Skip to content

Instantly share code, notes, and snippets.

@yosuke-yasuda
Created May 19, 2016 01:55
Show Gist options
  • Save yosuke-yasuda/1861e9517a73167586471ec088ac0951 to your computer and use it in GitHub Desktop.
Save yosuke-yasuda/1861e9517a73167586471ec088ac0951 to your computer and use it in GitHub Desktop.
mecab_tokenize <- function(tbl, text_col, .drop=TRUE){
loadNamespace("RMeCab")
loadNamespace("tidyr")
text_cname <- as.character(substitute(text_col))
text <- tbl[[text_cname]]
tokenize <- function(text){
tokens <- unlist(RMeCab::RMeCabC(text))
data.frame(.token = tokens, .pos = names(tokens))
}
if(.drop){
tbl[[text_cname]] <- lapply(text, tokenize)
token_col <- text_cname
} else {
tbl$.token <- lapply(text, tokenize)
token_col <- ".token"
}
tidyr::unnest_(tbl, token_col)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment