在R中找到并绘制n-gram的频率,可以通过以下步骤实现:
install.packages("tm")
install.packages("RWeka")
install.packages("ggplot2")
library(tm)
library(RWeka)
library(ggplot2)
text <- c("This is a sample sentence.", "Another sentence for testing.")
corpus <- Corpus(VectorSource(text))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
ngram <- function(x, n) {
unlist(lapply(n, function(n) {
unlist(lapply(nchar(x) - n + 1, function(i) {
substr(x, i, i + n - 1)
}))
}))
}
ngram_corpus <- tm_map(corpus, content_transformer(ngram), n = 2) # 2-gram示例
ngram_freq <- table(unlist(ngram_corpus))
ngram_freq <- sort(ngram_freq, decreasing = TRUE)
df <- data.frame(ngram = names(ngram_freq), freq = as.numeric(ngram_freq))
ggplot(df, aes(x = ngram, y = freq)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
这样,你就可以在R中找到并绘制n-gram的频率了。请注意,以上代码仅为示例,实际应用中可能需要根据具体需求进行适当调整。
领取专属 10元无门槛券
手把手带您无忧上云