我已经使用计数向量器为一些文本数据创建了word2vec。现在,我想将生成的单词(表示常见含义/方面)中的某些单词分组到新的单个单词中,从而找到新的word2vec表示。我该如何解决这个问题? from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd
import numpy as np
A = {'some_text': ('cat is red and fat', 'dog is blue hairy thin','horse i
我需要连接条件满足词的前一个词和后一个词。具体地说,就是那些符合逗号条件的人。
vector <- c("Paulsen", "Kehr,", "Diego", "Schalper", "Sepúlveda,", "Diego")
#I know how to get which elements meet my condition:
grepl(",", vector)
#[1] FALSE TRUE FALSE FALSE TRUE FALSE
所需输出:
我在玩弄Spacy中的相似函数,并观察到一些我不理解的东西:
import spacy
nlp = spacy.load('en_core_web_sm')
doc1 = nlp("Honda Civic Toyota")
doc2 = nlp("Honda Civic Toyota car Christian God")
for token in doc1:
print (token.text, doc1[0].similarity(token))
for token in doc2:
print (token.text,
绘制嵌入TSNE结果的单词时,单词会多次出现。
我正在降低Word2Vec单词嵌入的维度,但是当我绘制最相似单词的子集的结果时(手动输入几个我想要的最相似的单词),相同的单词会多次出现:
from sklearn.manifold import TSNE
words = sum([[k] + v for k, v in similar_words.items()], [])
wvs = model.wv[words]
tsne = TSNE(n_components=3, random_state=0, n_iter=10000, perplexity=29)
np.set_printop