我尝试了下面的代码:
!pip install python-bidi
from wordcloud import WordCloud
from matplotlib import pyplot as plt
from bidi.algorithm import get_display
text="""মুস্তাফিজ"""
bidi_text = get_display(text)
print(bidi_text)
# https://github.com/amueller/word_cloud/issues/367
# https://stackoverflow.com/questions/54063438/create-wordcloud-in-python-for-foreign-language-hebrew
# https://www.omicronlab.com/bangla-fonts.html
rgx = r"[\u0980-\u09FF]+"
wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf').generate(bidi_text)
#wordcloud = WordCloud(font_path='/content/FreeSansBold.ttf').generate(bidi_text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
然后我得到了这个错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-87-56d899c0de07> in <module>()
12 # https://www.omicronlab.com/bangla-fonts.html
13 rgx = r"[\u0980-\u09FF]+"
---> 14 wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf').generate(bidi_text)
15
16 #wordcloud = WordCloud(font_path='/content/FreeSansBold.ttf').generate(bidi_text)
2 frames
/usr/local/lib/python3.6/dist-packages/wordcloud/wordcloud.py in generate_from_frequencies(self, frequencies, max_font_size)
381 if len(frequencies) <= 0:
382 raise ValueError("We need at least 1 word to plot a word cloud, "
--> 383 "got %d." % len(frequencies))
384 frequencies = frequencies[:self.max_words]
385
ValueError:我们至少需要一个字来绘制一个字云,得到0.。
这一行不是选择班加语: wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf').generate(bidi_text)
我尝试了来自这里的几乎所有用于班加语的字体:https://www.omicronlab.com/bangla-fonts.html
毫无办法
发布于 2021-06-04 09:51:04
在单词云中,您没有用定义的来更改。在处理单词云中的文本时,它无法匹配模式,并返回一个空列表。在创建word云对象时传递rgx变量将解决您的问题。
wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf',regexp=rgx).generate(bidi_text)
下面是代码的完整片段。
!pip install python-bidi
from wordcloud import WordCloud
from matplotlib import pyplot as plt
from bidi.algorithm import get_display
text="""মুস্তাফিজ"""
bidi_text = get_display(text)
print(bidi_text)
# https://github.com/amueller/word_cloud/issues/367
# https://stackoverflow.com/questions/54063438/create-wordcloud-in-python-for-foreign-language-hebrew
# https://www.omicronlab.com/bangla-fonts.html
rgx = r"[\u0980-\u09FF]+"
wordcloud = WordCloud(font_path='/content/Siyamrupali.ttf',
regexp=rgx).generate(bidi_text)
#wordcloud = WordCloud(font_path='/content/FreeSansBold.ttf').generate(bidi_text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
发布于 2021-06-17 06:12:41
我使用下面的代码在Bangla生成了一个单词云。你可以试试:
def generate_Word_cloud(self,author_post,vocabularyWordnumber,img_file,stop_word_root_path):
stop_word_file = stop_word_root_path+'/stopwords-bn.txt'
print(stop_word_file)
f = open(stop_word_file, "r", encoding="utf8")
stop_word = f.read().split("\n")
print(stop_word)
final_text = " ".join(author_post)
print(final_text)
wordcloud = WordCloud(stopwords = stop_word, font_path='/usr/share/fonts/truetype/freefont/kalpurush.ttf',
width = 600, height = 500,max_font_size=300, max_words=vocabularyWordnumber,
min_word_length=4, background_color="black").generate(final_text)
wordcloud.to_file(img_file)
发布于 2021-10-14 10:56:37
我遵循这句话,最终可以解决Ubuntu中的问题。
步骤1
!sudo apt-get install libfreetype6-dev libharfbuzz-dev libfribidi-dev gtk-doc-tools
步骤2
!wget -O raqm-0.7.0.tar.gz https://raw.githubusercontent.com/python-pillow/pillow-depends/master/raqm-0.7.0.tar.gz
现在,raqm-0.7.0.tar.gz文件应该位于下载部分。
步骤3
!tar -xzvf raqm-0.7.0.tar.gz
步骤4
!cd raqm-0.7.0
步骤5
!./configure --prefix=/usr && make -j4 && sudo make -j4 install
步骤6
现在你只需要重新安装枕头库。激活正确的环境。然后运行以下命令:
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade Pillow
就这样!现在您有了一个工作枕头库,可以在图像中生成正确的孟加拉文和其他印地语字体。
此外,正如@Farzana Eva在其注释中所建议的,您需要在wordcloud对象中传递rgx变量。
https://stackoverflow.com/questions/64629441
复制