背景
我确实有一个图像数据集(类似于http://www.image-net.org/),该数据集为每个图像提供了一个“带有打字的描述”。我想对这个家伙运行一些深卷积神经网络,但我需要首先生成“标签”。所以,问题是:
问题
如何从“带有排字的描述”生成类别的“标签”?
技术信息
数据集有大约13M幅图像,并有相应的(有效的)“描述”和可选的“输入”。“描述”的一些例子如下:
想法
我正在考虑用以下的方法来处理这个问题。
发布于 2014-08-12 18:46:05
这里有一些想法:
- Check a Information retrieval course and implement the checking, google lecture3-tolerant-retrieval-handout-6-per.pdf (I bet this is not the way to go) In case you want frequencies, google "Natural Language Corpus Data"
- Use some code [http://norvig.com/spell-correct.html](http://norvig.com/spell-correct.html) (in many languages)
- Use [http://viget.com/extend/tagging-text-automatically](http://viget.com/extend/tagging-text-automatically) I have never used them but it should work reasonable well
- I would not recommend using k means because you do know the number of groups
- Use the most recurrent word might work for few examples (like the ones you show there) but it might not work for many cases.
我希望这是有用的
https://stackoverflow.com/questions/25273507
复制相似问题