问列表中的单词计数
EN

Stack Overflow用户

提问于 2019-09-15 08:04:49

回答 1查看 57关注 0票数 0

我有一个项目，其目标是在字符串中计数单词。解决这个问题的最明显的方法之一是将字符串拆分成列表，然后让程序运行，这样它就可以查看任何列表项是否相同；最后，将单词作为字典的键，将重复的次数作为字典的键。我这样做了，但是出现了“列表索引必须是整数或片，而不是str”的错误消息。解决这个问题的方法是什么(下面的代码)。

words = content_string.lower()
punctuation = ["'", '"', ',', '.', '?', '!', ':', ';', '()','-']
words = "".join(i if i not in punctuation else "" for i in words)
words = words.split()

i = 0
counts = dict()

for i in words:
if words[i] in counts:
    counts[words[i]] += 1
else:
    counts[words[i]] =1

sorted_counts = sorted(counts.items(), key=operator.itemgetter(1), reverse=True)
for i in len(range(9)):
    print(count[i])

python-3.x

python

媒体处理1元起

智能、强大、全面的多媒体数据处理服务，助您提升媒体质量、降低成本，媒体处理套餐低至1元

回答 1

Stack Overflow用户

发布于 2019-09-15 08:31:49

使用collections.Counter (8.3收款)

import collections
from pprint import pprint

content_string = 'I am having a project of which the goal is to count words in a string (unigrams). One of the most obvious ways to approach this is by splitting the string up to lists and then have the program run so it can see if any list items are the same; finally, put the word as the key of a dictionary, and the times of repetition as the key of the dictionary. I did this, but the error message appears of "list indices must be integers or slices, not str". What are some ways to fix this problem (code below).'

words = content_string.lower()
punctuation = ["'", '"', ',', '.', '?', '!', ':', ';', '(',')','-']
words = "".join(i if i not in punctuation else "" for i in words)
words = words.split()

word_count = collections.Counter()
for word in words:
    word_count[word] += 1

pprint(word_count.most_common())

结果

[('the', 11),
 ('of', 6),
 ('to', 4),
 ('a', 3),
 ('this', 3),
 ('i', 2),
 ('is', 2),
 ('string', 2),
 ('ways', 2),
 ('and', 2),
 ('list', 2),
 ('are', 2),
 ('as', 2),
 ('key', 2),
 ('dictionary', 2),
 ('am', 1),
 ('having', 1),
 ('project', 1),
 ('which', 1),
 ('goal', 1),
 ('count', 1),
 ('words', 1),
 ('in', 1),
 ('unigrams', 1),
 ('one', 1),
 ('most', 1),
 ('obvious', 1),
 ('approach', 1),
 ('by', 1),
 ('splitting', 1),
 ('up', 1),
 ('lists', 1),
 ('then', 1),
 ('have', 1),
 ('program', 1),
 ('run', 1),
 ('so', 1),
 ('it', 1),
 ('can', 1),
 ('see', 1),
 ('if', 1),
 ('any', 1),
 ('items', 1),
 ('same', 1),
 ('finally', 1),
 ('put', 1),
 ('word', 1),
 ('times', 1),
 ('repetition', 1),
 ('did', 1),
 ('but', 1),
 ('error', 1),
 ('message', 1),
 ('appears', 1),
 ('indices', 1),
 ('must', 1),
 ('be', 1),
 ('integers', 1),
 ('or', 1),
 ('slices', 1),
 ('not', 1),
 ('str', 1),
 ('what', 1),
 ('some', 1),
 ('fix', 1),
 ('problem', 1),
 ('code', 1),
 ('below', 1)]

PS。for i in words: I实际上是一个单词，而不是索引。如果您想要一个索引和单词，那么您可以使用for i, word in enumerate(words):，但是，正如您看到的那样，使用计数器以更短的方式解决问题。

无论如何，不使用计数器，您可以解决以下问题：

from pprint import pprint

content_string = 'I am having a project of which the goal is to count words in a string (unigrams). One of the most obvious ways to approach this is by splitting the string up to lists and then have the program run so it can see if any list items are the same; finally, put the word as the key of a dictionary, and the times of repetition as the key of the dictionary. I did this, but the error message appears of "list indices must be integers or slices, not str". What are some ways to fix this problem (code below).'

words = content_string.lower()
punctuation = ["'", '"', ',', '.', '?', '!', ':', ';', '(',')','-']
words = "".join(i if i not in punctuation else "" for i in words)
words = words.split()

word_count = {}

for word in words:
    try:
        word_count[word] += 1
    except KeyError:
        word_count[word] = 1

word_count = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
pprint(word_count)