当一些原始值相同时反转字典

小能豆

当一些原始值相同时反转字典

假设我有一本名为的词典word_counter_dictionary，它计算文档中有多少个单词，格式为{'word' : number}。例如，单词“secondly”出现一次，因此键/值对将是{'secondly' : 1}。我想制作一个倒排列表，以便数字成为键，单词成为这些键的值，这样我就可以绘制出最常用的前 25 个单词。我看到某个地方这个setdefault()函数可能会派上用场，但无论如何我都无法使用它，因为到目前为止，在我所在的课程中我们只讲过get()。

inverted_dictionary = {}
for key in word_counter_dictionary:
    new_key = word_counter_dictionary[key]
    inverted_dictionary[new_key] = word_counter_dictionary.get(new_key, '') + str(key)   
    inverted_dictionary

到目前为止，使用上述方法，它工作正常，直到遇到具有相同值的另一个单词。例如，该单词"saves"在文档中也出现过一次，因此 Python 会正常添加新的键/值对。但是它会{1 : 'secondly'}用新对擦除，因此{1 : 'saves'}字典中只有。

因此，最重要的是，我的目标是获取这本名为的新词典中的所有单词及其各自的重复次数inverted_dictionary。

阅读 5

2024-11-20

共1个答案

小能豆

您所遇到的问题是由于在构建倒排字典时，遇到具有相同出现次数的单词时，后来的单词覆盖了前一个单词。为了避免这种情况，您需要确保每个出现次数都映射到所有出现该次数的单词。

我们可以修改代码，使用一个列表来保存所有具有相同出现次数的单词，这样就不会丢失任何单词。

解决方案

首先，我们要确保 inverted_dictionary 中的值是一个列表，而不是单个字符串。当遇到相同的出现次数时，我们就把新单词添加到该列表中，而不是覆盖它。

修改后的代码：

word_counter_dictionary = {
    'secondly': 1,
    'saves': 1,
    'apple': 3,
    'banana': 2,
    'orange': 3,
    'grape': 2
}

inverted_dictionary = {}

for word, count in word_counter_dictionary.items():
    # 使用 setdefault 来确保每个出现次数对应一个单词列表
    inverted_dictionary.setdefault(count, []).append(word)

# 输出倒排字典
print(inverted_dictionary)

解释：

setdefault()：该方法用于获取字典中某个键的值。如果该键不存在，它会创建一个新的键，并将其值设置为提供的默认值（在这种情况下是一个空列表）。然后我们可以通过 append() 向该列表中添加单词。
inverted_dictionary.setdefault(count, [])：对于每个单词，我们检查它出现的次数 count 是否已经存在于 inverted_dictionary 中。如果不存在，我们会将其初始化为一个空列表，然后将该单词追加到该列表中。

结果：

如果您运行上述代码，inverted_dictionary 会包含以下内容：

{
    1: ['secondly', 'saves'],
    3: ['apple', 'orange'],
    2: ['banana', 'grape']
}

后续操作：绘制最常用的 25 个单词

如果您希望继续根据出现次数绘制最常用的前 25 个单词，可以通过以下步骤来处理：

对 inverted_dictionary 按出现次数进行排序。
提取最常见的 25 个单词。

例如：

import matplotlib.pyplot as plt

# 按出现次数排序倒排字典（降序）
sorted_inverted = sorted(inverted_dictionary.items(), reverse=True, key=lambda x: x[0])

# 提取前 25 个最常用的单词
top_25_words = []
for count, words in sorted_inverted[:25]:
    top_25_words.extend(words)

# 绘制前 25 个单词
plt.bar(top_25_words, [word_counter_dictionary[word] for word in top_25_words])
plt.xticks(rotation=90)
plt.show()

这样，您就可以按照单词出现次数的降序绘制出前 25 个最常见的单词及其出现次数了。

2024-11-20