统计 Python 列表中的值出现的次数

为了对时区进行计数，我们用两种方法：一个用纯 Python 代码，比较麻烦。另一个用 collections 包，比较简单。这里我们先介绍使用纯 Python 代码的方法。在 10 条时区信息中，可以看到有些是空字符串。遍历时区的过程中将计数值保存在字典中。

time_zones = [
    'America/New_York',
    'America/Denver',
    'America/New_York',
    'America/Sao_Paulo',
    'America/New_York',
    'America/New_York',
    'Europe/Warsaw',
    '',
    '',
    '']

纯 Python 代码的方法

def get_counts(sequence):
    counts = {}
    for x in sequence:
        if x in counts:
            counts[x] += 1
        else:
            counts[x] = 1
        print(counts)
    return counts

print(get_counts(time_zones))

Output:

{'America/New_York': 1}
{'America/New_York': 1, 'America/Denver': 1}
{'America/New_York': 2, 'America/Denver': 1}
{'America/New_York': 2, 'America/Denver': 1, 'America/Sao_Paulo': 1}
{'America/New_York': 3, 'America/Denver': 1, 'America/Sao_Paulo': 1}
{'America/New_York': 4, 'America/Denver': 1, 'America/Sao_Paulo': 1}
{'America/New_York': 4, 'America/Denver': 1, 'America/Sao_Paulo': 1, 'Europe/Warsaw': 1}
{'America/New_York': 4, 'America/Denver': 1, 'America/Sao_Paulo': 1, 'Europe/Warsaw': 1, '': 1}
{'America/New_York': 4, 'America/Denver': 1, 'America/Sao_Paulo': 1, 'Europe/Warsaw': 1, '': 2}
{'America/New_York': 4, 'America/Denver': 1, 'America/Sao_Paulo': 1, 'Europe/Warsaw': 1, '': 3}
{'America/New_York': 4, 'America/Denver': 1, 'America/Sao_Paulo': 1, 'Europe/Warsaw': 1, '': 3}
[Finished in 0.6s]

使用 `collections` 包

（译者：下面关于 defaultdict 的用法是我从 Stack Overflow 上找到的，英文比较多，简单的说就是通常如果一个字典里不存在一个 key，调用的时候会报错，但是如果我们设置了了 default，就不会被报错，而是会新建一个 key，对应的 value 就是我们设置的int，这里 int 代表 0）

from collections import defaultdict

def get_counts2(sequence):
    counts = defaultdict(int) # 所有的值均会被初始化为0
    for x in sequence:
        counts[x] += 1
    return counts

print(get_counts2(time_zones))

Output:

defaultdict(, {'America/New_York': 4, 'America/Denver': 1, 'America/Sao_Paulo': 1, 'Europe/Warsaw': 1, '': 3})
[Finished in 0.8s]

如果想要得到前 10 位的时区及其计数值，我们需要一些有关字典的处理技巧：

import pprint
time_zones2 = get_counts(time_zones)

def top_counts(count_dict, n = 10):
    vk_pairs = [(count, tz) for (tz, count) in count_dict.items()]
    vk_pairs.sort()         
    return vk_pairs[-n:]    

    # vk_pairs.sort(reverse = True)    
    # return vk_pairs[:10]

pprint.pprint(top_counts(time_zones2))

Output:

[(33, 'America/Sao_Paulo'),
 (35, 'Europe/Madrid'),  
 (36, 'Pacific/Honolulu'),
 (37, 'Asia/Tokyo'),
 (74, 'Europe/London'),
 (191, 'America/Denver'),
 (382, 'America/Los_Angeles'),
 (400, 'America/Chicago'),
 (521, ''),
 (1251, 'America/New_York')]
[Finished in 0.7s]

CHAPTER 14 Data Analysis Examples（数据分析实例）

赞助博客

经济学人合集

2016~2025 年经济学人高清 PDF 合集

赞助合集

2016~2025 年经济学人高清 PDF 合集

赞助合集

Python

纯 Python 代码的方法

使用 collections 包

赞助博客

2016~2025 年经济学人高清 PDF 合集

2016~2025 年经济学人高清 PDF 合集

使用 `collections` 包