检查存在的字典VS在python

一组似乎检键作为集是一个稍快一点:检查存在的字典VS在python

import random 

import string

import timeit

repeat = 3

numbers = 1000

def time(statement, _setup=None):

print min(

timeit.Timer(statement, setup=_setup or setup).repeat(

repeat, numbers))

random.seed('slartibartfast')

# Integers

length = 100000

d = {}

for _ in range(length):

d[random.randint(0, 10000000)] = 0

s = set(d)

setup = """from __main__ import s, d, length

"""

time('for i in xrange(length): check = i in d')

time('for i in xrange(length): check = i in s')

# Strings

d = {}

for _ in range(length):

d[''.join(random.choice(string.ascii_uppercase) for __ in range(16))] = 0

s = set(d)

test_strings= []

for _ in range(length):

test_strings.append(random.choice(string.ascii_uppercase) for __ in range(16))

setup = """from __main__ import s, d, length, test_strings

"""

time('for i in test_strings: check = i in d')

time('for i in test_strings: check = i in s')

印像:

10.1242966769 

9.73939713014

10.5156763102

10.2767765061

这是可以预料的或随机神器?

想知道是否值得在性能密集型代码中为字典键创建集合。

编辑:我的测量结果真的让我怀疑底层的实现,我不是想保存微秒,我只是好奇 - 是的,如果事实证明底层实现真的有利集,我可以做一组这些字典键 - 或不(我实际上是在修补遗留代码)。

回答:

可能取决于各种各样的东西。在我跑,字典查找已稍快,但还不足以感到兴奋:

In [1]: import numpy as np 

In [2]: d = {i: True for i in np.random.random(1000)}

In [3]: s = {i for i in np.random.random(1000)}

In [4]: checks = d.keys()[:500] + list(s)[:500]

In [5]: %timeit [k in d for k in checks]

10000 loops, best of 3: 83 µs per loop

In [6]: %timeit [k in s for k in checks]

10000 loops, best of 3: 88.4 µs per loop

In [7]: d = {i: True for i in np.random.random(100000)}

In [8]: s = {i for i in np.random.random(100000)}

In [9]: checks = d.keys()[:5000] + list(s)[:5000]

In [10]: %timeit [k in d for k in checks]

1000 loops, best of 3: 865 µs per loop

In [11]: %timeit [k in s for k in checks]

1000 loops, best of 3: 929 µs per loop

回答:

老实说,它在很大程度上依赖于硬件,操作系统和数据大小/约束。一般来说,性能将几乎相同,直到您获得真正的大数据量。注意几个运行在这里dict稍微好一些。在更大的数据结构尺寸时,内部实现细节开始主导差异,并且在我的机器上set往往表现更好。

现实是在大多数情况下,三角洲并不重要。如果您真的想要更好的查找性能,请考虑使用cython或​​转移到C级操作,或者使用针对较大数据大小设计的库实现。 Python基础类型在达到几百万个元素时并不意味着性能。

>>> # With empty dict as setup in question 

>>> time('for i in xrange(length): check = i in d')

2.83035111427

>>> time('for i in xrange(length): check = i in s')

2.87069892883

>>> d = { random.random(): None for _ in xrange(100000) }

>>> s = set(d)

>>> time('for i in xrange(length): check = i in d')

3.84766697884

>>> time('for i in xrange(length): check = i in s')

3.97955989838

>>> d = { random.randint(0, 1000000000): None for _ in xrange(100000) }

>>> s = set(d)

>>> time('for i in xrange(length): check = i in d')

3.96871709824

>>> time('for i in xrange(length): check = i in s')

3.62110710144

>>> d = { random.randint(0, 1000000000): None for _ in xrange(10000000) }

>>> s = set(d)

>>> time('for i in xrange(length): check = i in d')

10.6934559345

>>> time('for i in xrange(length): check = i in s')

5.7491569519

以上是 检查存在的字典VS在python 的全部内容, 来源链接: utcz.com/qa/261456.html

回到顶部