检查存在的字典VS在python
一组似乎检键作为集是一个稍快一点:检查存在的字典VS在python
import random import string
import timeit
repeat = 3
numbers = 1000
def time(statement, _setup=None):
print min(
timeit.Timer(statement, setup=_setup or setup).repeat(
repeat, numbers))
random.seed('slartibartfast')
# Integers
length = 100000
d = {}
for _ in range(length):
d[random.randint(0, 10000000)] = 0
s = set(d)
setup = """from __main__ import s, d, length
"""
time('for i in xrange(length): check = i in d')
time('for i in xrange(length): check = i in s')
# Strings
d = {}
for _ in range(length):
d[''.join(random.choice(string.ascii_uppercase) for __ in range(16))] = 0
s = set(d)
test_strings= []
for _ in range(length):
test_strings.append(random.choice(string.ascii_uppercase) for __ in range(16))
setup = """from __main__ import s, d, length, test_strings
"""
time('for i in test_strings: check = i in d')
time('for i in test_strings: check = i in s')
印像:
10.1242966769 9.73939713014
10.5156763102
10.2767765061
这是可以预料的或随机神器?
想知道是否值得在性能密集型代码中为字典键创建集合。
编辑:我的测量结果真的让我怀疑底层的实现,我不是想保存微秒,我只是好奇 - 是的,如果事实证明底层实现真的有利集,我可以做一组这些字典键 - 或不(我实际上是在修补遗留代码)。
回答:
可能取决于各种各样的东西。在我跑,字典查找已稍快,但还不足以感到兴奋:
In [1]: import numpy as np In [2]: d = {i: True for i in np.random.random(1000)}
In [3]: s = {i for i in np.random.random(1000)}
In [4]: checks = d.keys()[:500] + list(s)[:500]
In [5]: %timeit [k in d for k in checks]
10000 loops, best of 3: 83 µs per loop
In [6]: %timeit [k in s for k in checks]
10000 loops, best of 3: 88.4 µs per loop
In [7]: d = {i: True for i in np.random.random(100000)}
In [8]: s = {i for i in np.random.random(100000)}
In [9]: checks = d.keys()[:5000] + list(s)[:5000]
In [10]: %timeit [k in d for k in checks]
1000 loops, best of 3: 865 µs per loop
In [11]: %timeit [k in s for k in checks]
1000 loops, best of 3: 929 µs per loop
回答:
老实说,它在很大程度上依赖于硬件,操作系统和数据大小/约束。一般来说,性能将几乎相同,直到您获得真正的大数据量。注意几个运行在这里dict
稍微好一些。在更大的数据结构尺寸时,内部实现细节开始主导差异,并且在我的机器上set
往往表现更好。
现实是在大多数情况下,三角洲并不重要。如果您真的想要更好的查找性能,请考虑使用cython
或转移到C级操作,或者使用针对较大数据大小设计的库实现。 Python基础类型在达到几百万个元素时并不意味着性能。
>>> # With empty dict as setup in question >>> time('for i in xrange(length): check = i in d')
2.83035111427
>>> time('for i in xrange(length): check = i in s')
2.87069892883
>>> d = { random.random(): None for _ in xrange(100000) }
>>> s = set(d)
>>> time('for i in xrange(length): check = i in d')
3.84766697884
>>> time('for i in xrange(length): check = i in s')
3.97955989838
>>> d = { random.randint(0, 1000000000): None for _ in xrange(100000) }
>>> s = set(d)
>>> time('for i in xrange(length): check = i in d')
3.96871709824
>>> time('for i in xrange(length): check = i in s')
3.62110710144
>>> d = { random.randint(0, 1000000000): None for _ in xrange(10000000) }
>>> s = set(d)
>>> time('for i in xrange(length): check = i in d')
10.6934559345
>>> time('for i in xrange(length): check = i in s')
5.7491569519
以上是 检查存在的字典VS在python 的全部内容, 来源链接: utcz.com/qa/261456.html