为什么aiofiles 比普通文件操作还要慢?

为什么aiofiles 比普通文件操作还要慢?

多个日志文件中查找是否含有某个字符串,发现aiofiles很慢,不知道是否使用方法有误?恳请指点

files = [

r'C:\log\20210523.log',

r'C:\log\20210522.log',

r'C:\log\20210521.log',

r'C:\log\20210524.log',

r'C:\log\20210525.log',

r'C:\log\20210520.log',

r'C:\log\20210519.log',

]

async def match_content_in_file(filename:str,content:str,encoding:str="gbk")->bool:

async with aiofiles.open(filename,mode="r",encoding=encoding) as f:

# text = await f.read()

# return content in text

async for line in f:

if content in line:

return True

def match_content_in_file2(filename:str,content:str,encoding:str="gbk")->bool:

with open(filename,mode="r",encoding=encoding) as f:

# text = f.read()

# return content in text

for line in f:

if content in line:

return True

async def main3():

start = time.time()

tasks = [match_content_in_file(f,'808395') for f in files]

l = await asyncio.gather(*tasks)

print(l)

end = time.time()

print(end - start)

def main2():

start = time.time()

l = []

for f in files:

l.append(match_content_in_file2(f,'808395'))

print(l)

end = time.time()

print(end-start)

if __name__ == '__main__':

asyncio.run(main3()) # 很慢

main2() # 很快

实测情况(每个文件约7.5M)

  • 逐行读取文件内容异步方式耗时巨大。
[True, True, True, None, None, True, True]

异步方式: 40.80606389045715

-------------------------------------

[True, True, True, None, None, True, True]

同步方式: 0.48870062828063965

  • 一次性读取文件内容,异步方式和同步方式差别不大,但还是同步快一点
[True, True, True, False, False, True, True]

异步方式: 0.6835882663726807

-------------------------------------

[True, True, True, False, False, True, True]

同步方式: 0.6745946407318115

环境
python 3.9.2 win10


回答:

硬盘读取一个文件是最快的, 同时多读几个文件, 要在多个磁盘块中反复切换, 反而慢.

读文件和网络通讯不一样, 网络请求是在发送后, 需要等待, 这个时候可以使用协程提升并发数量.
硬盘不行.


回答:

aio是io复用,只能解决io性能问题,可以看下cpu,如果单核cpu已经打满了的话,用协程也不会提升性能的


回答:

为什么我测试正好相反呢, 环境 3.8.2

import time

import asyncio

files = [

r'C:\log\20210523.log',

r'C:\log\20210522.log',

r'C:\log\20210521.log',

r'C:\log\20210524.log',

r'C:\log\20210525.log',

r'C:\log\20210520.log',

r'C:\log\20210519.log',

]

def match_content_in_file(f, s):

time.sleep(1) # 都是sleep 1s

async def match_content_in_file_asc(f,s):

await asyncio.sleep(1) # 都是sleep 1s

async def main3():

start = time.time()

tasks = [match_content_in_file_asc(f,'808395') for f in files]

l = await asyncio.gather(*tasks)

print(l)

end = time.time()

print(end - start)

def main2():

start = time.time()

l = []

for f in files:

l.append(match_content_in_file(f,'808395'))

print(l)

end = time.time()

print(end-start)

if __name__ == '__main__':

asyncio.run(main3()) # 很快

main2() # 很慢

outputs

/bin/python3 test.py

[None, None, None, None, None, None, None]

1.000645637512207

[None, None, None, None, None, None, None]

7.005064249038696


回答:

这个测试很有趣,我也测了一下

  1. 当文件都不存在时,aiofiles快很多
~/test ᐅ python3 -V

Python 3.8.1

~/test ᐅ python3 aiotest.py

[None, None, None, None, None, None, None]

1.0032050609588623

[None, None, None, None, None, None, None]

7.023258686065674

~/test ᐅ sw_vers

ProductName: Mac OS X

ProductVersion: 10.15.7

BuildVersion: 19H2

  1. 当文件存在时,aiofiles慢了很多很多

    import asyncio

    import os

    import time

    from random import randint

    from pathlib import Path

    import aiofiles

    BASE_DIR = Path('log')

    files = [

    '20210523.log',

    '20210522.log',

    '20210521.log',

    '20210524.log',

    '20210525.log',

    '20210520.log',

    '20210519.log',

    ]

    def gen_files():

    if not BASE_DIR.exists():

    BASE_DIR.mkdir(parents=True)

    for fname in files:

    if not (p := BASE_DIR / fname).exists():

    nums = [randint(10**6, 10**7-1) for _ in range(1024*1024)]

    p.write_text('\n'.join(map(str, nums)))

    print(f'{p} created!')

    os.system(f'ls -lh {BASE_DIR}')

    async def match_content_in_file(filename:str,content:str,encoding:str="gbk")->bool:

    async with aiofiles.open(filename,mode="r",encoding=encoding) as f:

    # text = await f.read()

    # return content in text

    async for line in f:

    if content in line:

    return True

    def match_content_in_file2(filename:str,content:str,encoding:str="gbk")->bool:

    with open(filename,mode="r",encoding=encoding) as f:

    # text = f.read()

    # return content in text

    for line in f:

    if content in line:

    return True

    async def main3():

    print('Start async process...')

    start = time.time()

    tasks = [match_content_in_file(BASE_DIR/f,'808395') for f in files]

    l = await asyncio.gather(*tasks)

    print(l)

    end = time.time()

    print(end - start)

    def main2():

    print('Start sync process...')

    start = time.time()

    l = []

    for f in files:

    l.append(match_content_in_file2(BASE_DIR/f,'808395'))

    print(l)

    end = time.time()

    print(end-start)

    if __name__ == '__main__':

    gen_files() # 生成测试用文件

    asyncio.run(main3()) # 很慢

    main2() # 很快

    结果:

    total 114688

    -rw-r--r-- 1 lian staff 8.0M 6 12 00:24 20210519.log

    -rw-r--r-- 1 lian staff 8.0M 6 12 00:24 20210520.log

    -rw-r--r-- 1 lian staff 8.0M 6 12 00:23 20210521.log

    -rw-r--r-- 1 lian staff 8.0M 6 12 00:23 20210522.log

    -rw-r--r-- 1 lian staff 8.0M 6 12 00:23 20210523.log

    -rw-r--r-- 1 lian staff 8.0M 6 12 00:24 20210524.log

    -rw-r--r-- 1 lian staff 8.0M 6 12 00:24 20210525.log

    Start async process...

    [True, True, True, True, True, True, True]

    283.923513174057

    Start sync process...

    [True, True, True, True, True, True, True]

    0.46163487434387207

以上是 为什么aiofiles 比普通文件操作还要慢? 的全部内容, 来源链接: utcz.com/p/937976.html

回到顶部