为什么aiofiles 比普通文件操作还要慢?

Z时代
2024-03-12
分类：IT

多个日志文件中查找是否含有某个字符串，发现aiofiles很慢，不知道是否使用方法有误？恳请指点

files = [
    r'C:\log\20210523.log',
    r'C:\log\20210522.log',
    r'C:\log\20210521.log',
    r'C:\log\20210524.log',
    r'C:\log\20210525.log',
    r'C:\log\20210520.log',
    r'C:\log\20210519.log',
]
async def match_content_in_file(filename:str,content:str,encoding:str="gbk")->bool:
    async with aiofiles.open(filename,mode="r",encoding=encoding) as f:
        # text = await f.read()
        # return content in text
        async for line in f:
            if content in line:
                return True
def match_content_in_file2(filename:str,content:str,encoding:str="gbk")->bool:
    with open(filename,mode="r",encoding=encoding) as f:
        # text = f.read()
        # return content in text
        for line in f:
            if content in line:
                return True
async def main3():
    start = time.time()
    tasks = [match_content_in_file(f,'808395') for f in files]
    l = await asyncio.gather(*tasks)
    print(l)
    end = time.time()
    print(end - start)
def main2():
    start = time.time()
    l = []
    for f in files:
        l.append(match_content_in_file2(f,'808395'))
    print(l)
    end = time.time()
    print(end-start)
if __name__ == '__main__':
    asyncio.run(main3())   # 很慢    main2()   # 很快

实测情况(每个文件约7.5M)

逐行读取文件内容异步方式耗时巨大。

[True, True, True, None, None, True, True] 异步方式: 40.80606389045715 ------------------------------------- [True, True, True, None, None, True, True]

同步方式: 0.48870062828063965

一次性读取文件内容，异步方式和同步方式差别不大，但还是同步快一点

[True, True, True, False, False, True, True] 异步方式: 0.6835882663726807 ------------------------------------- [True, True, True, False, False, True, True]

同步方式: 0.6745946407318115

环境
python 3.9.2 win10

回答：

硬盘读取一个文件是最快的, 同时多读几个文件, 要在多个磁盘块中反复切换, 反而慢.

读文件和网络通讯不一样, 网络请求是在发送后, 需要等待, 这个时候可以使用协程提升并发数量.
硬盘不行.

回答：

aio是io复用，只能解决io性能问题，可以看下cpu，如果单核cpu已经打满了的话，用协程也不会提升性能的

回答：

为什么我测试正好相反呢, 环境 3.8.2

import time
import asyncio
files = [
    r'C:\log\20210523.log',
    r'C:\log\20210522.log',
    r'C:\log\20210521.log',
    r'C:\log\20210524.log',
    r'C:\log\20210525.log',
    r'C:\log\20210520.log',
    r'C:\log\20210519.log',
]
def match_content_in_file(f, s):
    time.sleep(1) # 都是sleep 1s
async def match_content_in_file_asc(f,s):
    await asyncio.sleep(1)  # 都是sleep 1s
async def main3():
    start = time.time()
    tasks = [match_content_in_file_asc(f,'808395') for f in files]
    l = await asyncio.gather(*tasks)
    print(l)
    end = time.time()
    print(end - start)
def main2():
    start = time.time()
    l = []
    for f in files:
        l.append(match_content_in_file(f,'808395'))
    print(l)
    end = time.time()
    print(end-start)
if __name__ == '__main__':
    asyncio.run(main3())   # 很快    main2()   # 很慢

outputs

/bin/python3 test.py
[None, None, None, None, None, None, None]
1.000645637512207
[None, None, None, None, None, None, None]7.005064249038696

回答：

这个测试很有趣，我也测了一下

当文件都不存在时，aiofiles快很多

~/test ᐅ python3 -V
Python 3.8.1
~/test ᐅ python3 aiotest.py
[None, None, None, None, None, None, None]
1.0032050609588623
[None, None, None, None, None, None, None]
7.023258686065674
~/test ᐅ sw_vers
ProductName:    Mac OS X
ProductVersion:    10.15.7BuildVersion:    19H2

当文件存在时，aiofiles慢了很多很多

import asyncio
import os
import time
from random import randint
from pathlib import Path
import aiofiles
BASE_DIR = Path('log')
files = [
 '20210523.log',
 '20210522.log',
 '20210521.log',
 '20210524.log',
 '20210525.log',
 '20210520.log',
 '20210519.log',
]
def gen_files():
 if not BASE_DIR.exists():
     BASE_DIR.mkdir(parents=True)
 for fname in files:
     if not (p := BASE_DIR / fname).exists():
         nums = [randint(10**6, 10**7-1) for _ in range(1024*1024)]
         p.write_text('\n'.join(map(str, nums)))
         print(f'{p} created!')
 os.system(f'ls -lh {BASE_DIR}')
async def match_content_in_file(filename:str,content:str,encoding:str="gbk")->bool:
 async with aiofiles.open(filename,mode="r",encoding=encoding) as f:
     # text = await f.read()
     # return content in text
     async for line in f:
         if content in line:
             return True
def match_content_in_file2(filename:str,content:str,encoding:str="gbk")->bool:
 with open(filename,mode="r",encoding=encoding) as f:
     # text = f.read()
     # return content in text
     for line in f:
         if content in line:
             return True
async def main3():
 print('Start async process...')
 start = time.time()
 tasks = [match_content_in_file(BASE_DIR/f,'808395') for f in files]
 l = await asyncio.gather(*tasks)
 print(l)
 end = time.time()
 print(end - start)
def main2():
 print('Start sync process...')
 start = time.time()
 l = []
 for f in files:
     l.append(match_content_in_file2(BASE_DIR/f,'808395'))
 print(l)
 end = time.time()
 print(end-start)
if __name__ == '__main__':
 gen_files() # 生成测试用文件
 asyncio.run(main3())   # 很慢 main2()   # 很快

结果:

total 114688 -rw-r--r-- 1 lian staff 8.0M 6 12 00:24 20210519.log -rw-r--r-- 1 lian staff 8.0M 6 12 00:24 20210520.log -rw-r--r-- 1 lian staff 8.0M 6 12 00:23 20210521.log -rw-r--r-- 1 lian staff 8.0M 6 12 00:23 20210522.log -rw-r--r-- 1 lian staff 8.0M 6 12 00:23 20210523.log -rw-r--r-- 1 lian staff 8.0M 6 12 00:24 20210524.log -rw-r--r-- 1 lian staff 8.0M 6 12 00:24 20210525.log Start async process... [True, True, True, True, True, True, True] 283.923513174057 Start sync process... [True, True, True, True, True, True, True]

0.46163487434387207

以上是为什么aiofiles 比普通文件操作还要慢? 的全部内容，来源链接： utcz.com/p/937976.html

为什么aiofiles 比普通文件操作还要慢?

回答：

回答：

回答：

回答：

其他人也看了：