在python中使用多线程读取txt文件

Z时代
2024-01-10
分类：问答

我正在尝试在python中读取文件（扫描它的行并查找术语）并写入结果-

可以说，每个术语的计数器。我需要对大量文件（超过3000个）执行此操作。可以做多线程吗？如果是，怎么办？

因此，场景是这样的：

读取每个文件并扫描其行

将我已读取的所有文件的计数器写入同一输出文件。

第二个问题是，它是否会提高读取/写入的速度。

希望它足够清楚。谢谢，

罗恩

回答：

我同意@aix，multiprocessing绝对是要走的路。无论您将如何进行I /

O绑定，无论您正在运行多少个并行进程，您都只能读得这么快。但是，很容易被一些加速。

考虑以下内容（input /是一个包含来自Gutenberg项目的.txt文件的目录）。

import os.path
from multiprocessing import Pool
import sys
import time
def process_file(name):
    ''' Process one file: count number of lines and words '''
    linecount=0
    wordcount=0
    with open(name, 'r') as inp:
        for line in inp:
            linecount+=1
            wordcount+=len(line.split(' '))
    return name, linecount, wordcount
def process_files_parallel(arg, dirname, names):
    ''' Process each file in parallel via Poll.map() '''
    pool=Pool()
    results=pool.map(process_file, [os.path.join(dirname, name) for name in names])
def process_files(arg, dirname, names):
    ''' Process each file in via map() '''
    results=map(process_file, [os.path.join(dirname, name) for name in names])
if __name__ == '__main__':
    start=time.time()
    os.path.walk('input/', process_files, None)
    print "process_files()", time.time()-start
    start=time.time()
    os.path.walk('input/', process_files_parallel, None)
    print "process_files_parallel()", time.time()-start

当我在双核计算机上运行此程序时，速度明显提高（但不是2倍）：

$ python process_files.py
process_files() 1.71218085289
process_files_parallel() 1.28905105591

如果文件足够小以适合内存，并且您需要完成很多不受I / O约束的处理，那么您应该会看到更好的改进。

以上是在python中使用多线程读取txt文件的全部内容，来源链接： utcz.com/qa/413861.html

在python中使用多线程读取txt文件

回答：

其他人也看了：