一个python脚本中,数据量大导致的泄露是为什么?

一个python脚本中,数据量大导致的泄露是为什么?

有一段python脚本,是循环一个非常大的mongoengine构建的表,每次取一样,追加写入到csv里面,当数据量大到10万这个级别的时候,经常会出现内存异常,被linux kernal保护性杀死,OOM了

Aug 22 18:59:13 ubuntu1 kernel: [35729.076177] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/snap.pycharm-professional.pycharm-professional.5bec252c-af46-4b4b-b24d-ca56f3e18cc8.scope,task=python,pid=82650,uid=1000

Aug 22 18:59:13 ubuntu1 kernel: [35729.076201] Out of memory: Killed process 82650 (python) total-vm:25980252kB, anon-rss:22958888kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:45712kB oom_score_adj:0

使用一个监测内存的memory-profiler的装饰器找到的可疑的地方是这里:

   Line #    Mem usage    Increment  Occurrences   Line Contents

443 557.3 MiB -906.4 MiB 47360 elif _attr_id in multiple_choices_attr_id_list:

444 557.3 MiB -723.6 MiB 37888 if isinstance(answer, list):

445 557.3 MiB -378176.5 MiB 19756656 answer = '|'.join([

446 557.3 MiB -376003.6 MiB 19643112 str(option_id_name_dict.get(_id, '')) for _id in option_id_order_list if _id in answer

447 ])

448 557.3 MiB -725.2 MiB 37888 _row.append(answer)

我有两个问题:

一是: 为什么中间两行的列表推导的写法会有问题?

二是:是这个装饰器放映的内存Increment,为什么有负值?


回答:

The first column represents the line number of the code that has been profiled, the second column (Mem usage) the memory usage of the Python interpreter after that line has been executed. The third column (Increment) represents the difference in memory of the current line with respect to the last one. The last column (Line Contents) prints the code that has been profiled.

数据量大, 按行进行处理, 及时释放不需要的变量. 该写文件的写文件, 该unset的unset.


回答:

分页查1页1w,1页写1个sheet,你1下10w csv估计也打不开

以上是 一个python脚本中,数据量大导致的泄露是为什么? 的全部内容, 来源链接: utcz.com/p/938585.html

回到顶部