我使用带有线程线程的
Python 2子进程来采用标准输入,使用二进制文件A,B和C进行处理,并将修改后的数据写入标准输出.
这个脚本(我们称之为:A_to_C.py)非常慢,我想学习如何解决它.
一般流程如下:
A_process = subprocess.Popen(['A','-'],stdin=subprocess.PIPE,stdout=subprocess.PIPE) produce_A_thread = threading.Thread(target=produceA,args=(sys.stdin,A_process.stdin)) B_process = subprocess.Popen(['B',stdout=subprocess.PIPE) convert_A_to_B_thread = threading.Thread(target=produceB,args=(A_process.stdout,B_process.stdin)) C_process = subprocess.Popen(['C',stdin=subprocess.PIPE) convert_B_to_C_thread = threading.Thread(target=produceC,args=(B_process.stdout,C_process.stdin)) produce_A_thread.start() convert_A_to_B_thread.start() convert_B_to_C_thread.start() produce_A_thread.join() convert_A_to_B_thread.join() convert_B_to_C_thread.join() A_process.wait() B_process.wait() C_process.wait()
这个想法是标准输入到A_to_C.py:
> A二进制处理一个标准输入块,并使用函数generateA创建A输出.
> B二进制处理A的标准输出块,并通过函数generateB创建B输出.
> C二进制通过函数produceC处理B的标准输出块,并将C输出写入标准输出.
我用cProfile进行了剖析,几乎所有的时间在这个脚本似乎都花在了获取线程锁.
例如,在测试417s作业中,416s(总运行时的99%)用于获取线程锁:
$python
Python 2.6.6 (r266:84292,Nov 21 2013,10:50:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help","copyright","credits" or "license" for more information.
>>> import pstats
>>> p = pstats.Stats('1.profile')
>>> p.sort_stats('cumulative').print_stats(10)
Thu Jun 12 22:19:07 2014 1.profile
1755 function calls (1752 primitive calls) in 417.203 cpu seconds
Ordered by: cumulative time
List reduced from 162 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.020 0.020 417.203 417.203 A_to_C.py:90(<module>)
1 0.000 0.000 417.123 417.123 A_to_C.py:809(main)
6 0.000 0.000 416.424 69.404 /foo/python/2.7.3/lib/python2.7/threading.py:234(wait)
32 416.424 13.013 416.424 13.013 {method 'acquire' of 'thread.lock' objects}
3 0.000 0.000 416.422 138.807 /foo/python/2.7.3/lib/python2.7/threading.py:648(join)
3 0.000 0.000 0.498 0.166 A_to_C.py:473(which)
37 0.000 0.000 0.498 0.013 A_to_C.py:475(is_exe)
3 0.496 0.165 0.496 0.165 {posix.access}
6 0.000 0.000 0.194 0.032 /foo/python/2.7.3/lib/python2.7/subprocess.py:475(_eintr_retry_call)
3 0.000 0.000 0.191 0.064 /foo/python/2.7.3/lib/python2.7/subprocess.py:1286(wait)
我的线程错误我在做什么.Thread和/或subprocess.Popen安排导致这个问题?
解决方法
您对子进程的调用.Popen()隐式指定bufsize的默认值0,这将强制无缓冲的I / O.尝试添加合理的缓冲区大小(4K,16K,甚至1M),看看它是否有所不同.