This is the code for sequential execution of a single process:
import requests,time,os,random
def img_down(url):
with open("{}".format(str(random.random())+os.path.basename(url)),"wb") as fob:
fob.write(requests.get(url).content)
urllist=[]
with open("urllist.txt","r+") as u:
for a in u.readlines():
urllist.append(a.strip())
s=time.clock()
for i in range(len(urllist)):
img_down(urllist[i])
e=time.clock()
print ("time: %d" % (e-s))
This is the code for multi-process:
from multiprocessing import Pool
import requests,os,time,random
def img_down(url):
with open("{}".format(str(random.random())+os.path.basename(url)),"wb") as fob:
fob.write(requests.get(url).content)
if __name__=="__main__":
urllist=[]
with open("urllist.txt","r+") as urlfob:
for s in urlfob.readlines():
urllist.append(s.strip())
s=time.clock()
p=Pool()
for i in range(len(urllist)):
p.apply_async(img_down,args=(urllist[i],))
p.close()
p.join()
e=time.clock()
print ("time: {}".format(e-s))
But there is almost no difference between the time spent in single process and multi-process. The problem is probably that requests block IO. Is your understanding correct? How to modify the code to achieve the purpose of multi-process?
Thanks!
phpcn_u15822017-06-22 11:54:30
The bottleneck of writing files is disk IO, not CPU. Parallelism does not have much effect. You can try not to write files and then compare the times
怪我咯2017-06-22 11:54:30
Pool without parameters uses
os.cpu_count() or 1
If it is a single-core CPU, or the number cannot be collected, there is only one process.
That should be the reason.