在写爬虫中,我要把爬取到的数据存到数据库中.每一个页面里边有很多条目,比如一个人的访客可能有很多个,于是插入卸载循环中,
try:
sql_visitor='INSERT INTO visitor (ownername,owneruid,visitorname,visitoruid,visittime) VALUE ("%s",%d,"%s",%d,"%s")'%(ownername,owneruid,visitorname,visitoruid,visitortime)
print sql_visitor
self.cursor.execute(sql_visitor)
self.connect.commit()
except Exception as e:
print e
一个页面一个线程,嫌弃慢的我开了5个
max_threads=5
while uid < 8000000 or threadlist:
for thread1 in threadlist:
if not thread1.is_alive():
threadlist.remove(thread1)
while len(threadlist) < max_threads and uid < 8000000:
uid+=1
thread2=threading.Thread(target=run,args=(uid,))
thread2.setDaemon(True)
thread2.start()
threadlist.append(thread2)
time.sleep(5)
运行很顺利:
INSERT INTO visitor (ownername,owneruid,visitorname,visitoruid,visittime) VALUE ("huosai7",4893,"Liang2017",7252799,"2017-5-22 21:06")
INSERT INTO personalinfo (ownername,owneruid,jifen,huajiao,xiaomijiao,jinbi,haoyou,zhuti,rizhi,xiangce,fenxiang,kongjianfangwenliang,youxiangyanzheng,shipinrenzheng,juzhudi,chushengdi,shangcifabiaoshijian,shangcihuodongshijian,zuihoufangwen,zhuceshijian,zaixianshijian,shengri,xingbie) VALUE("huosai7",4893,0,0,0,0,0,0,0,0,0,0,0,0,"","","2100-01-01 12:00","2100-01-01 12:00","2100-01-01 12:00","2004-1-3 19:28",0,"2100-01-01 12:00",0)
INSERT INTO visitor (ownername,owneruid,visitorname,visitoruid,visittime) VALUE ("龙乐",4894,"Liang2017",7252799,"2017-5-22 21:06")
(1062, "Duplicate entry '4894-7252799-2017-05-22 21:06:00' for key 'PRIMARY'")
INSERT INTO personalinfo (ownername,owneruid,jifen,huajiao,xiaomijiao,jinbi,haoyou,zhuti,rizhi,xiangce,fenxiang,kongjianfangwenliang,youxiangyanzheng,shipinrenzheng,juzhudi,chushengdi,shangcifabiaoshijian,shangcihuodongshijian,zuihoufangwen,zhuceshijian,zaixianshijian,shengri,xingbie) VALUE("龙乐",4894,0,0,0,0,0,0,0,0,0,0,0,0,"","","2100-01-01 12:00","2100-01-01 12:00","2100-01-01 12:00","2004-1-3 20:21",0,"2100-01-01 12:00",0)
.......
于是我将max_thread设置成10,于是结果如下:
INSERT INTO visitor (ownername,owneruid,visitorname,visitoruid,visittime) VALUE ("xiao61",4889,"Liang2017",7252799,"2017-5-22 21:06")
(2006, 'MySQL server has gone away')
INSERT INTO personalinfo (ownername,owneruid,jifen,huajiao,xiaomijiao,jinbi,haoyou,zhuti,rizhi,xiangce,fenxiang,kongjianfangwenliang,youxiangyanzheng,shipinrenzheng,juzhudi,chushengdi,shangcifabiaoshijian,shangcihuodongshijian,zuihoufangwen,zhuceshijian,zaixianshijian,shengri,xingbie) VALUE("xiao61",4889,0,0,0,0,0,0,0,0,0,0,0,0,"","","2100-01-01 12:00","2100-01-01 12:00","2100-01-01 12:00","2004-1-3 15:56",0,"2100-01-01 12:00",0)
(2006, 'MySQL server has gone away')
INSERT INTO visitor (ownername,owneruid,visitorname,visitoruid,visittime) VALUE ("糊涂酷酷熊",4897,"Liang2017",7252799,"2017-5-22 21:06")
(2006, 'MySQL server has gone away')
INSERT INTO personalinfo (ownername,owneruid,jifen,huajiao,xiaomijiao,jinbi,haoyou,zhuti,rizhi,xiangce,fenxiang,kongjianfangwenliang,youxiangyanzheng,shipinrenzheng,juzhudi,chushengdi,shangcifabiaoshijian,shangcihuodongshijian,zuihoufangwen,zhuceshijian,zaixianshijian,shengri,xingbie) VALUE("糊涂酷酷熊",4897,611,0,1655,0,0,2,0,0,0,34,0,0,"","","2007-3-27 00:37","2007-3-27 00:37","2007-3-27 00:37","2004-1-3 21:08",0,"2100-01-01 12:00",1)
(2006, 'MySQL server has gone away')
.......
可以看出2006出来了,然后我将max_thread设置成30,然后结果如下:
就将,够详细吗?不够详细还需要什么只管说!
巴扎黑2017-06-13 09:26:41
看这里,我猜你是用的是pymysql,它的线程安全描述为1,对应的pep249里面做了详细的描述:
Threads may share the module, but not connections.
线程可以共享模块但不能共享连接。这也就是说你可能得在每个线程中创建一个连接。
呐~为什么不用orm来做呢?