search

Home  >  Q&A  >  body text

python多线程爬虫往mysql里面写数据导致死锁

我在爬虫里面用的是MySQLdb这个包进行insert操作,我一开始是所有子线程公用一个mysql连接,结果发现数据写不进去,然后我又试过所有子线程公用一个cursor游标,出现了只有部分数据写进去了,而且自增id居然还自增了。(也就是数据没有写入进去,但是自增id的自增数增加了),最后我干脆每个子线程在每一次写入的时候创建一个连接句柄,然后就出现了如图症状,请问这该怎么办啊?有什么优化方法吗?

PHP中文网PHP中文网2811 days ago437

reply all(3)I'll reply

  • 怪我咯

    怪我咯2017-04-18 10:12:53

    Thanks for the invitation. Metadata Lock (MDL) was introduced into mysql's metadata protection mechanism in 5.5. 5.5 The protection of metadata is at the transaction level, and the MDL is only released after the transaction ends.
    When a session performs a DML operation in the main database and has not yet submitted, another session performs a DDL operation on the same object such as drop table. Since MySQL's binlog records based on the order of transaction submission, the slave database When applying, there will be a situation where the table is dropped first and then inserted into the table, causing an error in the application from the library. Therefore, MySQL introduced the Metadata lock after version 5.5.3. The Metadata lock will only be released after the transaction ends, so DDL operations cannot be performed before the transaction is committed or rolled back.
    The causes of Waiting for table metadata lock are generally the following simple scenarios:
    Scenario 1: Long transaction running, blocking DDL, and then blocking all subsequent operations on the same table
    Scenario 2: Uncommitted transactions, blocking DDL, Then block all subsequent operations on the same table

    So how to solve it?
    View all transaction locks in waiting blocked state

    USE INFORMATION_SCHEMA
    SELECT * 
    FROM INNODB_LOCKS 
    WHERE LOCK_TRX_ID IN (SELECT BLOCKING_TRX_ID FROM INNODB_LOCK_WAITS);
    

    Or you can directly target a specific table

    SELECT * FROM INNODB_LOCKS 
    WHERE LOCK_TABLE = db_name.table_name;

    Deleting the relevant records directly should unlock the lock.

    Recommendation: Use a connection pool instead of sharing a connection for all threads, and remember to commit or rollback each transaction.
    Recommended reference:
    https://dev.mysql.com/doc/ref...
    http://www.cnblogs.com/cchust...
    https://gold.xitu.io/entry/57 ...
    http://www.cnblogs.com/digdee...

    reply
    0
  • 天蓬老师

    天蓬老师2017-04-18 10:12:53

    The table is from Innodb, but automatic submission is not enabled and there is no manual submission.

    reply
    0
  • 黄舟

    黄舟2017-04-18 10:12:53

    First of all, multiple threads share a connection, which is equivalent to concurrency. Many threads need to wait.
    The utilization rate of one connection per thread is not high, so it is generally a connection pool. There is no need for recycling, and the connection is always maintained, avoiding frequent disconnections.

    With pictures, whether the subject has opened the transaction

    reply
    0
  • Cancelreply