Scrapy INSERT in „new_table“ nur, wenn in „aktueller Tabelle“ kein Datensatz vorhanden ist

Question

Ich habe es mit Website-Scraping versucht. Ich habe die Daten erfolgreich aus der aktuellen Datenbanktabelle gelöscht. Aber ich möchte „new_table“ nur einfügen, wenn der Datensatz nicht in „aktuelle Tabelle“ vorhanden ist. Mein Code ist (pipeline) table='products' table2='new_products'`defsave(self,row):cursor=self.cnx Cursor ()cursor.execute("SELECTDISTINCTproduct_idFROMpr

P粉278379495 · Answer

如果您只想在不存在的情况下插入，则无需执行您正在执行的操作。无需全选然后查看您要查找的那个是否存在。

您需要的是为表2中的produc_id创建一个唯一索引

然后将代码更改为：

table = 'products'
table2 = 'new_products'`

def save(self, row):  
    create_query = ("INSERT INTO " + self.table + 
        "(rowid, date, listing_id, product_id, product_name, price, url) "
        "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")

    cursor.execute(create_query, row)
    lastRecordId = cursor.lastrowid

    self.cnx.commit()
    print("Item saved with ID: {}" . format(lastRecordId))
    create_query = ("INSERT INTO " + self.table2 + 
            "(rowid, date, listing_id, product_id, product_name, price, url) "
            "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s) ON DUPLICATE KEY UPDATE product_id=product_id")
    cursor.execute(create_query, row)
    self.cnx.commit()

如果您使用ON DUPLICATE KEY，当它发现重复行（已存在的product_id）时，系统会尝试将product_id更新为相同的product_id，因此不会生效。

如果设置 autocommit= True，则可以删除这些提交。

编辑

如果正如您在评论中所说，仅当表中不存在时才需要插入新表，您可以像这样更改代码：

您需要更改行 old_ids = [row[0] for row incursor.fetchall()] 中的变量名称，因为您正在更改 row 参数的值 2.你的问题出在if语句中，product_id变量不存在，需要修改

table = 'products'
table2 = 'new_products'`

def save(self, row):     
    cursor = self.cnx.cursor()
    cursor.execute("SELECT DISTINCT product_id FROM products;")
    old_ids = [element[0] for element in cursor.fetchall()]
    create_query = ("INSERT INTO " + self.table + 
        "(rowid, date, listing_id, product_id, product_name, price, url) "
        "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")

    cursor.execute(create_query, row)
    lastRecordId = cursor.lastrowid

    self.cnx.commit()
    cursor.close()
    print("Item saved with ID: {}" . format(lastRecordId))

 

   if not row['product_id'] in old_ids:
        create_query = ("INSERT INTO " + self.table2 + 
            "(rowid, date, listing_id, product_id, product_name, price, url) "
            "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")

Scrapy INSERT in „new_table“ nur, wenn in „aktueller Tabelle“ kein Datensatz vorhanden ist

Antworte allen(1)Ich werde antworten