
Heim  >  Fragen und Antworten  >  Hauptteil

Scrapy INSERT in „new_table“ nur, wenn in „aktueller Tabelle“ kein Datensatz vorhanden ist

Ich habe es mit Website-Scraping versucht. Ich habe die Daten erfolgreich aus der aktuellen Datenbanktabelle gelöscht. Aber ich möchte „new_table“ nur einfügen, wenn der Datensatz nicht in der „aktuellen Tabelle“ existiert

Mein Code ist (Pipeline)

table = 'products'
table2 = 'new_products'`
def save(self, row): 

    cursor = self.cnx.cursor()
    cursor.execute("SELECT DISTINCT product_id FROM products;")
    old_ids = [row[0] for row in cursor.fetchall()]
    create_query = ("INSERT INTO " + self.table + 
        "(rowid, date, listing_id, product_id, product_name, price, url) "
        "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")

    cursor.execute(create_query, row)
    lastRecordId = cursor.lastrowid

    print("Item saved with ID: {}" . format(lastRecordId))

    if not product_id in old_ids:
        create_query = ("INSERT INTO " + self.table2 + 
            "(rowid, date, listing_id, product_id, product_name, price, url) "
            "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")

Es funktioniert nicht richtig und es liegt ein Fehler vor.

2022-05-06 12:26:57 [scrapy.core.scraper] ERROR: Error processing {'date': '2022-05-06 12:26:57.575507',
 'listing_id': '0190199600119',
 'price': '4199.00',
 'product_id': '1209298',
 'product_name': 'APPLE 11" Magic Türkçe Q Klavye Siyah',
 'rowid': 456274953331128512,
 'url': ''}
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/", line 162, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "/root/teknosa/teknosa/", line 28, in process_item
  File "/root/teknosa/teknosa/", line 62, in save
    if not product_id in old_ids:
NameError: name 'product_id' is not defined
Saving item into db ...

Ich habe eine eindeutige Produkt-ID.

Wenn in der aktuellen Tabelle keine Produkt-ID vorhanden ist, fügen Sie diese Produkt-ID in „neue_Produkte“ ein

Wie macht man das?

Vielen Dank.

Letzte Änderung: Ich erhalte diese Fehlermeldung.

2022-05-07 18:17:11 [scrapy.core.scraper] ERROR: Error processing {'date': '2022-05-07 18:17:11.902622',
 'listing_id': '8713439219357',
 'price': '99.00',
 'product_id': '1175529',
 'product_name': 'TRUST 21935 NANGA USB 3.1 Kart Okuyucu',
 'rowid': -411152717288573423,
 'url': ''}
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/mysql/connector/", line 523, in cmd_query
_mysql_connector.MySQLInterfaceError: Duplicate entry '-411152717288573423' for key 'products.rowid'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/", line 162, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "/root/teknosa/teknosa/", line 28, in process_item
  File "/root/teknosa/teknosa/", line 69, in save
    cursor.execute(create_query, row)
  File "/usr/local/lib/python3.8/dist-packages/mysql/connector/", line 269, in execute
    result = self._cnx.cmd_query(stmt, raw=self._raw,
  File "/usr/local/lib/python3.8/dist-packages/mysql/connector/", line 528, in cmd_query
    raise errors.get_mysql_exception(exc.errno, msg=exc.msg,
mysql.connector.errors.IntegrityError: 1062 (23000): Duplicate entry '-411152717288573423' for key 'products.rowid'

P粉122932466P粉122932466291 Tage vor417

Antworte allen(1)Ich werde antworten

  • P粉278379495

    P粉2783794952024-03-30 00:30:37




    table = 'products'
    table2 = 'new_products'`
    def save(self, row):  
        create_query = ("INSERT INTO " + self.table + 
            "(rowid, date, listing_id, product_id, product_name, price, url) "
            "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")
        cursor.execute(create_query, row)
        lastRecordId = cursor.lastrowid
        print("Item saved with ID: {}" . format(lastRecordId))
        create_query = ("INSERT INTO " + self.table2 + 
                "(rowid, date, listing_id, product_id, product_name, price, url) "
                "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s) ON DUPLICATE KEY UPDATE product_id=product_id")
        cursor.execute(create_query, row)

    如果您使用ON DUPLICATE KEY,当它发现重复行(已存在的product_id)时,系统会尝试将product_id更新为相同的product_id,因此不会生效。

    如果设置 autocommit= True,则可以删除这些提交。



    您需要更改行 old_ids = [row[0] for row incursor.fetchall()] 中的变量名称,因为您正在更改 row 参数的值 2.你的问题出在if语句中,product_id变量不存在,需要修改

    table = 'products'
    table2 = 'new_products'`
    def save(self, row):     
        cursor = self.cnx.cursor()
        cursor.execute("SELECT DISTINCT product_id FROM products;")
        old_ids = [element[0] for element in cursor.fetchall()]
        create_query = ("INSERT INTO " + self.table + 
            "(rowid, date, listing_id, product_id, product_name, price, url) "
            "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")
        cursor.execute(create_query, row)
        lastRecordId = cursor.lastrowid
        print("Item saved with ID: {}" . format(lastRecordId))
       if not row['product_id'] in old_ids:
            create_query = ("INSERT INTO " + self.table2 + 
                "(rowid, date, listing_id, product_id, product_name, price, url) "
                "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")

  • StornierenAntwort