所以我有一個代碼,它從 14 頁(到目前為止)中刪除礦物的名稱 價格並將其保存到 .txt 檔案中。我首先嘗試僅使用 Page1,然後我想添加更多頁面以獲取更多資料。但隨後程式碼抓取了一些它不應該抓取的東西——隨機名稱/字串。我沒想到它會搶到那個,但它確實搶到了,並且給了這個錯誤的價格!它發生在具有這種“意外名稱”的礦物之後,然後列表中的整個其餘部分都有錯誤的價格。見下圖:
因此,由於該字串與其他字串不同,因此進一步的程式碼無法拆分它並給出錯誤:
cutted2 = split2.pop(1) ^^^^^^^^^^^^^ IndexError: pop index out of range
我試著忽略這些錯誤並使用不同 Stackoverflow 頁面中使用的方法之一:
try: cutted2 = split2.pop(1) except IndexError: continue
它確實有效,沒有出現錯誤......但隨後它為錯誤的礦物分配了錯誤的價格(正如我注意到的)!如何更改程式碼以忽略這些“奇怪”的名稱並繼續列表?下面是完整的程式碼,我記得它停在 URL5 上,並給出了這個彈出索引錯誤:
import requests from bs4 import BeautifulSoup import re def collecter(URL): headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36"} soup = BeautifulSoup(requests.get(URL, headers=headers).text, "lxml") names = [n.getText(strip=True) for n in soup.select("table tr td font a")] prices = [ p.getText(strip=True).split("Price:")[-1] for p in soup.select("table tr td font font") ] names[:] = [" ".join(n.split()) for n in names if not n.startswith("[")] prices[:] = [p for p in prices if p] with open("Minerals.txt", "a+", encoding='utf-8') as file: for name, price in zip(names, prices): # print(f"{name}\n{price}") # print("-" * 50) filename = str(name)+" "+str(price)+"\n" split1 = filename.split(' / ') cutted1 = split1.pop(0) split2 = cutted1.split(": ") try: cutted2 = split2.pop(1) except IndexError: continue two_prices = cutted2+" "+split1.pop(0)+"\n" file.write(two_prices) URL1 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=0" URL2 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=25" URL3 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=50" URL4 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=75" URL5 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=100" URL6 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=125" URL7 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=150" URL8 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=175" URL9 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=200" URL10 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=225" URL11 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=250" URL12 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=275" URL13 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=300" URL14 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=325" collecter(URL1) collecter(URL2) collecter(URL3) collecter(URL4) collecter(URL5) collecter(URL6) collecter(URL7) collecter(URL8) collecter(URL9) collecter(URL10) collecter(URL11) collecter(URL12) collecter(URL13) collecter(URL14)
編輯:這是下面完全有效的程式碼,感謝幫助人員!
import requests from bs4 import BeautifulSoup import re for URL in range(0,2569,25): headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36"} soup = BeautifulSoup(requests.get(f'https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First={URL}', headers=headers).text, "lxml") names = [n.getText(strip=True) for n in soup.select("table tr td font>a")] prices = [p.getText(strip=True).split("Price:")[-1] for p in soup.select("table tr td font>font")] names[:] = [" ".join(n.split()) for n in names if not n.startswith("[") ] prices[:] = [p for p in prices if p] with open("MineralsList.txt", "a+", encoding='utf-8') as file: for name, price in zip(names, prices): # print(f"{name}\n{price}") # print("-" * 50) filename = str(name)+" "+str(price)+"\n" split1 = filename.split(' / ') cutted1 = split1.pop(0) split2 = cutted1.split(": ") cutted2 = split2.pop(1) try: two_prices = cutted2+" "+split1.pop(0)+"\n" except IndexError: two_prices = cutted2+"\n" file.write(two_prices)
但是經過一些更改後,它會因新錯誤而停止- 它無法通過給定屬性找到字符串,因此出現錯誤“IndexError:從空列表中彈出”...甚至soup.select( "table tr td font>font" )
提供了幫助,就像它在「名稱」中所做的那樣
P粉3919557632024-02-22 14:52:43
您可以嘗試下一個範例以及分頁
import requests from bs4 import BeautifulSoup for URL in range(0,100,25): headers = {"User-Agent": "Mozilla/5.0"} soup = BeautifulSoup(requests.get(f'https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First={URL}', headers=headers).text, "lxml") names = [ x.get_text(strip=True) for x in soup.select('table tr td font a')][:25] print(names) prices = [ x.get_text(strip=True) for x in soup.select('table tr td font:nth-child(3)')][:25] print(prices) # with open("Minerals.txt", "a+", encoding='utf-8') as file: # for name, price in zip(names, prices): # # print(f"{name}\n{price}") # # print("-" * 50) # filename = str(name)+" "+str(price)+"\n" # split1 = filename.split(' / ') # cutted1 = split1.pop(0) # split2 = cutted1.split(": ") # try: # cutted2 = split2.pop(1) # except IndexError: # continue # two_prices = cutted2+" "+split1.pop(0)+"\n" # file.write(two_prices)
輸出:
["NX51AH2:\n'lepidolite' after Elbaite with Elbaite", "TH27AL9:\n'Pearceite' with Calcite", "TFM69AN5:\n'Stilbite'", 'SM90CEX:\nAcanthite', 'TMA97AN5:\nAcanthite', 'TB90AE8:\n Acanthite', 'TZ71AK9:\nAcanthite', 'EC63G1:\nAcanthite', 'MN56K9:\nAcanthite', 'TF89AL3:\nAcanthite (Se-bearing) with Polybasite (Se-bearing) and Calcite', 'TP66AJ8:\nAcanthite (Se-bearing) with Pyrite', 'TY86AN2:\nAcanthite after Polybasite', 'TA66AF6:\nAcanthite with Calcite', 'JFD104AO2:\nAcanthite with Calcite', 'TX36AL6:\nAcanthite with Calcite', 'TA48AH1:\nAcanthite with Chalcopyrite', 'EF89L9:\nAcanthite with Pyrite and Calcite', 'TX89AN0:\nAcanthite with Siderite and Proustite', 'EA56K0:\nAcanthite with Silver', 'EC48K0:\nAcanthite with Silver', '11AT12:\nAcanthite, Calcite', '9EF89L9:\nAcanthite, Pyrite, Calcite', 'SM75TDA:\nAdamite', '2M14:\nAdamite', '20MJX66:\nAdamite'] ['Price:€580 / US8 / ¥84010 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€450 / US4 / ¥65180 / AUD0', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€540 / US7 / ¥78220 / AUD0', 'Price:€580 / US8 / ¥84010 / AUD0', 'Price:€85 / US / ¥12310 / AUD0', 'Price:€155 / US9 / ¥22450 / AUD0', 'Price:€460 / US4 / ¥66630 / AUD0', 'Price:€1500 / US47 / ¥217290 / AUD10', 'Price:€1600 / US51 / ¥231770 / AUD60', 'Price:€160 / US5 / ¥23170 / AUD0', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€1200 / US38 / ¥173830 / AUD50', 'Price:€290 / US9 / ¥42000 / AUD0', 'Price:€480 / US5 / ¥69530 / AUD0', 'Price:€4800 / US53 / ¥695320 / AUD00', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€290 / US9 / ¥42000 / AUD0', 'Price:€70 / US / ¥10140 / AUD0', 'Price:€320 / US0 / ¥46350 / AUD0', 'Price:€75 / US / ¥10860 / AUD0', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD5'] ['5TD76M9:\nAdamite', 'MA54AE9:\nAdamite (variety Cu-bearing adamite) with Calcite', 'EA11Y6:\nAdamite (variety cuprian)', 'EB14Y6:\nAdamite (variety cuprian)', 'MC11X8:\nAdamite (variety cuprian) with Smithsonite', 'JRM10AN8:\nAegirine', 'MFA46AP3:\nAegirine with Zircon, Orthoclase and Quartz (variety smoky)', 'EM48AF8:\nAlabandite with Calcite', 'MC92T6:\nAlabandite with Calcite and Rhodochrosite', 'TF16AN1:\nAlabandite with Rhodochrosite', 'TX17S1:\nAlabandite with Rhodochrosite', 'TD34S1:\nAlabandite with Rhodochrosite', '10TR46:\nAlmandine', 'HM90EJ:\nAnalcime', 'EFH36AP3:\nAnalcime with Natrolite, Rhodochrosite and Serandite', 'ELR67AP1:\nAnalcime with Quartz', 'EML88AP1:\nAnalcime with Quartz', 'TF87AF4:\nAndorite', 'TR88AJ3:\nAndorite', 'ND56AN0:\nAndorite with Zinkenite', 'SM180NH:\nAndradite (variety demantoid)', 'MT86AL3:\nAndradite (variety demantoid) with Calcite', 'MA27AL7:\nAndradite (variety demantoid) with Calcite', 'TC80TL:\nAndradite (variety topazolite) with Clinochlore', 'TC85TE:\nAndradite (variety topazolite) with Clinochlore'] ['Price:€180 / US5 / ¥26070 / AUD0', 'Price:€840 / US6 / ¥121680 / AUD90', 'Price:€60 / US / ¥8690 / AUD', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€70 / US / ¥10140 / AUD0', 'Price:€580 / US8 / ¥84010 / AUD0', 'Price:€1600 / US51 / ¥231770 / AUD68', 'Price:€2700 / US86 / ¥391120 / AUD60', 'Price:€740 / US3 / ¥107190 / AUD40', 'Price:€110 / US3 / ¥15930 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€920 / US9 / ¥133270 / AUD10', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€130 / US4 / ¥18830 / AUD0', 'Price:€260 / US8 / ¥37660 / AUD0', 'Price:€380 / US2 / ¥55040 / AUD0', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€390 / US2 / ¥56490 / AUD0', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€180 / US5 / ¥26070 / AUD0', 'Price:€1600 / US51 / ¥231770 / AUD60', 'Price:€2200 / US70 / ¥318690 / AUD90', 'Price:€80 / US / ¥11580 / AUD0', 'Price:€85 / US / ¥12310 / AUD0'] ['T29NAK3:\nAndradite (variety topazolite) with Clinochlore', 'TC85TV:\nAndradite (variety topazolite) with Clinochlore', 'T89GH5:\nAndradite (variety topazolite) with Clinochlore', 'TQ94Q0:\nAndradite (variety topazolite) with Stilbite', 'SM140TFV:\nAndradite on Microcline', 'HM140NG:\nAndradite with Calcite', 'GM66R9:\nAndradite with Clinochlore', 'SM70TYW:\nAndradite with Epidote', 'TC290TVH:\nAndradite with Epidote and Microcline', 'TKX11AO7:\nAndradite with Microcline', 'TC2100TEJ:\nAndradite with Microcline', 'TH16AN2:\nAndradite with Microcline', 'TTX66AO7:\nAndradite with Microcline', 'TC2150TJL:\nAndradite with Microcline', 'TQ96AN2:\nAndradite with Microcline', 'TF48AF2:\nAnglesite', 'MA47AL4:\nAnglesite with Galena', 'LQ88AE6:\nAnglesite with Galena', 'ER90AL8:\nAnglesite with Galena', 'TP70AE1:\nAnglesite with Galena', 'N54NAL5:\nAnglesite with Galena', 'GV96R9:\nAnhydrite with Calcite and Pyrite', '11TV99:\nAnhydrite, Calcite', 'MG26AL4:\nAnorthoroselite with Calcite', 'XM260NFF:\nAragonite'] ['Price:€240 / US7 / ¥34760 / AUD0', 'Price:€85 / US / ¥12310 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€980 / US11 / ¥141960 / AUD10', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€160 / US5 / ¥23170 / AUD0', 'Price:€70 / US / ¥10140 / AUD0', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€70 / US / ¥10140 / AUD0', 'Price:€100 / US3 / ¥14480 / AUD0', 'Price:€110 / US3 / ¥15930 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€380 / US2 / ¥55040 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€360 / US1 / ¥52140 / AUD0', 'Price:€540 / US7 / ¥78220 / AUD0', 'Price:€540 / US7 / ¥78220 / AUD0', 'Price:€940 / US9 / ¥136160 / AUD50', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€460 / US4 / ¥66630 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€60 / US / ¥8690 / AUD'] ['XM295EAR:\nAragonite', 'ETE46AP2:\nAragonite', 'EXM26AP0:\nAragonite', 'EYB26AP0:\nAragonite', 'EXE56AP2:\nAragonite', 'ETF46AP0:\nAragonite', 'XM2160ERF:\nAragonite', 'EXM46AP0:\nAragonite', 'XM2190MEX:\nAragonite', 'XM2780EFT:\nAragonite', 'EHM93AO9:\nAragonite', 'TYB37AO8:\nAragonite (variety Cu-bearing aragonite)', 'SM99AM3:\nAragonite (variety cuprian)', '1M06:\nAragonite (variety flos ferri)', 'TG69AL3:\nAragonite (variety tarnowitzite)', 'MLC96AO2:\nAragonite on Calcite', 'MLE68AO2:\nAragonite on Calcite', 'MTB66AP3:\nAragonite with Quartz (variety hematoide)', 'MXF96AP3:\nAragonite with Quartz (variety hematoide)', 'MRR47AP3:\nAragonite with Quartz (variety hematoide)', 'MTR37AP3:\nAragonite with Quartz (variety hematoide)', 'JFD193AP3:\nArfvedsonite with Microcline', 'TFX76AO7:\nArsenopyrite with Calcite, Pyrite, Sphalerite and Rhodochrosite', 'NB37AL3:\nArsenopyrite with Muscovite', 'HM220NX:\nArsenopyrite with Muscovite'] ['Price:€95 / US / ¥13760 / AUD6', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€160 / US5 / ¥23170 / AUD6', 'Price:€160 / US5 / ¥23170 / AUD0', 'Price:€190 / US6 / ¥27520 / AUD3', 'Price:€780 / US4 / ¥112990 / AUD03', 'Price:€880 / US8 / ¥127470 / AUD50', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€480 / US5 / ¥69530 / AUD0', 'Price:€100 / US3 / ¥14480 / AUD0', 'Price:€460 / US4 / ¥66630 / AUD0', 'Price:€190 / US6 / ¥27520 / AUD0', 'Price:€360 / US1 / ¥52140 / AUD0', 'Price:€160 / US5 / ¥23170 / AUD6', 'Price:€190 / US6 / ¥27520 / AUD3', 'Price:€230 / US7 / ¥33310 / AUD4', 'Price:€230 / US7 / ¥33310 / AUD4', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€170 / US5 / ¥24620 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0']
P粉6776848762024-02-22 00:54:41
您只需使 CSS 選擇器更具體,以便僅識別直接位於字體元素內部(而不是向下幾層)的連結:
soup.select("table tr td font>a")
新增進一步的條件,即連結指向單一項目而不是頁面底部的下一頁/上一頁連結也將有所幫助:
soup.select("table tr td font>a[href*='CODE']")