Heim > Fragen und Antworten > Hauptteil
Ich habe also einen Code, der den Namen und den Preis eines Minerals aus (bisher) 14 Seiten entfernt und in einer TXT-Datei speichert. Ich habe zuerst versucht, nur Seite1 zu verwenden, dann wollte ich weitere Seiten hinzufügen, um mehr Daten zu erhalten. Aber dann greift der Code auf etwas, das er nicht sollte – zufällige Namen/Strings. Ich hatte nicht damit gerechnet, dass es dieses Exemplar ergattern würde, aber es tat es und gab diesem Exemplar den falschen Preis! Es passiert, wenn ein Mineral diesen „unerwarteten Namen“ hat und dann der gesamte Rest der Liste den falschen Preis hat. Siehe unten:
Da sich diese Zeichenfolge von anderen Zeichenfolgen unterscheidet, kann der weitere Code sie nicht aufteilen und gibt die Fehlermeldung aus:
cutted2 = split2.pop(1) ^^^^^^^^^^^^^ IndexError: pop index out of range
Ich habe versucht, diese Fehler zu ignorieren und eine der Methoden zu verwenden, die auf verschiedenen Stackoverflow-Seiten verwendet werden:
try: cutted2 = split2.pop(1) except IndexError: continue
Es hat tatsächlich funktioniert, es sind keine Fehler aufgetreten...aber dann hat es dem falschen Mineral den falschen Preis zugewiesen (wie mir aufgefallen ist)! Wie kann ich den Code ändern, um diese „seltsamen“ Namen zu ignorieren und mit der Liste fortzufahren? Unten ist der vollständige Code. Ich erinnere mich, dass er bei URL5 angehalten hat und diesen Popup-Indexfehler ausgegeben hat:
import requests from bs4 import BeautifulSoup import re def collecter(URL): headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36"} soup = BeautifulSoup(requests.get(URL, headers=headers).text, "lxml") names = [n.getText(strip=True) for n in soup.select("table tr td font a")] prices = [ p.getText(strip=True).split("Price:")[-1] for p in soup.select("table tr td font font") ] names[:] = [" ".join(n.split()) for n in names if not n.startswith("[")] prices[:] = [p for p in prices if p] with open("Minerals.txt", "a+", encoding='utf-8') as file: for name, price in zip(names, prices): # print(f"{name}\n{price}") # print("-" * 50) filename = str(name)+" "+str(price)+"\n" split1 = filename.split(' / ') cutted1 = split1.pop(0) split2 = cutted1.split(": ") try: cutted2 = split2.pop(1) except IndexError: continue two_prices = cutted2+" "+split1.pop(0)+"\n" file.write(two_prices) URL1 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=0" URL2 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=25" URL3 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=50" URL4 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=75" URL5 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=100" URL6 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=125" URL7 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=150" URL8 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=175" URL9 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=200" URL10 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=225" URL11 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=250" URL12 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=275" URL13 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=300" URL14 = "https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First=325" collecter(URL1) collecter(URL2) collecter(URL3) collecter(URL4) collecter(URL5) collecter(URL6) collecter(URL7) collecter(URL8) collecter(URL9) collecter(URL10) collecter(URL11) collecter(URL12) collecter(URL13) collecter(URL14)
EDIT: Hier ist der voll funktionsfähige Code unten, danke an den Helfer!
import requests from bs4 import BeautifulSoup import re for URL in range(0,2569,25): headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36"} soup = BeautifulSoup(requests.get(f'https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First={URL}', headers=headers).text, "lxml") names = [n.getText(strip=True) for n in soup.select("table tr td font>a")] prices = [p.getText(strip=True).split("Price:")[-1] for p in soup.select("table tr td font>font")] names[:] = [" ".join(n.split()) for n in names if not n.startswith("[") ] prices[:] = [p for p in prices if p] with open("MineralsList.txt", "a+", encoding='utf-8') as file: for name, price in zip(names, prices): # print(f"{name}\n{price}") # print("-" * 50) filename = str(name)+" "+str(price)+"\n" split1 = filename.split(' / ') cutted1 = split1.pop(0) split2 = cutted1.split(": ") cutted2 = split2.pop(1) try: two_prices = cutted2+" "+split1.pop(0)+"\n" except IndexError: two_prices = cutted2+"\n" file.write(two_prices)
Aber nach einigen Änderungen stoppt es mit einem neuen Fehler – es kann die Zeichenfolge anhand der angegebenen Eigenschaft nicht finden, daher der Fehler „IndexError: popping from empty list“ ... sogar soup.select("table tr td font>font" )
hat Hilfe bereitgestellt, wie es in „Name“ der Fall ist.
P粉3919557632024-02-22 14:52:43
您可以尝试下一个示例以及分页
import requests from bs4 import BeautifulSoup for URL in range(0,100,25): headers = {"User-Agent": "Mozilla/5.0"} soup = BeautifulSoup(requests.get(f'https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First={URL}', headers=headers).text, "lxml") names = [ x.get_text(strip=True) for x in soup.select('table tr td font a')][:25] print(names) prices = [ x.get_text(strip=True) for x in soup.select('table tr td font:nth-child(3)')][:25] print(prices) # with open("Minerals.txt", "a+", encoding='utf-8') as file: # for name, price in zip(names, prices): # # print(f"{name}\n{price}") # # print("-" * 50) # filename = str(name)+" "+str(price)+"\n" # split1 = filename.split(' / ') # cutted1 = split1.pop(0) # split2 = cutted1.split(": ") # try: # cutted2 = split2.pop(1) # except IndexError: # continue # two_prices = cutted2+" "+split1.pop(0)+"\n" # file.write(two_prices)
输出:
["NX51AH2:\n'lepidolite' after Elbaite with Elbaite", "TH27AL9:\n'Pearceite' with Calcite", "TFM69AN5:\n'Stilbite'", 'SM90CEX:\nAcanthite', 'TMA97AN5:\nAcanthite', 'TB90AE8:\n Acanthite', 'TZ71AK9:\nAcanthite', 'EC63G1:\nAcanthite', 'MN56K9:\nAcanthite', 'TF89AL3:\nAcanthite (Se-bearing) with Polybasite (Se-bearing) and Calcite', 'TP66AJ8:\nAcanthite (Se-bearing) with Pyrite', 'TY86AN2:\nAcanthite after Polybasite', 'TA66AF6:\nAcanthite with Calcite', 'JFD104AO2:\nAcanthite with Calcite', 'TX36AL6:\nAcanthite with Calcite', 'TA48AH1:\nAcanthite with Chalcopyrite', 'EF89L9:\nAcanthite with Pyrite and Calcite', 'TX89AN0:\nAcanthite with Siderite and Proustite', 'EA56K0:\nAcanthite with Silver', 'EC48K0:\nAcanthite with Silver', '11AT12:\nAcanthite, Calcite', '9EF89L9:\nAcanthite, Pyrite, Calcite', 'SM75TDA:\nAdamite', '2M14:\nAdamite', '20MJX66:\nAdamite'] ['Price:€580 / US$598 / ¥84010 / AUD$890', 'Price:€220 / US$227 / ¥31860 / AUD$330', 'Price:€450 / US$464 / ¥65180 / AUD$690', 'Price:€90 / US$92 / ¥13030 / AUD$130', 'Price:€240 / US$247 / ¥34760 / AUD$370', 'Price:€540 / US$557 / ¥78220 / AUD$830', 'Price:€580 / US$598 / ¥84010 / AUD$890', 'Price:€85 / US$87 / ¥12310 / AUD$130', 'Price:€155 / US$159 / ¥22450 / AUD$230', 'Price:€460 / US$474 / ¥66630 / AUD$700', 'Price:€1500 / US$1547 / ¥217290 / AUD$2310', 'Price:€1600 / US$1651 / ¥231770 / AUD$2460', 'Price:€160 / US$165 / ¥23170 / AUD$240', 'Price:€240 / US$247 / ¥34760 / AUD$370', 'Price:€1200 / US$1238 / ¥173830 / AUD$1850', 'Price:€290 / US$299 / ¥42000 / AUD$440', 'Price:€480 / US$495 / ¥69530 / AUD$740', 'Price:€4800 / US$4953 / ¥695320 / AUD$7400', 'Price:€150 / US$154 / ¥21720 / AUD$230', 'Price:€290 / US$299 / ¥42000 / AUD$440', 'Price:€70 / US$72 / ¥10140 / AUD$100', 'Price:€320 / US$330 / ¥46350 / AUD$490', 'Price:€75 / US$77 / ¥10860 / AUD$110', 'Price:€90 / US$92 / ¥13030 / AUD$130', 'Price:€140 / US$144 / ¥20280 / AUD$215'] ['5TD76M9:\nAdamite', 'MA54AE9:\nAdamite (variety Cu-bearing adamite) with Calcite', 'EA11Y6:\nAdamite (variety cuprian)', 'EB14Y6:\nAdamite (variety cuprian)', 'MC11X8:\nAdamite (variety cuprian) with Smithsonite', 'JRM10AN8:\nAegirine', 'MFA46AP3:\nAegirine with Zircon, Orthoclase and Quartz (variety smoky)', 'EM48AF8:\nAlabandite with Calcite', 'MC92T6:\nAlabandite with Calcite and Rhodochrosite', 'TF16AN1:\nAlabandite with Rhodochrosite', 'TX17S1:\nAlabandite with Rhodochrosite', 'TD34S1:\nAlabandite with Rhodochrosite', '10TR46:\nAlmandine', 'HM90EJ:\nAnalcime', 'EFH36AP3:\nAnalcime with Natrolite, Rhodochrosite and Serandite', 'ELR67AP1:\nAnalcime with Quartz', 'EML88AP1:\nAnalcime with Quartz', 'TF87AF4:\nAndorite', 'TR88AJ3:\nAndorite', 'ND56AN0:\nAndorite with Zinkenite', 'SM180NH:\nAndradite (variety demantoid)', 'MT86AL3:\nAndradite (variety demantoid) with Calcite', 'MA27AL7:\nAndradite (variety demantoid) with Calcite', 'TC80TL:\nAndradite (variety topazolite) with Clinochlore', 'TC85TE:\nAndradite (variety topazolite) with Clinochlore'] ['Price:€180 / US$185 / ¥26070 / AUD$270', 'Price:€840 / US$866 / ¥121680 / AUD$1290', 'Price:€60 / US$61 / ¥8690 / AUD$90', 'Price:€90 / US$92 / ¥13030 / AUD$130', 'Price:€70 / US$72 / ¥10140 / AUD$100', 'Price:€580 / US$598 / ¥84010 / AUD$890', 'Price:€1600 / US$1651 / ¥231770 / AUD$2468', 'Price:€2700 / US$2786 / ¥391120 / AUD$4160', 'Price:€740 / US$763 / ¥107190 / AUD$1140', 'Price:€110 / US$113 / ¥15930 / AUD$160', 'Price:€220 / US$227 / ¥31860 / AUD$330', 'Price:€920 / US$949 / ¥133270 / AUD$1410', 'Price:€140 / US$144 / ¥20280 / AUD$210', 'Price:€90 / US$92 / ¥13030 / AUD$130', 'Price:€130 / US$134 / ¥18830 / AUD$200', 'Price:€260 / US$268 / ¥37660 / AUD$400', 'Price:€380 / US$392 / ¥55040 / AUD$580', 'Price:€240 / US$247 / ¥34760 / AUD$370', 'Price:€390 / US$402 / ¥56490 / AUD$600', 'Price:€150 / US$154 / ¥21720 / AUD$230', 'Price:€180 / US$185 / ¥26070 / AUD$270', 'Price:€1600 / US$1651 / ¥231770 / AUD$2460', 'Price:€2200 / US$2270 / ¥318690 / AUD$3390', 'Price:€80 / US$82 / ¥11580 / AUD$120', 'Price:€85 / US$87 / ¥12310 / AUD$130'] ['T29NAK3:\nAndradite (variety topazolite) with Clinochlore', 'TC85TV:\nAndradite (variety topazolite) with Clinochlore', 'T89GH5:\nAndradite (variety topazolite) with Clinochlore', 'TQ94Q0:\nAndradite (variety topazolite) with Stilbite', 'SM140TFV:\nAndradite on Microcline', 'HM140NG:\nAndradite with Calcite', 'GM66R9:\nAndradite with Clinochlore', 'SM70TYW:\nAndradite with Epidote', 'TC290TVH:\nAndradite with Epidote and Microcline', 'TKX11AO7:\nAndradite with Microcline', 'TC2100TEJ:\nAndradite with Microcline', 'TH16AN2:\nAndradite with Microcline', 'TTX66AO7:\nAndradite with Microcline', 'TC2150TJL:\nAndradite with Microcline', 'TQ96AN2:\nAndradite with Microcline', 'TF48AF2:\nAnglesite', 'MA47AL4:\nAnglesite with Galena', 'LQ88AE6:\nAnglesite with Galena', 'ER90AL8:\nAnglesite with Galena', 'TP70AE1:\nAnglesite with Galena', 'N54NAL5:\nAnglesite with Galena', 'GV96R9:\nAnhydrite with Calcite and Pyrite', '11TV99:\nAnhydrite, Calcite', 'MG26AL4:\nAnorthoroselite with Calcite', 'XM260NFF:\nAragonite'] ['Price:€240 / US$247 / ¥34760 / AUD$370', 'Price:€85 / US$87 / ¥12310 / AUD$130', 'Price:€220 / US$227 / ¥31860 / AUD$330', 'Price:€980 / US$1011 / ¥141960 / AUD$1510', 'Price:€140 / US$144 / ¥20280 / AUD$210', 'Price:€140 / US$144 / ¥20280 / AUD$210', 'Price:€160 / US$165 / ¥23170 / AUD$240', 'Price:€70 / US$72 / ¥10140 / AUD$100', 'Price:€90 / US$92 / ¥13030 / AUD$130', 'Price:€70 / US$72 / ¥10140 / AUD$100', 'Price:€100 / US$103 / ¥14480 / AUD$150', 'Price:€110 / US$113 / ¥15930 / AUD$160', 'Price:€140 / US$144 / ¥20280 / AUD$210', 'Price:€150 / US$154 / ¥21720 / AUD$230', 'Price:€220 / US$227 / ¥31860 / AUD$330', 'Price:€380 / US$392 / ¥55040 / AUD$580', 'Price:€220 / US$227 / ¥31860 / AUD$330', 'Price:€360 / US$371 / ¥52140 / AUD$550', 'Price:€540 / US$557 / ¥78220 / AUD$830', 'Price:€540 / US$557 / ¥78220 / AUD$830', 'Price:€940 / US$969 / ¥136160 / AUD$1450', 'Price:€220 / US$227 / ¥31860 / AUD$330', 'Price:€460 / US$474 / ¥66630 / AUD$700', 'Price:€140 / US$144 / ¥20280 / AUD$210', 'Price:€60 / US$61 / ¥8690 / AUD$92'] ['XM295EAR:\nAragonite', 'ETE46AP2:\nAragonite', 'EXM26AP0:\nAragonite', 'EYB26AP0:\nAragonite', 'EXE56AP2:\nAragonite', 'ETF46AP0:\nAragonite', 'XM2160ERF:\nAragonite', 'EXM46AP0:\nAragonite', 'XM2190MEX:\nAragonite', 'XM2780EFT:\nAragonite', 'EHM93AO9:\nAragonite', 'TYB37AO8:\nAragonite (variety Cu-bearing aragonite)', 'SM99AM3:\nAragonite (variety cuprian)', '1M06:\nAragonite (variety flos ferri)', 'TG69AL3:\nAragonite (variety tarnowitzite)', 'MLC96AO2:\nAragonite on Calcite', 'MLE68AO2:\nAragonite on Calcite', 'MTB66AP3:\nAragonite with Quartz (variety hematoide)', 'MXF96AP3:\nAragonite with Quartz (variety hematoide)', 'MRR47AP3:\nAragonite with Quartz (variety hematoide)', 'MTR37AP3:\nAragonite with Quartz (variety hematoide)', 'JFD193AP3:\nArfvedsonite with Microcline', 'TFX76AO7:\nArsenopyrite with Calcite, Pyrite, Sphalerite and Rhodochrosite', 'NB37AL3:\nArsenopyrite with Muscovite', 'HM220NX:\nArsenopyrite with Muscovite'] ['Price:€95 / US$98 / ¥13760 / AUD$146', 'Price:€140 / US$144 / ¥20280 / AUD$210', 'Price:€140 / US$144 / ¥20280 / AUD$210', 'Price:€140 / US$144 / ¥20280 / AUD$210', 'Price:€150 / US$154 / ¥21720 / AUD$230', 'Price:€150 / US$154 / ¥21720 / AUD$230', 'Price:€160 / US$165 / ¥23170 / AUD$246', 'Price:€160 / US$165 / ¥23170 / AUD$240', 'Price:€190 / US$196 / ¥27520 / AUD$293', 'Price:€780 / US$804 / ¥112990 / AUD$1203', 'Price:€880 / US$908 / ¥127470 / AUD$1350', 'Price:€240 / US$247 / ¥34760 / AUD$370', 'Price:€480 / US$495 / ¥69530 / AUD$740', 'Price:€100 / US$103 / ¥14480 / AUD$150', 'Price:€460 / US$474 / ¥66630 / AUD$700', 'Price:€190 / US$196 / ¥27520 / AUD$290', 'Price:€360 / US$371 / ¥52140 / AUD$550', 'Price:€160 / US$165 / ¥23170 / AUD$246', 'Price:€190 / US$196 / ¥27520 / AUD$293', 'Price:€230 / US$237 / ¥33310 / AUD$354', 'Price:€230 / US$237 / ¥33310 / AUD$354', 'Price:€240 / US$247 / ¥34760 / AUD$370', 'Price:€170 / US$175 / ¥24620 / AUD$260', 'Price:€220 / US$227 / ¥31860 / AUD$330', 'Price:€220 / US$227 / ¥31860 / AUD$330']
P粉6776848762024-02-22 00:54:41
您只需使 CSS 选择器更加具体,以便仅识别直接位于字体元素内部(而不是向下几层)的链接:
soup.select("table tr td font>a")
添加进一步的条件,即链接指向单个项目而不是页面底部的下一页/上一页链接也将有所帮助:
soup.select("table tr td font>a[href*='CODE']")