Scraping the web using Python but not sure what to do with static(?) URLs

Question

I'm trying to learn how to extract data from this URL: https://denver.coloradotaxsale.com/index.cfm?folder=auctionResults&mode=preview However, the problem is that when I try to switch pages, the URL doesn't change, so I'm not sure how Enumerate or loop over it. Since the web page has 3000 sales data points, trying to find a better way. This is my starting code, it's very simple but I would appreciate any help I can give or

P粉600845163 · Answer

To get data from more pages, you can use the following example:

导入请求
将 pandas 导入为 pd
从 bs4 导入 BeautifulSoup


数据 = {
    "folder": "拍卖结果",
    “登录ID”：“00”，
    "页数": "1",
    "orderBy": "AdvNum",
    "orderDir": "asc",
    "justFirstCertOnGroups": "1",
    "doSearch": "真",
    "itemIDList": "",
    "itemSetIDList": "",
    “兴趣”： ””，
    “优质的”： ””，
    "itemSetDID": "",
}

url =“https://denver.coloradotaxsale.com/index.cfm?folder=auctionResults&mode=preview”


所有数据 = []

for data["pageNum"] in range(1, 3): # <-- 增加此处的页数。
    soup = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
    对于 soup.select("#searchResults tr")[2:] 中的行：
        tds = [td.text.strip() for row.select("td") 中的 td]
        all_data.append(tds)

列= [
    “序列号”，
    “纳税年度”，
    “通知”，
    “包裹ID”，
    “面额”，
    “中标”，
    “卖给”，
]

df = pd.DataFrame(all_data, columns=columns)

# 打印数据框中的最后 10 项：
打印（df.tail（10）.to_markdown（））

Print:

<表类=“s-表”> <标题> SEQ NUM Tax year notify Plot ID Number of faces Winning Bid Sold to <正文> 96 000094 2020 00031-18-001-000 905.98 USD $81.00 00005517 97 000095 2020 00031-18-002-000 $750.13 $75.00 00005517 98 000096 2020 00031-18-003-000 $750.13 $75.00 00005517 99 000097 2020 00031-18-004-000 $750.13 $75.00 00005517 100 000098 2020 00031-18-007-000 $750.13 $76.00 00005517 101 000099 2020 00031-18-008-000 905.98 USD $84.00 00005517 102 000100 2020 00031-19-001-000 $1,999.83 $171.00 00005517 103 000101 2020 00031-19-004-000 1,486.49 USD 131.00 USD 00005517 104 000102 2020 00031-19-006-000 1,063.44 USD 96.00 USD 00005517 105 000103 2020 00031-20-001-000 1,468.47 USD 126.00 USD 00005517

Scraping the web using Python but not sure what to do with static(?) URLs

reply all(1)I'll reply