上傳包含 HTML 頁面中的 URL 的 CSV 文件，並使用 Flask 讀取要抓取的 URL

Question

我目前需要製作一個基於網路的系統，可以上傳包含URL清單的CSV檔案。上傳後，系統將逐行讀取URL，並將用於下一步抓取。這裡，抓取需要先登入網站再抓取。我已經有了登入網站的源代碼。但是，問題是我想將名為“upload_page.html”的html頁面與名為“upload_csv.py”的燒瓶檔案連接起來。登入和抓取的原始程式碼應該放在flask檔案中的哪裡？ upload_page.html<d

P粉207969787 · Answer

csv_file = request.files['file']
# Load the CSV data into a DataFrame
df = pd.read_csv(csv_file)
final_data = []
# Initialize the web driver
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=chrome_options)
# Loop over the rows in the DataFrame and scrape each link
for index, row in df.iterrows():
    link = row['Link']
    # Login to the website
    # Replace this with your own login code
    driver.get("https://example.com/login")
    username_field = driver.find_element_by_name("username")
    password_field = driver.find_element_by_name("password")
    username_field.send_keys("myusername")
    password_field.send_keys("mypassword")
    password_field.send_keys(Keys.RETURN)
    # Wait for the login to complete
    WebDriverWait(driver, 10).until(EC.url_changes("https://example.com/login"))
    # Scrape the website
    driver.get(link)
    start = time.time()
    # will be used in the while loop
    initialScroll = 0
    finalScroll = 1000

    while True:
        driver.execute_script(f"window.scrollTo({initialScroll},{finalScroll})")
        # this command scrolls the window starting from the pixel value stored in the initialScroll
        # variable to the pixel value stored at the finalScroll variable
        initialScroll = finalScroll
        finalScroll += 1000

        # we will stop the script for 3 seconds so that the data can load
        time.sleep(2)
        end = time.time()
        # We will scroll for 20 seconds.
        if round(end - start) > 20:
            break

上傳包含 HTML 頁面中的 URL 的 CSV 文件，並使用 Flask 讀取要抓取的 URL

全部回覆(1)我來回復