Home  >  Article  >  Database  >  How to use python to crawl CSDN popular comment URLs and store them in redis

How to use python to crawl CSDN popular comment URLs and store them in redis

WBOY
WBOYforward
2023-05-28 15:17:23825browse

1. Configure webdriver

Download the Google Chrome driver, and configure it

import timeimport randomfrom PIL import Imagefrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECif __name__ == '__main__':options = webdriver.ChromeOptions()options.binary_location = r'C:UsershhhAppDataLocalGoogleChromeApplication谷歌浏览器.exe'# driver=webdriver.Chrome(executable_path=r'D:360Chromechromedriverchromedriver.exe')driver = webdriver.Chrome(options=options)#以java模块为例driver.get('https://www.csdn.net/nav/java')for i in range(1,20):driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")time.sleep(2)

2. Get the URL

from bs4 import BeautifulSoupfrom lxml import etree 
html = etree.HTML(driver.page_source)# soup = BeautifulSoup(html, 'lxml')# soup_herf=soup.find_all("#feedlist_id > li:nth-child(1) > div > div > h2 > a")# soup_herftitle = html.xpath('//*[@id="feedlist_id"]/li/div/div/h2/a/@href')

You can see, Crawled a lot at once,The speed is very fast
How to use python to crawl CSDN popular comment URLs and store them in redis

3. Write to Redis

After importing the redis package,Configure the redis port and redis database& #xff0c;Use rpush function to write
Open redisHow to use python to crawl CSDN popular comment URLs and store them in redis

import redis
r_link = redis.Redis(port='6379', host='localhost', decode_responses=True, db=1)for u in title:print("准备写入{}".format(u))r_link.rpush("csdn_url", u)print("{}写入成功!".format(u))print('=' * 30, 'n', "共计写入url:{}个".format(len(title)), 'n', '=' * 30)

How to use python to crawl CSDN popular comment URLs and store them in redis

Done!

You can see it in Redis Desktop Manager Crawling and writing are very fast.
How to use python to crawl CSDN popular comment URLs and store them in redis
To use it, just use rpop to pop it off the stack

one_url = r_link.rpop("csdn_url)")while one_url:print("{}被弹出!".format(one_url))

The above is the detailed content of How to use python to crawl CSDN popular comment URLs and store them in redis. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete