Home > Article > Backend Development > Python crawler crawls all movies of Tencent Video (code)
The content this article brings to you is about the python crawler crawling all the movies (code) of Tencent Video. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
Using python to crawl all movies of Tencent Video
# -*- coding: utf-8 -*- import re import urllib2 from bs4 import BeautifulSoup import string, time import pymongo NUM = 0 #全局变量,电影数量 m_type = u'' #全局变量,电影类型 m_site = u'qq' #全局变量,电影网站 #根据指定的URL获取网页内容 def gethtml(url): req = urllib2.Request(url) response = urllib2.urlopen(req) html = response.read() return html #从电影分类列表页面获取电影分类 def gettags(html): global m_type soup = BeautifulSoup(html) #过滤出分类内容 #print soup #
p_page = soup.find_all('p', {'class' : 'mod_pagenav', 'id' : 'pager'}) #print p_page #len(p_page), p_page[0] #25 re_pages = r'(.+?)' p = re.compile(re_pages, re.DOTALL) pages = p.findall(str(p_page[0])) #print pages if len(pages) > 1: return pages[-2] else: return 1 def getmovielist(html): soup = BeautifulSoup(html) #
The above is the detailed content of Python crawler crawls all movies of Tencent Video (code). For more information, please follow other related articles on the PHP Chinese website!