


Python crawls Douban movie data and extracts value xpath and lxml modules (code)
The content this article brings to you is about Python crawling Douban movie data and extracting value xpath and lxml modules (code). It has certain reference value. Friends in need can refer to it. I hope it will be useful to you. help.
Tools: Python 3.6.5, PyCharm development tools, Windows 10 operating system, Google Chrome
Purpose: crawl the title, link address of the movie in the Douban movie rankings, Pictures, number of reviewers, ratings, etc.
Website: https://movie.douban.com/chart
Grammar points:
xpath syntax:
Google Chrome installs the xpath helper plug-in: Help us locate data from elements
1. Select the node (label)
(1),/html/ head/meta: Can select all meta tags under html
(2), //li: All li tags on the current page
(3), /html/head//link: All link tags under head
##2, //: Can be selected from any node
(1)、//li:All li tags on the current page
## (2)、/html/head//link:head All link tags under3. The purpose of the @ symbol
(1) Select a specific element: //p[ @class='feed']/ul/li, select li under ul under p of
class='feed'
(2), a/@href: Select the href value of a
4. Get the text## ( 1), /a/text(): Get the text under a
(2), /a//text(): Get all the text under a Text
Example
:
##lxml syntax:
1. Installation: pip install lxml
2. Use
from lxml import etree
## element = etree.HTML("html string ")
element.xpath("")
Code:
from lxml import etree import requests url = "https://movie.douban.com/chart" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36" } response = requests.get(url,headers=headers) html_str = response.content.decode() #print(html_str) html = etree.HTML(html_str) print(html) #1.获取所有的电影的URL地址 #url_list = html.xpath("//div[@class='indent']/div/table//div[@class='pl2']/a/@href") #print(url_list) #2.所有图片的地址 #img_list = html.xpath("//div[@class='indent']/div/table//a[@class='nbg']/img/@src") #print(img_list) ret1 = html.xpath("//div[@class='indent']/div/table") print(ret1) for table in ret1: item = {} item["title"] = table.xpath(".//div[@class='pl2']/a/text()")[0].replace("/","").strip() item["href"] = table.xpath(".//div[@class='pl2']/a/@href")[0] item["img"] = table.xpath(".//a[@class='nbg']/img/@src")[0] item["comment_num"] = table.xpath(".//span[@class='pl']/text()")[0] item["rating_num"] = table.xpath(".//span[@class='rating_nums']/text()")[0] print(item)
Running effect:
The above is the detailed content of Python crawls Douban movie data and extracts value xpath and lxml modules (code). For more information, please follow other related articles on the PHP Chinese website!

Pythonarrayssupportvariousoperations:1)Slicingextractssubsets,2)Appending/Extendingaddselements,3)Insertingplaceselementsatspecificpositions,4)Removingdeleteselements,5)Sorting/Reversingchangesorder,and6)Listcomprehensionscreatenewlistsbasedonexistin

NumPyarraysareessentialforapplicationsrequiringefficientnumericalcomputationsanddatamanipulation.Theyarecrucialindatascience,machinelearning,physics,engineering,andfinanceduetotheirabilitytohandlelarge-scaledataefficiently.Forexample,infinancialanaly

Useanarray.arrayoveralistinPythonwhendealingwithhomogeneousdata,performance-criticalcode,orinterfacingwithCcode.1)HomogeneousData:Arrayssavememorywithtypedelements.2)Performance-CriticalCode:Arraysofferbetterperformancefornumericaloperations.3)Interf

No,notalllistoperationsaresupportedbyarrays,andviceversa.1)Arraysdonotsupportdynamicoperationslikeappendorinsertwithoutresizing,whichimpactsperformance.2)Listsdonotguaranteeconstanttimecomplexityfordirectaccesslikearraysdo.

ToaccesselementsinaPythonlist,useindexing,negativeindexing,slicing,oriteration.1)Indexingstartsat0.2)Negativeindexingaccessesfromtheend.3)Slicingextractsportions.4)Iterationusesforloopsorenumerate.AlwayschecklistlengthtoavoidIndexError.

ArraysinPython,especiallyviaNumPy,arecrucialinscientificcomputingfortheirefficiencyandversatility.1)Theyareusedfornumericaloperations,dataanalysis,andmachinelearning.2)NumPy'simplementationinCensuresfasteroperationsthanPythonlists.3)Arraysenablequick

You can manage different Python versions by using pyenv, venv and Anaconda. 1) Use pyenv to manage multiple Python versions: install pyenv, set global and local versions. 2) Use venv to create a virtual environment to isolate project dependencies. 3) Use Anaconda to manage Python versions in your data science project. 4) Keep the system Python for system-level tasks. Through these tools and strategies, you can effectively manage different versions of Python to ensure the smooth running of the project.

NumPyarrayshaveseveraladvantagesoverstandardPythonarrays:1)TheyaremuchfasterduetoC-basedimplementation,2)Theyaremorememory-efficient,especiallywithlargedatasets,and3)Theyofferoptimized,vectorizedfunctionsformathematicalandstatisticaloperations,making


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Atom editor mac version download
The most popular open source editor

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

SublimeText3 Linux new version
SublimeText3 Linux latest version

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software
