


Getting Started with Python Web Crawler: Understanding the Basics of Web Pages
1. The composition of a web page
A web page is mainly composed of three parts-HTML, CSS and JavaScript. If a web page is compared to a human face, these three parts are like human eyes, nose and mouth. Below we introduce these three functions.
HTML
HTML (HyperText Markup Language) is a markup language used to build web pages. It is a subset based on the standard universal markup language. It replaces HTML1.0 as a web page production standards, in HTML2.0, some new elements have been added to enhance the performance capabilities of web pages.
HTML syntax includes tags and tag syntax, which are used to define the structure, content, and style of web pages. An HTML document usually contains a root tag and one or more tags. Each tag has a header line that marks the tag's type, attributes, and values.
The root tag of HTML is the start tag of the document, which contains the title of the document and other basic information, such as document type, language, date, version, etc.
In addition, HTML also has some other elements and attributes, such as titles, paragraphs, tables, lists, images, links, etc. These elements and attributes can be used to define the appearance and behavior of web pages.
The advantages of HTML include portability, cross-platform and rich content. It has become the standard language for web page production and is widely used in websites, emails, news articles, online chat and other fields.
CSS
HTML defines the structure of a web page, but with only HTML page layout, the web page does not look good. In order to make the web page look better, you can use CSS to achieve it.
CSS (Cascading Style Sheets) is a language used to define web page styles. It is a superset of HTML. CSS provides more style choices and definitions, making web design more flexible and easier to maintain.
CSS syntax includes selectors, attributes, values, pseudo-class selectors, etc. The selector is used to select the elements to be styled. It can select based on class, ID, wildcard, etc. Attributes are used to define element styles, which can be defined based on class, ID, wildcard, etc. The value can be a single value or a string, number, or other type of value. Pseudo-class selectors are used to define class selectors and ID selectors. They can be used to define pseudo-class elements and pseudo-class selectors.
The advantages of CSS include maintainability, scalability and customizability, etc. It makes web design more flexible and easier to maintain. By using CSS, you can have better control over the layout, style, and animation effects of web pages, making them more beautiful and attractive.
CSS3 is the latest version of CSS, which adds some new selectors, properties and values to make web design more flexible and rich. For example, CSS3 adds new pseudo-class selectors, animation selectors, transition selectors, etc., making web design more vivid and interesting.
So what does CSS look like? I excerpted it.
#head { position: relative; height: 100%; width: 100%; min-height: 768px; cursor: default; }
This code defines an element named #head, whose style attributes are position: relative; height: 100%; width: 100%; min-height: 768px; cursor: default;.
The meanings of these attributes are as follows:
position: relative: Indicates that the positioning method of the element is relative positioning, that is, the element is positioned relative to its parent element.
Height: 100%: Indicates that the height of the element is 100% of the height of its parent element.
width: 100%: Indicates that the width of the element is 100% of the width of its parent element.
min-height: 768px: Indicates that the minimum height of the element is 768 pixels, that is, the minimum height of its parent element is 768 pixels.
cursor: default: Indicates that the cursor style of the element is the default value, that is, the cursor style is not set.
Through the combination of these attributes, you can define an element with relative positioning, height of 100%, width of 100%, and a default cursor style.
JavaScript
JavaScript ("JS" for short) is a lightweight, interpreted or just-in-time compiled programming language with function priority. It was first designed and implemented by Brendan Eich of Netscape in 1995 and is widely used in Web browsers.
JavaScript's syntax is based on prototype programming, a multi-paradigm dynamic scripting language, and supports object-oriented, imperative, declarative, and functional programming paradigms. Its standard is ECMAScript. As of 2012, all browsers fully support ECMAScript 5.1, and older browsers support at least the ECMAScript 3 standard.
The basic syntax of JavaScript includes variables, functions, objects, arrays, closures, etc. Variables are used to store data, functions are used to implement logic, objects are used to encapsulate data and methods, arrays are used to store data or objects, and closures are used to override functions or methods and use variables in them.
JavaScript’s built-in objects include functions, arrays, objects, strings, regular expressions, functions, etc. It also supports class and const in ES6, as well as the new let and const`. ES7 introduced syntax such as let, const and rest/spread.
JavaScript’s scope chain mechanism allows code to be executed safely in different scopes. It also supports event processing, DOM operations, modularization and other functions. JavaScript has been widely used in web browsers, mobile applications, game development and other fields.
JavaScript通常是以单独文件加载的,后缀是.js。
综上所述,HTML定义了网页的内容和结构,CSS描述了网页的样式,JavaScript定义了网页的行为。
2.网页的结构
下面我们看一个示例代码。
<!DOCTYPE html> <html> <head> <title>网页标题</title> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <link rel="stylesheet" href="styles.css" rel="external nofollow" > </head> <body> <header> <nav> <ul> <li><a href="#" rel="external nofollow" rel="external nofollow" rel="external nofollow" >导航链接1</a></li> <li><a href="#" rel="external nofollow" rel="external nofollow" rel="external nofollow" >导航链接2</a></li> <li><a href="#" rel="external nofollow" rel="external nofollow" rel="external nofollow" >导航链接3</a></li> </ul> </nav> </header> <main> <h2 id="网页标题">网页标题</h2> <p>这是一个段落。</p> <ul> <li>列表项1</li> <li>列表项2</li> <li>列表项3</li> </ul> </main> <footer> <p>版权信息</p> </footer> </body> </html>
这个示例中, 声明了这是一个 HTML5 文档, 标签定义了文档的根元素,
标签包含了文档的元数据,如标题、字符集、视口等。标签定义了文档的标题, 和 标签定义了文档的字符集和视口。 标签定义了文档的样式表,
3.节点树和节点间的关系
节点树(Node Tree)是一个树形数据结构,它通过节点的组合来表示数据,节点通过节点之间的关系来表示数据之间的层次结构。节点树可以用来实现各种数据的组织和管理,例如数据库中的表,文件系统中的文件,以及各种应用程序中的数据模型。
节点间的关系通常通过以下方式表示:
树形结构:节点之间通过树形结构连接,节点的父节点通过右子节点连接,左子节点连接到根节点。
节点的层次结构:节点根据其父节点的层次结构来确定其位置。例如,一个节点的父节点是其子节点的父节点,其子节点是其左右子节点。
节点的继承关系:节点之间存在继承关系,即子节点继承其父节点的属性和关系。
节点的属性和关系:节点可以有属性和关系,属性用于描述节点的基本信息,例如名称、类型、值等。关系用于描述节点之间的关系,例如父子关系、层级关系等。
节点的遍历:节点树可以通过遍历来访问和修改节点的属性和关系。例如,可以通过递归遍历来查找节点的子节点和父节点,可以通过深度优先搜索遍历整个树。
下面我们一个示意图,一目了然。
4.选择器
我们知道网页由一个个节点组成,CSs 选择器会根据不同的节点设置不同的样式规则,那么怎样来定位节点呢?
在 Css 中,我们使用 CSS 选择器来定位节点。
CSS选择器是用于在HTML文档中选择元素的语言。CSS选择器是一种用于选择HTML元素的语言,它可以用于创建样式表,并将样式应用于HTML元素。
CSS选择器通常由一个或多个关键字组成,这些关键字用于指定选择器的类型。例如,div选择器用于选择所有具有div类的元素,a选择器用于选择所有具有a类的元素,img选择器用于选择所有具有img类的元素等。
CSS选择器可以使用通配符来匹配多个类型的元素。例如,div, a, img可以匹配所有具有这些类型的元素。
CSS选择器还可以使用伪类来创建更复杂的选择器。例如,:hover伪类用于在鼠标悬停在元素上时应用样式,:focus伪类用于在元素上聚焦时应用样式等。
CSS选择器可以嵌套使用,以创建更复杂的选择器。例如,div:hover a可以选择所有具有a类的元素,div:focus a可以选择所有具有a类的元素,并在鼠标悬停和聚焦时应用样式。
总之,CSS选择器是一种用于在HTML文档中选择元素的语言,它可以用于创建样式表,并将样式应用于HTML元素。
The above is the detailed content of Getting Started with Python Web Crawler: Understanding the Basics of Web Pages. For more information, please follow other related articles on the PHP Chinese website!

Pythonisbothcompiledandinterpreted.WhenyourunaPythonscript,itisfirstcompiledintobytecode,whichisthenexecutedbythePythonVirtualMachine(PVM).Thishybridapproachallowsforplatform-independentcodebutcanbeslowerthannativemachinecodeexecution.

Python is not strictly line-by-line execution, but is optimized and conditional execution based on the interpreter mechanism. The interpreter converts the code to bytecode, executed by the PVM, and may precompile constant expressions or optimize loops. Understanding these mechanisms helps optimize code and improve efficiency.

There are many methods to connect two lists in Python: 1. Use operators, which are simple but inefficient in large lists; 2. Use extend method, which is efficient but will modify the original list; 3. Use the = operator, which is both efficient and readable; 4. Use itertools.chain function, which is memory efficient but requires additional import; 5. Use list parsing, which is elegant but may be too complex. The selection method should be based on the code context and requirements.

There are many ways to merge Python lists: 1. Use operators, which are simple but not memory efficient for large lists; 2. Use extend method, which is efficient but will modify the original list; 3. Use itertools.chain, which is suitable for large data sets; 4. Use * operator, merge small to medium-sized lists in one line of code; 5. Use numpy.concatenate, which is suitable for large data sets and scenarios with high performance requirements; 6. Use append method, which is suitable for small lists but is inefficient. When selecting a method, you need to consider the list size and application scenarios.

Compiledlanguagesofferspeedandsecurity,whileinterpretedlanguagesprovideeaseofuseandportability.1)CompiledlanguageslikeC arefasterandsecurebuthavelongerdevelopmentcyclesandplatformdependency.2)InterpretedlanguageslikePythonareeasiertouseandmoreportab

In Python, a for loop is used to traverse iterable objects, and a while loop is used to perform operations repeatedly when the condition is satisfied. 1) For loop example: traverse the list and print the elements. 2) While loop example: guess the number game until you guess it right. Mastering cycle principles and optimization techniques can improve code efficiency and reliability.

To concatenate a list into a string, using the join() method in Python is the best choice. 1) Use the join() method to concatenate the list elements into a string, such as ''.join(my_list). 2) For a list containing numbers, convert map(str, numbers) into a string before concatenating. 3) You can use generator expressions for complex formatting, such as ','.join(f'({fruit})'forfruitinfruits). 4) When processing mixed data types, use map(str, mixed_list) to ensure that all elements can be converted into strings. 5) For large lists, use ''.join(large_li

Pythonusesahybridapproach,combiningcompilationtobytecodeandinterpretation.1)Codeiscompiledtoplatform-independentbytecode.2)BytecodeisinterpretedbythePythonVirtualMachine,enhancingefficiencyandportability.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

WebStorm Mac version
Useful JavaScript development tools

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Linux new version
SublimeText3 Linux latest version
