search
HomeBackend DevelopmentPython TutorialWhy do I get the 'list out of range' error when using Python crawler?

Why do I get the

"list index out of range" error in Python crawler: Cause and solution

When using Python and BeautifulSoup for web crawling, you often encounter list index out of range errors. This problem can occur even if the code is not modified, especially when dealing with dynamic web pages or website structure changes. This article analyzes the cause of this error and provides an effective solution.

Here is a sample code that demonstrates what might cause this error to occur:

 import requests
from bs4 import BeautifulSoup

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Edg/124.0.0.0'}
response = requests.get("https://www.iqiyi.com/ranks1/3/0", headers=headers)
print(response.status_code)
response = response.text
soup = BeautifulSoup(response, "html.parser")

def extract_data():
    titles = [title.get_text().strip() for title in soup.find_all("div", class_="rvi__tit1")]
    heat = [heat.get_text().strip() for heat in soup.find_all("span", class_="rvi__index__num")]
    introductions = [intro.get_text().strip() for intro in soup.find_all("p", class_="rvi__des2")]
    return titles, heat, introductions

def display_data(titles, heat, introductions):
    min_len = min(len(titles), len(heat), len(introductions))
    for i in range(min_len):
        print(f"Ranking: {i 1}, Title: {titles[i]}, Popularity: {heat[i]}, Introduction: {introductions[i]}")


if __name__ == '__main__':
    titles, heat, introductions = extract_data()
    display_data(titles, heat, introductions)

In this example, list index out of range error usually occurs in display_data function. The reason is: the lengths of the three lists of titles , heat , and introductions may be inconsistent. If one of the lists has a length less than 10 (or the range of loops), an index out-of-bounds error will occur when accessing the list elements.

Solution:

The key is to make sure that before accessing the list element, the length of the list is checked and only elements within the valid index range are accessed. The improved code is as follows:

 import requests
from bs4 import BeautifulSoup

# ... (headers and request remains the same) ...

def extract_data():
    # ... (extraction remains the same) ...

def display_data(titles, heat, introductions):
    min_len = min(len(titles), len(heat), len(introductions)) # Find the shortest list
    for i in range(min_len):
        print(f"Ranking: {i 1}, Title: {titles[i]}, Popularity: {heat[i]}, Introduction: {introductions[i]}")


if __name__ == '__main__':
    titles, heat, introductions = extract_data()
    display_data(titles, heat, introductions)

By calculating the shortest length of the three lists min_len and using min_len as the range of the loop, we ensure that no elements outside the list index range are accessed, effectively avoiding list index out of range errors. This is a more robust way of processing that can adapt to changes in different web page structures and data volumes. In addition, adding error handling mechanisms (such as try-except blocks) is also a good programming practice that can handle more complex situations.

The above is the detailed content of Why do I get the 'list out of range' error when using Python crawler?. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
What are some common reasons why a Python script might not execute on Unix?What are some common reasons why a Python script might not execute on Unix?Apr 28, 2025 am 12:18 AM

The reasons why Python scripts cannot run on Unix systems include: 1) Insufficient permissions, using chmod xyour_script.py to grant execution permissions; 2) Shebang line is incorrect or missing, you should use #!/usr/bin/envpython; 3) The environment variables are not set properly, and you can print os.environ debugging; 4) Using the wrong Python version, you can specify the version on the Shebang line or the command line; 5) Dependency problems, using virtual environment to isolate dependencies; 6) Syntax errors, using python-mpy_compileyour_script.py to detect.

Give an example of a scenario where using a Python array would be more appropriate than using a list.Give an example of a scenario where using a Python array would be more appropriate than using a list.Apr 28, 2025 am 12:15 AM

Using Python arrays is more suitable for processing large amounts of numerical data than lists. 1) Arrays save more memory, 2) Arrays are faster to operate by numerical values, 3) Arrays force type consistency, 4) Arrays are compatible with C arrays, but are not as flexible and convenient as lists.

What are the performance implications of using lists versus arrays in Python?What are the performance implications of using lists versus arrays in Python?Apr 28, 2025 am 12:10 AM

Listsare Better ForeflexibilityandMixdatatatypes, Whilearraysares Superior Sumerical Computation Sand Larged Datasets.1) Unselable List Xibility, MixedDatatypes, andfrequent elementchanges.2) Usarray's sensory -sensical operations, Largedatasets, AndwhenMemoryEfficiency

How does NumPy handle memory management for large arrays?How does NumPy handle memory management for large arrays?Apr 28, 2025 am 12:07 AM

NumPymanagesmemoryforlargearraysefficientlyusingviews,copies,andmemory-mappedfiles.1)Viewsallowslicingwithoutcopying,directlymodifyingtheoriginalarray.2)Copiescanbecreatedwiththecopy()methodforpreservingdata.3)Memory-mappedfileshandlemassivedatasetsb

Which requires importing a module: lists or arrays?Which requires importing a module: lists or arrays?Apr 28, 2025 am 12:06 AM

ListsinPythondonotrequireimportingamodule,whilearraysfromthearraymoduledoneedanimport.1)Listsarebuilt-in,versatile,andcanholdmixeddatatypes.2)Arraysaremorememory-efficientfornumericdatabutlessflexible,requiringallelementstobeofthesametype.

What data types can be stored in a Python array?What data types can be stored in a Python array?Apr 27, 2025 am 12:11 AM

Pythonlistscanstoreanydatatype,arraymodulearraysstoreonetype,andNumPyarraysarefornumericalcomputations.1)Listsareversatilebutlessmemory-efficient.2)Arraymodulearraysarememory-efficientforhomogeneousdata.3)NumPyarraysareoptimizedforperformanceinscient

What happens if you try to store a value of the wrong data type in a Python array?What happens if you try to store a value of the wrong data type in a Python array?Apr 27, 2025 am 12:10 AM

WhenyouattempttostoreavalueofthewrongdatatypeinaPythonarray,you'llencounteraTypeError.Thisisduetothearraymodule'sstricttypeenforcement,whichrequiresallelementstobeofthesametypeasspecifiedbythetypecode.Forperformancereasons,arraysaremoreefficientthanl

Which is part of the Python standard library: lists or arrays?Which is part of the Python standard library: lists or arrays?Apr 27, 2025 am 12:03 AM

Pythonlistsarepartofthestandardlibrary,whilearraysarenot.Listsarebuilt-in,versatile,andusedforstoringcollections,whereasarraysareprovidedbythearraymoduleandlesscommonlyusedduetolimitedfunctionality.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)