Home >Backend Development >Python Tutorial >Web scraping com selenium
This text is already well organized and written in correct Portuguese. The only suggestion would be to improve clarity in some points and add a little more context for the reader who is not familiar with web scraping and the IBGE website. A revised version follows:
This tutorial demonstrates how to automate the collection of inflation data from IBGE (Brazilian Institute of Geography and Statistics) using the Selenium library in Python. The objective is to extract data on the percentage variation of the IPCA (Broad National Consumer Price Index) from the SIDRA website (IBGE Automatic Recovery System).
Before you start, make sure you have Python installed on your system, along with the package manager pip
.
Create a new folder for your project. Inside it, create a Jupyter Notebook file (.ipynb
) or a Python file (.py
). Jupyter Notebook makes it easy to view and run code step by step.
Open your terminal or command prompt, navigate to your project folder and run the following commands to install the necessary libraries:
<code class="language-bash">pip install notebook selenium webdriver-manager pandas</code>
Create a virtual environment (recommended) to isolate the dependencies of this project:
<code class="language-bash">python -m venv venv # Cria o ambiente virtual venv\Scripts\activate # Ativa o ambiente virtual (Windows) source venv/bin/activate # Ativa o ambiente virtual (Linux/macOS)</code>
After activating the virtual environment, run the library installation commands again. To save dependencies in a requirements.txt
file, use:
<code class="language-bash">pip freeze > requirements.txt</code>
This allows you to easily reproduce the environment on another computer.
Download the version of ChromeDriver compatible with your Google Chrome version. You can find the download link on the official ChromeDriver website by searching for the version corresponding to your version of Chrome (go to chrome://settings/help
to check your version). After downloading, unzip the file and remember where it was saved.
To make using ChromeDriver easier, add the path of your ChromeDriver installation folder to the PATH environment variable. Follow the steps:
C:caminhoparachromedriver
).To check if ChromeDriver is configured correctly, open your terminal and type:
<code class="language-bash">pip install notebook selenium webdriver-manager pandas</code>
ChromeDriver version should be displayed.
The Python code below uses Selenium to access the SIDRA page, select the data and extract the IPCA percentage variation information. Remember to replace 'C:\caminho\para\chromedriver.exe'
with the correct path for your ChromeDriver.
<code class="language-bash">python -m venv venv # Cria o ambiente virtual venv\Scripts\activate # Ativa o ambiente virtual (Windows) source venv/bin/activate # Ativa o ambiente virtual (Linux/macOS)</code>
Run the Python script. If everything is configured correctly, the script will:
pagina_carregada.html
(useful for debugging).The extracted data can be processed further, for example to create graphs or reports.
This tutorial provides a basis for automating IBGE data collection. Remember that the site structure may change, requiring adjustments to the XPath code. It's important to monitor changes to your site and update your script as needed. Furthermore, respect the terms of use of the IBGE website when collecting data.
This version improves clarity, adds important information about environment configuration, and provides a more complete introduction for users with less web scraping experience. The structure has also been slightly reorganized for better fluidity.
The above is the detailed content of Web scraping com selenium. For more information, please follow other related articles on the PHP Chinese website!