Home >Web Front-end >JS Tutorial >A JavaScript scraper for the Wikipedia Academy Award List.

A JavaScript scraper for the Wikipedia Academy Award List.

Susan Sarandon
Susan SarandonOriginal
2025-01-24 16:39:121031browse

This tutorial demonstrates web scraping using JavaScript's Cheerio library to extract Academy Award-winning films from Wikipedia and save them to a CSV file.

First, install the required packages:

<code class="language-bash">npm install cheerio axios</code>

The Wikipedia page URL is:

<code class="language-javascript">const url = 'https://en.wikipedia.org/wiki/List_of_Academy_Award%E2%80%93winning_films';</code>

The code fetches the page's HTML using axios, then uses Cheerio to parse it:

<code class="language-javascript">const { data: html } = await axios.get(url);
const $ = cheerio.load(html);

const theadData = [];
const tableData = [];</code>

The script navigates the DOM, extracting data from table cells:

<code class="language-javascript">$('tbody').each((i, column) => {
  const columnData = [];
  $(column).find('th').each((j, cell) => {
    columnData.push($(cell).text().replace('\n', ''));
  });
  theadData.push(columnData);
});

tableData.push(theadData[0]);

$('table tr').each((i, row) => {
  const rowData = [];
  $(row).find('td').each((j, cell) => {
    rowData.push($(cell).text().trim());
  });
  if (rowData.length) tableData.push(rowData);
});</code>

Finally, the extracted data is formatted and saved to a CSV file using fs.writeFileSync, with semicolons as delimiters:

<code class="language-javascript">const csvContent = tableData.map((row) => row.join(';')).join('\n');
fs.writeFileSync('academy_awards.csv', csvContent, 'utf-8');</code>

Run the script using:

<code class="language-bash">node scraper.js</code>

The resulting academy_awards.csv file contains the scraped data.

A JavaScript scraper for the Wikipedia Academy Award List.

This tutorial builds upon previous scraping tutorials using Go and Python. Consider supporting the author if this was helpful: A JavaScript scraper for the Wikipedia Academy Award List.

The above is the detailed content of A JavaScript scraper for the Wikipedia Academy Award List.. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn