Home  >  Article  >  Java  >  How to Extract Web Page Data into Java Programs with Jsoup?

How to Extract Web Page Data into Java Programs with Jsoup?

Linda Hamilton
Linda HamiltonOriginal
2024-10-30 21:47:30938browse

How to Extract Web Page Data into Java Programs with Jsoup?

Web Page Data Extraction for Java Programs

Extracting information from web pages into Java programs requires a specific technique known as web scraping. Web scraping involves parsing HTML content to identify and extract targeted data.

One highly recommended approach is to utilize the Jsoup HTML parser, renowned for its jQuery-like CSS selector support and enhanced for loop compatibility. Here's a sample Java code that demonstrates the web scraping process:

<code class="java">import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class WebScraper {

    public static void main(String[] args) throws Exception {
        // Target URL
        String url = "https://www.bestbuy.com/site/best-buy-insignia-55-class-f30-series-led-4k-uhd-smart-fire-tv/6494164.p?skuId=6494164";

        // Connect to the URL and parse HTML content
        Document document = Jsoup.connect(url).get();

        // Get product information using CSS selectors
        String title = document.select("h1.page-title").text();
        String price = document.select(".priceView-customer-price").text();
        String description = document.select(".product-lang-en-us .product-description-rich-html").text();

        // Print results</code>

The above is the detailed content of How to Extract Web Page Data into Java Programs with Jsoup?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn