Scrapper Competitor-Golang-php.cn

Home

Backend Development

Golang

Scrapper Competitor

Barbara Streisand

Nov 06, 2024 pm 03:21 PM

Scrapper Concorrente

Program objective

Access web pages at the same time to extract the title of each page and display these titles in the terminal. This is done using concurrency in Go, which allows you to access multiple pages simultaneously, saving time.

Explanation of the Code

Packages used

import (
    "fmt"
    "net/http"
    "sync"
    "github.com/PuerkitoBio/goquery"
)

fetchTitle function

This role is responsible for:

Access a web page (url)
Extract page title
Evniate the result to a channel

func fetchTitle(url string, wg *sync.WaitGroup, results chan



<p>Function parameters:</p>

url string: Represents the address of the web page (url) that we are going to access to obtain the title
wg *sync.WaitGroup: Pointer to a WaitGroup, which we use to synchronize the completion of all tasks (goroutines) that are running at the same time. The * indicates that we are passing an "address" to WaitGroup` and not a copy of it.
results chan

The defer wg.Done() line tells the program to mark this task (goroutine) as completed when the fetchTitle function finishes. This is important so that main knows when all tasks have been completed.

HTTP Request

req, err := http.Get(url)
if err != nil {
results return
}
defer req.Body.Close()

http.Get(url): This line makes a HTTP GET request to the URL. This means we are accessing the page and asking the server for its content.
err != nil: Here we check if there was any error when accessing the page (for example, if the page does not exist or the server is not responding). If there is an error, we send a message to the results channel and end the function with return.
defer req.Body.Close(): This ensures that after we are done using the page content, we free up the memory allocated to store it.

Status Check

if req.StatusCode != 200 {
results return
}

req.StatusCode != 200: We check if the server responded with the code 200 OK (indicates success). If it is not 200, it means the page did not load correctly. We then send an error message to the results channel and terminate the function.

Title Loading and Search

doc, err := goquery.NewDocumentFromReader(req.Body)
if err != nil {
results return
}
title := doc.Find("title").Text()
results }

goquery.NewDocumentFromReader(req.Body): We load the HTML content of the page (provided by req.Body) into goquery, which allows you to navigate and search specific parts of the HTML.
doc.Find("title").Text(): We look for the tag in the HTML of the page and get the text inside it (i.e. the title).
results : We send the extracted title to the results channel, where it will be read later.

main function

The main function is the main function that configures and controls the program.

func main() {
urls := []string{
"http://olos.novagne.com.br/Olos/login.aspx?logout=true",
"http://sistema.novagne.com.br/novagne/",
}

urls := []string{...}: We define a list of URLs that we want to process. Each URL will be passed to a goroutine that will extract the page title.

WaitGroup and Channel Configuration

var wg sync.WaitGroup
results := make(chan string, len(urls)) // Channel to store the results

var wg sync.WaitGroup: We create a new instance of WaitGroup, which will control the number of goroutines and ensure that they all finish before the program ends.
results := make(chan string, len(urls)): We create a results channel with capacity equal to the number of URLs. This channel will store messages with titles or errors.

Home of Goroutines

for _, url := range urls {
wg.Add(1)
go fetchTitle(url, &wg, results)
}

for _, url := range urls: Here we loop through each URL in the list.
wg.Add(1): For each URL, we increment the WaitGroup counter to indicate that a new task (goroutine) will be started.
go fetchTitle(url, &wg, results): We call fetchTitle as a goroutine for each URL, that is, we make it run in parallel with the others.

Waiting and Displaying Results

wg.Wait()
close(results)

REPO: https://github.com/ionnss/Scrapper-GoRoutine

ions,

another earth day

The above is the detailed content of Scrapper Competitor. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

How do you use the pprof tool to analyze Go performance?Mar 21, 2025 pm 06:37 PM

The article explains how to use the pprof tool for analyzing Go performance, including enabling profiling, collecting data, and identifying common bottlenecks like CPU and memory issues.Character count: 159

How do you write unit tests in Go?Mar 21, 2025 pm 06:34 PM

The article discusses writing unit tests in Go, covering best practices, mocking techniques, and tools for efficient test management.

How do I write mock objects and stubs for testing in Go?Mar 10, 2025 pm 05:38 PM

This article demonstrates creating mocks and stubs in Go for unit testing. It emphasizes using interfaces, provides examples of mock implementations, and discusses best practices like keeping mocks focused and using assertion libraries. The articl

What are the vulnerabilities of Debian OpenSSLApr 02, 2025 am 07:30 AM

OpenSSL, as an open source library widely used in secure communications, provides encryption algorithms, keys and certificate management functions. However, there are some known security vulnerabilities in its historical version, some of which are extremely harmful. This article will focus on common vulnerabilities and response measures for OpenSSL in Debian systems. DebianOpenSSL known vulnerabilities: OpenSSL has experienced several serious vulnerabilities, such as: Heart Bleeding Vulnerability (CVE-2014-0160): This vulnerability affects OpenSSL 1.0.1 to 1.0.1f and 1.0.2 to 1.0.2 beta versions. An attacker can use this vulnerability to unauthorized read sensitive information on the server, including encryption keys, etc.

How can I define custom type constraints for generics in Go?Mar 10, 2025 pm 03:20 PM

This article explores Go's custom type constraints for generics. It details how interfaces define minimum type requirements for generic functions, improving type safety and code reusability. The article also discusses limitations and best practices

Explain the purpose of Go's reflect package. When would you use reflection? What are the performance implications?Mar 25, 2025 am 11:17 AM

The article discusses Go's reflect package, used for runtime manipulation of code, beneficial for serialization, generic programming, and more. It warns of performance costs like slower execution and higher memory use, advising judicious use and best

How do you use table-driven tests in Go?Mar 21, 2025 pm 06:35 PM

The article discusses using table-driven tests in Go, a method that uses a table of test cases to test functions with multiple inputs and outcomes. It highlights benefits like improved readability, reduced duplication, scalability, consistency, and a

How can I use tracing tools to understand the execution flow of my Go applications?Mar 10, 2025 pm 05:36 PM

This article explores using tracing tools to analyze Go application execution flow. It discusses manual and automatic instrumentation techniques, comparing tools like Jaeger, Zipkin, and OpenTelemetry, and highlighting effective data visualization

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

1 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Where to find the Crane Control Keycard in Atomfall

1 weeks agoByDDD

Hot Tools

SublimeText3 English version

Recommended: Win version, supports code prompts!

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),