search

Preface

The development of programming languages ​​has brought us infinite possibilities. As a modern programming language, Go language has many advantages such as efficiency, simplicity, and cross-platform. It is widely used in server-side programming, cloud computing, containers and other fields. This article will introduce how to use third-party libraries to query HTML documents in Go.

1. Go language and HTML

HTML is a markup language used to build web pages. It can specify the structure and style of elements and be used with other technologies such as CSS and JavaScript to achieve complex interactive effects. Go language is a compiled, statically typed, concurrency-safe programming language known for its efficiency. Although the Go language itself does not directly support HTML parsing, we can accomplish this task by using third-party libraries.

2. HTML parsing in Go language

In Go language, we can use a variety of tools to parse HTML documents, such as golang.org/x/net/html, github.com/PuerkitoBio/goquery, etc. These tools provide a set of methods and structures for parsing, traversing, and modifying HTML documents.

2.1 Use golang.org/x/net/html

golang.org/x/net/html is one provided by Go language A standard library that provides a rich API for parsing HTML documents. Next, we'll demonstrate how to use the library to query node data in an HTML document.

The following is a simple HTML document:

<!DOCTYPE html>
<html>
  <head>
    <title>A Simple HTML Document</title>
  </head>
  <body>
    <h1 id="This-is-a-heading">This is a heading</h1>
    <p>This is a paragraph.</p>
    <p>This is another paragraph.</p>
  </body>
</html>

We now want to query the text content of all paragraph nodes (<p></p> tags) in this document. First, we need to parse the HTML document into a DOM tree structure, and then query the node data by recursively traversing the DOM tree.

package main

import (
    "fmt"
    "golang.org/x/net/html"
    "strings"
)

var htmlString = `
<!DOCTYPE html>
<html>
  <head>
    <title>A Simple HTML Document</title>
  </head>
  <body>
    <h1 id="This-is-a-heading">This is a heading</h1>
    <p>This is a paragraph.</p>
    <p>This is another paragraph.</p>
  </body>
</html>
`

func main() {
    reader := strings.NewReader(htmlString)
    doc, err := html.Parse(reader)
    if err != nil {
        fmt.Println("Failed to parse HTML string:", err)
        return
    }
    var find func(*html.Node)
    find = func(n *html.Node) {
        if n.Type == html.ElementNode && n.Data == "p" {
            fmt.Println(n.FirstChild.Data)
        } else {
            for c := n.FirstChild; c != nil; c = c.NextSibling {
                find(c)
            }
        }
    }
    find(doc)
}

In the above code, we use strings.NewReader() to convert the string to the io.Reader interface type and pass it to html.Parse() Function to parse HTML documents. Then, we define a recursive function named find() to traverse the DOM tree and find nodes that meet the conditions. When a paragraph node is encountered, we output the text content of that node. Finally, we call the find() function to query and output the text content of all paragraph nodes.

2.2 Use github.com/PuerkitoBio/goquery

github.com/PuerkitoBio/goquery is a very popular Go language library. It provides a simple and convenient way for HTML parsing and querying. We can use goquery to traverse and query HTML documents without having to delve into the structure of the DOM tree.

The following is a sample HTML document:

<!DOCTYPE html>
<html>
  <head>
    <title>A Simple HTML Document</title>
  </head>
  <body>
    <h1 id="This-is-a-heading">This is a heading</h1>
    <p>This is a paragraph.</p>
    <p>This is another paragraph.</p>
  </body>
</html>

We now want to query the text content of all paragraph nodes in the document, which can be easily achieved using goquery:

package main

import (
    "fmt"
    "github.com/PuerkitoBio/goquery"
    "strings"
)

var htmlString = `
<!DOCTYPE html>
<html>
  <head>
    <title>A Simple HTML Document</title>
  </head>
  <body>
    <h1 id="This-is-a-heading">This is a heading</h1>
    <p>This is a paragraph.</p>
    <p>This is another paragraph.</p>
  </body>
</html>
`

func main() {
    reader := strings.NewReader(htmlString)
    doc, err := goquery.NewDocumentFromReader(reader)
    if err != nil {
        fmt.Println("Failed to parse HTML string:", err)
        return
    }
    doc.Find("p").Each(func(i int, s *goquery.Selection) {
        fmt.Println(s.Text())
    })
}

In the above code, we use strings.NewReader() to convert the string to the io.Reader interface type and pass it to the goquery.NewDocumentFromReader() function to Parse HTML documents. Then, we use doc.Find("p") to query all paragraph nodes and output their text content through the s.Text() method.

3. Summary

This article introduces how to query the content of HTML documents in Go language. We explored two different approaches, using golang.org/x/net/html and github.com/PuerkitoBio/goquery. These tools are not only able to parse HTML documents, but also provide a rich API for traversing and manipulating the DOM tree. No matter which method you choose, you can easily obtain data from HTML documents, helping you build more elegant and efficient applications.

The above is the detailed content of golang query html. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
How to use the 'strings' package to manipulate strings in Go step by stepHow to use the 'strings' package to manipulate strings in Go step by stepMay 13, 2025 am 12:12 AM

Go's strings package provides a variety of string manipulation functions. 1) Use strings.Contains to check substrings. 2) Use strings.Split to split the string into substring slices. 3) Merge strings through strings.Join. 4) Use strings.TrimSpace or strings.Trim to remove blanks or specified characters at the beginning and end of a string. 5) Replace all specified substrings with strings.ReplaceAll. 6) Use strings.HasPrefix or strings.HasSuffix to check the prefix or suffix of the string.

Go strings package: how to improve my code?Go strings package: how to improve my code?May 13, 2025 am 12:10 AM

Using the Go language strings package can improve code quality. 1) Use strings.Join() to elegantly connect string arrays to avoid performance overhead. 2) Combine strings.Split() and strings.Contains() to process text and pay attention to case sensitivity issues. 3) Avoid abuse of strings.Replace() and consider using regular expressions for a large number of substitutions. 4) Use strings.Builder to improve the performance of frequently splicing strings.

What are the most useful functions in the GO bytes package?What are the most useful functions in the GO bytes package?May 13, 2025 am 12:09 AM

Go's bytes package provides a variety of practical functions to handle byte slicing. 1.bytes.Contains is used to check whether the byte slice contains a specific sequence. 2.bytes.Split is used to split byte slices into smallerpieces. 3.bytes.Join is used to concatenate multiple byte slices into one. 4.bytes.TrimSpace is used to remove the front and back blanks of byte slices. 5.bytes.Equal is used to compare whether two byte slices are equal. 6.bytes.Index is used to find the starting index of sub-slices in largerslices.

Mastering Binary Data Handling with Go's 'encoding/binary' Package: A Comprehensive GuideMastering Binary Data Handling with Go's 'encoding/binary' Package: A Comprehensive GuideMay 13, 2025 am 12:07 AM

Theencoding/binarypackageinGoisessentialbecauseitprovidesastandardizedwaytoreadandwritebinarydata,ensuringcross-platformcompatibilityandhandlingdifferentendianness.ItoffersfunctionslikeRead,Write,ReadUvarint,andWriteUvarintforprecisecontroloverbinary

Go 'bytes' package quick referenceGo 'bytes' package quick referenceMay 13, 2025 am 12:03 AM

ThebytespackageinGoiscrucialforhandlingbyteslicesandbuffers,offeringtoolsforefficientmemorymanagementanddatamanipulation.1)Itprovidesfunctionalitieslikecreatingbuffers,comparingslices,andsearching/replacingwithinslices.2)Forlargedatasets,usingbytes.N

Mastering Go Strings: A Deep Dive into the 'strings' PackageMastering Go Strings: A Deep Dive into the 'strings' PackageMay 12, 2025 am 12:05 AM

You should care about the "strings" package in Go because it provides tools for handling text data, splicing from basic strings to advanced regular expression matching. 1) The "strings" package provides efficient string operations, such as Join functions used to splice strings to avoid performance problems. 2) It contains advanced functions, such as the ContainsAny function, to check whether a string contains a specific character set. 3) The Replace function is used to replace substrings in a string, and attention should be paid to the replacement order and case sensitivity. 4) The Split function can split strings according to the separator and is often used for regular expression processing. 5) Performance needs to be considered when using, such as

'encoding/binary' Package in Go: Your Go-To for Binary Operations'encoding/binary' Package in Go: Your Go-To for Binary OperationsMay 12, 2025 am 12:03 AM

The"encoding/binary"packageinGoisessentialforhandlingbinarydata,offeringtoolsforreadingandwritingbinarydataefficiently.1)Itsupportsbothlittle-endianandbig-endianbyteorders,crucialforcross-systemcompatibility.2)Thepackageallowsworkingwithcus

Go Byte Slice Manipulation Tutorial: Mastering the 'bytes' PackageGo Byte Slice Manipulation Tutorial: Mastering the 'bytes' PackageMay 12, 2025 am 12:02 AM

Mastering the bytes package in Go can help improve the efficiency and elegance of your code. 1) The bytes package is crucial for parsing binary data, processing network protocols, and memory management. 2) Use bytes.Buffer to gradually build byte slices. 3) The bytes package provides the functions of searching, replacing and segmenting byte slices. 4) The bytes.Reader type is suitable for reading data from byte slices, especially in I/O operations. 5) The bytes package works in collaboration with Go's garbage collector, improving the efficiency of big data processing.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.