Home  >  Article  >  Backend Development  >  Golang html.Parse rewrite href query string to contain &

Golang html.Parse rewrite href query string to contain &

王林
王林forward
2024-02-09 23:42:081164browse

Golang html.Parse重写href查询字符串以包含&

php editor Zimo is here to introduce you to a little trick about Golang. When parsing HTML using html.Parse, sometimes we need to rewrite the query string of href to include the & symbol. This technique can help us be more flexible and convenient when processing HTML links, and improve development efficiency. Next, we will explain in detail how to use this technique and give sample code, hoping it will be helpful to everyone.

Question content

I have the following code:

package main

import (
    "os"
    "strings"

    "golang.org/x/net/html"
)

func main() {
    myhtmldocument := `<!doctype html>
<html>
<head>
</head>
<body>
    <a href="http://www.example.com/input?foo=bar&baz=quux">wtf</a>
</body>
</html>`

    doc, _ := html.parse(strings.newreader(myhtmldocument))
    html.render(os.stdout, doc)
}

html.render function produces the following output:

<!DOCTYPE html><html><head>

</head>
<body>
    <a href="http://www.example.com/input?foo=bar&baz=quux">WTF</a>

</body></html>

Why rewrite the query string and convert & to & (between bar and baz)?

Is there a way to avoid this behavior?

I'm trying to do a template conversion but I don't want it to break my urls.

Solution

html.parse If you want to generate valid html, and the html specification stipulates that the ampersand in the href attribute must Encode.

https://www.w3.org/tr/xhtml1/guidelines .html#c_12

In sgml and xml, the ampersand character ("&") declares the beginning of an entity reference (for example, ® represents the registered trademark symbol "®"). Unfortunately, many html user agents silently ignore incorrect usage of the & symbol in html documents - treating an & symbol that doesn't look like an entity reference as a literal & symbol. XML-based user agents will not tolerate this incorrect use, and any document that incorrectly uses the & symbol will not be "valid" and therefore will not conform to this specification. To ensure that the document is compatible with historical html user agents and xml-based user agents, the & symbol used in the document, which is treated as a literal character, must represent itself as an entity reference (such as "&"). For example, when the href attribute of an element refers to a cgi script with parameters, it must be expressed as http://my.site.dom/cgi-bin/myscript.pl?class=guest& name=user instead of http://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user.

In this case, go actually makes your html better and more efficient

That being said - the browser will escape it, so if you click it, the resulting url will still be correct (no &, just &:

console.log(document.queryselector('a').href)
 <a href="http://www.example.com/input?foo=bar&baz=quux">WTF</a>

The above is the detailed content of Golang html.Parse rewrite href query string to contain &. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete