Is the html tag closed or not?

When you write HTML5 code, have you ever wondered whether you should write or , whether to write or . Why is it wrong to write ? Anyway, I struggled with it, and I found that this topic is actually much more interesting than I thought.

If you are not interested in my research process, you can jump directly to the "Legitimacy" section to get the answer.

Void elements A contentless element is a special element that cannot contain any content. Other elements, such as , can contain no content or contain another element or text.

The more common content-less elements are:

Less common content-less elements include:

These are all the existing contentless elements.

is not legal HTML because it implies that br can contain content (but Hello! is completely meaningless). Both and are very common.

Although we all know that XHTML forces you to write , there is no explicit requirement in HTML.

Tracing history In order to fully understand the contentless element, we need to understand its history.

HTML, XML and XHTML are all based on SGML. The full name of SGML is "Standard General Markup Language", which was drafted in 1986. Both HTML and XML are derived from SGML, where XML is a constrained subset of SGML and XHTML is based on XML.

XHTML is basically the same as HTML, but based on XML.

After knowing this relationship, let’s move on to the most interesting part of this article.

SGML has a feature called "Null End Tag (NET)". When there is only simple text in the label, using an untailed label eliminates the need to close the label. For example, you can write Quoted text as .)

Then, a tag that does not contain any content can be written as <quote//, where quote is the tag name, the first / is used to enable the tailless tag, and the second / indicates the end of the tailless tag.

If according to this logic, the first half of , then won’t be parsed as >? If you think like me, you also think this syntax is stupid.

Unfortunately, the creators of the HTML4 specification didn't think so and wrote it into the specification. However, it is obvious that the browser manufacturers at the time did not care about this syntax and did not support it to a great extent. (At this point, maybe browser manufacturers have done a good thing.)

The makers of the XML (also applicable to XHTML) specification realized that this syntax was not very good, so they simply did not include the feature of tailless tags. At the same time, it provides a relatively easy-to-understand syntax for content-free tags. The name of this syntax is "elementless tags", and it looks like this: . This syntax looked so natural that most developers at the time thought it was the correct way to write it.

Fortunately, HTML is constantly improving, and W3C members are learning from the mistakes they made in the past. That's why HTML5 has made such great progress compared to previous versions.

When introducing the new syntax of HTML5, W3C said:

The syntax of HTML5 is fully compatible with HTML4 and XHTML1, but is not compatible with those obscure HTML4 features in SGML. For example, no tail tag (<em/content/).

HTML5 Well done!

(I think they should keep the "short tag" feature, like Nice , which I think is cool. But at least HTML is no longer so cluttered.)

Legality Okay, let’s go back to the question of legality at the beginning of the article. The current HTML5 specification’s explanation of non-content tags is this:

Such tags should consist of the following components, in the order consistent with the following table:

A "<" character.

Tag name.

This item is optional, one or more attributes, each attribute must be preceded by one or more spaces.

This is optional, one or more spaces.

This item is optional, a "/" character, this item can only appear in content-less elements.

A ">" character.

The "/" character in the penultimate part is optional and has no actual meaning. So there is actually no real difference between and .

Correctness Developers who like XML and XHTML may say, "Yes, although / is optional, is 'more correct'."

I must tell you that you are wrong. In fact, some argue that / in contentless tags is actually a tolerated grammatical error. This tolerance is based on compatibility considerations, which causes all browsers and parsers to treat and equally.

Regarding this point, Google's code style guide also clearly stipulates not to close the content-free tag.

shortcoming Of course, there are also disadvantages to not closing contentless tags, but I don't think this can cover up its advantages: making your code clean and concise.

The first disadvantage is that the developer must know which tags have no content tags. Suppose you don't know whether is a contentless tag, then when you can't find its closing tag, you will wonder whether you should close this tag. However, there are only a few content-free tags in total, and you can usually tell at a glance whether a certain tag is a content-free tag.

The second disadvantage is that the editor may not handle unclosed contentless tags well. Editor developers must understand contentless tags and provide appropriate syntax highlighting and code completion. When you write an in the editor, the editor must know that it will never be followed by .

But these functions are very simple to implement, and the editors I know support this aspect quite well, so this is not really a disadvantage.

My thoughts on content-free tags I think the concept of contentless tags can actually be eliminated from HTML. We can add content to these tags to replace some of its attributes.

Take the tag as an example. It has a mandatory alt attribute. This attribute exists to prevent users who cannot see images (perhaps due to physical defects or because the device they are using does not support images) Know what the image is about (if the image is just for aesthetic reasons, you really shouldn't add the alt attribute).

My question is: why not use the content of instead of the alt attribute? I think it’s more intuitive to write it this way:

Image of doge .

The tag even has an attribute called content! Why not just write the value of content directly into the content of the tag? should be written as Value content , just like . There are many other tags.

So there are only a few content-free tags that should really be retained, but the W3C must consider backward compatibility, so it is still very difficult to change the status quo.

Final Thoughts: tag This label really bothers me because its meaning is very simple, but its writing is very wordy. This way of writing seems wrong, because the content of is not logically related to my-script.js. (The HTML specification allows you to add both content and src attributes to it)

The problem is that the tag is not a contentless tag, you can write JavaScript inside it. So there is no optional / closing tag (Annotation: This is why is wrong).

Using the tag instead of is perfect because it is already used to import external files and provides all the required attributes. Of course, the web platform always needs to consider backward compatibility, otherwise all old browsers that do not support this syntax will not be able to parse your page.

Original link: Matias Meno Translation: Bole Online - Fang Yinghang