Home >Backend Development >PHP Tutorial >Why Does PHP DOMDocument::loadHTML Fail with UTF-8 Encoding, and How Can I Fix It?

Why Does PHP DOMDocument::loadHTML Fail with UTF-8 Encoding, and How Can I Fix It?

Linda Hamilton
Linda HamiltonOriginal
2024-12-23 05:28:14677browse

Why Does PHP DOMDocument::loadHTML Fail with UTF-8 Encoding, and How Can I Fix It?

Failed to Encode UTF-8 with PHP DOMDocument::loadHTML

In certain scenarios, attempting to parse HTML using DOMDocument::loadHTML can result in encoding issues, particularly when UTF-8 encoding is involved. This article explores the reasons behind these problems and provides several solutions to address them effectively.

Cause of the Issue

By default, DOMDocument treats strings as encoded in ISO-8859-1, which is the HTTP/1.1 default character set. However, UTF-8 strings are interpreted incorrectly under this assumption, leading to encoding errors.

Alternative Solutions

1. Prepending Encoding Declarations

For straightforward (X)HTML snippets, prepend an XML or meta charset declaration to instruct the parser to treat the string as UTF-8:

$contentType = '<meta http-equiv="Content-Type" content="text/html; charset=utf-8">';
$dom->loadHTML($contentType . $profile);

$dom->loadHTML('<meta charset="utf8">' . $profile);

2. Using HTML SmartDOMDocument

This workaround can be applied if prior encoding declarations cannot be determined:

$dom->loadHTML(mb_convert_encoding($profile, 'HTML-ENTITIES', 'UTF-8'));

3. PHP 8.2 Workaround

For PHP 8.2 , use the following approach:

$dom->loadHTML(mb_encode_numericentity($profile, [0x80, 0x10FFFF, 0, ~0], 'UTF-8'));

Conclusion

By understanding the cause of encoding problems and employing the appropriate solutions, developers can effectively parse HTML with UTF-8 encoding using PHP's DOMDocument::loadHTML method.

The above is the detailed content of Why Does PHP DOMDocument::loadHTML Fail with UTF-8 Encoding, and How Can I Fix It?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn