Home >Backend Development >PHP Tutorial >Why are UTF-8 Characters Corrupted When Using `file_get_contents()`?

Why are UTF-8 Characters Corrupted When Using `file_get_contents()`?

Susan Sarandon
Susan SarandonOriginal
2024-12-09 22:42:13414browse

Why are UTF-8 Characters Corrupted When Using `file_get_contents()`?

file_get_contents() Interrupts UTF-8 Characters

The issue arises when loading HTML from an external server with UTF-8 encoding. Characters like ľ, š, č, ť, ž are corrupted and replaced with invalid characters.

The Root of the Problem

The file_get_contents() function may be encountering encoding issues. By default, it interprets the data as ASCII, which fails to handle UTF-8 characters correctly.

Proposed Solution

To resolve this, consider using an alternative encoding method.

1. Manual Encoding Conversion

Use the mb_convert_encoding() function to convert the fetched HTML to UTF-8:

$html = file_get_contents('http://example.com/foreign.html');
$utf8_html = mb_convert_encoding($html, 'UTF-8', mb_detect_encoding($html, 'UTF-8', true));

2. Output Encoding

Ensure the output is properly encoded by adding the following line to the script:

header('Content-Type: text/html; charset=UTF-8');

3. HTML Entity Conversion

Convert the fetched HTML to HTML entities before outputting it:

$html = file_get_contents('http://example.com/foreign.html');
$html_entities = htmlentities($html, ENT_COMPAT, 'UTF-8');
echo $html_entities;

4. JSON Decoding

If the external HTML is stored as JSON, decode it using the JSON class:

$json = file_get_contents('http://example.com/foreign.html');
$decoded_json = json_decode($json, true);
$html = $decoded_json['html'];

By utilizing these techniques, you can circumvent the encoding issues caused by file_get_contents() and ensure the proper display of UTF-8 characters.

The above is the detailed content of Why are UTF-8 Characters Corrupted When Using `file_get_contents()`?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn