Home > Article > Backend Development > How to Read Unicode UTF-8 Files into Wstrings in Windows with C 11?
In the realm of Windows programming, the task of retrieving Unicode (UTF-8) data from a file into a wide character string (wstring) can be accomplished through the versatile capabilities provided by the C 11 standard.
The crux of this solution lies in utilizing the std::codecvt_utf8 facet. This facet serves as a bridge between UTF-8 encoded byte strings and character strings employing UCS2 or UCS4 representation. It holds the key to both reading and writing UTF-8 files, encompassing both text and binary formats.
To harness the power of the facet, a locale object is typically instantiated. This object encapsulates culture-specific information as a集合of facets that jointly define a specific localized environment. Once obtained, the stream buffer can be imbued with this locale.
With a meticulously crafted example, we demonstrate the practical application of this approach:
<code class="cpp">#include <sstream> #include <fstream> #include <codecvt> std::wstring readFile(const char* filename) { std::wifstream wif(filename); wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>)); std::wstringstream wss; wss << wif.rdbuf(); return wss.str(); }</code>
This function gracefully opens a designated UTF-8 file, reads its contents into a wstring, and returns the resulting string.
Another viable option involves setting the global C locale before engaging with string streams. This command ensures that all subsequent invocations of the std::locale default constructor will yield copies of the global C locale, obviating the need for explicit stream buffer imbuing.
<code class="cpp">std::locale::global(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));</code>
With this modification in place, wstrings can be effortlessly read from UTF-8 files:
<code class="cpp">std::wstring wstr = readFile("a.txt");</code>
The aforementioned techniques provide robust and efficient means of handling Unicode (UTF-8) files in Windows environments, enabling developers to effectively manipulate and process wide character strings.
The above is the detailed content of How to Read Unicode UTF-8 Files into Wstrings in Windows with C 11?. For more information, please follow other related articles on the PHP Chinese website!