Home >Backend Development >C++ >How to Determine the True Length of a UTF-8 Encoded std::string in C ?

How to Determine the True Length of a UTF-8 Encoded std::string in C ?

Linda HamiltonOriginal: 2024-10-27 20:43:30474browse

Determining the True Length of a UTF-8 Encoded std::string

In C , a std::string is an array of characters, each occupying one byte of memory. However, in the case of UTF-8 encoding, a single character may be represented using a sequence of multiple bytes. This leads to a discrepancy between the length of the string as reported by str.length() and its actual length in characters.

As per the UTF-8 character encoding standard, bytes are grouped into sequences, with the first byte indicating the length of the sequence:

0x00000000 - 0x0000007F: 1 byte
0x00000080 - 0x000007FF: 2 bytes
0x00000800 - 0x0000FFFF: 3 bytes
0x00010000 - 0x001FFFFF: 4 bytes

To determine the actual length of a UTF-8 encoded std::string, you can employ the following approach:

Iterate through the string character by character using the *s operator.
For each character, check if the first byte (using the & operator) matches the continuation byte pattern (10xxxxxx).

If the first byte does not match the continuation pattern, increment the length count. This indicates the start of a new character sequence.

Here's an example implementation:

<code class="c++">int len = 0;
while (*s) len += (*s++ & 0xc0) != 0x80;</code>

By following this approach, you can accurately determine the true length of a UTF-8 encoded std::string, which is essential for various operations, such as character counting, string manipulation, and data parsing.

The above is the detailed content of How to Determine the True Length of a UTF-8 Encoded std::string in C ?. For more information, please follow other related articles on the PHP Chinese website!

String Array if count for using Length operator this

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：Should You Null Pointers in Destructors? The Case Against and Why Alternatives are Better.Next article：Should You Null Pointers in Destructors? The Case Against and Why Alternatives are Better.

See more

How to Determine the True Length of a UTF-8 Encoded std::string in C ?

Related articles