search
HomeCommon ProblemHow many bytes does the char type occupy?

The char type occupies 1 byte in C or C and 2 bytes in java. char is used in C or C to define character variables, and the char data type is an integer type and only occupies 1 byte. In Java, the char type occupies 2 bytes because the Java compiler uses Unicode encoding by default, so 2 bytes (16 bits) can represent all characters.

How many bytes does the char type occupy?

The operating environment of this tutorial: Windows 7 system, Dell G3 computer.

I searched on Baidu for "how many bytes does char occupy" and got the following answer:

How many bytes does the char type occupy?

char is used for Character variables defined in C or C are an integer type, occupying only one byte, and the value range is -128 ~ 127 (-27 ~ 27-1).

The char type occupies 1 byte, which is 8 bits. The positive integers that can be stored are 0111 1111, which is 127.

Obviously this is not the result we want, so I continued to search for "how many bytes does char in java occupy"

How many bytes does the char type occupy?

Char in Java is used to store the data type of characters. It occupies 2 bytes and uses unicode encoding. Its first 128 bytes of encoding are compatible with ASCII, but some characters require two chars to represent them.

Why does char occupy the same number of bytes in C or C and java?

What does it mean that some characters require two chars to represent?

Encoding

Before discussing this issue, let us first popularize some knowledge points.

First of all, we all know that the information stored in the computer is represented by binary numbers, so how do we let the computer store the Chinese characters or English that we humans use?

For example, how to convert 'a' into binary and store it in the computer is called encoding;

And analyzing and displaying the binary number stored in the computer is called Decode for .

Character set

Character (Character) is a general term for various characters and symbols, including various national characters, punctuation marks, graphic symbols, Numbers etc. A character set (Character set) is a collection of multiple characters. There are many types of character sets, and each character set contains a different number of characters. Common character set names: ASCII character set, GB2312 character set, BIG5 character set, GB18030 character set , Unicode character set, etc. This is the explanation given by Baidu Encyclopedia. Anyway, a character set is a collection of characters. There are many types of character sets, and the number of characters in a character set is also different. In order for a computer to accurately process text in various character sets, character encoding is required so that the computer can recognize and store various text.

unicode

Its name is Unicode, also called Universal Code. The number of symbols is constantly increasing and has exceeded one million.

Before Unicode was created, there were hundreds of encoding systems. No encoding can contain enough characters. As can be seen from its name, it is an encoding of all symbols. Each symbol is given a unique encoding, so the garbled code problem caused by different encodings will disappear.

Most computers use ASCII (American Standard Code for Information Interchange), which is a 7-bit encoding scheme that represents all uppercase and lowercase letters, numbers, punctuation marks, and control characters. Unicode contains ASCII codes, and '\u0000' to '\u007F' correspond to all 128 ACSII characters.

I can’t help but feel that only those with strength can set standards. Unicode is just a symbol set. It only specifies the binary code of the symbol. It only provides the mapping between characters and numbers, but does not specify how this binary code should be stored. We know that the number of English letters is very small and can be represented by one byte, but the number of Chinese symbols in Unicode is very large, and one byte cannot be used at all. As a result, various implementation methods for unicode character storage appeared later, such as UTF-8, UTF-16, etc. UTF-8 is the most widely used Unicode implementation on the Internet.

Inner code and external code

We often say that char in java occupies several bytes, which should be the char in the internal code in java.

Internal code refers to the encoding method of char and string in memory when Java is running; external code is the character encoding used externally when the program interacts with the outside world, such as serialization technology. Foreign code can be understood as: as long as it is not an internal code, it is a foreign code. It should be noted that the encoding method in the object code file (executable file or class file) generated by source code compilation belongs to foreign code. The internal code in the JVM uses UTF16. The 16 in UTF-16 refers to the minimum unit of 16 bits, that is, two bytes are one unit. In the early days, UTF16 was encoded using a fixed-length 2-byte encoding. Two bytes can represent 65536 symbols (in fact, it can actually represent less than this), which was enough to represent all characters in Unicode at that time. However, with the increase of characters in Unicode, 2 bytes cannot represent all characters. UTF16 uses 2 bytes or 4 bytes to complete the encoding. To deal with this situation, Java uses a pair of char to represent characters that require 4 bytes, taking into account forward compatibility requirements. Therefore, char in Java takes up two bytes, but some characters require two chars to represent them. This explains why some characters require two chars to represent them.

In addition: Java's class file uses UTF8 to store characters, that is to say, the characters in the class occupy 1 to 6 bytes. During Java serialization, characters are also encoded in UTF8, accounting for 1 to 6 characters.

length()

Then here’s another question: What is the String.length() of a character in Java?

After reading the previous knowledge points, you can’t open your mouth anymore and answer 1... Write a demo and have a look: use tiger to test in the Year of the Tiger, tigerUTF represents the corresponding unicode encoding.

         String tiger = "?";
         String tigerUTF = "\uD83D\uDC05";
         System.out.println(tigerUTF);
         System.out.println(tiger.length()); 
         System.out.println(tiger.codePointCount(0,tiger.length()));

How many bytes does the char type occupy?

It can be concluded that the result of calling String.length() is 2, which means that the string char array occupies UTF- 2 code units (i.e. 4 bytes) of 16 format, not how many characters there are. Of course, we can use the codePointCount method to get how many characters we want to get.

For more related knowledge, please visit the FAQ column!

The above is the detailed content of How many bytes does the char type occupy?. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
deepseek web version official entrancedeepseek web version official entranceMar 12, 2025 pm 01:42 PM

The domestic AI dark horse DeepSeek has risen strongly, shocking the global AI industry! This Chinese artificial intelligence company, which has only been established for a year and a half, has won wide praise from global users for its free and open source mockups, DeepSeek-V3 and DeepSeek-R1. DeepSeek-R1 is now fully launched, with performance comparable to the official version of OpenAIo1! You can experience its powerful functions on the web page, APP and API interface. Download method: Supports iOS and Android systems, users can download it through the app store; the web version has also been officially opened! DeepSeek web version official entrance: ht

In-depth search deepseek official website entranceIn-depth search deepseek official website entranceMar 12, 2025 pm 01:33 PM

At the beginning of 2025, domestic AI "deepseek" made a stunning debut! This free and open source AI model has a performance comparable to the official version of OpenAI's o1, and has been fully launched on the web side, APP and API, supporting multi-terminal use of iOS, Android and web versions. In-depth search of deepseek official website and usage guide: official website address: https://www.deepseek.com/Using steps for web version: Click the link above to enter deepseek official website. Click the "Start Conversation" button on the homepage. For the first use, you need to log in with your mobile phone verification code. After logging in, you can enter the dialogue interface. deepseek is powerful, can write code, read file, and create code

How to solve the problem of busy servers for deepseekHow to solve the problem of busy servers for deepseekMar 12, 2025 pm 01:39 PM

DeepSeek: How to deal with the popular AI that is congested with servers? As a hot AI in 2025, DeepSeek is free and open source and has a performance comparable to the official version of OpenAIo1, which shows its popularity. However, high concurrency also brings the problem of server busyness. This article will analyze the reasons and provide coping strategies. DeepSeek web version entrance: https://www.deepseek.com/DeepSeek server busy reason: High concurrent access: DeepSeek's free and powerful features attract a large number of users to use at the same time, resulting in excessive server load. Cyber ​​Attack: It is reported that DeepSeek has an impact on the US financial industry.

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment