Home  >  Article  >  Java  >  How many bytes does a string occupy in java?

How many bytes does a string occupy in java?

王林
王林Original
2019-11-22 15:39:338215browse

How many bytes does a string occupy in java?

First of all, char is the basic type of Java. The number of bytes occupied by the basic type is fixed. For example, int occupies 4 bytes and double occupies 8 bytes. This can make Java The types occupied on different platforms are fixed, which ensures the portability of Java. Therefore, the char type in Java fixedly occupies 2 bytes. (Note: The char type can also store a Chinese character).

Secondly, String is stored in a more flexible way. In String, one English character occupies 1 byte, while Chinese characters occupy different numbers of bytes depending on the encoding. Under UTF-8 encoding, one Chinese character occupies 3 bytes; while using GBK encoding, one Chinese character occupies 2 bytes.

The test code is as follows:

import java.io.UnsupportedEncodingException;

public class StrTest {

    public static void main(String[] args) throws UnsupportedEncodingException {
        String str1 = "hello";
        String str2 = "你好abc";

        System.out.println("utf-8编码下'hello'所占的字节数:" + str1.getBytes("utf-8").length);
        System.out.println("gbk编码下'hello'所占的字节数:" + str1.getBytes("gbk").length);

        System.out.println("utf-8编码下'你好abc'所占的字节数:" + str2.getBytes("utf-8").length);
        System.out.println("gbk编码下你好'你好abc'所占的字节数:" + str2.getBytes("gbk").length);
    }

}

Output result:

utf-8编码下’hello’所占的字节数: 5
gbk编码下’hello’所占的字节数: 5
utf-8编码下’你好abc’所占的字节数: 9
gbk编码下你好’你好abc’所占的字节数: 7

It can be seen that for String, one English character occupies 1 byte. Chinese characters occupy 2 (GBK encoding) or 3 (UTF-8 encoding) bytes. You can also use this method to check the status of other encodings, which will not be described here.

Finally, based on the characteristics of String, you can determine whether a string contains Chinese characters. The example is as follows:

public class StrTest {

    public static void main(String[] args) throws UnsupportedEncodingException {
        searchChineseCharacter("Good morning");
        searchChineseCharacter("hello 早上好");
    }

    //找出一个字符串中的汉字
    public static void searchChineseCharacter(String str){
        //正则表达式,用于匹配中文字符
        String regex = "[\u4e00-\u9fa5]";

        //如果str的长度和其所占字节数不等,说明包含中文
        if (str.length() != str.getBytes().length){
            Pattern pattern = Pattern.compile(regex);
            Matcher matcher = pattern.matcher(str);
            System.out.print("'" + str + "' 中的汉字为:");

            while (matcher.find()){
                System.out.print(matcher.group());
            }
        }

        else {
            System.out.println("'" + str + "' 中无汉字");
        }
    }

}

Output result:

‘Good morning’ 中无汉字
‘hello 早上好’ 中的汉字为:早上好

Recommended tutorial: java Getting Started Tutorial

The above is the detailed content of How many bytes does a string occupy in java?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn