Garbled characters appear when converting byte[] to String in java
First of all, it is very simple to directly convert the commonly used byte[] to String:
public static void main(String[] args) { String str="我是中国人"; byte[] arr=str.getBytes(); System.out.println("打印:"+arr); for(byte e : arr) { System.out.print(e + " "); } String str2=new String(arr); System.out.println("\n打印2:"+str2); }
java related video recommendations: java learning
For example, the output result of the above is:
打印:[B@15db9742 -50 -46 -54 -57 -42 -48 -71 -6 -56 -53 打印2:我是中国人
You will know the encoding when you see this. byte is one byte, and Chinese characters are two bytes. Therefore, five Chinese characters require ten byte types of digital storage. Then the numbers are turned into Chinese characters, and there is a process of coding standards.
So how does java handle character encoding?
JAVA uses its own String class, and String class objects do not need to specify a coding table! Why does it know what characters each of a bunch of numbers represents? This is because the character information in String is stored in UNICODE encoding. In order to represent characters (note that it is a single character), JAVA also has the data type char, and its size is a fixed length of two 8-digit hexadecimal digits, which is 0~65535. The purpose is to correspond to a character in UNICODE.
If you want to get a UNICODE number in a String, you can use getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)
method to get a char[], this char[] represents String characters, numbers encoded according to the UNICODE encoding table.
Why is there garbled code when converting byte[ ] to String?
Obviously, as mentioned above, the coding standards are different. For example, the Chinese word "dang" in the GB2312 standard is represented by two eight-digit numbers 0xB5 and 0xB1. On the English system, there is no GB2312 encoding table. If you give it a 0xB5, 0xB1, it will be treated as ASCII. Put it in Java, and it will process it according to its own UNICODE specification, so if the specifications are different, strange results will appear, that is, garbled characters.
So how do we solve the garbled problem of converting byte[] to String?
It depends on where byte[] comes from. It is often encountered that a picture needs to be converted into byte[] and then converted into a String stream object for transmission to other places. The receiver then converts it into byte[] and then into a picture.
1. If byte[] is transmitted directly, data loss will occur if byte[] is too long. Because not all byte combinations can be mapped to char.
2. Use the common Base64 encoding specification. The encoding specification of base64 is to convert common characters into 6-bit binary representation (64 are commonly used, so it is called base64). How to write, there are ready-made tool classes as follows:
import org.apache.commons.codec.binary.Base64; public class UtilHelper { //base64字符串转byte[] public static byte[] base64String2ByteFun(String base64Str){ return Base64.decodeBase64(base64Str); } //byte[]转base64 public static String byte2Base64StringFun(byte[] b){ return Base64.encodeBase64String(b); } }
This way, the standard conversion between byte[] and String can be guaranteed.
More related articles and tutorials are recommended: Java zero-based introduction
The above is the detailed content of Garbled characters appear when converting byte[] to String in java. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 English version
Recommended: Win version, supports code prompts!

SublimeText3 Mac version
God-level code editing software (SublimeText3)