What to do if the bufferedinputstream is garbled?-Common Problem-php.cn

Home

Common Problem

What to do if the bufferedinputstream is garbled?

藏色散人

Mar 22, 2023 am 11:22 AM

Garbled characters

bufferedinputstream乱码是因为BufferedInputStream读取的是字节byte，那么如果读取的数据比较长，并且没有一次性读完，就会出现乱码，其解决乱码问题的办法就是用BufferedReader来读取，其读取代码如“BufferedReader reader = new BufferedReader (...)”。

What to do if the bufferedinputstream is garbled?

本教程操作环境：Windows10系统、Java8.0、Dell G3电脑。

bufferedinputstream乱码怎么办？

BufferedInputStream和BufferedOutputStream用法解决乱码

昨晚写了一个把所有的简体汉字转换成繁体并且取出拼音的程序，在IO流操作中遇到了中文乱码问题。

下面是我写的程序

package com.java.utils.charactor;
 
import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
 
/**
 * 简繁体转换
 *
 * @author pengjianbo <pengjianbosoft@gmail.com>
 * $Id$
 */
public class SimTradConvert {
 
    public SimTradConvert() throws Exception {
 
        File simplFile = new File(
                "D:\\android\\JavaUtils\\src\\com\\java\\utils\\charactor\\simplified.txt");
        FileInputStream simplFis = new FileInputStream(simplFile);
        BufferedInputStream simplBis = new BufferedInputStream(simplFis);
        BufferedReader simplBr = new BufferedReader(new InputStreamReader(simplBis));
        StringBuffer simplsb = new StringBuffer();
 
        byte[] simplb = new byte[1024];
        while ((simplBis.read(simplb)) != -1) {
            simplsb.append(new String(simplb));
        }
        
        simplFis.close();
        simplBis.close();
        
        
        File tradFile = new File(
                "D:\\android\\JavaUtils\\src\\com\\java\\utils\\charactor\\traditional.txt");
        FileInputStream tradFis = new FileInputStream(tradFile);
        BufferedInputStream tradBis = new BufferedInputStream(tradFis);
        StringBuffer tradsb = new StringBuffer();
 
        byte[] tradb = new byte[1024];
        while ((tradBis.read(tradb)) != -1) {
            tradsb.append(new String(tradb));
        }
        
        tradBis.close();
        tradFis.close();
        
        System.out.println(simplsb.toString());
        /*CnGetPinyin pinyin = new CnGetPinyin();
        //连接SQLite的JDBC
        Class.forName("org.sqlite.JDBC");
        Connection conn = DriverManager.getConnection("jdbc:sqlite:pai.db");
        Statement stat = conn.createStatement();
        for(int i = 0; i < simplsb.length() -1; i++ ) {
            
            stat.executeUpdate( "insert into CNLang(pinyin,simp,trad) values(&#39;" + pinyin.getPinyin(simplsb.substring(i, i + 1)) + "&#39;,&#39;"
                                + simplsb.substring(i, i + 1) + "&#39;,&#39;" + tradsb.substring(i, i + 1) + "&#39;)");
            System.out.println("正在添加:" + simplsb.substring(i, i + 1) + "-->"  + tradsb.substring(i, i + 1));
            if( i > simplsb.length() -1 ) {
                stat.close();
                conn.close();
            }
        }*/
        
    }
 
    public static void main(String[] args) throws Exception {
        new SimTradConvert();
    }
 
}

在我的这个程序中，用BufferedInputStream，而且用了read(byte[]),就出了读取出来现在部分的中文乱码，我想是我这个byte[] tradb = new byte[1024];缓冲大小设置的问题，试图去更改byte[]的在小，结果出现乱码的地方和原先的不一样了。也就说明了，在缓冲的末尾的时候出了问题，末尾的那个字节容纳不了一个汉字，所以出现的乱码。我想如果用read()去读取的话应该不会出现这个问题的（没试过）。像我的这种读取大量的中文数据我想我宁愿用read去读，大不了就开一个线程嘛。

下面是我看到网上别人写的博客：后来在网上找一下资料，转载如下：

BufferedInputStream和BufferedOutputStream是过滤流,需要使用已存在的节点来构造,即必须先有InputStream或OutputStream,相对直接读写,这两个流提供带缓存的读写,提高了系统读写效率性能.BufferedInputStream读取的是字节byte,因为一个汉字占两个字节,而当中英文混合的时候,有的字符占一个字节,有的字符占两个字节,所以如果直接读字节,而数据比较长,没有一次读完的时候,很可能刚好读到一个汉字的前一个字节,这样,这个中文就成了乱码,后面的数据因为没有字节对齐,也都成了乱码.所以我们需要用BufferedReader来读取,它读到的是字符,所以不会读到半个字符的情况,不会出现乱码.

package com.pocketdigi;
 
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
 
public class Main {
 
    public static void main(String[] args) throws IOException {
        File f = new File("d:/a.txt");
        FileOutputStream fos = new FileOutputStream(f);
        // 构建FileOutputStream对象,文件不存在会自动新建
        BufferedOutputStream bos = new BufferedOutputStream(fos);
        bos.write("1我是中文".getBytes());
        bos.close();
        // 关闭输出流,写入数据,如果下面还要写用flush();
        // 因为是BufferOutputStream链接到FileOutputStream,只需关闭尾端的流
        // 所以不需要关闭FileOutputStream;
        FileInputStream fis = new FileInputStream(f);
        BufferedInputStream bis = new BufferedInputStream(fis);
        BufferedReader reader = new BufferedReader (new InputStreamReader(bis));
        //之所以用BufferedReader,而不是直接用BufferedInputStream读取,是因为BufferedInputStream是InputStream的间接子类,
        //InputStream的read方法读取的是一个byte,而一个中文占两个byte,所以可能会出现读到半个汉字的情况,就是乱码.
        //BufferedReader继承自Reader,该类的read方法读取的是char,所以无论如何不会出现读个半个汉字的.
        StringBuffer result = new StringBuffer();
        while (reader.ready()) {
            result.append((char)reader.read());
        }
        System.out.println(result.toString());
        reader.close();
 
 
    }
 
}

推荐学习：《Java视频教程》

The above is the detailed content of What to do if the bufferedinputstream is garbled?. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn