This article mainly introduces relevant information on detailed examples of java string interception. Here is an example code to help you realize such a function. Friends in need can refer to
java string Detailed explanation of intercepted examples
Preface:
In Java, the length of the string "abcd" and the string "abhello" are the same, both four characters.
But the corresponding number of bytes is different, one Chinese character occupies two bytes.
Define a method to get a substring based on the specified number of bytes.
For example: for "Hello ab", if you take three bytes, then the substring is half of the word ab and "you", then half of it must be discarded.
If you take four bytes, it is "ab you", and if you take five bytes, it is still "ab you".
Only GBK and utf-8 encodings are considered
Example code:
import java.io.UnsupportedEncodingException; import org.junit.Test; /** * @author<a href="mailto:953801304@qq.com" rel="external nofollow" >胡龙华</a> * @version 2017-4-4 下午1:08:45 * @fileName StringCut.java */ public class StringCut { @Test public void analyze(){ String str1 = "你好abc"; byte[] bs1=null; byte[] bs2=null; try { bs1 = str1.getBytes("GBK"); System.out.println("---GBK---"); for(byte b:bs1){ System.out.print(b+" "); } System.out.println(); //-60 -29 -70 -61 97 98 99 // 发现规律,再gbk中一个中文汉字 都是以两个字节 小于0的数存储 bs2 = str1.getBytes("utf-8"); System.out.println("---utf-8---"); for(byte b:bs2){ System.out.print(b+" "); } //-28 -67 -96 -27 -91 -67 97 98 99 // 发现规律,在utf-8中一个中文汉字 是以三个字节 小于0 的数存储 } catch (UnsupportedEncodingException e) { e.printStackTrace(); } } /** * 思路:从第len个往前数,连续2的倍数个负数则全部输出,单数个则去掉最后一个输出 * @param str * @param len * @return */ private static String StringCutByGBK(String str,int len){ byte[] bs = null; try { int count = 0; bs = str .getBytes("GBK"); for(int i=len-1;i>=0;i--){ if(bs[i]<0){ count++; }else{ break; } // 0 1 2 3 4 5 6 7 8 9 10 11 12 } //-60 -29 -70 -61 -80 -95 97 98 99 -76 -17 -72 -25 if(count%2==0){ String s=new String(bs, 0, len, "GBK"); System.out.println("截取"+len+"个字符:"+s); }else{ String s=new String(bs, 0, len-1, "GBK"); System.out.println("截取"+len+"个字符:"+s); } } catch (UnsupportedEncodingException e) { e.printStackTrace(); } return null; } /** * 思路:从第len个往前数,连续3的倍数个负数则全部输出,其他情况则去掉最后count%3个输出 * @param str * @param len * @return */ private static String StringCutByUTF8(String str,int len){ byte[] bs = null; try { int count = 0; bs = str .getBytes("UTF-8"); for(int i=len-1;i>=0;i--){ if(bs[i]<0){ count++; }else{ break; } } // 0 1 2 3 4 5 6 7 8 9 10 11 12 //-60 -29 -70 -61 -80 -95 97 98 99 -76 -17 -72 -25 if(count%3==0){ String s=new String(bs, 0, len, "UTF-8"); System.out.println("截取"+len+"个字符:"+s); }else{ String s=new String(bs, 0, len-count%3, "UTF-8"); System.out.println("截取"+len+"个字符:"+s); } } catch (UnsupportedEncodingException e) { e.printStackTrace(); } return null; } @Test public void TEST() { String str = "你好啊abc达哥"; try { System.out.println("---测试gbk---"); byte bs [] = str.getBytes("GBK"); for(int i=0;i<=bs.length;i++){ //System.out.print(bs[i]+" "); StringCutByGBK(str,i); } System.out.println("---测试UTF-8---"); byte bs2 [] = str.getBytes("utf-8"); for(int i=0;i<=bs2.length;i++){ //System.out.print(bs[i]+" "); StringCutByUTF8(str,i); } } catch (UnsupportedEncodingException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
The above is the detailed content of How to implement string interception in Java?. For more information, please follow other related articles on the PHP Chinese website!