ByteBuffer.asCharBuffer打印字符串乱码

Z时代
2024-01-10
分类：综合

背景

下面代码会直接输出乱码。原因是：getBytes需要指定字符编码

ByteBuffer byteBuffer = ByteBuffer.wrap("开源中国".getBytes());
CharBuffer buffer = byteBuffer.asCharBuffer();
System.out.println(buffer.toString());

修改后就能符合预期：

ByteBuffer byteBuffer = ByteBuffer.wrap("开源中国".getBytes(StandardCharsets.UTF_16));
CharBuffer buffer = byteBuffer.asCharBuffer();
System.out.println(buffer.toString());

原因

字符编码和编码的方式不一致导致的。

getBytes 其实就是把字符串编码按照默认的UTF-8进行编码称为字节数组（看源码可知）.

byteBuffer.asCharBuffer()也就是转为UTF-8的CharBuffer。JAVA内部使用的是UTF-16

下面会证明：

ByteBuffer byteBuffer = ByteBuffer.wrap("开源中国".getBytes());
System.out.println(byteBuffer.limit()); // 12 每个字符3字节
// 把 ByteBuffer 使用UTF-8解码为UTF-16的CharBuffer然后toString输出
System.out.println(StandardCharsets.UTF_8.decode(byteBuffer).toString()); // 结果能正常输出
byteBuffer.rewind();
CharBuffer buffer = byteBuffer.asCharBuffer();
System.out.println(buffer.toString());

小结

JAVA要输出CharBuffer或者char数组，会直接认为是UTF-16编码来获取对应的代码点，最终找到映射的字符

String.getBytes：使用默认编码UTF-8(Windows会不一样)进行编码为对应字节

以上是 ByteBuffer.asCharBuffer打印字符串乱码的全部内容，来源链接： utcz.com/z/516245.html

回到顶部