MSCV 如何处理中文的字符串字面量?
我在用VS2019写C++的过程中发现一个问题,在代码里面直接写中文,转换成字符数组时候会截断,具体代码如下:
#include <cstdio>#include <cstring>
#include <bitset>
using namespace std;
int main(){
const char* str1 = "退出 ";//最后有一个半角空格
int len1 = strlen(str1);
const char* str2 = "退出 ";//最后有一个全角空格
int len2 = strlen(str2);
printf("str1:%s, len1:%d\n", str1, len1);
for (int i = 0; i < len1; ++i)
printf("%s:0x%02X\n", bitset<8>(str1[i]).to_string().c_str(), (unsigned char)str1[i]);
printf("=====\n");
printf("str2:%s, len2:%d\n", str2, len2);
for (int i = 0; i < len2; ++i)
printf("%s:0x%02X\n", bitset<8>(str2[i]).to_string().c_str(), (unsigned char)str2[i]);
return 0;
}
上面这段代码,我用notepad++保存为UTF-8无BOM编码的cpp文件,用VS2019运行的结果是:
用WSL的G++ 7.5.0运行的结果是:
VS2019的显示有问题,不过这是控制台的事情,编码确实是UTF-8,我不清楚的地方是在字符数组里面, '出'的编码0xE587BA怎么就变成了0xE5873F?不但少了一个空格,连最后一个值都变了,为什么加了一个全角空格就可以?这好像和代码页有关,但是我搜了一会也没找到很好的解释。我现在已经知道怎么正确用VS显示中文,只是我不太明白这个现象的问题出在哪,求解惑。
回答:
微软文档:
You can use the /utf-8 option to specify both the source and execution character sets as encoded by using UTF-8. It is equivalent to specifying /source-charset:utf-8 /execution-charset:utf-8 on the command line. Any of these options also enables the /validate-charset option by default. For a list of supported code page identifiers and character set names, see Code Page Identifiers.By default, Visual Studio detects a byte-order mark to determine if the source file is in an encoded Unicode format, for example, UTF-16 or UTF-8. If no byte-order mark is found, it assumes the source file is encoded using the current user code page, unless you have specified a code page by using /utf-8 or the /source-charset option. Visual Studio allows you to save your C++ source code by using any of several character encodings. For information about source and execution character sets, see Character Sets in the language documentation.
你这个用法正好是微软不予支持的。
VS2019 可以自动检测 BOM ,但是没有 BOM ,那么默认是系统编码。如果想用 utf-8 ,需要根据自己的需求增加 /utf-8
或者 /source-charset:utf-8
选项。
以上是 MSCV 如何处理中文的字符串字面量? 的全部内容, 来源链接: utcz.com/p/193308.html