c++ jsoncpp中文和\uXXXX使用toStyledString生成字符串中文乱码解决方案
一、中文乱码解决方法1.1、乱码展示在使用jsoncpp解析含有中文的字符串的时候,使用toStyledString()函数生成的字符串中的中文部分将变成\u加4个16进制数字会出现解析乱码的情况。比如:1.2、乱码原因及解决方法jsoncpp的源码来分析(官方下载地址:http://sourceforge.net/projects/jsoncpp/files/)。通过分析StyledWriter
目录
一、中文乱码解决方法
1.1、乱码展示
在使用jsoncpp解析含有中文的字符串的时候,使用toStyledString()函数生成的字符串中的中文部分将变成\u加4个16进制数字会出现解析乱码的情况。
比如:
1.2、乱码原因及解决方法
jsoncpp的源码来分析(官方下载地址:http://sourceforge.net/projects/jsoncpp/files/ )。通过分析StyledWriter的writeValue函数发现他对字符串的处理通过valueToQuotedStringN函数进行了转义:
static String valueToQuotedStringN(const char* value, unsigned length) {
if (value == nullptr)
return "";
if (!isAnyCharRequiredQuoting(value, length))
return String("\"") + value + "\"";
// We have to walk value and escape any special characters.
// Appending to String is not efficient, but this should be rare.
// (Note: forward slashes are *not* rare, but I am not escaping them.)
String::size_type maxsize = length * 2 + 3; // allescaped+quotes+NULL
String result;
result.reserve(maxsize); // to avoid lots of mallocs
result += "\"";
char const* end = value + length;
for (const char* c = value; c != end; ++c) {
switch (*c) {
case '\"':
result += "\\\"";
break;
case '\\':
result += "\\\\";
break;
case '\b':
result += "\\b";
break;
case '\f':
result += "\\f";
break;
case '\n':
result += "\\n";
break;
case '\r':
result += "\\r";
break;
case '\t':
result += "\\t";
break;
// case '/':
// Even though \/ is considered a legal escape in JSON, a bare
// slash is also legal, so I see no reason to escape it.
// (I hope I am not misunderstanding something.)
// blep notes: actually escaping \/ may be useful in javascript to avoid </
// sequence.
// Should add a flag to allow this compatibility mode and prevent this
// sequence from occurring.
default: {
unsigned int cp = utf8ToCodepoint(c, end);
// don't escape non-control characters
// (short escape sequence are applied above)
if (cp < 0x80 && cp >= 0x20)
result += static_cast<char>(cp);
else if (cp < 0x10000) { // codepoint is in Basic Multilingual Plane
result += "\\u";
result += toHex16Bit(cp);
}
else { // codepoint is not in Basic Multilingual Plane
// convert to surrogate pair first
cp -= 0x10000;
result += "\\u";
result += toHex16Bit((cp >> 10) + 0xD800);
result += "\\u";
result += toHex16Bit((cp & 0x3FF) + 0xDC00);
}
}break;
}
}
result += "\"";
return result;
}
通过代码可以明白的看到default:里面处理的就是包括中文在内的字符:于是我们可以修改源代码重新编译库。将:
default: {
unsigned int cp = utf8ToCodepoint(c, end);
// don't escape non-control characters
// (short escape sequence are applied above)
if (cp < 0x80 && cp >= 0x20)
result += static_cast<char>(cp);
else if (cp < 0x10000) { // codepoint is in Basic Multilingual Plane
result += "\\u";
result += toHex16Bit(cp);
}
else { // codepoint is not in Basic Multilingual Plane
// convert to surrogate pair first
cp -= 0x10000;
result += "\\u";
result += toHex16Bit((cp >> 10) + 0xD800);
result += "\\u";
result += toHex16Bit((cp & 0x3FF) + 0xDC00);
}
//result += *c;
}break;
改为:
default: {
result += *c;
}break;
最终结果为:
参考链接:
c++ jsoncpp使用toStyledString生成字符串中文乱码解决方案
二、含有\uXXXX解析乱码的解决方法
2.1、乱码展示
json文件如下:
解析结果:
2.2、乱码原因
之前改过valueToQuotedStringN函数,这个函数是将字符串转化为unicode编码,所以直接读取\uXXXX格式的字符串得到的其实是utf-8的字符串(如果读的是中文才是unicode编码)。所以这里需要额外的将字符串转化为unicode代码
2.3、解决方法
utf-8转unicode:
wstring UTF8ToUnicode(const string& str)
{
int len = 0;
len = str.length();
int unicodeLen = ::MultiByteToWideChar(CP_UTF8,
0,
str.c_str(),
-1,
NULL,
0);
wchar_t * pUnicode;
pUnicode = new wchar_t[unicodeLen + 1];
memset(pUnicode, 0, (unicodeLen + 1) * sizeof(wchar_t));
::MultiByteToWideChar(CP_UTF8,
0,
str.c_str(),
-1,
(LPWSTR)pUnicode,
unicodeLen);
wstring rt;
rt = (wchar_t*)pUnicode;
delete pUnicode;
return rt;
}
在程序中加入该函数,并调用:
std::string ws2s(const std::wstring& ws)
{
std::string curLocale = setlocale(LC_ALL, NULL);
setlocale(LC_ALL, "chs");
const wchar_t* _Source = ws.c_str();
size_t _Dsize = 2 * ws.size() + 1;
char *_Dest = new char[_Dsize];
memset(_Dest, 0, _Dsize);
wcstombs(_Dest, _Source, _Dsize);
std::string result = _Dest;
delete[]_Dest;
setlocale(LC_ALL, curLocale.c_str());
return result;
}
//调用
std::string content = root["Cnki"][i]["content"].toStyledString();
wstring wstr = UTF8ToUnicode(content);//将utf-8转化为unicode格式
cout << ws2s(wstr) << endl;
结果:
参考链接:
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
所有评论(0)