目录

一、中文乱码解决方法

1.1、乱码展示

1.2、乱码原因及解决方法

二、含有\uXXXX解析乱码的解决方法

2.1、乱码展示

2.2、乱码原因

2.3、解决方法


一、中文乱码解决方法

1.1、乱码展示

在使用jsoncpp解析含有中文的字符串的时候,使用toStyledString()函数生成的字符串中的中文部分将变成\u加4个16进制数字会出现解析乱码的情况。

比如:

1.2、乱码原因及解决方法

jsoncpp的源码来分析(官方下载地址:http://sourceforge.net/projects/jsoncpp/files/ )。通过分析StyledWriter的writeValue函数发现他对字符串的处理通过valueToQuotedStringN函数进行了转义:

static String valueToQuotedStringN(const char* value, unsigned length) {
  if (value == nullptr)
    return "";

  if (!isAnyCharRequiredQuoting(value, length))
    return String("\"") + value + "\"";
  // We have to walk value and escape any special characters.
  // Appending to String is not efficient, but this should be rare.
  // (Note: forward slashes are *not* rare, but I am not escaping them.)
  String::size_type maxsize = length * 2 + 3; // allescaped+quotes+NULL
  String result;
  result.reserve(maxsize); // to avoid lots of mallocs
  result += "\"";
  char const* end = value + length;
  for (const char* c = value; c != end; ++c) {
    switch (*c) {
    case '\"':
      result += "\\\"";
      break;
    case '\\':
      result += "\\\\";
      break;
    case '\b':
      result += "\\b";
      break;
    case '\f':
      result += "\\f";
      break;
    case '\n':
      result += "\\n";
      break;
    case '\r':
      result += "\\r";
      break;
    case '\t':
      result += "\\t";
      break;
    // case '/':
    // Even though \/ is considered a legal escape in JSON, a bare
    // slash is also legal, so I see no reason to escape it.
    // (I hope I am not misunderstanding something.)
    // blep notes: actually escaping \/ may be useful in javascript to avoid </
    // sequence.
    // Should add a flag to allow this compatibility mode and prevent this
    // sequence from occurring.
	default: {
		unsigned int cp = utf8ToCodepoint(c, end);
		// don't escape non-control characters
		// (short escape sequence are applied above)
		if (cp < 0x80 && cp >= 0x20)
			result += static_cast<char>(cp);
		else if (cp < 0x10000) { // codepoint is in Basic Multilingual Plane
			result += "\\u";
			result += toHex16Bit(cp);
		}
		else { // codepoint is not in Basic Multilingual Plane
			   // convert to surrogate pair first
			cp -= 0x10000;
			result += "\\u";
			result += toHex16Bit((cp >> 10) + 0xD800);
			result += "\\u";
			result += toHex16Bit((cp & 0x3FF) + 0xDC00);
		}

		}break;
	}
  }
  result += "\"";
  return result;
}

通过代码可以明白的看到default:里面处理的就是包括中文在内的字符:于是我们可以修改源代码重新编译库。将:

	default: {
		unsigned int cp = utf8ToCodepoint(c, end);
		// don't escape non-control characters
		// (short escape sequence are applied above)
		if (cp < 0x80 && cp >= 0x20)
			result += static_cast<char>(cp);
		else if (cp < 0x10000) { // codepoint is in Basic Multilingual Plane
			result += "\\u";
			result += toHex16Bit(cp);
		}
		else { // codepoint is not in Basic Multilingual Plane
			   // convert to surrogate pair first
			cp -= 0x10000;
			result += "\\u";
			result += toHex16Bit((cp >> 10) + 0xD800);
			result += "\\u";
			result += toHex16Bit((cp & 0x3FF) + 0xDC00);
		}

			//result += *c;
			
		}break;

改为:

	default: {
			result += *c;
    }break;

最终结果为:

参考链接:

c++ jsoncpp使用toStyledString生成字符串中文乱码解决方案

二、含有\uXXXX解析乱码的解决方法

2.1、乱码展示

json文件如下:

解析结果:

2.2、乱码原因

之前改过valueToQuotedStringN函数,这个函数是将字符串转化为unicode编码,所以直接读取\uXXXX格式的字符串得到的其实是utf-8的字符串(如果读的是中文才是unicode编码)。所以这里需要额外的将字符串转化为unicode代码

2.3、解决方法

utf-8转unicode:

wstring UTF8ToUnicode(const string& str)
{
	int len = 0;
	len = str.length();
	int unicodeLen = ::MultiByteToWideChar(CP_UTF8,
		0,
		str.c_str(),
		-1,
		NULL,
		0);
	wchar_t * pUnicode;
	pUnicode = new wchar_t[unicodeLen + 1];
	memset(pUnicode, 0, (unicodeLen + 1) * sizeof(wchar_t));
	::MultiByteToWideChar(CP_UTF8,
		0,
		str.c_str(),
		-1,
		(LPWSTR)pUnicode,
		unicodeLen);
	wstring rt;
	rt = (wchar_t*)pUnicode;
	delete pUnicode;
	return rt;
}

在程序中加入该函数,并调用:

std::string ws2s(const std::wstring& ws)
{
	std::string curLocale = setlocale(LC_ALL, NULL);     
	setlocale(LC_ALL, "chs");
	const wchar_t* _Source = ws.c_str();
	size_t _Dsize = 2 * ws.size() + 1;
	char *_Dest = new char[_Dsize];
	memset(_Dest, 0, _Dsize);
	wcstombs(_Dest, _Source, _Dsize);
	std::string result = _Dest;
	delete[]_Dest;
	setlocale(LC_ALL, curLocale.c_str());
	return result;
}


//调用
std::string content = root["Cnki"][i]["content"].toStyledString();
wstring wstr = UTF8ToUnicode(content);//将utf-8转化为unicode格式	
cout << ws2s(wstr) << endl;	

结果:

参考链接:

C++ STRING 和WSTRING 之间的互相转换函数

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐