当前位置：首页 > news >正文

【蛋疼c++】千万别用std::wifstream读取Unicode UTF16文件

news 2025/8/20 16:40:36

上当了。

最近程序要和 Jscript / activex 脚本通信。

ActiveX这玩意，导出文件，如果是UTF8导出，会出现莫名异常：写一半直接退出。或许是系统语言设置的问题。

但是切换为utf16（unicode）导出就没有问题：

OpenTextFile method (Visual Basic for Applications) | Microsoft Learn

var fso = new ActiveXObject("Scripting.FileSystemObject");
var file = fso.CreateTextFile("", true, true );

然而蛋疼还没有完。在C++程序中，UTF8文件直接用 stf::ifstream 读进来就可以。

std::ifstream file(L"");if (file.is_open()) {std::string line;while (std::getline(file, line)) {...}}file.close();

但 UTF16 却不能直接用 std::wifstream 读取。 StackOverflow 上有人说，需要告知 std::wifstream 编码格式。 c++ 标准库才会跳过bom、进行逐行解码。

有人整理如下：(18)用std::wifstream读取Unicode文本-CSDN博客

结果第二天就出现问题。一些特殊表情符号直接空白（比如：🍓）。或许是，这个办法不支持UTF16的surrogate pair，四个字节的符号。

我勒个去，编码直接变没了！

震惊！

立马鞭策chatgpt，让他写个直接读到 TCHAR* 数组里的替代办法，一点问题没有！c++还真是蛋疼啊~

TCHAR* ReadUTF16File(const TCHAR* filePath) {HANDLE hFile = CreateFile(filePath, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);if (hFile == INVALID_HANDLE_VALUE) {// Handle file opening errorreturn NULL;}DWORD fileSize = GetFileSize(hFile, NULL);TCHAR* buffer = new TCHAR[fileSize / sizeof(TCHAR) + 1];DWORD bytesRead = 0;ReadFile(hFile, buffer, fileSize, &bytesRead, NULL);buffer[fileSize / sizeof(TCHAR)] = '\0';CloseHandle(hFile);return buffer;
}

std::vector<std::wstring> _args;
QkString ln;if(StrCmpN(_args[i].c_str(), L"-loadArgsW", 10)==0) {TCHAR* all = ReadUTF16File(_args[i].c_str()+11)+1;TCHAR* current = all;TCHAR* next = nullptr;while ((next = _tcschr(current, _T('\n'))) != nullptr) {// Process the line from current to nextln.Empty();ln.Append(current, next-current);//*next = _T('\0');//_tprintf(_T("%s\n"), current);*next = _T('\n'); // Restore the newline charactercurrent = next + 1; // Move to the character after the newlineln.Trim();_args.push_back(ln.GetData());}if (*current != _T('\0')) {//_tprintf(_T("%s\n"), current);ln = current;ln.Trim();_args.push_back(ln.GetData());}}