查找任何文件编码的有效方法

是的,这是一个最常见的问题,这对我来说是个模糊的问题,因为我对此并不了解。

但我想以一种非常精确的方式来查找文件编码。像Notepad ++一样精确。

回答:

StreamReader.CurrentEncoding属性很少为我返回正确的文本文件编码。通过分析文件的字节序标记(BOM),我在确定文件的字节序方面取得了更大的成功。如果文件没有BOM,则无法确定文件的编码。

*已更新4/08/2020,包括UTF-32LE检测并返回UTF-32BE的正确编码

/// <summary>

/// Determines a text file's encoding by analyzing its byte order mark (BOM).

/// Defaults to ASCII when detection of the text file's endianness fails.

/// </summary>

/// <param name="filename">The text file to analyze.</param>

/// <returns>The detected encoding.</returns>

public static Encoding GetEncoding(string filename)

{

// Read the BOM

var bom = new byte[4];

using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read))

{

file.Read(bom, 0, 4);

}

// Analyze the BOM

if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;

if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;

if (bom[0] == 0xff && bom[1] == 0xfe && bom[2] == 0 && bom[3] == 0) return Encoding.UTF32; //UTF-32LE

if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE

if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE

if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return new UTF32Encoding(true, true); //UTF-32BE

// We actually have no idea what the encoding is if we reach this point, so

// you may wish to return null instead of defaulting to ASCII

return Encoding.ASCII;

}

以上是 查找任何文件编码的有效方法 的全部内容, 来源链接: utcz.com/qa/410696.html

回到顶部