Home  >  Article  >  php教程  >  用PHP判断文件是否是UTF-8编码(检查Bom)

用PHP判断文件是否是UTF-8编码(检查Bom)

WBOY
WBOYOriginal
2016-06-13 11:36:281029browse

UTF-8编码的文件分为带Bom和不带Bom两种,带Bom的大家都很容易处理,不带Bom的会有点麻烦,所以写了一个函数去判断,代码如下:

//返回 1 表示纯 ASCII(即是所有字符都不大于127)
//返回 2 表示UTF8
//返回 0 表示正常gb编码

function TestUtf8($text)
{
if(strlen($text) $lastch = 0;
$begin = 0;
$BOM = true;
$BOMchs = array(0xEF, 0xBB, 0xBF);
$good = 0;
$bad = 0;
$notAscii = 0;
for($i=0; $i {
$ch = ord($text[$i]);
if($begin {
$BOM = ($BOMchs[$begin]==$ch);
$begin += 1;
continue;
}

if($begin==4 && $BOM) break;

if($ch >= 0x80 ) $notAscii++;

if( ($ch&0xC0) == 0x80 )
{
if( ($lastch&0xC0) == 0xC0 )
{
$good += 1;
}
else if( ($lastch&0x80) == 0 )
{
$bad += 1;
}
}
else if( ($lastch&0xC0) == 0xC0 )
{
$bad += 1;
}
$lastch = $ch;
}
if($begin == 4 && $BOM)
{
return 2;
}
else if($notAscii==0)
{
return 1;
}
else if ($good >= $bad )
{
return 2;
}
else
{
return 0;
}
}

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn