Heim  >  Artikel  >  Backend-Entwicklung  >  php实现ocr文字识别

php实现ocr文字识别

WBOY
WBOYOriginal
2016-07-30 13:29:324427Durchsuche

更多:http://www.webyang.net/Html/web/article_161.html

OCR的百度定义 (Optical Character Recognition,光学字符识别)是指电子设备(例如扫描仪或数码相机)检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程;即,针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术。

作为一个工程师,在实际编程中,可能需要把图片中的文字显示出来,这就需要用到ocr技术。因为php开发,所以优先选择php,找了php的ocr扩展测试了下,结果发现不可用(地址:http://sourceforge.net/projects/phpocr.berlios)?网上也看了很多朋友的demo,基本上原理都是对图片分解成0,1矩阵,然后根据特征,转化成相应的字符串。测试几个都是不可行的。然后看到别人说PHP搞OCR的很少,也不适合,语言效率太低,这种算法需要很高的效率。可以尝试C,MATLAB 等的OCR算法。搞matlab的玩OCR这类偏算法的很多。

无奈才虚学浅,不会C。无意中却发现百度有ocr的api提供:http://apistore.baidu.com/apiworks/servicedetail/146.html。

写了个玩下:

<ol>
<li value="1">
<span></span><span>php</span>
</li>
<li>
<span>header</span><span>(</span><span>"Content-type: text/html; charset=utf-8"</span><span>);</span>
</li>
<li><span> </span></li>
<li>
<span>function</span><span> curl</span><span>(</span><span>$img</span><span>)</span><span></span><span>{</span>
</li>
<li><span> </span></li>
<li>
<span>    $ch  </span><span>=</span><span> curl_init</span><span>();</span>
</li>
<li>
<span>    $url </span><span>=</span><span></span><span>'http://apis.baidu.com/apistore/idlocr/ocr'</span><span>;</span><span></span><span>//百度ocr api</span>
</li>
<li>
<span>    $header </span><span>=</span><span> array</span><span>(</span>
</li>
<li>
<span></span><span>'Content-Type:application/x-www-form-urlencoded'</span><span>,</span>
</li>
<li>
<span></span><span>'apikey:69c2ace1ef297ce88869f0751cb1b618'</span><span>,</span>
</li>
<li>
<span></span><span>);</span>
</li>
<li><span> </span></li>
<li>
<span>    $data_temp </span><span>=</span><span> file_get_contents</span><span>(</span><span>$img</span><span>);</span>
</li>
<li>
<span>    $data_temp </span><span>=</span><span> urlencode</span><span>(</span><span>base64_encode</span><span>(</span><span>$data_temp</span><span>));</span>
</li>
<li>
<span></span><span>//封装必要参数</span>
</li>
<li>
<span>    $data </span><span>=</span><span></span><span>"fromdevice=pc&clientip=127.0.0.1&detecttype=LocateRecognize&languagetype=CHN_ENG&imagetype=1&image="</span><span>.</span><span>$data_temp</span><span>;</span>
</li>
<li><span></span></li>
<li>
<span>    curl_setopt</span><span>(</span><span>$ch</span><span>,</span><span> CURLOPT_HTTPHEADER </span><span>,</span><span> $header</span><span>);</span><span></span><span>// 添加apikey到header</span>
</li>
<li>
<span>    curl_setopt</span><span>(</span><span>$ch</span><span>,</span><span> CURLOPT_POST</span><span>,</span><span></span><span>1</span><span>);</span>
</li>
<li>
<span>    curl_setopt</span><span>(</span><span>$ch</span><span>,</span><span> CURLOPT_POSTFIELDS</span><span>,</span><span> $data</span><span>);</span><span></span><span>// 添加参数</span>
</li>
<li>
<span>    curl_setopt</span><span>(</span><span>$ch</span><span>,</span><span> CURLOPT_RETURNTRANSFER</span><span>,</span><span></span><span>1</span><span>);</span>
</li>
<li>
<span>    curl_setopt</span><span>(</span><span>$ch </span><span>,</span><span> CURLOPT_URL </span><span>,</span><span> $url</span><span>);</span><span></span><span>// 执行HTTP请求</span>
</li>
<li>
<span>    $res </span><span>=</span><span> curl_exec</span><span>(</span><span>$ch</span><span>);</span>
</li>
<li>
<span></span><span>if</span><span></span><span>(</span><span>$res </span><span>===</span><span> FALSE</span><span>)</span><span></span><span>{</span>
</li>
<li>
<span>        echo </span><span>"cURL Error: "</span><span></span><span>.</span><span> curl_error</span><span>(</span><span>$ch</span><span>);</span>
</li>
<li>
<span></span><span>}</span>
</li>
<li>
<span>    curl_close</span><span>(</span><span>$ch</span><span>);</span>
</li>
<li><span></span></li>
<li>
<span>    $temp_var </span><span>=</span><span> json_decode</span><span>(</span><span>$res</span><span>,</span><span>true</span><span>);</span>
</li>
<li>
<span></span><span>return</span><span> $temp_var</span><span>;</span>
</li>
<li><span> </span></li>
<li><span>}</span></li>
<li><span> </span></li>
<li>
<span>$wordArr </span><span>=</span><span> curl</span><span>(</span><span>'4.jpg'</span><span>);</span>
</li>
<li>
<span>if</span><span>(</span><span>$wordArr</span><span>[</span><span>'errNum'</span><span>]</span><span></span><span>==</span><span></span><span>0</span><span>)</span><span></span><span>{</span>
</li>
<li>
<span>    var_dump</span><span>(</span><span>$wordArr</span><span>);</span>
</li>
<li>
<span>}</span><span></span><span>else</span><span></span><span>{</span>
</li>
<li>
<span>    echo </span><span>"识别出错:"</span><span>.</span><span>$wordArr</span><span>[</span><span>"errMsg"</span><span>];</span>
</li>
<li><span>}</span></li>
</ol>

测试了几张图片,准确度还是蛮高的。百分百的话,是不现实的~


版权声明:本文为博主原创文章,未经博主允许不得转载。

以上就介绍了php实现ocr文字识别,包括了方面的内容,希望对PHP教程有兴趣的朋友有所帮助。

Stellungnahme:
Der Inhalt dieses Artikels wird freiwillig von Internetnutzern beigesteuert und das Urheberrecht liegt beim ursprünglichen Autor. Diese Website übernimmt keine entsprechende rechtliche Verantwortung. Wenn Sie Inhalte finden, bei denen der Verdacht eines Plagiats oder einer Rechtsverletzung besteht, wenden Sie sich bitte an admin@php.cn