search

Home  >  Q&A  >  body text

php中使用tesseract识别验证码,并且模拟登录,验证码错误

代码如下:

<?php
header("Content-type:text/html;charset=utf-8");
/**
 * 模拟登录
 */

//1.初始化变量
$cookie_file = tempnam('./temp','cookie');
$login_url = "http://210.32.33.91:8080/reader/redr_verify.php";//登录页面
$verify_code_url = "http://210.32.33.91:8080/reader/captcha.php";//验证码页面

//2.获取cookies
echo "正在获取COOKIE...<br>";
$curl = curl_init();
$timeout = 5;
curl_setopt($curl, CURLOPT_URL, $login_url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($curl,CURLOPT_COOKIEJAR,$cookie_file); //获取COOKIE并存储
$contents = curl_exec($curl);
curl_close($curl);
//3.取出验证码
echo "COOKIE获取完成,正在取验证码...<br>";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $verify_code_url);
curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie_file);//保存cookie
curl_setopt($curl, CURLOPT_COOKIEFILE, $cookie_file);//使用cookie
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$img = curl_exec($curl);
curl_close($curl);

$codename = time();
$fp = fopen("/home/wwwroot/default/tesseract/Test/images/$codename.png","w");
echo "<img src='./images/$codename.png'>";
fwrite($fp,$img);
fclose($fp);
//开始识别验证码
echo "验证码取出完成,正在休眠,正在识别验证码...<br>";

passthru("/usr/bin/tesseract  /home/wwwroot/default/tesseract/Test/images/$codename.png /home/wwwroot/default/tesseract/Test/images/$codename");
$code = file_get_contents("./images/$codename.txt");

echo "验证码成功取出:$code<br>";

echo "正在准备模拟登录...<br>";

$post_url = "http://210.32.33.91:8080/reader/redr_verify.php";
//为安全性,此处密码不提供。
$post = "number=1111111&passwd=111111&captcha=$code&select=cert_no&returnUrl=";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $post_url);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER,1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $post);
curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($curl, CURLOPT_COOKIEFILE, $cookie_file);
$result=curl_exec($curl);
curl_close($curl);
echo str_replace('captcha.php','http://210.32.33.91:8080/reader/captcha.php',$result);
ringa_leeringa_lee2793 days ago604

reply all(2)I'll reply

  • 怪我咯

    怪我咯2017-04-10 16:45:23

    2016/1/25 14:51更新

    Linux的话captcha目录,cookies目录给写权限
    你一步步断点调试,看你那个程序生成的结果是多少,图片是多少,


    代码在:https://github.com/rainwsy/sf/tree/master/library-OCR-login


    更新:
    你应该
    1.将验证码存下来,跟文字结果对比下,
    2.对比每次的session_id是否一致
    3.CURLOPT_COOKIEJAR第一次用的时候存session_id就可以了,后面的操作用CURLOPT_COOKIEFILE来读取session_id,其实你可以对比下几次请求返回header头中的session_id是否一致
    我的验证码识别结果:

    写了个DEMO:

    指出几个问题,获取session在获取验证码的那一步一并获取就OK,没必要先取得session再取验证码,
    当看到账号密码隐藏的时候我再想这是给校友回答的吗?

    <?php
    require_once 'OCR.php';
    $loginUrl = "http://210.32.33.91:8080/reader/redr_verify.php"; // 登录页面
    $captchaUrl = "http://210.32.33.91:8080/reader/captcha.php"; // 验证码页面
    
    $cookie_file = __DIR__ . DIRECTORY_SEPARATOR . 'cookies' . DIRECTORY_SEPARATOR . date('YmdHis') . '.txt';
    
    // 获取验证码
    $captchaString = get($captchaUrl, $cookie_file, true);
    $tempCaptchaFile = __DIR__ . DIRECTORY_SEPARATOR . 'captcha' . DIRECTORY_SEPARATOR . date('YmdHis') . '.gif';
    file_put_contents($tempCaptchaFile, $captchaString);
    /*既然你说不是验证码的问题,我就不发出来了*/
    $ocr = new OCR($tempCaptchaFile);
    $captcha = $ocr->getCaptcha();
    
    /* 开始登陆 */
    $username = '用户名';
    $passwd = '密码';
    $postArray = [
        'number' => $username,
        'passwd' => $passwd,
        'captcha' => $captcha,
        'select' => 'cert_no',
        'returnUrl' => ''
    ];
    $postData = http_build_query($postArray);
    echo post($loginUrl, $postData, $cookie_file);
    
    function get($url, $cookie_file, $isCookiesSave = false)
    {
        // 初始化
        $curl = curl_init($url);
        $header = array();
        $header[] = 'User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36';
        curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
        // 不输出header头信息
        curl_setopt($curl, CURLOPT_HEADER, 0);
        if ($isCookiesSave) {
            curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie_file); // 存储cookies
        } else {
            curl_setopt($curl, CURLOPT_COOKIEFILE, $cookie_file);
        }
        // 保存到字符串而不是输出
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
        // 是否抓取跳转后的页面
        curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
        $info = curl_exec($curl);
        curl_close($curl);
        return $info;
    }
    
    function post($url, $data, $cookie_file)
    {
        // 初始化
        $curl = curl_init($url);
        $header = array();
        $header[] = 'User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36';
        curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
        // 不输出header头信息
        curl_setopt($curl, CURLOPT_HEADER, 0);
        // 保存到字符串而不是输出
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curl, CURLOPT_COOKIEFILE, $cookie_file);
        // post数据
        curl_setopt($curl, CURLOPT_POST, 1);
        // 请求数据
        curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
        // 是否抓取跳转后的页面
        curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
        $response = curl_exec($curl);
        curl_close($curl);
        return $response;
    }

    reply
    0
  • ringa_lee

    ringa_lee2017-04-10 16:45:23

    网上的评论是 识别率低

    reply
    0
  • Cancelreply