Home  >  Article  >  Backend Development  >  PHP regular capture of images under the entire domain name_PHP tutorial

PHP regular capture of images under the entire domain name_PHP tutorial

WBOY
WBOYOriginal
2016-07-20 11:12:42765browse

Code source: jUnion

Applicable platforms: Windows, Linux (Ubuntu), php-5.2.5+, Apache

Function: Capture pictures of the entire site. Currently, there is no curl plug-in development using PHP. It will be improved later

Configuration: config directory
domain_name: domain name (default: bizhibar.com)
request_site: website URL (default: http://www.bizhibar.com/)
request_url: Which page of the website to start from (default: http://www.bizhibar.com/)
Accept_type: Image type (default: gif, bmp, png, ico, jpg, jpeg)
Save_path: Picture saving path (default: savefiles/)
Partition_name: Image saving directory name prefix (default: img_)
        dir_file_limit: How many files each directory allows (default: 100)
Serialize_img_size: How many image addresses are read before they are cached in the accompImg file in the cache directory. These addresses will be ignored the next time you continue to crawl. (Default: 30)
Serialize_url_size: Same as serialize_url_size, how many link addresses have been read before caching to the cache directory
The overURL under the URL will be ignored when crawling next time. (Default: 10)

Note: I welcome your criticism and advice. If you have any new questions or areas that need improvement, please give me feedback

<?php
set_time_limit(0);
require dirname(__FILE__).DIRECTORY_SEPARATOR.&#39;include&#39;.DIRECTORY_SEPARATOR.&#39;Capture.const.php&#39;;
require __Home__.&#39;include&#39;.__Os__.&#39;Capture.class.php&#39;;

$_cfg = array(
	&#39;site&#39; => __Home__.&#39;config&#39;.__Os__.&#39;capture.site.php&#39;,
	&#39;preg&#39; => __Home__.&#39;config&#39;.__Os__.&#39;capture.preg.php&#39;,
	&#39;accompImg&#39; => __Home__.&#39;cache&#39;.__Os__.&#39;accompImg&#39;,
	&#39;overURL&#39;   => __Home__.&#39;cache&#39;.__Os__.&#39;overURL&#39;
);

$_parse = new Capture( $_cfg );
$_parse->parseQuestUrl();

?>
<?php
/**
 * The main class
 * @author pankai<530911044@qq.com>
 * @date 2013-08-10
 */
class Capture {
	private static $_Config = array();
	
	private static $_CapSite = NULL;
	private static $_CapPreg = NULL;
	
	private static $_overURL = array();
	
	private $_mark = FALSE;
	private static $_markTime = 1;
	/**
	 * initialize the main class: Capture
	 * @param $_cfg array
	 */
	public function __construct( &$_cfg ) {
		self::$_Config = &$_cfg;
		
		self::$_CapSite = require $_cfg[&#39;site&#39;];
		self::$_CapPreg = require $_cfg[&#39;preg&#39;];
		
		foreach( self::$_CapPreg as $_key => $_value ) {
			self::$_CapPreg[$_key] = str_replace( &#39;_request_site&#39;, self::$_CapSite[&#39;request_site&#39;], $_value );
		}
		
		self::import( &#39;file.OperateFile&#39; );
		if( file_exists( $_cfg[&#39;overURL&#39;] ) && filesize( $_cfg[&#39;overURL&#39;] ) > 0 ) {
			$_contents = OperateFile::readText( $_cfg[&#39;overURL&#39;], filesize( $_cfg[&#39;overURL&#39;] ) );
			self::$_overURL = unserialize( $_contents );
		}
		
		self::import(&#39;pivotal.Pivotal&#39;);
		if( file_exists( $_cfg[&#39;accompImg&#39;] ) && filesize( $_cfg[&#39;accompImg&#39;] ) > 0 ) {
			$_contents = OperateFile::readText( $_cfg[&#39;accompImg&#39;], filesize( $_cfg[&#39;accompImg&#39;] ) );
			Pivotal::$_accompImg = unserialize( $_contents );
		}
		
	}
	/**
	 * load class, follow Java pragrammer(package): import com.jUnion.Capture
	 * @param $_class
	 */
	public static function import( $_class ) {
		require_once __Home__.&#39;include&#39;.__Os__.str_replace( &#39;.&#39;, __Os__, $_class ).&#39;.class.php&#39;;
	}
	
	/**
	 * create an instance of Pivotal class
	 * @param $_source
	 */
	private function getCapInstance( &$_source ) {
		$this->_mark = FALSE;
		
		$_Captal = new Pivotal( self::$_Config, $_source );
		$_tagA = $_Captal->parseUrl();
		
		$this->_mark = TRUE;
		
		return $_tagA;
	}
	
	/**
	 * go forward one by one
	 * @param $_tagArr
	 */
	private function roundTagA( &$_tagArr ) {
		if( $_tagArr == NULL ) {
			return;
		}
		$_tagArrLength = count( $_tagArr );
		for( $i = 0; $i < $_tagArrLength; $i ++ ) {
			if( is_array( $_tagArr[ $i ] ) ) {
				$this->roundTagA( $_tagArr[ $i ] );  
			}
			else {
				if( stripos( $_tagArr[$i], self::$_CapSite[&#39;domain_name&#39;] )
					=== FALSE ) {
						continue;
					}
				if( in_array( $_tagArr[$i], self::$_overURL ) ) {
					continue;
				}
				self::$_overURL[] = $_tagArr[$i];
				if( count( self::$_overURL ) % self::$_CapSite[&#39;serialize_url_size&#39;] == 0 ) {
					OperateFile::setText( self::$_Config[&#39;overURL&#39;], serialize( self::$_overURL ) );
				}
				do {
					$_tagA = $this->getCapInstance( Http::get( $_tagArr[$i] ) );
					sleep( self::$_CapSite[&#39;preform_page_time&#39;] * self::$_markTime );
					if( $this->_mark === TRUE ) {
						self::$_markTime = self::$_CapSite[&#39;preform_page_time&#39;];
						break;
					}
					self::$_markTime *= 2;
				} while( true );
				/* parse the main page and return next page */
				$this->roundTagA( $_tagA );
			}
		}
	}
	//www.bkjia.com
	public function parseQuestUrl() {
		self::import(&#39;http.Http&#39;);
		$_round_Arr = $this->getCapInstance( Http::get( self::$_CapSite[&#39;request_url&#39;] ) );
		$this->roundTagA( $_round_Arr ); 
	}
}

?>

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/444554.htmlTechArticleCode source: jUnion Applicable platforms: Windows, Linux (Ubuntu), php-5.2.5+, Apache Function: To capture images of the entire site, there is currently no curl plug-in development using PHP. The configuration will be improved later:...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn