search
HomeBackend DevelopmentPHP TutorialPHP code for remotely grabbing website images and saving them

Example, PHP code to capture website data.

  1. /**
  2. * A class for grabbing images
  3. *
  4. * @package default
  5. * @author WuJunwei
  6. */
  7. class download_image
  8. {
  9. public $save_path; //The save address of the captured image
  10. //The size limit of the captured image (unit: Bytes) Only capture images larger than size than this limit
  11. public $img_size=0;
  12. //Define a static array to record the hyperlink addresses that have been crawled to avoid repeated crawling
  13. public static $ a_url_arr=array();
  14. /**
  15. * @param String $save_path The save address of the captured image
  16. * @param Int $img_size The save address of the captured image
  17. */
  18. public function __construct($save_path,$img_size)
  19. {
  20. $this->save_path=$save_path;
  21. $this->img_size=$img_size ;
  22. }
  23. /**
  24. * Method of recursively downloading and capturing images of the homepage and its subpages (recursive)
  25. *
  26. * @param String $capture_url URL used to capture images
  27. *
  28. */
  29. public function recursive_download_images($capture_url)
  30. {
  31. if (!in_array($capture_url,self::$a_url_arr)) //Not captured
  32. {
  33. self: :$a_url_arr[]=$capture_url; //Counted into static array
  34. } else //After capture, exit the function directly
  35. {
  36. return;
  37. }
  38. $this->download_current_page_images($capture_url); //Download All pictures on the current page
  39. //Use @ to block warning errors caused by the inability to read the capture address
  40. $content=@file_get_contents($capture_url);
  41. //Match the regular pattern before ? in the href attribute of the a tag
  42. $a_pattern = "|]+href=['" ]?([^ '"?]+)['" >]|U";
  43. preg_match_all($a_pattern, $content, $a_out, PREG_SET_ORDER);
  44. $tmp_arr=array(); //Define an array to store the hyperlink address of the image captured under the current loop
  45. foreach ($a_out as $k => $v)
  46. {
  47. /**
  48. * Remove empty '', '#', '/' and duplicate values ​​​​in hyperlinks
  49. * 1: The value of the hyperlink address cannot be equal to the url of the current crawled page, otherwise it will fall into an infinite loop
  50. * 2: Hyperlink is '' or '#', '/' is also this page, which will also fall into an infinite loop,
  51. * 3: Sometimes a hyperlink address will appear multiple times in a web page. If it is not removed, it will cause damage to a sub-page. for repeated downloads)
  52. */
  53. if ( $v[1] && !in_array($v[1],self::$a_url_arr) &&!in_array($v[1],array('#',' /',$capture_url) ) )
  54. {
  55. $tmp_arr[]=$v[1];
  56. }
  57. }
  58. foreach ($tmp_arr as $k => $v)
  59. {
  60. //Hyperlink path address
  61. if ( strpos($v, 'http://')!==false ) //If the url contains http://, you can access it directly
  62. {
  63. $a_url = $v;
  64. }else //Otherwise the proof is Relative address, the access address of the hyperlink needs to be reassembled
  65. {
  66. $domain_url = substr($capture_url, 0,strpos($capture_url, '/',8)+1);
  67. $a_url=$domain_url.$v;
  68. }
  69. $this->recursive_download_images($a_url);
  70. }
  71. }
  72. /**
  73. * Download all images under the current webpage
  74. *
  75. * @param String $capture_url The webpage address used to capture images
  76. * @return Array An array of the url addresses of the img tags of all images on the current webpage
  77. */
  78. public function download_current_page_images($capture_url)
  79. {
  80. $content=@file_get_contents($capture_url); / /Shield warning errors
  81. // Match the regular part before ? in the src attribute of the img tag
  82. $img_pattern = "|PHP code for remotely grabbing website images and saving them]+src=['" ]?([^ '"?]+) ['" > ;'.$capture_url . "Total found" . $photo_num . " pictures";
  83. foreach ($img_out as $k => $v)
  84. {
  85. $this->save_one_img($capture_url ,$v[1]);
  86. }
  87. }
  88. /**
  89. * Method to save a single image
  90. *
  91. * @param String $capture_url The webpage address used to capture the image
  92. * @param String $img_url The url of the image that needs to be saved
  93. *
  94. */
  95. public function save_one_img($capture_url,$img_url)
  96. {
  97. //Picture path address
  98. if ( strpos($img_url, 'http://')!==false )
  99. {
  100. // $img_url = $img_url;
  101. }else
  102. {
  103. $domain_url = substr($capture_url, 0,strpos($capture_url, '/',8)+1);
  104. $img_url=$domain_url.$img_url ;
  105. }
  106. $pathinfo = pathinfo($img_url); //Get the picture path information
  107. $pic_name=$pathinfo['basename']; //Get the name of the picture
  108. if (file_exists($this->save_path.$ pic_name)) //If the image exists, it proves that it has been captured, exit the function
  109. {
  110. echo $img_url . 'The image has been captured !
    ';
  111. return;
  112. }
  113. //Read the image content into a string
  114. $img_data = @file_get_contents($img_url); //Block because the image address cannot be read Get the warning error caused by
  115. if ( strlen($img_data) > $this->img_size ) //Download pictures whose size is larger than the limit
  116. {
  117. $img_size = file_put_contents($this->save_path . $pic_name, $ img_data);
  118. if ($img_size)
  119. {
  120. echo $img_url . 'Image saved successfully!
    ';
  121. } else
  122. {
  123. echo $img_url . 'Failed to save image!
    ';
  124. }
  125. } else
  126. {
  127. echo $img_url . 'Image reading failed!
    ';
  128. }
  129. }
  130. } // END
  131. set_time_limit(120); //Set the maximum execution time of the script according to the situation
  132. $download_img=new download_image('E:/images/',0); //Instantiate the download image object
  133. $download_img->recursive_download_images('http://bbs.it-home.org/'); //Recursive capture image method
  134. //$download_img->download_current_page_images($_POST['capture_url']); / /Method to only grab the current page pictures
  135. ?>
Copy code


Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
How can you check if a PHP session has already started?How can you check if a PHP session has already started?Apr 30, 2025 am 12:20 AM

In PHP, you can use session_status() or session_id() to check whether the session has started. 1) Use the session_status() function. If PHP_SESSION_ACTIVE is returned, the session has been started. 2) Use the session_id() function, if a non-empty string is returned, the session has been started. Both methods can effectively check the session state, and choosing which method to use depends on the PHP version and personal preferences.

Describe a scenario where using sessions is essential in a web application.Describe a scenario where using sessions is essential in a web application.Apr 30, 2025 am 12:16 AM

Sessionsarevitalinwebapplications,especiallyfore-commerceplatforms.Theymaintainuserdataacrossrequests,crucialforshoppingcarts,authentication,andpersonalization.InFlask,sessionscanbeimplementedusingsimplecodetomanageuserloginsanddatapersistence.

How can you manage concurrent session access in PHP?How can you manage concurrent session access in PHP?Apr 30, 2025 am 12:11 AM

Managing concurrent session access in PHP can be done by the following methods: 1. Use the database to store session data, 2. Use Redis or Memcached, 3. Implement a session locking strategy. These methods help ensure data consistency and improve concurrency performance.

What are the limitations of using PHP sessions?What are the limitations of using PHP sessions?Apr 30, 2025 am 12:04 AM

PHPsessionshaveseverallimitations:1)Storageconstraintscanleadtoperformanceissues;2)Securityvulnerabilitieslikesessionfixationattacksexist;3)Scalabilityischallengingduetoserver-specificstorage;4)Sessionexpirationmanagementcanbeproblematic;5)Datapersis

Explain how load balancing affects session management and how to address it.Explain how load balancing affects session management and how to address it.Apr 29, 2025 am 12:42 AM

Load balancing affects session management, but can be resolved with session replication, session stickiness, and centralized session storage. 1. Session Replication Copy session data between servers. 2. Session stickiness directs user requests to the same server. 3. Centralized session storage uses independent servers such as Redis to store session data to ensure data sharing.

Explain the concept of session locking.Explain the concept of session locking.Apr 29, 2025 am 12:39 AM

Sessionlockingisatechniqueusedtoensureauser'ssessionremainsexclusivetooneuseratatime.Itiscrucialforpreventingdatacorruptionandsecuritybreachesinmulti-userapplications.Sessionlockingisimplementedusingserver-sidelockingmechanisms,suchasReentrantLockinJ

Are there any alternatives to PHP sessions?Are there any alternatives to PHP sessions?Apr 29, 2025 am 12:36 AM

Alternatives to PHP sessions include Cookies, Token-based Authentication, Database-based Sessions, and Redis/Memcached. 1.Cookies manage sessions by storing data on the client, which is simple but low in security. 2.Token-based Authentication uses tokens to verify users, which is highly secure but requires additional logic. 3.Database-basedSessions stores data in the database, which has good scalability but may affect performance. 4. Redis/Memcached uses distributed cache to improve performance and scalability, but requires additional matching

Define the term 'session hijacking' in the context of PHP.Define the term 'session hijacking' in the context of PHP.Apr 29, 2025 am 12:33 AM

Sessionhijacking refers to an attacker impersonating a user by obtaining the user's sessionID. Prevention methods include: 1) encrypting communication using HTTPS; 2) verifying the source of the sessionID; 3) using a secure sessionID generation algorithm; 4) regularly updating the sessionID.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.