Home  >  Article  >  Backend Development  >  PHP Regular Expression: How to match all image links in HTML

PHP Regular Expression: How to match all image links in HTML

PHPz
PHPzOriginal
2023-06-23 11:17:332245browse

In HTML pages, we often need to extract image links for use on other occasions, or do some image downloading, batch processing, etc. At this time, PHP regular expressions can help us quickly and accurately match all image links.

1. Analysis of image links in HTML

In HTML, image links usually appear in the form of a1f02c36ba31691bcfe87b2722de723b tags, and their format is as follows:

<img src="image.jpg" alt="图片">

Among them, The src attribute specifies the link address of the image. Generally, the formats of image links are as follows:

  1. Relative link: /images/picture.jpg
  2. Absolute link: https://www.example.com/ images/picture.jpg
  3. Link with parameters: https://www.example.com/images/picture.jpg?size=large
  4. Relative path link: ../images/picture .jpg

We need to write regular expressions to match these four link formats.

2. PHP regular expression matching image link

There are many kinds of regular expression functions in PHP, among which preg_match() is the most commonly used one and can be used to match from text The specified string. The following is a regular expression that can match the above four image link formats:

$pattern = '/<img.+?src=['"](.+?)['"].*?>/';

This regular expression consists of multiple parts. Let’s explain them one by one:

  1. < ;img. ?src= matches the a1f02c36ba31691bcfe87b2722de723b tag and is positioned before the src attribute. Among them, . ? means non-greedy matching of any character until src is encountered.
  2. ['"] means quotation marks, which can match double quotation marks or single quotation marks.
  3. (. ?) means matching any character until the next quotation mark is encountered. A capturing group is used here, which can Use the $matches array call in subsequent code.
  4. .*? means non-greedy matching of any character until the > symbol.

Next, we use the preg_match() function to Extract all image links in HTML:

$html = file_get_contents('example.html'); // 读取 HTML 文件
preg_match_all($pattern, $html, $matches); // 匹配链接
$imgUrls = $matches[1]; // 获取匹配到的链接地址

In this way, we can get an array $imgUrls containing all image links. If you want to only match image links in a certain format, you can do it in a regular expression Some modifications, such as matching only absolute links:

$pattern = '/<img.+?src=['"](https?://.+?)['"].*?>/';

This regular expression increases the restriction of http or https protocol headers and only matches absolute links starting with these two protocols.

Summary

Using PHP regular expressions to match image links in HTML is not a complicated matter. You only need to write the corresponding regular expression according to the link format, and then use the preg_match() function to quickly and accurately extract the All links. If you often need to extract other content from HTML, you can also achieve it through a similar method.

The above is the detailed content of PHP Regular Expression: How to match all image links in HTML. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn