Home > Article > Backend Development > PHP Regular Expressions: How to match all JavaScript code in HTML
In web development, JavaScript is often used to implement some functions. In HTML pages, JavaScript code snippets are usually embedded in 3f1c4e4b6b16bbbd69b2ee476dc4f83a
tags, but sometimes script snippets are not placed in the standard 3f1c4e4b6b16bbbd69b2ee476dc4f83a
tags, but It exists in the attributes of other HTML elements, such as onclick
, onload
, etc.
If we want to find all the JavaScript code snippets in the HTML page, we can use PHP's regular expression to match.
Regular expression (regular expression) is a grammatical rule used to describe string patterns. In PHP, use /
symbols to wrap regular expressions, such as /pattern/
, where pattern
represents the pattern to be matched.
Commonly used regular expression metacharacters include:
.
: Matches any single character *
: Match zero or more instances of the previous character
: Match one or more instances of the previous character ?
: Match before One or zero instances of a character |
: Selects to match one of the items in the string d
: Matches the digit w
: Matches letters, numbers, and underscores s
: Matches whitespace characters such as spaces, tabs, newlines, etc. First, we can use the preg_match_all
function to match all 3f1c4e4b6b16bbbd69b2ee476dc4f83a
tags in the HTML page:
$html = file_get_contents('example.html'); // 获取 HTML 文件内容 $pattern = "/<script(.*?)>(.*?)</script>/is"; // 匹配 script 标记的正则表达式 preg_match_all($pattern, $html, $matches); // 执行匹配
In the above code, we use the file_get_contents
function to get the contents of an HTML file, and then use the regular expression/f4fd8c3eec17f88bd2bc2649b35d067f(.*?)< ;/script>/is
Matches the content of all 3f1c4e4b6b16bbbd69b2ee476dc4f83a
tags in the HTML page and stores the matching results in the $matches
array.
However, this only gets the JavaScript code contained in the 3f1c4e4b6b16bbbd69b2ee476dc4f83a
tag, not the code in other attributes.
First, we need to know the name of the attribute that contains the JavaScript code. For example, JavaScript code for a click event might exist in the onclick
attribute, and JavaScript code for other events might exist in onload
, onsubmit
, onchange
and other attributes.
We can use PHP's built-in get_meta_tags
function to get all the meta tags of the HTML page and analyze their attributes to find out the attribute names containing JavaScript code:
$html = file_get_contents('example.html'); // 获取 HTML 文件内容 $meta_tags = get_meta_tags('data://text/html;base64,' . base64_encode($html)); // 获取元标记信息 $pattern = "/on[a-z]+=['"](.*?)['"]/i"; // 匹配属性中的 JavaScript 代码的正则表达式 $matches = array(); // 存储匹配结果 foreach($meta_tags as $tag=>$value) { // 遍历元标记 if(preg_match_all($pattern, $value, $submatches)) { // 匹配属性中的 JavaScript 代码 $matches = array_merge($matches, $submatches[1]); // 合并匹配结果 } }
Above In the code, we use the get_meta_tags
function to get the meta tags of the HTML page. Then, we use the regular expression "/on[a-z] =['"](.*?)['"]/i"
to match all attribute names starting with on
Properties that contain JavaScript code. Finally, we use the preg_match_all
function to store the matched results in the $matches
array.
Through the above two steps, we have successfully found all the JavaScript code in the HTML page. Now, we need to combine these code snippets into a string that can be easily processed.
$html = file_get_contents('example.html'); // 获取 HTML 文件内容 $script_pattern = "/<script(.*?)>(.*?)</script>/is"; $attr_pattern = "/on[a-z]+=['"](.*?)['"]/i"; preg_match_all($script_pattern, $html, $script_matches); // 匹配 script 标记中的代码 $attr_matches = array(); // 存储属性中的代码 $meta_tags = get_meta_tags('data://text/html;base64,' . base64_encode($html)); // 获取元标记 foreach($meta_tags as $tag=>$value) { // 遍历元标记 if(preg_match_all($attr_pattern, $value, $submatches)) { // 匹配属性中的代码 $attr_matches = array_merge($attr_matches, $submatches[1]); } } $all_script = implode(" ", array_merge($script_matches[2], $attr_matches)); // 合并所有代码为一个字符串
In the above code, we use the implode
function to merge all the JavaScript code snippets in $script_matches[2]
and $attr_matches
into A string using newline characters to separate each code fragment for further processing.
The above is the detailed content of PHP Regular Expressions: How to match all JavaScript code in HTML. For more information, please follow other related articles on the PHP Chinese website!