How to match Chinese characters with UTF-8 regular expression, utf-8 regular expression

Home

Backend Development

PHP Tutorial

How to match Chinese characters with UTF-8 regular expression, utf-8 regular expression_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 13, 2016 am 09:45 AM

utf-8regular expression

How does UTF-8 regular expression match Chinese characters? UTF-8 regular expression

determines whether the input content contains illegal characters. Please see the code below

$str = "编程";
// if(!preg_match("/^[\x{4e00}-\x{9fa5}A-Za-z0-9_]+$/u",$str)) 
//UTF-8汉字字母数字下划线正则表达式
if(!preg_match("/^[\x{4e00}-\x{9fa5}]+$/u",$str)) //UTF-8汉字字母数字下划线正则表达式
 { 
  echo "<font color=red>您输入的[".$str."]含有违法字符</font>"; 
 }
 else 
 {
  echo "<font color=green>您输入的[".$str."]完全合法,通过!</font>"; 

 }

-----------------------

UTF-8 matches:
In JavaScript, it is very simple to determine whether a string is Chinese.

For example:

Copy code The code is as follows:
var str = "php programming";
if (/^[u4e00-u9fa5] $/.test(str))

{ alert("The string is all in Chinese");

}
else{ alert("This string is not all in Chinese");
}

In PHP, x is used to represent hexadecimal data.

So, transform it into the following code:

Copy code The code is as follows:
$str = "php programming";
if (preg_match("/^[x4e00-x9fa5] $/",$str))
{
print("This string is all in Chinese");
}
else { print("This string is not all Chinese");
}

It seems that no error is reported and the judgment result is correct. However, if $str is replaced with the word "programming", the result still shows "the string is not all in Chinese". It seems that this judgment is not accurate enough.
Important:

Looked up "Proficient in Regular Expressions" and found that for [x4e00-x9fa5], I made an enhanced explanation myself
In PHP's regular expression, [x4e00-x9fa5] is actually the concept of characters and character groups. x{hex} expresses a hexadecimal number. It should be noted that hex can be 1-2 digits or 4 digits. Yes, but if it is 4 digits, curly brackets must be added,
At the same time, if it is a hex larger than x{FF}, it must be used together with the u modifier, otherwise an illegal error will occur

You can only find regular rules for matching full-width characters on the Internet: ^[x80-xff]*^/ , you can match Chinese without adding braces [u4e00-u9fa5], but PHP does not support it. However, since x represents ten Why is hexadecimal data different from the range x4e00-x9fa5 provided in js?

So I changed to the code below and found that it was really accurate:

Copy code The code is as follows:
$str = "php programming";
if (preg_match("/^[x{4e00}-x{9fa5}] $/u",$str))
{
print("This string is all in Chinese");
}
else { print("This string is not all Chinese");
}

I know the final correct expression for using regular expressions to match Chinese characters under UTF-8 encoding in PHP - /^[x{4e00}-x{9fa5}] $/u. I wrote the following test code with reference to the above article (copy Save the following code as a .php file)

<&#63;php $action = trim($_GET['action']);

 if($action == "sub") { 

 $str = $_POST['dir'];  

 //if(!preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str)) //GB2312汉字字母数字下划线正则表达式  

 if(!preg_match("/^[\x{4e00}-\x{9fa5}A-Za-z0-9_]+$/u",$str)) 

 //UTF-8汉字字母数字下划线正则表达式 

 {   

echo "<font color=red>您输入的[".$str."]含有违法字符</font>";  

 }  

else  

{  

 echo "<font color=green>您输入的[".$str."]完全合法,通过!</font>";  

 } } 

&#63;<form method="POST" action="&#63;action=sub"> 输入字符(数字,字母,汉字,下划线): 

 <input type="text" name="dir" value=""> 

 <input type="submit" value="提交"> 

</form>

GBK:

Copy code The code is as follows:
preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_] $/",$str); //GB2312 Chinese character alphanumeric underline regular expression

The above content is all about how to match Chinese characters with UTF-8 regular expression in PHP. I hope you like it.

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

php怎么去除字符串中的所有大写字母Sep 26, 2022 pm 07:59 PM

两种去除方法：1、利用preg_replace()执行正则表达式搜索所有大写字母并将其替换为空字符即可，语法“preg_replace('/[A-Z]/','',$str)”。2、利用preg_filter()执行正则表达式搜索所有大写字母并将其替换为空字符即可，语法“preg_filter('/[A-Z]/','',$str)”。

php怎么替换nbsp空格符Apr 24, 2022 pm 02:55 PM

方法：1、用“str_replace(" ","其他字符",$str)”语句，可将nbsp符替换为其他字符；2、用“preg_replace("/(\s|\&nbsp\;||\xc2\xa0)/","其他字符",$str)”语句。

使用Go语言编写高性能的正则表达式匹配Jun 15, 2023 pm 10:56 PM

随着数据量的不断增大，正则表达式匹配成为了程序中常用的操作之一。而在Go语言中，由于其天然的并行ism，以及与底层系统的交互性和高效性，使得Go语言的正则表达式匹配极具优势。那么如何使用Go语言编写高性能的正则表达式匹配呢？一、了解正则表达式在使用正则表达式前，我们首先需要了解正则表达式，了解其基本语法规则以及常用的匹配字符，使我们能够在编写正则表达式时更加

php怎么利用正则排除字符串中的字符Dec 15, 2022 pm 03:30 PM

两种方法：1、用preg_replace()，可执行正则表达式的搜索和替换，只需将字符串中匹配的字符替换为空字符即可，语法“preg_replace(正则, "", $str)”。2、用preg_match_all()，可搜索字符串中所有和正则表达式匹配的结果，会将每次的匹配结果放在一个数组$array中，语法“preg_match_all(正则,$str,$array);”。

javascript怎么正则替换非汉字的字符Oct 13, 2022 pm 05:37 PM

在javascript中，可以使用replace()函数配合正则表达式“/[u4e00-u9fa5|,]+/ig”来查找字符串中的所有非汉字字符，并将其替换为其他指定值，语法“字符串对象.replace(/[u4e00-u9fa5|,]+/ig,'指定替换值')”。

php怎么只获取中文字符Apr 28, 2022 pm 08:15 PM

php中可用preg_match_all()配合正则表达式过滤字符串，只获取中文字符；语法“preg_match_all("/[\x{4e00}-\x{9fff}]+/u","$str",$arr);”，会将匹配字符存入“$arr”数组中。

Java语言正则表达式的使用方法Jun 10, 2023 am 08:13 AM

Java语言正则表达式的使用方法正则表达式是一种强大的文本处理工具，可以用来匹配和验证文本。在Java语言中，也可以使用正则表达式来实现字符串的匹配和处理。本文将介绍Java语言正则表达式的使用方法，涵盖正则表达式的基础知识，常用的正则表达式语法，以及在Java程序中使用正则表达式的方法。一、基础知识正则表达式是什么？正则表达式是一种文本模式，用来描述一组字

PHP开发：如何编写高效的正则表达式Jun 15, 2023 pm 09:04 PM

在PHP开发中，正则表达式是非常重要的工具，用于匹配、查找和替换文本中的特定字符串。然而，编写高效的正则表达式并不是一件易事，需要开发者具备一定的技巧和经验。下面是一些可以帮助您编写高效正则表达式的技巧：1.尽可能使用非贪婪匹配默认情况下，正则表达式是贪婪的，即它们将尽可能匹配更多的文本。在某些情况下，可能需要使用非贪婪匹配来避免这种情况。非贪婪匹配使用"

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Repo: How To Revive Teammates

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

3 weeks agoByDDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

3 weeks agoByDDD

Hot Tools

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Atom editor mac version download

The most popular open source editor

Dreamweaver Mac version

Visual web development tools

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Hot Topics

Where is the login entrance for gmail email?

7316

1625

1349

1261

1208