A function that truly intercepts strings according to the rules of utf8 encoding (utf8 version sub_str)

Home

Backend Development

PHP Tutorial

A function that truly intercepts strings according to the rules of utf8 encoding (utf8 version sub_str)_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 21, 2016 pm 03:14 PM

strsubutf8codeeffectfunctionFunctioncopystringinterceptVersionofcodinglawconduct

Copy code The code is as follows:

 
/* 
* Function: The function is the same as substr, except that it will not cause garbled characters
* Parameters: 
* Return: 
*/ 
function utf8_substr( $str , $start , $length=null ){ 
// Intercept normally first. 
$res = substr ( $str , $start , $length ); 
$strlen = strlen( $str ); 
/* Then determine whether the first and last 6 bytes are complete (not incomplete) */ 
// If The parameter start is a positive number
if ( $start >= 0 ){ 
//Truncate about 6 bytes forward
$next_start = $start + $length; // Initial position
$ next_len = $next_start + 6 $next_segm = substr( $str , $next_start , $next_len ); 
// If the first byte is not The first byte of the complete character, and then intercept about 6 bytes 
$prev_start = $start - 6 > 0 ? $start - 6 : 0; 
$prev_segm = substr( $str , $prev_start , $start - $prev_start ); 
} 
// start is a negative number
else{ 
// intercept about 6 bytes forward
$next_start = $strlen + $start + $ length; // Initial position
$next_len = $next_start + 6 $next_segm = substr( $str , $next_start , $next_len ); 
// If the first byte is not the first byte of the complete character, then intercept about 6 bytes. 
$start = $strlen + $start; 
$prev_start = $start - 6 > 0 ? $start - 6 : 0; 
$prev_segm = substr( $str , $prev_start , $start - $prev_start ); 
} 
// Determine whether the first 6 bytes comply with utf8 rules
if ( preg_match( '@^([x80-xBF]{0,5})[xC0-xFD]?@' , $next_segm , $bytes ) ){ 
if ( !empty( $bytes[1] ) ){ 
$bytes = $bytes[1]; 
$res .= $bytes; 
} 
} 
// Determine whether the last 6 bytes comply with utf8 rules
 $ord0 = ord( $res[0] ); 
if ( 128 = $ord0 ){ 
// Intercept from the back and add it in front of res. 
if ( preg_match( '@[xC0-xFD][x80-xBF]{0,5}$@' , $prev_segm , $bytes ) ){ 
if ( !empty( $bytes[0] ) ){ 
$bytes = $bytes[0]; 
$res = $bytes . $res; 
} 
} 
} 
return $res; 
} 
 

Test data::

Copy code The code is as follows:

 
$ str = 'dfjdjf test 13f test 65&2 data fdj (1 on mfe&...on'; 
var_dump( utf8_substr( $str , 22 , 12 ) ); echo ' 
 '; 
var_dump( utf8_substr( $str , 22 , -6 ) ); echo ' 
 '; 
var_dump( utf8_substr( $str , 9 , 12 ) ); echo ' 
 '; 
var_dump( utf8_substr( $str , 19 , 12 ) ); echo ' 
 '; 
var_dump( utf8_substr( $str , 28 , -6 ) ); echo ' 
 ' ; 

Display results:: (No garbled interception, everyone is welcome to test and submit bugs)
string(12) "According toｆｽdｊ"
string(26) "According toｆｽdｊ（1 mfe&…"
string(13) "13f try 65&2 number"
string(12) "Dataｆｄ"
string(20) "dｊ（1 is mfe&…"

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

如何在技嘉主板上设置键盘启动功能 (技嘉主板启用键盘开机方式)Dec 31, 2023 pm 05:15 PM

技嘉的主板怎么设置键盘开机首先，要支持键盘开机，一定是PS2键盘！！设置步骤如下：第一步：开机按Del或者F2进入bios，到bios的Advanced(高级)模式普通主板默认进入主板的EZ（简易）模式，需要按F7切换到高级模式，ROG系列主板默认进入bios的高级模式（我们用简体中文来示范）第二步：选择到——【高级】——【高级电源管理（APM）】第三步：找到选项【由PS2键盘唤醒】第四步：这个选项默认是Disabled（关闭）的，下拉之后可以看到三种不同的设置选择，分别是按【空格键】开机、按组

11个常见的分类特征的编码技术Apr 12, 2023 pm 12:16 PM

机器学习算法只接受数值输入，所以如果我们遇到分类特征的时候都会对分类特征进行编码，本文总结了常见的11个分类变量编码方法。1、ONE HOT ENCODING最流行且常用的编码方法是One Hot Enoding。一个具有n个观测值和d个不同值的单一变量被转换成具有n个观测值的d个二元变量，每个二元变量使用一位（0，1）进行标识。例如：编码后最简单的实现是使用pandas的' get_dummiesnew_df=pd.get_dummies(columns=[‘Sex’], data=df)2、

utf8编码汉字占多少字节Feb 21, 2023 am 11:40 AM

utf8编码汉字占3个字节。在UTF-8编码中，一个中文等于三个字节，一个中文标点占三个字节；而在Unicode编码中，一个中文（含繁体）等于两个字节。UTF-8使用1~4字节为每个字符编码，一个US-ASCIl字符只需1字节编码，带有变音符号的拉丁文、希腊文、西里尔字母、亚美尼亚语、希伯来文、阿拉伯文、叙利亚文等字母则需要2字节编码。

知识图谱：大模型的理想搭档Jan 29, 2024 am 09:21 AM

大型语言模型（LLM）具有生成流畅和连贯文本的能力，为人工智能的对话、创造性写作等领域带来了新的前景。然而，LLM也存在一些关键局限。首先，它们的知识仅限于从训练数据中识别出的模式，缺乏对世界的真正理解。其次，推理能力有限，不能进行逻辑推理或从多个数据源融合事实。面对更复杂、更开放的问题时，LLM的回答可能变得荒谬或矛盾，被称为“幻觉”。因此，尽管LLM在某些方面非常有用，但在处理复杂问题和真实世界情境时，仍存在一定的局限性。为了弥补这些差距，近年来出现了检索增强生成（RAG）系统，其核心思想是

常见的几种编码方式Oct 24, 2023 am 10:09 AM

常见的编码方式有ASCII编码、Unicode编码、UTF-8编码、UTF-16编码、GBK编码等。详细介绍：1、ASCII编码是最早的字符编码标准，使用7位二进制数表示128个字符，包括英文字母、数字、标点符号以及控制字符等；2、Unicode编码是一种用于表示世界上所有字符的标准编码方式，它为每个字符分配了一个唯一的数字码点；3、UTF-8编码等等。

如何解决php数据库查询结果编码的问题Mar 21, 2023 am 11:49 AM

PHP是一种流行的Web编程语言，可以用于编写动态网页和应用程序。在实际应用中，PHP经常需要与数据库进行交互，进行数据的查询和处理。然而，在使用PHP从数据库中获取结果时，可能会遇到编码的问题，这通常会导致出现乱码。那么，如何解决php数据库查询结果编码的问题呢？

PHP编码小技巧：如何生成带有防伪验证功能的二维码？Aug 17, 2023 pm 02:42 PM

PHP编码小技巧：如何生成带有防伪验证功能的二维码？随着电子商务和互联网的发展，二维码越来越被广泛应用于各行各业。而在使用二维码的过程中，为了确保产品的安全性和防止伪造，为二维码添加防伪验证功能是十分重要的一环。本文将介绍如何使用PHP生成带有防伪验证功能的二维码，并附上相应代码示例。在开始之前，我们需要准备以下几个必要的工具和库：PHPQRCode：PHP

一文搞懂如何基于 GenAI 提升编码效能Apr 01, 2024 pm 06:49 PM

Hellofolks，我是Luga，今天我们来聊一下人工智能(AI)生态领域相关的技术-GenAI。面对日新月异的技术创新以及差异化的业务场景挑战，传统的编码方式已经开始出现水土不服，难以完全应对日益增长的诉求。与此同时，新兴的通用GenAI（人工智能技术）具有极具潜力的能力来满足这一需求。GenAI作为人工智能技术的代表，以其强大的潜力和能力已经开始在各行各业得到广泛应用。它可以自动学习和适应不同场景下的编码需求，极大地提高了编码效率和质量。通过深度学习和模型优化，GenAI能够准确地理解不同

See all articles