Home >php教程 >php手册 >php字符串处理之全角半角转换,php字符串全角半角

php字符串处理之全角半角转换,php字符串全角半角

WBOY
WBOYOriginal
2016-06-13 08:45:401069browse

php字符串处理之全角半角转换,php字符串全角半角

半角全角的处理是字符串处理的常见问题,本文尝试为大家提供一个思路。

一、概念

全角字符unicode编码从65281~65374 (十六进制 0xFF01 ~ 0xFF5E)
半角字符unicode编码从33~126 (十六进制 0x21~ 0x7E)
空格比较特殊,全角为 12288(0x3000),半角为 32 (0x20)
而且除空格外,全角/半角按unicode编码排序在顺序上是对应的
所以可以直接通过用+-法来处理非空格数据,对空格单独处理

二、实现思路

1. 找到目标unicode的字符,可以使用正则表达式解决

2. 修改unicode编码

三、实现

1. 首先是两个unicode与字符的转换函数:

<span> 1</span>     <span>/*</span><span>*
</span><span> 2</span> <span>     * 将unicode转换成字符
</span><span> 3</span> <span>     * @param int $unicode
</span><span> 4</span> <span>     * @return string UTF-8字符
</span><span> 5</span> <span>     *</span><span>*/</span>
<span> 6</span>     <span>function</span> unicode2Char(<span>$unicode</span><span>){
</span><span> 7</span>         <span>if</span>(<span>$unicode</span> < 128)     <span>return</span> <span>chr</span>(<span>$unicode</span><span>);
</span><span> 8</span>         <span>if</span>(<span>$unicode</span> < 2048)    <span>return</span> <span>chr</span>((<span>$unicode</span> >> 6) + 192) .
<span> 9</span>                                       <span>chr</span>((<span>$unicode</span> & 63) + 128<span>);
</span><span>10</span>         <span>if</span>(<span>$unicode</span> < 65536)   <span>return</span> <span>chr</span>((<span>$unicode</span> >> 12) + 224) .
<span>11</span>                                       <span>chr</span>(((<span>$unicode</span> >> 6) & 63) + 128) .
<span>12</span>                                       <span>chr</span>((<span>$unicode</span> & 63) + 128<span>);
</span><span>13</span>         <span>if</span>(<span>$unicode</span> < 2097152) <span>return</span> <span>chr</span>((<span>$unicode</span> >> 18) + 240) .
<span>14</span>                                       <span>chr</span>(((<span>$unicode</span> >> 12) & 63) + 128) .
<span>15</span>                                       <span>chr</span>(((<span>$unicode</span> >> 6) & 63) + 128) .
<span>16</span>                                       <span>chr</span>((<span>$unicode</span> & 63) + 128<span>);
</span><span>17</span>         <span>return</span> <span>false</span><span>;
</span><span>18</span> <span>    }
</span><span>19</span>  
<span>20</span>     <span>/*</span><span>*
</span><span>21</span> <span>     * 将字符转换成unicode
</span><span>22</span> <span>     * @param string $char 必须是UTF-8字符
</span><span>23</span> <span>     * @return int
</span><span>24</span> <span>     *</span><span>*/</span>
<span>25</span>     <span>function</span> char2Unicode(<span>$char</span><span>){
</span><span>26</span>         <span>switch</span> (<span>strlen</span>(<span>$char</span><span>)){
</span><span>27</span>             <span>case</span> 1 : <span>return</span> <span>ord</span>(<span>$char</span><span>);
</span><span>28</span>             <span>case</span> 2 : <span>return</span> (<span>ord</span>(<span>$char</span>{1}) & 63) |
<span>29</span>                             ((<span>ord</span>(<span>$char</span>{0}) & 31) << 6<span>);
</span><span>30</span>             <span>case</span> 3 : <span>return</span> (<span>ord</span>(<span>$char</span>{2}) & 63) |
<span>31</span>                             ((<span>ord</span>(<span>$char</span>{1}) & 63) << 6) |
<span>32</span>                             ((<span>ord</span>(<span>$char</span>{0}) & 15) << 12<span>);
</span><span>33</span>             <span>case</span> 4 : <span>return</span> (<span>ord</span>(<span>$char</span>{3}) & 63) |
<span>34</span>                             ((<span>ord</span>(<span>$char</span>{2}) & 63) << 6) |
<span>35</span>                             ((<span>ord</span>(<span>$char</span>{1}) & 63) << 12) |
<span>36</span>                             ((<span>ord</span>(<span>$char</span>{0}) & 7)  << 18<span>);
</span><span>37</span>             <span>default</span> :
<span>38</span>                 <span>trigger_error</span>('Character is not UTF-8!', <span>E_USER_WARNING</span><span>);
</span><span>39</span>                 <span>return</span> <span>false</span><span>;
</span><span>40</span> <span>        }
</span><span>41</span>     }

  2. 全角转半角

<span> 1</span>     <span>/*</span><span>*
</span><span> 2</span> <span>     * 全角转半角
</span><span> 3</span> <span>     * @param string $str
</span><span> 4</span> <span>     * @return string
</span><span> 5</span> <span>     *</span><span>*/</span>
<span> 6</span>     <span>function</span> sbc2Dbc(<span>$str</span><span>){
</span><span> 7</span>         <span>return</span> <span>preg_replace</span><span>(
</span><span> 8</span>             <span>//</span><span> 全角字符 </span>
<span> 9</span>             '/[\x{3000}\x{ff01}-\x{ff5f}]/ue',
<span>10</span>             <span>//</span><span> 编码转换
</span><span>11</span> <span>            // 0x3000是空格,特殊处理,其他全角字符编码-0xfee0即可以转为半角</span>
<span>12</span>             '($unicode=char2Unicode(\'\0\')) == 0x3000 ? " " : (($code=$unicode-0xfee0) > 256 ? unicode2Char($code) : chr($code))',
<span>13</span>             <span>$str</span>
<span>14</span> <span>        );
</span><span>15</span>     }

3. 半角转全角

<span> 1</span>     <span>/*</span><span>*
</span><span> 2</span> <span>     * 半角转全角
</span><span> 3</span> <span>     * @param string $str
</span><span> 4</span> <span>     * @return string
</span><span> 5</span> <span>     *</span><span>*/</span>
<span> 6</span>     <span>function</span> dbc2Sbc(<span>$str</span><span>){</span>
<span> 7</span>         <span>return</span> <span>preg_replace</span><span>(
</span><span> 8</span>             <span>//</span><span> 半角字符 </span>
<span> 9</span>             '/[\x{0020}\x{0020}-\x{7e}]/ue',  
<span>10</span>             <span>//</span><span> 编码转换
</span><span>11</span> <span>            // 0x0020是空格,特殊处理,其他半角字符编码+0xfee0即可以转为全角</span>
<span>12</span>             '($unicode=char2Unicode(\'\0\')) == 0x0020 ? unicode2Char(0x3000) : (($code=$unicode+0xfee0) > 256 ? unicode2Char($code) : chr($code))',
<span>13</span>             <span>$str</span>
<span>14</span> <span>        );
</span><span>15</span>     }

四、测试

 示例代码:

<span>1</span> <span>$a</span> = 'abc12 345'<span>;
</span><span>2</span> <span>$sbc</span> = dbc2Sbc(<span>$a</span><span>);
</span><span>3</span> <span>$dbc</span> = sbc2Dbc(<span>$sbc</span><span>);
</span><span>4</span> 
<span>5</span> <span>var_dump</span>(<span>$a</span>, <span>$sbc</span>, <span>$dbc</span>);

结果:

<span>1</span> <span>string</span>(9) "abc12 345"
<span>2</span> <span>string</span>(27) "abc12 345"
<span>3</span> <span>string</span>(9) "abc12 345"

 

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn