首頁 >php教程 >php手册 >php字符串处理之全角半角转换,php字符串全角半角

php字符串处理之全角半角转换,php字符串全角半角

WBOY
WBOY原創
2016-06-13 08:45:401087瀏覽

php字符串处理之全角半角转换,php字符串全角半角

半角全角的处理是字符串处理的常见问题,本文尝试为大家提供一个思路。

一、概念

全角字符unicode编码从65281~65374 (十六进制 0xFF01 ~ 0xFF5E)
半角字符unicode编码从33~126 (十六进制 0x21~ 0x7E)
空格比较特殊,全角为 12288(0x3000),半角为 32 (0x20)
而且除空格外,全角/半角按unicode编码排序在顺序上是对应的
所以可以直接通过用+-法来处理非空格数据,对空格单独处理

二、实现思路

1. 找到目标unicode的字符,可以使用正则表达式解决

2. 修改unicode编码

三、实现

1. 首先是两个unicode与字符的转换函数:

<span> 1</span>     <span>/*</span><span>*
</span><span> 2</span> <span>     * 将unicode转换成字符
</span><span> 3</span> <span>     * @param int $unicode
</span><span> 4</span> <span>     * @return string UTF-8字符
</span><span> 5</span> <span>     *</span><span>*/</span>
<span> 6</span>     <span>function</span> unicode2Char(<span>$unicode</span><span>){
</span><span> 7</span>         <span>if</span>(<span>$unicode</span> < 128)     <span>return</span> <span>chr</span>(<span>$unicode</span><span>);
</span><span> 8</span>         <span>if</span>(<span>$unicode</span> < 2048)    <span>return</span> <span>chr</span>((<span>$unicode</span> >> 6) + 192) .
<span> 9</span>                                       <span>chr</span>((<span>$unicode</span> & 63) + 128<span>);
</span><span>10</span>         <span>if</span>(<span>$unicode</span> < 65536)   <span>return</span> <span>chr</span>((<span>$unicode</span> >> 12) + 224) .
<span>11</span>                                       <span>chr</span>(((<span>$unicode</span> >> 6) & 63) + 128) .
<span>12</span>                                       <span>chr</span>((<span>$unicode</span> & 63) + 128<span>);
</span><span>13</span>         <span>if</span>(<span>$unicode</span> < 2097152) <span>return</span> <span>chr</span>((<span>$unicode</span> >> 18) + 240) .
<span>14</span>                                       <span>chr</span>(((<span>$unicode</span> >> 12) & 63) + 128) .
<span>15</span>                                       <span>chr</span>(((<span>$unicode</span> >> 6) & 63) + 128) .
<span>16</span>                                       <span>chr</span>((<span>$unicode</span> & 63) + 128<span>);
</span><span>17</span>         <span>return</span> <span>false</span><span>;
</span><span>18</span> <span>    }
</span><span>19</span>  
<span>20</span>     <span>/*</span><span>*
</span><span>21</span> <span>     * 将字符转换成unicode
</span><span>22</span> <span>     * @param string $char 必须是UTF-8字符
</span><span>23</span> <span>     * @return int
</span><span>24</span> <span>     *</span><span>*/</span>
<span>25</span>     <span>function</span> char2Unicode(<span>$char</span><span>){
</span><span>26</span>         <span>switch</span> (<span>strlen</span>(<span>$char</span><span>)){
</span><span>27</span>             <span>case</span> 1 : <span>return</span> <span>ord</span>(<span>$char</span><span>);
</span><span>28</span>             <span>case</span> 2 : <span>return</span> (<span>ord</span>(<span>$char</span>{1}) & 63) |
<span>29</span>                             ((<span>ord</span>(<span>$char</span>{0}) & 31) << 6<span>);
</span><span>30</span>             <span>case</span> 3 : <span>return</span> (<span>ord</span>(<span>$char</span>{2}) & 63) |
<span>31</span>                             ((<span>ord</span>(<span>$char</span>{1}) & 63) << 6) |
<span>32</span>                             ((<span>ord</span>(<span>$char</span>{0}) & 15) << 12<span>);
</span><span>33</span>             <span>case</span> 4 : <span>return</span> (<span>ord</span>(<span>$char</span>{3}) & 63) |
<span>34</span>                             ((<span>ord</span>(<span>$char</span>{2}) & 63) << 6) |
<span>35</span>                             ((<span>ord</span>(<span>$char</span>{1}) & 63) << 12) |
<span>36</span>                             ((<span>ord</span>(<span>$char</span>{0}) & 7)  << 18<span>);
</span><span>37</span>             <span>default</span> :
<span>38</span>                 <span>trigger_error</span>('Character is not UTF-8!', <span>E_USER_WARNING</span><span>);
</span><span>39</span>                 <span>return</span> <span>false</span><span>;
</span><span>40</span> <span>        }
</span><span>41</span>     }

  2. 全角转半角

<span> 1</span>     <span>/*</span><span>*
</span><span> 2</span> <span>     * 全角转半角
</span><span> 3</span> <span>     * @param string $str
</span><span> 4</span> <span>     * @return string
</span><span> 5</span> <span>     *</span><span>*/</span>
<span> 6</span>     <span>function</span> sbc2Dbc(<span>$str</span><span>){
</span><span> 7</span>         <span>return</span> <span>preg_replace</span><span>(
</span><span> 8</span>             <span>//</span><span> 全角字符 </span>
<span> 9</span>             '/[\x{3000}\x{ff01}-\x{ff5f}]/ue',
<span>10</span>             <span>//</span><span> 编码转换
</span><span>11</span> <span>            // 0x3000是空格,特殊处理,其他全角字符编码-0xfee0即可以转为半角</span>
<span>12</span>             '($unicode=char2Unicode(\'\0\')) == 0x3000 ? " " : (($code=$unicode-0xfee0) > 256 ? unicode2Char($code) : chr($code))',
<span>13</span>             <span>$str</span>
<span>14</span> <span>        );
</span><span>15</span>     }

3. 半角转全角

<span> 1</span>     <span>/*</span><span>*
</span><span> 2</span> <span>     * 半角转全角
</span><span> 3</span> <span>     * @param string $str
</span><span> 4</span> <span>     * @return string
</span><span> 5</span> <span>     *</span><span>*/</span>
<span> 6</span>     <span>function</span> dbc2Sbc(<span>$str</span><span>){</span>
<span> 7</span>         <span>return</span> <span>preg_replace</span><span>(
</span><span> 8</span>             <span>//</span><span> 半角字符 </span>
<span> 9</span>             '/[\x{0020}\x{0020}-\x{7e}]/ue',  
<span>10</span>             <span>//</span><span> 编码转换
</span><span>11</span> <span>            // 0x0020是空格,特殊处理,其他半角字符编码+0xfee0即可以转为全角</span>
<span>12</span>             '($unicode=char2Unicode(\'\0\')) == 0x0020 ? unicode2Char(0x3000) : (($code=$unicode+0xfee0) > 256 ? unicode2Char($code) : chr($code))',
<span>13</span>             <span>$str</span>
<span>14</span> <span>        );
</span><span>15</span>     }

四、测试

 示例代码:

<span>1</span> <span>$a</span> = 'abc12 345'<span>;
</span><span>2</span> <span>$sbc</span> = dbc2Sbc(<span>$a</span><span>);
</span><span>3</span> <span>$dbc</span> = sbc2Dbc(<span>$sbc</span><span>);
</span><span>4</span> 
<span>5</span> <span>var_dump</span>(<span>$a</span>, <span>$sbc</span>, <span>$dbc</span>);

结果:

<span>1</span> <span>string</span>(9) "abc12 345"
<span>2</span> <span>string</span>(27) "abc12 345"
<span>3</span> <span>string</span>(9) "abc12 345"

 

陳述:
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn