>  기사  >  백엔드 개발  >  php字符串处理之全角半角变换

php字符串处理之全角半角变换

WBOY
WBOY원래의
2016-06-13 12:28:441441검색

php字符串处理之全角半角转换

半角全角的处理是字符串处理的常见问题,本文尝试为大家提供一个思路。

一、概念

全角字符unicode编码从65281~65374 (十六进制 0xFF01 ~ 0xFF5E)
半角字符unicode编码从33~126 (十六进制 0x21~ 0x7E)
空格比较特殊,全角为 12288(0x3000),半角为 32 (0x20)
而且除空格外,全角/半角按unicode编码排序在顺序上是对应的
所以可以直接通过用+-法来处理非空格数据,对空格单独处理

二、实现思路

1. 找到目标unicode的字符,可以使用正则表达式解决

2. 修改unicode编码

三、实现

1. 首先是两个unicode与字符的转换函数:

<span style="color: #008080;"> 1</span>     <span style="color: #008000;">/*</span><span style="color: #008000;">*</span><span style="color: #008080;"> 2</span> <span style="color: #008000;">     * 将unicode转换成字符</span><span style="color: #008080;"> 3</span> <span style="color: #008000;">     * @param int $unicode</span><span style="color: #008080;"> 4</span> <span style="color: #008000;">     * @return string UTF-8字符</span><span style="color: #008080;"> 5</span> <span style="color: #008000;">     *</span><span style="color: #008000;">*/</span><span style="color: #008080;"> 6</span>     <span style="color: #0000ff;">function</span> unicode2Char(<span style="color: #800080;">$unicode</span><span style="color: #000000;">){</span><span style="color: #008080;"> 7</span>         <span style="color: #0000ff;">if</span>(<span style="color: #800080;">$unicode</span> return <span style="color: #008080;">chr</span>(<span style="color: #800080;">$unicode</span><span style="color: #000000;">);</span><span style="color: #008080;"> 8</span>         <span style="color: #0000ff;">if</span>(<span style="color: #800080;">$unicode</span> return <span style="color: #008080;">chr</span>((<span style="color: #800080;">$unicode</span> >> 6) + 192) .<span style="color: #008080;"> 9</span>                                       <span style="color: #008080;">chr</span>((<span style="color: #800080;">$unicode</span> & 63) + 128<span style="color: #000000;">);</span><span style="color: #008080;">10</span>         <span style="color: #0000ff;">if</span>(<span style="color: #800080;">$unicode</span> return <span style="color: #008080;">chr</span>((<span style="color: #800080;">$unicode</span> >> 12) + 224) .<span style="color: #008080;">11</span>                                       <span style="color: #008080;">chr</span>(((<span style="color: #800080;">$unicode</span> >> 6) & 63) + 128) .<span style="color: #008080;">12</span>                                       <span style="color: #008080;">chr</span>((<span style="color: #800080;">$unicode</span> & 63) + 128<span style="color: #000000;">);</span><span style="color: #008080;">13</span>         <span style="color: #0000ff;">if</span>(<span style="color: #800080;">$unicode</span> return <span style="color: #008080;">chr</span>((<span style="color: #800080;">$unicode</span> >> 18) + 240) .<span style="color: #008080;">14</span>                                       <span style="color: #008080;">chr</span>(((<span style="color: #800080;">$unicode</span> >> 12) & 63) + 128) .<span style="color: #008080;">15</span>                                       <span style="color: #008080;">chr</span>(((<span style="color: #800080;">$unicode</span> >> 6) & 63) + 128) .<span style="color: #008080;">16</span>                                       <span style="color: #008080;">chr</span>((<span style="color: #800080;">$unicode</span> & 63) + 128<span style="color: #000000;">);</span><span style="color: #008080;">17</span>         <span style="color: #0000ff;">return</span> <span style="color: #0000ff;">false</span><span style="color: #000000;">;</span><span style="color: #008080;">18</span> <span style="color: #000000;">    }</span><span style="color: #008080;">19</span>  <span style="color: #008080;">20</span>     <span style="color: #008000;">/*</span><span style="color: #008000;">*</span><span style="color: #008080;">21</span> <span style="color: #008000;">     * 将字符转换成unicode</span><span style="color: #008080;">22</span> <span style="color: #008000;">     * @param string $char 必须是UTF-8字符</span><span style="color: #008080;">23</span> <span style="color: #008000;">     * @return int</span><span style="color: #008080;">24</span> <span style="color: #008000;">     *</span><span style="color: #008000;">*/</span><span style="color: #008080;">25</span>     <span style="color: #0000ff;">function</span> char2Unicode(<span style="color: #800080;">$char</span><span style="color: #000000;">){</span><span style="color: #008080;">26</span>         <span style="color: #0000ff;">switch</span> (<span style="color: #008080;">strlen</span>(<span style="color: #800080;">$char</span><span style="color: #000000;">)){</span><span style="color: #008080;">27</span>             <span style="color: #0000ff;">case</span> 1 : <span style="color: #0000ff;">return</span> <span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span><span style="color: #000000;">);</span><span style="color: #008080;">28</span>             <span style="color: #0000ff;">case</span> 2 : <span style="color: #0000ff;">return</span> (<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{1}) & 63) |<span style="color: #008080;">29</span>                             ((<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{0}) & 31) );<span style="color: #008080;">30</span>             <span style="color: #0000ff;">case</span> 3 : <span style="color: #0000ff;">return</span> (<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{2}) & 63) |<span style="color: #008080;">31</span>                             ((<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{1}) & 63) 32                             ((<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{0}) & 15) );<span style="color: #008080;">33</span>             <span style="color: #0000ff;">case</span> 4 : <span style="color: #0000ff;">return</span> (<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{3}) & 63) |<span style="color: #008080;">34</span>                             ((<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{2}) & 63) 35                             ((<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{1}) & 63) 36                             ((<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{0}) & 7)  );<span style="color: #008080;">37</span>             <span style="color: #0000ff;">default</span> :<span style="color: #008080;">38</span>                 <span style="color: #008080;">trigger_error</span>('Character is not UTF-8!', <span style="color: #ff00ff;">E_USER_WARNING</span><span style="color: #000000;">);</span><span style="color: #008080;">39</span>                 <span style="color: #0000ff;">return</span> <span style="color: #0000ff;">false</span><span style="color: #000000;">;</span><span style="color: #008080;">40</span> <span style="color: #000000;">        }</span><span style="color: #008080;">41</span>     }

  2. 全角转半角

<span style="color: #008080;"> 1</span>     <span style="color: #008000;">/*</span><span style="color: #008000;">*</span><span style="color: #008080;"> 2</span> <span style="color: #008000;">     * 全角转半角</span><span style="color: #008080;"> 3</span> <span style="color: #008000;">     * @param string $str</span><span style="color: #008080;"> 4</span> <span style="color: #008000;">     * @return string</span><span style="color: #008080;"> 5</span> <span style="color: #008000;">     *</span><span style="color: #008000;">*/</span><span style="color: #008080;"> 6</span>     <span style="color: #0000ff;">function</span> sbc2Dbc(<span style="color: #800080;">$str</span><span style="color: #000000;">){</span><span style="color: #008080;"> 7</span>         <span style="color: #0000ff;">return</span> <span style="color: #008080;">preg_replace</span><span style="color: #000000;">(</span><span style="color: #008080;"> 8</span>             <span style="color: #008000;">//</span><span style="color: #008000;"> 全角字符 </span><span style="color: #008080;"> 9</span>             '/[\x{3000}\x{ff01}-\x{ff5f}]/ue',<span style="color: #008080;">10</span>             <span style="color: #008000;">//</span><span style="color: #008000;"> 编码转换</span><span style="color: #008080;">11</span> <span style="color: #008000;">            // 0x3000是空格,特殊处理,其他全角字符编码-0xfee0即可以转为半角</span><span style="color: #008080;">12</span>             '($unicode=char2Unicode(\'\0\')) == 0x3000 ? " " : (($code=$unicode-0xfee0) > 256 ? unicode2Char($code) : chr($code))',<span style="color: #008080;">13</span>             <span style="color: #800080;">$str</span><span style="color: #008080;">14</span> <span style="color: #000000;">        );</span><span style="color: #008080;">15</span>     }

3. 半角转全角

<span style="color: #008080;"> 1</span>     <span style="color: #008000;">/*</span><span style="color: #008000;">*</span><span style="color: #008080;"> 2</span> <span style="color: #008000;">     * 半角转全角</span><span style="color: #008080;"> 3</span> <span style="color: #008000;">     * @param string $str</span><span style="color: #008080;"> 4</span> <span style="color: #008000;">     * @return string</span><span style="color: #008080;"> 5</span> <span style="color: #008000;">     *</span><span style="color: #008000;">*/</span><span style="color: #008080;"> 6</span>     <span style="color: #0000ff;">function</span> dbc2Sbc(<span style="color: #800080;">$str</span><span style="color: #000000;">){</span><span style="color: #008080;"> 7</span>         <span style="color: #0000ff;">return</span> <span style="color: #008080;">preg_replace</span><span style="color: #000000;">(</span><span style="color: #008080;"> 8</span>             <span style="color: #008000;">//</span><span style="color: #008000;"> 半角字符 </span><span style="color: #008080;"> 9</span>             '/[\x{0020}\x{0020}-\x{7e}]/ue',  <span style="color: #008080;">10</span>             <span style="color: #008000;">//</span><span style="color: #008000;"> 编码转换</span><span style="color: #008080;">11</span> <span style="color: #008000;">            // 0x0020是空格,特殊处理,其他半角字符编码+0xfee0即可以转为全角</span><span style="color: #008080;">12</span>             '($unicode=char2Unicode(\'\0\')) == 0x0020 ? unicode2Char(0x3000) : (($code=$unicode+0xfee0) > 256 ? unicode2Char($code) : chr($code))',<span style="color: #008080;">13</span>             <span style="color: #800080;">$str</span><span style="color: #008080;">14</span> <span style="color: #000000;">        );</span><span style="color: #008080;">15</span>     }

四、测试

 示例代码:

<span style="color: #008080;">1</span> <span style="color: #800080;">$a</span> = 'abc12 345'<span style="color: #000000;">;</span><span style="color: #008080;">2</span> <span style="color: #800080;">$sbc</span> = dbc2Sbc(<span style="color: #800080;">$a</span><span style="color: #000000;">);</span><span style="color: #008080;">3</span> <span style="color: #800080;">$dbc</span> = sbc2Dbc(<span style="color: #800080;">$sbc</span><span style="color: #000000;">);</span><span style="color: #008080;">4</span> <span style="color: #008080;">5</span> <span style="color: #008080;">var_dump</span>(<span style="color: #800080;">$a</span>, <span style="color: #800080;">$sbc</span>, <span style="color: #800080;">$dbc</span>);

结果:

<span style="color: #008080;">1</span> <span style="color: #0000ff;">string</span>(9) "abc12 345"<span style="color: #008080;">2</span> <span style="color: #0000ff;">string</span>(27) "abc12 345"<span style="color: #008080;">3</span> <span style="color: #0000ff;">string</span>(9) "abc12 345"

 

성명:
본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.