php字符串处理之全角半角转换
半角全角的处理是字符串处理的常见问题,本文尝试为大家提供一个思路。
一、概念
全角字符unicode编码从65281~65374 (十六进制 0xFF01 ~ 0xFF5E)
半角字符unicode编码从33~126 (十六进制 0x21~ 0x7E)
空格比较特殊,全角为 12288(0x3000),半角为 32 (0x20)
而且除空格外,全角/半角按unicode编码排序在顺序上是对应的
所以可以直接通过用+-法来处理非空格数据,对空格单独处理
二、实现思路
1. 找到目标unicode的字符,可以使用正则表达式解决
2. 修改unicode编码
三、实现
1. 首先是两个unicode与字符的转换函数:
<span style="color: #008080;"> 1</span> <span style="color: #008000;">/*</span><span style="color: #008000;">*</span><span style="color: #008080;"> 2</span> <span style="color: #008000;"> * 将unicode转换成字符</span><span style="color: #008080;"> 3</span> <span style="color: #008000;"> * @param int $unicode</span><span style="color: #008080;"> 4</span> <span style="color: #008000;"> * @return string UTF-8字符</span><span style="color: #008080;"> 5</span> <span style="color: #008000;"> *</span><span style="color: #008000;">*/</span><span style="color: #008080;"> 6</span> <span style="color: #0000ff;">function</span> unicode2Char(<span style="color: #800080;">$unicode</span><span style="color: #000000;">){</span><span style="color: #008080;"> 7</span> <span style="color: #0000ff;">if</span>(<span style="color: #800080;">$unicode</span> return <span style="color: #008080;">chr</span>(<span style="color: #800080;">$unicode</span><span style="color: #000000;">);</span><span style="color: #008080;"> 8</span> <span style="color: #0000ff;">if</span>(<span style="color: #800080;">$unicode</span> return <span style="color: #008080;">chr</span>((<span style="color: #800080;">$unicode</span> >> 6) + 192) .<span style="color: #008080;"> 9</span> <span style="color: #008080;">chr</span>((<span style="color: #800080;">$unicode</span> & 63) + 128<span style="color: #000000;">);</span><span style="color: #008080;">10</span> <span style="color: #0000ff;">if</span>(<span style="color: #800080;">$unicode</span> return <span style="color: #008080;">chr</span>((<span style="color: #800080;">$unicode</span> >> 12) + 224) .<span style="color: #008080;">11</span> <span style="color: #008080;">chr</span>(((<span style="color: #800080;">$unicode</span> >> 6) & 63) + 128) .<span style="color: #008080;">12</span> <span style="color: #008080;">chr</span>((<span style="color: #800080;">$unicode</span> & 63) + 128<span style="color: #000000;">);</span><span style="color: #008080;">13</span> <span style="color: #0000ff;">if</span>(<span style="color: #800080;">$unicode</span> return <span style="color: #008080;">chr</span>((<span style="color: #800080;">$unicode</span> >> 18) + 240) .<span style="color: #008080;">14</span> <span style="color: #008080;">chr</span>(((<span style="color: #800080;">$unicode</span> >> 12) & 63) + 128) .<span style="color: #008080;">15</span> <span style="color: #008080;">chr</span>(((<span style="color: #800080;">$unicode</span> >> 6) & 63) + 128) .<span style="color: #008080;">16</span> <span style="color: #008080;">chr</span>((<span style="color: #800080;">$unicode</span> & 63) + 128<span style="color: #000000;">);</span><span style="color: #008080;">17</span> <span style="color: #0000ff;">return</span> <span style="color: #0000ff;">false</span><span style="color: #000000;">;</span><span style="color: #008080;">18</span> <span style="color: #000000;"> }</span><span style="color: #008080;">19</span> <span style="color: #008080;">20</span> <span style="color: #008000;">/*</span><span style="color: #008000;">*</span><span style="color: #008080;">21</span> <span style="color: #008000;"> * 将字符转换成unicode</span><span style="color: #008080;">22</span> <span style="color: #008000;"> * @param string $char 必须是UTF-8字符</span><span style="color: #008080;">23</span> <span style="color: #008000;"> * @return int</span><span style="color: #008080;">24</span> <span style="color: #008000;"> *</span><span style="color: #008000;">*/</span><span style="color: #008080;">25</span> <span style="color: #0000ff;">function</span> char2Unicode(<span style="color: #800080;">$char</span><span style="color: #000000;">){</span><span style="color: #008080;">26</span> <span style="color: #0000ff;">switch</span> (<span style="color: #008080;">strlen</span>(<span style="color: #800080;">$char</span><span style="color: #000000;">)){</span><span style="color: #008080;">27</span> <span style="color: #0000ff;">case</span> 1 : <span style="color: #0000ff;">return</span> <span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span><span style="color: #000000;">);</span><span style="color: #008080;">28</span> <span style="color: #0000ff;">case</span> 2 : <span style="color: #0000ff;">return</span> (<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{1}) & 63) |<span style="color: #008080;">29</span> ((<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{0}) & 31) );<span style="color: #008080;">30</span> <span style="color: #0000ff;">case</span> 3 : <span style="color: #0000ff;">return</span> (<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{2}) & 63) |<span style="color: #008080;">31</span> ((<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{1}) & 63) 32 ((<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{0}) & 15) );<span style="color: #008080;">33</span> <span style="color: #0000ff;">case</span> 4 : <span style="color: #0000ff;">return</span> (<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{3}) & 63) |<span style="color: #008080;">34</span> ((<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{2}) & 63) 35 ((<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{1}) & 63) 36 ((<span style="color: #008080;">ord</span>(<span style="color: #800080;">$char</span>{0}) & 7) );<span style="color: #008080;">37</span> <span style="color: #0000ff;">default</span> :<span style="color: #008080;">38</span> <span style="color: #008080;">trigger_error</span>('Character is not UTF-8!', <span style="color: #ff00ff;">E_USER_WARNING</span><span style="color: #000000;">);</span><span style="color: #008080;">39</span> <span style="color: #0000ff;">return</span> <span style="color: #0000ff;">false</span><span style="color: #000000;">;</span><span style="color: #008080;">40</span> <span style="color: #000000;"> }</span><span style="color: #008080;">41</span> }
2. 全角转半角
<span style="color: #008080;"> 1</span> <span style="color: #008000;">/*</span><span style="color: #008000;">*</span><span style="color: #008080;"> 2</span> <span style="color: #008000;"> * 全角转半角</span><span style="color: #008080;"> 3</span> <span style="color: #008000;"> * @param string $str</span><span style="color: #008080;"> 4</span> <span style="color: #008000;"> * @return string</span><span style="color: #008080;"> 5</span> <span style="color: #008000;"> *</span><span style="color: #008000;">*/</span><span style="color: #008080;"> 6</span> <span style="color: #0000ff;">function</span> sbc2Dbc(<span style="color: #800080;">$str</span><span style="color: #000000;">){</span><span style="color: #008080;"> 7</span> <span style="color: #0000ff;">return</span> <span style="color: #008080;">preg_replace</span><span style="color: #000000;">(</span><span style="color: #008080;"> 8</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 全角字符 </span><span style="color: #008080;"> 9</span> '/[\x{3000}\x{ff01}-\x{ff5f}]/ue',<span style="color: #008080;">10</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 编码转换</span><span style="color: #008080;">11</span> <span style="color: #008000;"> // 0x3000是空格,特殊处理,其他全角字符编码-0xfee0即可以转为半角</span><span style="color: #008080;">12</span> '($unicode=char2Unicode(\'\0\')) == 0x3000 ? " " : (($code=$unicode-0xfee0) > 256 ? unicode2Char($code) : chr($code))',<span style="color: #008080;">13</span> <span style="color: #800080;">$str</span><span style="color: #008080;">14</span> <span style="color: #000000;"> );</span><span style="color: #008080;">15</span> }
3. 半角转全角
<span style="color: #008080;"> 1</span> <span style="color: #008000;">/*</span><span style="color: #008000;">*</span><span style="color: #008080;"> 2</span> <span style="color: #008000;"> * 半角转全角</span><span style="color: #008080;"> 3</span> <span style="color: #008000;"> * @param string $str</span><span style="color: #008080;"> 4</span> <span style="color: #008000;"> * @return string</span><span style="color: #008080;"> 5</span> <span style="color: #008000;"> *</span><span style="color: #008000;">*/</span><span style="color: #008080;"> 6</span> <span style="color: #0000ff;">function</span> dbc2Sbc(<span style="color: #800080;">$str</span><span style="color: #000000;">){</span><span style="color: #008080;"> 7</span> <span style="color: #0000ff;">return</span> <span style="color: #008080;">preg_replace</span><span style="color: #000000;">(</span><span style="color: #008080;"> 8</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 半角字符 </span><span style="color: #008080;"> 9</span> '/[\x{0020}\x{0020}-\x{7e}]/ue', <span style="color: #008080;">10</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 编码转换</span><span style="color: #008080;">11</span> <span style="color: #008000;"> // 0x0020是空格,特殊处理,其他半角字符编码+0xfee0即可以转为全角</span><span style="color: #008080;">12</span> '($unicode=char2Unicode(\'\0\')) == 0x0020 ? unicode2Char(0x3000) : (($code=$unicode+0xfee0) > 256 ? unicode2Char($code) : chr($code))',<span style="color: #008080;">13</span> <span style="color: #800080;">$str</span><span style="color: #008080;">14</span> <span style="color: #000000;"> );</span><span style="color: #008080;">15</span> }
四、测试
示例代码:
<span style="color: #008080;">1</span> <span style="color: #800080;">$a</span> = 'abc12 345'<span style="color: #000000;">;</span><span style="color: #008080;">2</span> <span style="color: #800080;">$sbc</span> = dbc2Sbc(<span style="color: #800080;">$a</span><span style="color: #000000;">);</span><span style="color: #008080;">3</span> <span style="color: #800080;">$dbc</span> = sbc2Dbc(<span style="color: #800080;">$sbc</span><span style="color: #000000;">);</span><span style="color: #008080;">4</span> <span style="color: #008080;">5</span> <span style="color: #008080;">var_dump</span>(<span style="color: #800080;">$a</span>, <span style="color: #800080;">$sbc</span>, <span style="color: #800080;">$dbc</span>);
结果:
<span style="color: #008080;">1</span> <span style="color: #0000ff;">string</span>(9) "abc12 345"<span style="color: #008080;">2</span> <span style="color: #0000ff;">string</span>(27) "abc12 345"<span style="color: #008080;">3</span> <span style="color: #0000ff;">string</span>(9) "abc12 345"