Heim >php教程 >php手册 >手把手教你做关键词匹配项目(搜索引擎)---- 第二十二天,教你做第二十二天

手把手教你做关键词匹配项目(搜索引擎)---- 第二十二天,教你做第二十二天

WBOY
WBOYOriginal
2016-06-13 09:18:211017Durchsuche

手把手教你做关键词匹配项目(搜索引擎)---- 第二十二天,教你做第二十二天

最新面试经历:面试的感触(二)、面试的感触

最新的架构:高并发数据采集的架构应用(Redis的应用)

吐槽:今天也是刚把心态调整好,继续写以前没有完成的文章,最近几个月自己也是休整了一段时间,回家做苦力,也当作是锻炼锻炼自己的身体,毕竟任何东西都换不回你的健康,我也是建议做IT行业的帅哥们多活动活动你们其它的部位。

 

第二十二天

起点:手把手教你做关键词匹配项目(搜索引擎)---- 第一天

回顾:手把手教你做关键词匹配项目(搜索引擎)---- 第二十一天

 

小帅帅是乐于做总结的人,根据以前所学的知识他总结了如下:

1. 宝贝属性的扩展和类型的问题初步已经得到很好的控制了,不过要推广和运营维护还是遇到了很大的障碍。

2. 对关键词的拆分使用了scws扩展以及自己原生的业务拆词方案,拆词有效的解决了词组方面的匹配难度。

3. 所有的初始工作好像已经完成了,只需要最后的整理项目应该可以正式运行起来了。

小帅帅的主动意识比较强烈,他没有去问于老大,就自己动手写了份代码,该代码主要是为了把所有的步骤连接起来。

宝贝属性的扩展CharList的构建请参照:手把手教你做关键词匹配项目(搜索引擎)---- 第十二天 ~ 手把手教你做关键词匹配项目(搜索引擎)---- 第十八天

Selector主要步骤如下:

1. 获取宝贝属性。

2. 使用业务知识扩充宝贝属性,形成CharList

3. 从词库中获取关键词

4. 关键词拆分算法

5. 匹配度算法

6. 返回匹配上的关键词列表

代码如下:

<span> 1</span> <?<span>php
</span><span> 2</span> <span>#</span><span>@Filename:selector/Selector.php</span>
<span> 3</span> <span>#</span><span>@Author:oshine</span>
<span> 4</span> 
<span> 5</span> <span>require_once</span> <span>dirname</span>(<span>__FILE__</span>) . '/SelectorItem.php'<span>;
</span><span> 6</span> <span>require_once</span> <span>dirname</span>(<span>__FILE__</span>) . '/charlist/CharList.php'<span>;
</span><span> 7</span> <span>require_once</span> <span>dirname</span>(<span>__FILE__</span>) . '/charlist/CharlistHandle.php'<span>;
</span><span> 8</span> <span>require_once</span> <span>dirname</span>(<span>dirname</span>(<span>__FILE__</span>)) . '/lib/Logger.php'<span>;
</span><span> 9</span> 
<span>10</span> <span>class</span><span> Selector
</span><span>11</span> <span>{
</span><span>12</span> 
<span>13</span>     <span>private</span> <span>static</span> <span>$charListHandle</span> = <span>array</span><span>(
</span><span>14</span>         "黑名单" => "BacklistCharListHandle",
<span>15</span>         "近义词" => "LinklistCharListHandle"
<span>16</span> <span>    );
</span><span>17</span> 
<span>18</span>     <span>public</span> <span>static</span> <span>function</span> select(<span>$num_iid</span><span>)
</span><span>19</span> <span>    {
</span><span>20</span>         <span>$selectorItem</span> = SelectorItem::createFromApi(<span>$num_iid</span><span>);
</span><span>21</span> 
<span>22</span>         Logger::trace(<span>$selectorItem</span>-><span>props_name);
</span><span>23</span> 
<span>24</span>         <span>$charlist</span> = <span>new</span><span> CharList();
</span><span>25</span> 
<span>26</span>         <span>foreach</span> (self::<span>$charListHandle</span> <span>as</span> <span>$matchKey</span> => <span>$className</span><span>) {
</span><span>27</span> 
<span>28</span>             <span>$handle</span> = self::createCharListHandle(<span>$className</span>, <span>$charlist</span>, <span>$selectorItem</span><span>);
</span><span>29</span>             <span>$handle</span>-><span>exec</span><span>();
</span><span>30</span> 
<span>31</span> <span>        }
</span><span>32</span> 
<span>33</span>         <span>$selectWords</span> = <span>array</span><span>();
</span><span>34</span> 
<span>35</span>         <span>$keywords</span> = DB::makeArray("select word from keywords"<span>);
</span><span>36</span>         <span>foreach</span> (<span>$keywords</span> <span>as</span> <span>$val</span><span>) {
</span><span>37</span>             <span>#</span><span> code...</span>
<span>38</span>             <span>$keywordEntity</span> = SplitterApp::<span>split</span>(<span>$val</span>["word"<span>]);
</span><span>39</span>             
<span>40</span>                 <span>#</span><span> code...</span>
<span>41</span>             <span>if</span>(MacthExector::macth(<span>$keywordEntity</span>,<span>$charlist</span><span>)){
</span><span>42</span>                 <span>$selectWords</span>[] = <span>$val</span>["word"<span>];
</span><span>43</span> <span>            }           
</span><span>44</span> 
<span>45</span> <span>        }
</span><span>46</span> 
<span>47</span>         <span>return</span> <span>$selectWords</span><span>;
</span><span>48</span> <span>    }
</span><span>49</span> 
<span>50</span>     <span>public</span> <span>static</span> <span>function</span> createCharListHandle(<span>$className</span>, <span>$charlist</span>, <span>$selectorItem</span><span>)
</span><span>51</span> <span>    {
</span><span>52</span>         <span>if</span> (<span>class_exists</span>(<span>$className</span><span>)) {
</span><span>53</span>             <span>return</span> <span>new</span> <span>$className</span>(<span>$charlist</span>, <span>$selectorItem</span><span>);
</span><span>54</span> <span>        }
</span><span>55</span>         <span>throw</span> <span>new</span> <span>Exception</span>("class not exists", 0<span>);
</span><span>56</span> <span>    }
</span><span>57</span> }

 测试驱动代码编程请参照:

也是使用一样的原理,先把测试代码写好,后续补全MatchExector代码。

MatchExector主要功能计算匹配度。

1. 如果只要有一个词在黑名单里面,匹配度肯定为零。

2. 如果是核心词,那么根据以前提到的算法来计算,请参照:手把手教你做关键词匹配项目(搜索引擎)---- 第十九天

<span> 1</span> <?<span>php
</span><span> 2</span> <span>#</span><span>@Filename:mathes/MatchExector.php</span>
<span> 3</span> <span>#</span><span>@Author:oshine</span>
<span> 4</span> 
<span> 5</span> <span>class</span><span> MatchExector {
</span><span> 6</span> 
<span> 7</span>     <span>public</span> <span>static</span> <span>function</span> match(KeywordEntity <span>$keywordEntity</span>,CharList <span>$charlist</span><span>){
</span><span> 8</span> 
<span> 9</span>         <span>$matchingDegree</span> = 0<span>;
</span><span>10</span>         <span>$elementWords</span> = <span>$keywordEntity</span>-><span>getElementWords();
</span><span>11</span>         <span>foreach</span> (<span>$elementWords</span> <span>as</span> <span>$word</span><span>) {
</span><span>12</span>             <span>#</span><span> code...</span>
<span>13</span>             <span>if</span>(<span>in_array</span>(<span>$word</span>, <span>$charlist</span>-><span>getBlacklist()))
</span><span>14</span>                 <span>return</span> <span>false</span><span>;
</span><span>15</span>             <span>if</span>(<span>in_array</span>(<span>$word</span>, <span>$charlist</span>-><span>getCore()))
</span><span>16</span>                 <span>$matchingDegree</span>+=<span>$keywordEntity</span>->calculateWeight(<span>$word</span><span>);
</span><span>17</span> 
<span>18</span> <span>        }
</span><span>19</span> 
<span>20</span>         <span>if</span>(<span>$matchingDegree</span>>0.8<span>)
</span><span>21</span>             <span>return</span> <span>true</span><span>;
</span><span>22</span>         <span>return</span> <span>false</span><span>;
</span><span>23</span> 
<span>24</span> <span>    }
</span><span>25</span>     
<span>26</span> }

 

整个代码相对来说实现了该有的功能,小帅帅非常的高兴,因为项目完成了肯定少不了项目奖金,说不定还有一餐丰富的晚餐,

想想都开始流口水了。

 

小帅帅把代码交给于老大,满怀期待的等候于老大的最后肯定。

于老大看了之后会有哪些反应呢?请关注第三章:关键词匹配项目深入研究(一)

 

第二章已完结,源代码地址:手把手教你做关键词匹配项目(二章完结篇)

 

Stellungnahme:
Der Inhalt dieses Artikels wird freiwillig von Internetnutzern beigesteuert und das Urheberrecht liegt beim ursprünglichen Autor. Diese Website übernimmt keine entsprechende rechtliche Verantwortung. Wenn Sie Inhalte finden, bei denen der Verdacht eines Plagiats oder einer Rechtsverletzung besteht, wenden Sie sich bitte an admin@php.cn