手把手教你做关键词匹配项目(搜索引擎)---- 第二十二天,教你做第二十二天
手把手教你做关键词匹配项目(搜索引擎)---- 第二十二天,教你做第二十二天
最新面试经历:面试的感触(二)、面试的感触
最新的架构:高并发数据采集的架构应用(Redis的应用)
吐槽:今天也是刚把心态调整好,继续写以前没有完成的文章,最近几个月自己也是休整了一段时间,回家做苦力,也当作是锻炼锻炼自己的身体,毕竟任何东西都换不回你的健康,我也是建议做IT行业的帅哥们多活动活动你们其它的部位。
第二十二天
起点:手把手教你做关键词匹配项目(搜索引擎)---- 第一天
回顾:手把手教你做关键词匹配项目(搜索引擎)---- 第二十一天
小帅帅是乐于做总结的人,根据以前所学的知识他总结了如下:
1. 宝贝属性的扩展和类型的问题初步已经得到很好的控制了,不过要推广和运营维护还是遇到了很大的障碍。
2. 对关键词的拆分使用了scws扩展以及自己原生的业务拆词方案,拆词有效的解决了词组方面的匹配难度。
3. 所有的初始工作好像已经完成了,只需要最后的整理项目应该可以正式运行起来了。
小帅帅的主动意识比较强烈,他没有去问于老大,就自己动手写了份代码,该代码主要是为了把所有的步骤连接起来。
宝贝属性的扩展CharList的构建请参照:手把手教你做关键词匹配项目(搜索引擎)---- 第十二天 ~ 手把手教你做关键词匹配项目(搜索引擎)---- 第十八天
Selector主要步骤如下:
1. 获取宝贝属性。
2. 使用业务知识扩充宝贝属性,形成CharList
3. 从词库中获取关键词
4. 关键词拆分算法
5. 匹配度算法
6. 返回匹配上的关键词列表
代码如下:
<span> 1</span> <?<span>php </span><span> 2</span> <span>#</span><span>@Filename:selector/Selector.php</span> <span> 3</span> <span>#</span><span>@Author:oshine</span> <span> 4</span> <span> 5</span> <span>require_once</span> <span>dirname</span>(<span>__FILE__</span>) . '/SelectorItem.php'<span>; </span><span> 6</span> <span>require_once</span> <span>dirname</span>(<span>__FILE__</span>) . '/charlist/CharList.php'<span>; </span><span> 7</span> <span>require_once</span> <span>dirname</span>(<span>__FILE__</span>) . '/charlist/CharlistHandle.php'<span>; </span><span> 8</span> <span>require_once</span> <span>dirname</span>(<span>dirname</span>(<span>__FILE__</span>)) . '/lib/Logger.php'<span>; </span><span> 9</span> <span>10</span> <span>class</span><span> Selector </span><span>11</span> <span>{ </span><span>12</span> <span>13</span> <span>private</span> <span>static</span> <span>$charListHandle</span> = <span>array</span><span>( </span><span>14</span> "黑名单" => "BacklistCharListHandle", <span>15</span> "近义词" => "LinklistCharListHandle" <span>16</span> <span> ); </span><span>17</span> <span>18</span> <span>public</span> <span>static</span> <span>function</span> select(<span>$num_iid</span><span>) </span><span>19</span> <span> { </span><span>20</span> <span>$selectorItem</span> = SelectorItem::createFromApi(<span>$num_iid</span><span>); </span><span>21</span> <span>22</span> Logger::trace(<span>$selectorItem</span>-><span>props_name); </span><span>23</span> <span>24</span> <span>$charlist</span> = <span>new</span><span> CharList(); </span><span>25</span> <span>26</span> <span>foreach</span> (self::<span>$charListHandle</span> <span>as</span> <span>$matchKey</span> => <span>$className</span><span>) { </span><span>27</span> <span>28</span> <span>$handle</span> = self::createCharListHandle(<span>$className</span>, <span>$charlist</span>, <span>$selectorItem</span><span>); </span><span>29</span> <span>$handle</span>-><span>exec</span><span>(); </span><span>30</span> <span>31</span> <span> } </span><span>32</span> <span>33</span> <span>$selectWords</span> = <span>array</span><span>(); </span><span>34</span> <span>35</span> <span>$keywords</span> = DB::makeArray("select word from keywords"<span>); </span><span>36</span> <span>foreach</span> (<span>$keywords</span> <span>as</span> <span>$val</span><span>) { </span><span>37</span> <span>#</span><span> code...</span> <span>38</span> <span>$keywordEntity</span> = SplitterApp::<span>split</span>(<span>$val</span>["word"<span>]); </span><span>39</span> <span>40</span> <span>#</span><span> code...</span> <span>41</span> <span>if</span>(MacthExector::macth(<span>$keywordEntity</span>,<span>$charlist</span><span>)){ </span><span>42</span> <span>$selectWords</span>[] = <span>$val</span>["word"<span>]; </span><span>43</span> <span> } </span><span>44</span> <span>45</span> <span> } </span><span>46</span> <span>47</span> <span>return</span> <span>$selectWords</span><span>; </span><span>48</span> <span> } </span><span>49</span> <span>50</span> <span>public</span> <span>static</span> <span>function</span> createCharListHandle(<span>$className</span>, <span>$charlist</span>, <span>$selectorItem</span><span>) </span><span>51</span> <span> { </span><span>52</span> <span>if</span> (<span>class_exists</span>(<span>$className</span><span>)) { </span><span>53</span> <span>return</span> <span>new</span> <span>$className</span>(<span>$charlist</span>, <span>$selectorItem</span><span>); </span><span>54</span> <span> } </span><span>55</span> <span>throw</span> <span>new</span> <span>Exception</span>("class not exists", 0<span>); </span><span>56</span> <span> } </span><span>57</span> }
测试驱动代码编程请参照:
也是使用一样的原理,先把测试代码写好,后续补全MatchExector代码。
MatchExector主要功能计算匹配度。
1. 如果只要有一个词在黑名单里面,匹配度肯定为零。
2. 如果是核心词,那么根据以前提到的算法来计算,请参照:手把手教你做关键词匹配项目(搜索引擎)---- 第十九天
<span> 1</span> <?<span>php </span><span> 2</span> <span>#</span><span>@Filename:mathes/MatchExector.php</span> <span> 3</span> <span>#</span><span>@Author:oshine</span> <span> 4</span> <span> 5</span> <span>class</span><span> MatchExector { </span><span> 6</span> <span> 7</span> <span>public</span> <span>static</span> <span>function</span> match(KeywordEntity <span>$keywordEntity</span>,CharList <span>$charlist</span><span>){ </span><span> 8</span> <span> 9</span> <span>$matchingDegree</span> = 0<span>; </span><span>10</span> <span>$elementWords</span> = <span>$keywordEntity</span>-><span>getElementWords(); </span><span>11</span> <span>foreach</span> (<span>$elementWords</span> <span>as</span> <span>$word</span><span>) { </span><span>12</span> <span>#</span><span> code...</span> <span>13</span> <span>if</span>(<span>in_array</span>(<span>$word</span>, <span>$charlist</span>-><span>getBlacklist())) </span><span>14</span> <span>return</span> <span>false</span><span>; </span><span>15</span> <span>if</span>(<span>in_array</span>(<span>$word</span>, <span>$charlist</span>-><span>getCore())) </span><span>16</span> <span>$matchingDegree</span>+=<span>$keywordEntity</span>->calculateWeight(<span>$word</span><span>); </span><span>17</span> <span>18</span> <span> } </span><span>19</span> <span>20</span> <span>if</span>(<span>$matchingDegree</span>>0.8<span>) </span><span>21</span> <span>return</span> <span>true</span><span>; </span><span>22</span> <span>return</span> <span>false</span><span>; </span><span>23</span> <span>24</span> <span> } </span><span>25</span> <span>26</span> }
整个代码相对来说实现了该有的功能,小帅帅非常的高兴,因为项目完成了肯定少不了项目奖金,说不定还有一餐丰富的晚餐,
想想都开始流口水了。
小帅帅把代码交给于老大,满怀期待的等候于老大的最后肯定。
于老大看了之后会有哪些反应呢?请关注第三章:关键词匹配项目深入研究(一)
第二章已完结,源代码地址:手把手教你做关键词匹配项目(二章完结篇)

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

SublimeText3 Linux new version
SublimeText3 Linux latest version

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Zend Studio 13.0.1
Powerful PHP integrated development environment

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.