


In-depth study of keyword matching projects (2) - The introduction of the idea of sub-tables, in-depth study of keywords
(2) The introduction of the idea of sub-tables
Recent articles: 1) Architecture application of high concurrent data collection (Redis application)
2) Highly available data collection platform (how to play with 3 languages php+.net+aauto)
Teach you step by step how to do a keyword matching projectThis part has basically been completed. In-depth research is to analyze the performance of the system and must be done under the stimulation of some environments. Some changes.
Teach you step by step how to do keyword matching project: Teach you step by step how to do keyword matching project (search engine)----Day 1~Teach you step by step how to do keyword matching project (search engine)--- - Day 22 (22 articles in total)
In-depth research: The previous section talked about in-depth research on keyword matching projects - the introduction of filters.
Each article will be divided into causes of the problem, solutions and some necessary implementation plans.
This article officially begins.
Antecedents of the problem
With the explosive growth of automatically collected data, the capacity of the vocabulary database is booming, from a few W data to millions of data, Xiao Shuai Shuai looks at the queries in the database and feels more and more Nothing can be done.
In addition, what Xiao Ding Ding often says to Xiao Shuai Shuai the most is: When can I choose words faster? Every time I wait for a long time and there is no response, I am really anxious to death.
Little handsome is also more anxious, and my heart is haggard. I really feel that this is the challenge. Xiao Shuaishuai had no choice but to continue to find the boss, asking him to reward him with a clever trick.
Boss Yu patted Xiao Shuaishuai on the shoulder: Young man, you know how difficult the project is!
Xiao Shuaishuai replied: Don’t ridicule me, I have felt it deeply, and I think my heart may not be able to bear it anymore.
Boss Yu: If you can’t bear this, I guess there will be more for you in the future.
Little handsome: big brother, let alone these virtual behaviors, hurry up the solution.
Boss Yu: Why are you in a hurry? Things cannot be rushed. Come here and I will give you a clear path.
“Does each baby have a category attribute? How many words in these millions of data really belong to this category? Assuming that we only use the vocabulary of this category, can our project continue to be stable?”
Solution
According to certain business needs, we can split the data table vertically or horizontally, which can effectively optimize performance.
Vertical segmentation is also called column segmentation. It splits uncommon columns or long fields to ensure that entities are in a relatively applicable state. Common ones include one-to-one relationships.
Horizontal splitting, also called row splitting, splits data records according to a certain business and stores them in different tables. Common table splitting operations include date splitting.
This case uses horizontal segmentation to split the data into categories.
Implementation Plan
In order not to change the structure of the data table, we designed it this way. We distinguish which data table the project uses according to the table name. The changes resulting from this are relatively few. We only need to change the code slightly to solve it. This is a very frustrating thing.
Modify the Keyword code and add data sources.
<?<span>php </span><span>define</span>('DATABASE_HOST','127.0.0.1'<span>); </span><span>define</span>('DATABASE_USER','xiaoshuaishuai'<span>); </span><span>define</span>('DATABASE__PASSWORD','xiaoshuaishuai'<span>); </span><span>define</span>('DATABASE_CHARSET','utf-8'<span>); </span><span>class</span><span> Keyword { </span><span>public</span> <span>$word</span><span>; </span><span>public</span> <span>static</span> <span>$conn</span> = <span>null</span><span>; </span><span>public</span> <span>function</span><span> getDbConn(){ </span><span>if</span>(self::<span>$conn</span> == <span>null</span><span>){ self</span>::<span>$conn</span> = <span>mysql_connect</span>(DATABASE_HOST,DATABASE_USER,<span>DATABASE__PASSWORD); </span><span>mysql_query</span>("SET NAMES '".DATABASE_CHARSET."'",self::<span>$conn</span><span>); </span><span>mysql_select_db</span>("dict",self::<span>$conn</span><span>); </span><span>return</span> self::<span>$conn</span><span>; } </span><span>return</span> self::<span>$conn</span><span>; } </span><span>public</span> <span>function</span><span> save(){ </span><span>$sql</span> = "insert into keywords(word) values ('<span>$this</span>->word')"<span>; </span><span>return</span> <span>mysql_query</span>(<span>$sql</span>,<span>$this</span>-><span>getDbConn()); } </span><span>public</span> <span>static</span> <span>function</span> getWordsSource(<span>$cid</span>,<span>$limit</span>=0,<span>$offset</span>=40<span>){ </span><span>$sql</span> = "SELECT * FROM keywords_<span>$cid</span> LIMIT <span>$limit</span>,<span>$ffset</span>"<span>; </span><span>return</span> DB::MakeArray(<span>$sql</span><span>); } </span><span>public</span> <span>static</span> <span>function</span> getWordsCount(<span>$cid</span><span>){ </span><span>$sql</span> = "SELECT count(*) FROM keywords_<span>$cid</span>"<span>; </span><span>return</span> DB::QueryScalar(<span>$sql</span><span>); } }</span>
New QueryScalar is added to the DB class, which is used to calculate the total amount
<?<span>php </span><span>#</span><span>@author oShine</span> <span>define</span>('DATABASE_HOST','127.0.0.1'<span>); </span><span>define</span>('DATABASE_USER','xiaoshuaishuai'<span>); </span><span>define</span>('DATABASE__PASSWORD','xiaoshuaishuai'<span>); </span><span>define</span>('DATABASE_CHARSET','utf-8'<span>); </span><span>class</span><span> DB { </span><span>public</span> <span>static</span> <span>$conn</span> = <span>null</span><span>; </span><span>public</span> <span>static</span> <span>function</span><span> Connect(){ </span><span>if</span>(self::<span>$conn</span> == <span>null</span><span>){ self</span>::<span>$conn</span> = <span>mysql_connect</span>(DATABASE_HOST,DATABASE_USER,<span>DATABASE__PASSWORD); </span><span>mysql_query</span>("SET NAMES '".DATABASE_CHARSET."'",self::<span>$conn</span><span>); </span><span>mysql_select_db</span>("dict",self::<span>$conn</span><span>); </span><span>return</span> self::<span>$conn</span><span>; } </span><span>return</span> self::<span>$conn</span><span>; } </span><span>public</span> <span>static</span> <span>function</span> Query(<span>$sql</span><span>){ </span><span>return</span> <span>mysql_query</span>(<span>$sql</span>,self::<span>Connect()); } </span><span>public</span> <span>static</span> <span>function</span> makeArray(<span>$sql</span><span>){ </span><span>$rs</span> = self::Query(<span>$sql</span><span>); </span><span>$result</span> = <span>array</span><span>(); </span><span>while</span>(<span>$data</span> = <span>mysql_fetch_assoc</span>(<span>$rs</span><span>)){ </span><span>$result</span>[] = <span>$data</span><span>; } </span><span>return</span> <span>$result</span><span>; } </span><span>public</span> <span>static</span> <span>function</span> QueryScalar(<span>$sql</span><span>){ </span><span>$rs</span> = self::Query(<span>$sql</span><span>); </span><span>$data</span> = <span>mysql_fetch_array</span>(<span>$rs</span><span>); </span><span>if</span>(<span>$data</span> == <span>false</span> || <span>empty</span>(<span>$data</span>) || !<span>isset</span>(<span>$data</span>[1])) <span>return</span> 0<span>; </span><span>return</span> <span>$data</span>[1<span>]; } } </span>
Modify the Selector code for word selection:
<?<span>php </span><span>#</span><span>@Filename:selector/Selector.php</span><span> #</span><span>@Author:oshine</span> <span>require_once</span> <span>dirname</span>(<span>__FILE__</span>) . '/SelectorItem.php'<span>; </span><span>require_once</span> <span>dirname</span>(<span>__FILE__</span>) . '/charlist/CharList.php'<span>; </span><span>require_once</span> <span>dirname</span>(<span>__FILE__</span>) . '/charlist/CharlistHandle.php'<span>; </span><span>require_once</span> <span>dirname</span>(<span>dirname</span>(<span>__FILE__</span>)) . '/lib/Logger.php'<span>; </span><span>class</span><span> Selector { </span><span>private</span> <span>static</span> <span>$charListHandle</span> = <span>array</span><span>( </span>"黑名单" => "BacklistCharListHandle", "近义词" => "LinklistCharListHandle"<span> ); </span><span>public</span> <span>static</span> <span>function</span> select(<span>$num_iid</span><span>) { </span><span>$selectorItem</span> = SelectorItem::createFromApi(<span>$num_iid</span><span>); Logger</span>::trace(<span>$selectorItem</span>-><span>props_name); </span><span>$charlist</span> = <span>new</span><span> CharList(); </span><span>foreach</span> (self::<span>$charListHandle</span> <span>as</span> <span>$matchKey</span> => <span>$className</span><span>) { </span><span>$handle</span> = self::createCharListHandle(<span>$className</span>, <span>$charlist</span>, <span>$selectorItem</span><span>); </span><span>$handle</span>-><span>exec</span><span>(); } </span><span>$selectWords</span> = <span>array</span><span>(); </span><span>$wordsCount</span> = Keyword::getWordsCount(selectorItem-><span>cid); </span><span>$offset</span> = 40<span>; </span><span>$page</span> = <span>ceil</span>(<span>$wordsCount</span>/<span>$offset</span><span>); </span><span>for</span>(<span>$i</span>=0;<span>$i</span><=<span>$page</span>;<span>$i</span>++<span>){ </span><span>$limit</span> = <span>$i</span>*<span>$offset</span><span>; </span><span>$keywords</span> = Keyword::getWordsSource(selectorItem->cid,<span>$limit</span>,<span>$offset</span><span>); </span><span>foreach</span> (<span>$keywords</span> <span>as</span> <span>$val</span><span>) { </span><span>#</span><span> code...</span> <span>$keywordEntity</span> = SplitterApp::<span>split</span>(<span>$val</span>["word"<span>]); </span><span>#</span><span> code...</span> <span>if</span>(MacthExector::macth(<span>$keywordEntity</span>,<span>$charlist</span><span>)){ </span><span>$selectWords</span>[] = <span>$val</span>["word"<span>]; } } } </span><span>return</span> <span>$selectWords</span><span>; } </span><span>public</span> <span>static</span> <span>function</span> createCharListHandle(<span>$className</span>, <span>$charlist</span>, <span>$selectorItem</span><span>) { </span><span>if</span> (<span>class_exists</span>(<span>$className</span><span>)) { </span><span>return</span> <span>new</span> <span>$className</span>(<span>$charlist</span>, <span>$selectorItem</span><span>); } </span><span>throw</span> <span>new</span> <span>Exception</span>("class not exists", 0<span>); } }</span>
Summary
Xiao Shuai Shuai has learned new knowledge points. Is this a way to reward the boss? Do you also want to reward me? Please give me a thumbs up!

本篇文章给大家带来了关于excel的相关知识,其中主要介绍了关于折叠表格的相关问题,就是分类汇总的功能,这样查看数据会非常的方便,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于excel的相关知识,其中主要介绍了关于AGGREGATE函数的相关内容,该函数用法与SUBTOTAL函数类似,但在功能上比SUBTOTAL函数更加强大,下面一起来看一下,希望对大家有帮助。

在之前的文章《实用Excel技巧分享:利用 数据透视表 来汇总业绩》中,我们学习了下Excel数据透视表,了解了利用数据透视表来汇总业绩的方法。而今天我们来聊聊怎么计算时间差(年数差、月数差、周数差),希望对大家有所帮助!

在之前的文章《实用Word技巧分享:聊聊你没用过的“行号”功能》中,我们了解了Word中你肯定没用过的"行号”功能。今天继续实用Word技巧分享,看看Excel表格怎么借用Word进行分栏打印,快来收藏使用吧!

本篇文章给大家带来了关于excel的相关知识,其中主要介绍了关于zenmm制作倒计时牌的相关内容,使用Excel中的日期函数结合按指定时间刷新的VBA代码,即可制作出倒计时牌,下面一起来看一下,希望对大家有帮助。

在之前的文章《实用Excel技巧分享:原来“定位功能”这么有用!》中,我们了解了定位功能的妙用。而今天我们聊聊合并后的单元格如何实现筛选功能,分享一种复制粘贴和方法解决这个问题,另外还会给大家分享一种合并单元格的不错的替代方式。

本篇文章给大家带来了关于excel的相关知识,其中主要介绍了关于如何使用函数寻找总和为某个值的组合的问题,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Excel的相关知识,其中主要介绍了关于XLOOKUP函数的相关知识,包括了常规查询、逆向查询、返回多列、自动除错以及近似查找等内容,下面一起来看一下,希望对大家有帮助。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

WebStorm Mac version
Useful JavaScript development tools

Notepad++7.3.1
Easy-to-use and free code editor

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.