search
HomeBackend DevelopmentPHP Tutorial百度的搜索拼音联想功能是大致上的原理是怎样的呢 谢谢!

在百度中  输入guangzhou下方就会提示广州、广州新闻。我在想百度是不是将一些热门关键字,然后用一个字段记住这些关键字的拼音;搜索的时候就直接查这个表。如果是拼音,就模糊匹配这个拼音标示列。完成匹配后将结果返回。这些只是我的想象,好像当中还有关键字权重机制。谷歌找不到相关资料;是不是有其它方式实现我没想到的呢。求助大侠 谢谢!


回复讨论(解决方案)

百度搜索时的下拉菜单 原理是一样的 ,再具体就是搜索技术了,不懂了

又到老徐showtime了...

我觉得你的想法应该是正确的.

一个小功能,但做起来是很复杂的
1.联想功能需要数据库,当然小型的写个文档也行了
2.每字联想还要ajax

不用百毒,但我上gg的时候,这个功能几乎每次都被firefox提示页面响应迟缓
可以理解,因为经过某巨型过滤器的原因,所以我用gg都是关闭这个功能的
写这几句没什么特别意思,只是提醒你虽然看上去很美,但还是离不开硬件支持的,慎用花哨的东西

原理上没有问题,实现起来有点麻烦
如果用 ajax 实现,那么速度是一个问题(本地测试时不会有问题)
所以百度为了提高速度,会让你安装“百度工具条”由控件完成

为什么装了百度工具条能够提高速度呢

前阵子和公司搜索部的人打了很多交到,了解了搜索引擎的工作大致原理。

搜索引擎内部有很多词表:

停词表,建义词表,同义词表、汉字-拼音的词表、suggest。

当你在搜索引擎上输入一个中文短句,搜索引擎首先会进行分词,然后将这些词,分别去上面提到的几个词表中查找有没有相关联的信息。如你所说的,就会去查找拼音-汉字的词表。遇到guangzhou = 广州,就会自动翻译过来。然后优先拿广州去进行搜索。
当你输入一个错误词后,可能会被搜索引擎的suggest纠正过来并提示你:您要找的是不是xxx?

其实上面只是搜索引擎处理搜索请求的其中一个分支,一次搜索会并行进行很多请求。
比如你在搜索引擎输入个短句。

搜索引擎首先会确定要搜索的内容:
1 整句
2 标准分词(可以理解为按中文语法分词)
3 自然分词(按单字、空格、标点进行分词)
...

然后分表拿每个分支,上面提到的那些辅助的词表,优化将要搜索的内容。
几个分支同时请求,拿到多个结果集。
接下来就是处理排序的问题了,一般来说,整句搜索拿到的结果相关度最高,所以权重也最高,理应排在第一位。但现实中的搜索引擎可能还要考虑到推广位,以及你要搜索的内容有更加官方的结果(比如你搜nginx,nginx的官方网站应该排在第一位)。或者是百度的百度推广,它可能会放在前面。

大致就是这样,实际上排序的逻辑是非常复杂的。它会根据好几个维度来确定排序结果,他们称这些叫“曲线”。当他调整每个维度的参数后,对排序结果都会产生影响。


在百度中  输入guangzhou下方就会提示广州、广州新闻。我在想百度是不是将一些热门关键字,然后用一个字段记住这些关键字的拼音;搜索的时候就直接查这个表。如果是拼音,就模糊匹配这个拼音标示列。完成匹配后将结果返回。这些只是我的想象,好像当中还有关键字权重机制。谷歌找不到相关资料;是不是有其它方式实现我没想到的呢。求助大侠 谢谢!



原理步骤
(1)获取拼音,转换成最可能的中文汉字。
(2)在这一串汉字或字符串中最可能的排越前。



至于为什么哪些是最可能的。这个是来自于数据分析结果,排序的最可能排最前面。百度每天的使用人次不止1亿次,通过数据分析当然使用越多越精准。




default7  有没些简单的搜索排序算法介绍一下呢。

如果弄清了这个问题, 对面试百度技术不会有什么问题. 可以透露一点, 由于大访问量和速度的原因, 不会直接访问关系数据库.

楼上是不是百度的大神啊  可不可以再多一点,我想做个简单的。 你透露的太一点点了阿 

前阵子和公司搜索部的人打了很多交到,了解了搜索引擎的工作大致原理。

搜索引擎内部有很多词表:

停词表,建义词表,同义词表、汉字-拼音的词表、suggest。

当你在搜索引擎上输入一个中文短句,搜索引擎首先会进行分词,然后将这些词,分别去上面提到的几个词表中查找有没有相关联的信息。如你所说的,就会去查找拼音-汉字的词表。遇到guangzhou = 广州,就……
有什么相关论文可以推荐一下 吗?

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
How can you prevent session fixation attacks?How can you prevent session fixation attacks?Apr 28, 2025 am 12:25 AM

Effective methods to prevent session fixed attacks include: 1. Regenerate the session ID after the user logs in; 2. Use a secure session ID generation algorithm; 3. Implement the session timeout mechanism; 4. Encrypt session data using HTTPS. These measures can ensure that the application is indestructible when facing session fixed attacks.

How do you implement sessionless authentication?How do you implement sessionless authentication?Apr 28, 2025 am 12:24 AM

Implementing session-free authentication can be achieved by using JSONWebTokens (JWT), a token-based authentication system where all necessary information is stored in the token without server-side session storage. 1) Use JWT to generate and verify tokens, 2) Ensure that HTTPS is used to prevent tokens from being intercepted, 3) Securely store tokens on the client side, 4) Verify tokens on the server side to prevent tampering, 5) Implement token revocation mechanisms, such as using short-term access tokens and long-term refresh tokens.

What are some common security risks associated with PHP sessions?What are some common security risks associated with PHP sessions?Apr 28, 2025 am 12:24 AM

The security risks of PHP sessions mainly include session hijacking, session fixation, session prediction and session poisoning. 1. Session hijacking can be prevented by using HTTPS and protecting cookies. 2. Session fixation can be avoided by regenerating the session ID before the user logs in. 3. Session prediction needs to ensure the randomness and unpredictability of session IDs. 4. Session poisoning can be prevented by verifying and filtering session data.

How do you destroy a PHP session?How do you destroy a PHP session?Apr 28, 2025 am 12:16 AM

To destroy a PHP session, you need to start the session first, then clear the data and destroy the session file. 1. Use session_start() to start the session. 2. Use session_unset() to clear the session data. 3. Finally, use session_destroy() to destroy the session file to ensure data security and resource release.

How can you change the default session save path in PHP?How can you change the default session save path in PHP?Apr 28, 2025 am 12:12 AM

How to change the default session saving path of PHP? It can be achieved through the following steps: use session_save_path('/var/www/sessions');session_start(); in PHP scripts to set the session saving path. Set session.save_path="/var/www/sessions" in the php.ini file to change the session saving path globally. Use Memcached or Redis to store session data, such as ini_set('session.save_handler','memcached'); ini_set(

How do you modify data stored in a PHP session?How do you modify data stored in a PHP session?Apr 27, 2025 am 12:23 AM

TomodifydatainaPHPsession,startthesessionwithsession_start(),thenuse$_SESSIONtoset,modify,orremovevariables.1)Startthesession.2)Setormodifysessionvariablesusing$_SESSION.3)Removevariableswithunset().4)Clearallvariableswithsession_unset().5)Destroythe

Give an example of storing an array in a PHP session.Give an example of storing an array in a PHP session.Apr 27, 2025 am 12:20 AM

Arrays can be stored in PHP sessions. 1. Start the session and use session_start(). 2. Create an array and store it in $_SESSION. 3. Retrieve the array through $_SESSION. 4. Optimize session data to improve performance.

How does garbage collection work for PHP sessions?How does garbage collection work for PHP sessions?Apr 27, 2025 am 12:19 AM

PHP session garbage collection is triggered through a probability mechanism to clean up expired session data. 1) Set the trigger probability and session life cycle in the configuration file; 2) You can use cron tasks to optimize high-load applications; 3) You need to balance the garbage collection frequency and performance to avoid data loss.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function