在百度中 输入guangzhou下方就会提示广州、广州新闻。我在想百度是不是将一些热门关键字,然后用一个字段记住这些关键字的拼音;搜索的时候就直接查这个表。如果是拼音,就模糊匹配这个拼音标示列。完成匹配后将结果返回。这些只是我的想象,好像当中还有关键字权重机制。谷歌找不到相关资料;是不是有其它方式实现我没想到的呢。求助大侠 谢谢!
回复讨论(解决方案)
百度搜索时的下拉菜单 原理是一样的 ,再具体就是搜索技术了,不懂了
又到老徐showtime了...
我觉得你的想法应该是正确的.
一个小功能,但做起来是很复杂的
1.联想功能需要数据库,当然小型的写个文档也行了
2.每字联想还要ajax
不用百毒,但我上gg的时候,这个功能几乎每次都被firefox提示页面响应迟缓
可以理解,因为经过某巨型过滤器的原因,所以我用gg都是关闭这个功能的
写这几句没什么特别意思,只是提醒你虽然看上去很美,但还是离不开硬件支持的,慎用花哨的东西
原理上没有问题,实现起来有点麻烦
如果用 ajax 实现,那么速度是一个问题(本地测试时不会有问题)
所以百度为了提高速度,会让你安装“百度工具条”由控件完成
为什么装了百度工具条能够提高速度呢
前阵子和公司搜索部的人打了很多交到,了解了搜索引擎的工作大致原理。
搜索引擎内部有很多词表:
停词表,建义词表,同义词表、汉字-拼音的词表、suggest。
当你在搜索引擎上输入一个中文短句,搜索引擎首先会进行分词,然后将这些词,分别去上面提到的几个词表中查找有没有相关联的信息。如你所说的,就会去查找拼音-汉字的词表。遇到guangzhou = 广州,就会自动翻译过来。然后优先拿广州去进行搜索。
当你输入一个错误词后,可能会被搜索引擎的suggest纠正过来并提示你:您要找的是不是xxx?
其实上面只是搜索引擎处理搜索请求的其中一个分支,一次搜索会并行进行很多请求。
比如你在搜索引擎输入个短句。
搜索引擎首先会确定要搜索的内容:
1 整句
2 标准分词(可以理解为按中文语法分词)
3 自然分词(按单字、空格、标点进行分词)
...
然后分表拿每个分支,上面提到的那些辅助的词表,优化将要搜索的内容。
几个分支同时请求,拿到多个结果集。
接下来就是处理排序的问题了,一般来说,整句搜索拿到的结果相关度最高,所以权重也最高,理应排在第一位。但现实中的搜索引擎可能还要考虑到推广位,以及你要搜索的内容有更加官方的结果(比如你搜nginx,nginx的官方网站应该排在第一位)。或者是百度的百度推广,它可能会放在前面。
大致就是这样,实际上排序的逻辑是非常复杂的。它会根据好几个维度来确定排序结果,他们称这些叫“曲线”。当他调整每个维度的参数后,对排序结果都会产生影响。
在百度中 输入guangzhou下方就会提示广州、广州新闻。我在想百度是不是将一些热门关键字,然后用一个字段记住这些关键字的拼音;搜索的时候就直接查这个表。如果是拼音,就模糊匹配这个拼音标示列。完成匹配后将结果返回。这些只是我的想象,好像当中还有关键字权重机制。谷歌找不到相关资料;是不是有其它方式实现我没想到的呢。求助大侠 谢谢!
原理步骤
(1)获取拼音,转换成最可能的中文汉字。
(2)在这一串汉字或字符串中最可能的排越前。
至于为什么哪些是最可能的。这个是来自于数据分析结果,排序的最可能排最前面。百度每天的使用人次不止1亿次,通过数据分析当然使用越多越精准。
default7 有没些简单的搜索排序算法介绍一下呢。
如果弄清了这个问题, 对面试百度技术不会有什么问题. 可以透露一点, 由于大访问量和速度的原因, 不会直接访问关系数据库.
楼上是不是百度的大神啊 可不可以再多一点,我想做个简单的。 你透露的太一点点了阿
前阵子和公司搜索部的人打了很多交到,了解了搜索引擎的工作大致原理。
搜索引擎内部有很多词表:
停词表,建义词表,同义词表、汉字-拼音的词表、suggest。
当你在搜索引擎上输入一个中文短句,搜索引擎首先会进行分词,然后将这些词,分别去上面提到的几个词表中查找有没有相关联的信息。如你所说的,就会去查找拼音-汉字的词表。遇到guangzhou = 广州,就……
有什么相关论文可以推荐一下 吗?

Long URLs, often cluttered with keywords and tracking parameters, can deter visitors. A URL shortening script offers a solution, creating concise links ideal for social media and other platforms. These scripts are valuable for individual websites a

Following its high-profile acquisition by Facebook in 2012, Instagram adopted two sets of APIs for third-party use. These are the Instagram Graph API and the Instagram Basic Display API.As a developer building an app that requires information from a

Laravel simplifies handling temporary session data using its intuitive flash methods. This is perfect for displaying brief messages, alerts, or notifications within your application. Data persists only for the subsequent request by default: $request-

This is the second and final part of the series on building a React application with a Laravel back-end. In the first part of the series, we created a RESTful API using Laravel for a basic product-listing application. In this tutorial, we will be dev

Laravel provides concise HTTP response simulation syntax, simplifying HTTP interaction testing. This approach significantly reduces code redundancy while making your test simulation more intuitive. The basic implementation provides a variety of response type shortcuts: use Illuminate\Support\Facades\Http; Http::fake([ 'google.com' => 'Hello World', 'github.com' => ['foo' => 'bar'], 'forge.laravel.com' =>

The PHP Client URL (cURL) extension is a powerful tool for developers, enabling seamless interaction with remote servers and REST APIs. By leveraging libcurl, a well-respected multi-protocol file transfer library, PHP cURL facilitates efficient execution of various network protocols, including HTTP, HTTPS, and FTP. This extension offers granular control over HTTP requests, supports multiple concurrent operations, and provides built-in security features.

Do you want to provide real-time, instant solutions to your customers' most pressing problems? Live chat lets you have real-time conversations with customers and resolve their problems instantly. It allows you to provide faster service to your custom

The 2025 PHP Landscape Survey investigates current PHP development trends. It explores framework usage, deployment methods, and challenges, aiming to provide insights for developers and businesses. The survey anticipates growth in modern PHP versio


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver CS6
Visual web development tools

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),
