原文連結:https://i18n.site/blog/tech/search
經過幾週的開發,i18n.site(純靜態Markdown多語言翻譯網站架設工具)現已支援純前端全文搜尋。
本文將分享i18n.site純前端全文搜尋的技術實作。存取 i18n.site 體驗搜尋功能。
程式碼開源:搜尋核心/互動介面
對於文件/個人部落格等中小型純靜態網站,建立自建全文搜尋後端過於繁重,無服務全文搜尋是更常見的選擇。
無伺服器全文搜尋解決方案分為兩大類:
第一個涉及第三方搜尋服務供應商,例如 algolia.com,它們提供用於全文搜尋的前端元件。
此類服務需要根據搜尋量付費,且由於合規問題通常無法向中國大陸用戶提供。
它們不能離線或在內部網路上使用,並且有很大的限制。本文不再詳述。
第二類是純前端全文檢索。
目前常見的純前端全文檢索工具有lunrjs和ElasticLunr.js(基於lunrjs的二次開發)。
lunrjs 有兩種建立索引的方法,但都有各自的問題。
由於索引包含了文件中的所有單字,因此其尺寸較大。
每次新增或修改文件時,都必須載入新的索引檔。
這會增加用戶等待時間並消耗大量頻寬。
建立索引是一項計算密集型任務,每次訪問時重建索引可能會導致明顯的延遲,從而導致糟糕的用戶體驗。
除了lunrjs之外,還有其他全文搜尋解決方案,例如:
fusejs,透過計算字串之間的相似度進行搜尋。
此方案效能較差,不適合全文檢索(參考Fuse.js 查詢時間長超過10秒,如何最佳化?)。
TinySearch使用布隆過濾器進行搜索,無法進行前綴搜索(例如輸入goo搜索good或google),無法實現自動完成效果。
針對現有解決方案的缺陷,i18n.site開發了全新的純前端全文搜尋解決方案,具有以下特點:
以下將介紹i18n.site技術實現的細節。
分詞使用瀏覽器原生的Intl.Segmenter,所有主流瀏覽器都支援。
分詞的coffeescript程式碼如下:
SEG = new Intl.Segmenter 0, granularity: "word" seg = (txt) => r = [] for {segment} from SEG.segment(txt) for i from segment.split('.') i = i.trim() if i and !'|`'.includes(i) and !/\p{P}/u.test(i) r.push i r export default seg export segqy = (q) => seg q.toLocaleLowerCase()
地點:
此方法允許增量索引,減少計算負載。
In the front-end interface, a progress bar for index loading can be displayed to avoid lag during the initial load. See "Animated Progress Bar, Based on a Single progress + Pure CSS Implementation" English / Chinese.
The project is developed based on the asynchronous encapsulation of IndexedDB, idb.
IndexedDB reads and writes are asynchronous. When creating an index, documents are loaded concurrently to build the index.
To avoid data loss due to concurrent writes, you can refer to the following coffeescript code, which adds a ing cache between reading and writing to intercept competitive writes.
`coffee
pusher = =>
ing = new Map()
(table, id, val)=>
id_set = ing.get(id)
if id_set
id_set.add val
return
id_set = new Set([val]) ing.set id, id_set pre = await table.get(id) li = pre?.li or [] loop to_add = [...id_set] li.push(...to_add) await table.put({id,li}) for i from to_add id_set.delete i if not id_set.size ing.delete id break return
rindexPush = pusher()
prefixPush = pusher()
`
To display search results in real-time as the user types, for example, showing words like words and work that start with wor when wor is entered.
The search kernel uses the prefix table for the last word after segmentation to find all words with that prefix and search sequentially.
An anti-shake function, debounce (implemented as follows), is used in the front-end interaction to reduce the frequency of searches triggered by user input, thus minimizing computational load.
js
export default (wait, func) => {
var timeout;
return function(...args) {
clearTimeout(timeout);
timeout = setTimeout(func.bind(this, ...args), wait);
};
}
The search first segments the keywords entered by the user.
Assuming there are N words after segmentation, the results are first returned with all keywords, followed by results with N-1, N-2, ..., 1 keywords.
The search results displayed first ensure query precision, while subsequent loaded results (click the "Load More" button) ensure recall.
To improve response speed, the search uses the yield generator to implement on-demand loading, returning results after each limit query.
Note that after each yield, a new IndexedDB query transaction must be opened for the next search.
To display search results in real-time as the user types, for example, showing words like words and work that start with wor when wor is entered.
The search kernel uses the prefix table for the last word after segmentation to find all words with that prefix and search sequentially.
An anti-shake function, debounce (implemented as follows), is used in the front-end interaction to reduce the frequency of searches triggered by user input, thus minimizing computational load.
js
export default (wait, func) => {
var timeout;
return function(...args) {
clearTimeout(timeout);
timeout = setTimeout(func.bind(this, ...args), wait);
};
}
The index table does not store the original text, only words, reducing storage space.
Highlighting search results requires reloading the original text, and using service worker can avoid repeated network requests.
Also, because service worker caches all articles, once a search is performed, the entire website, including search functionality, becomes offline available.
The pure front-end search solution provided by i18n.site is optimized for MarkDown documents.
When displaying search results, the chapter name is shown, and clicking navigates to that chapter.
The pure front-end implementation of inverted full-text search, without the need for a server, is very suitable for small to medium-sized websites such as documents and personal blogs.
i18n.site's open-source self-developed pure front-end search is compact, responsive, and addresses the various shortcomings of current pure front-end full-text search solutions, providing a better user experience.
以上是純前端倒排全文搜索的詳細內容。更多資訊請關注PHP中文網其他相關文章!