搜尋
首頁資料庫mysql教程Schema Design for Social Inboxes in MongoDB

Designing a schema is a critical part of any application. Like most databases, there are many options for modeling data in MongoDB, and it is important to incorporate the functional requirements and performance goals for your application w

Designing a schema is a critical part of any application. Like most databases, there are many options for modeling data in MongoDB, and it is important to incorporate the functional requirements and performance goals for your application when determining the best design. In this post, we’ll explore three approaches for using MongoDB when creating social inboxes or message timelines.

If you’re building a social network, like Twitter for example, you need to design a schema that is efficient for users viewing their inbox, as well as users sending messages to all their followers. The whole point of social media, after all, is that you can connect in real time.

There are several design considerations for this kind of application:

  • The application needs to support a potentially large volume of reads and writes.
  • Reads and writes are not uniformly distributed across users. Some users post much more frequently than others, and some users have many, many more followers than others.
  • The application must provide a user experience that is instantaneous.
  • Edit 11/6: The application will have little to no user deletions of data (a follow up blog post will include information about user deletions and historical data)

Because we are designing an application that needs to support a large volume of reads and writes we will be using a sharded collection for the messages. All three designs include the concept of “fan out,” which refers to distributing the work across the shards in parallel:

  1. Fan out on Read
  2. Fan out on Write
  3. Fan out on Write with Buckets

Each approach presents trade-offs, and you should use the design that is best for your application’s requirements.

The first design you might consider is called Fan Out on Read. When a user sends a message, it is simply saved to the inbox collection. When any user views their own inbox, the application queries for all messages that include the user as a recipient. The messages are returned in descending date order so that users can see the most recent messages.

To implement this design, create a sharded collection called inbox, specifying the from field as the shard key, which represents the address sending the message. You can then add a compound index on the to field and the sent field. Once the document is saved into the inbox, the message is effectively sent to all the recipients. With this approach sending messages is very efficient.

Viewing an inbox, on the other hand, is less efficient. When a user views their inbox the application issues a find command based on the to field, sorted by sent. Because the inbox collection uses from as its shard key, messages are grouped by sender across the shards. In MongoDB queries that are not based on the shard key will be routed to all shards. Therefore, each inbox view will be routed to all shards in the system. As the system scales and many users go to view their inbox, all queries will be routed to all shards. This design does not scale as well as each query being routed to a single shard.

With the “Fan Out on Read” method, sending a message is very efficient, but viewing the inbox is less efficient.

Fan out on Read is very efficient for sending messages, but less efficient for reading messages. If the majority of your application consists of users sending messages, but very few go to read what anyone sends them — let’s call it an anti-social app — then this design might work well. However, for most social apps there are more requests by users to view their inbox than there are to send messages.

The Fan out on Write takes a different approach that is more optimized for viewing inboxes. This time, instead of sharding our inbox collection on the sender, we shard on the message recipient. In this way, when we go to view an inbox the queries can be routed to a single shard, which will scale very well. Our message document is the same, but now save a copy of the message for every recipient.

With the “Fan Out on Write” method, viewing the inbox is efficient, but sending messages consumes more resources.

In practice we might implement the saving of messages asynchronously. Imagine two celebrities quickly exchange messages at a high-profile event - the system could quickly be saturated with millions of writes. By saving a first copy of their message, then using a pool of background workers to write copies to all followers, we can ensure the two celebrities can exchange messages quickly, and that followers will soon have their own copies. Furthermore, we could maintain a last-viewed date on the user document to ensure they have accessed the system recently - zombie accounts probably shouldn’t get a copy of the message, and for users that haven’t accessed their account recently we could always resort to our first design - Fan out on Read - to repopulate their inbox. Subsequent requests would then be fast again.

At this point we have improved the design for viewing inboxes by routing each inbox view to a single shard. However, each message in the user’s inbox will produce a random read operation. If each inbox view produces 50 random reads, then it only takes a relatively modest number of concurrent users to potentially saturate the disks. Fortunately we can take advantage of the document data model to further optimize this design to be even more efficient.

Fan out on Write with Buckets refines the Fan Out on Write design by “bucketing” messages together into documents of 50 messages ordered by time. When a user views their inbox the request can be fulfilled by reading just a few documents of 50 messages each instead of performing many random reads. Because read time is dominated by seek time, reducing the number of seeks can provide a major performance improvement to the application. Another advantage to this approach is that there are fewer index entries.

To implement this design we create two collections, an inbox collection and a user collection. The inbox collection uses two fields for the shard key, owner and sequence, which holds the owner’s user id and sequence number (i.e. the id of 50-message “bucket” documents in their inbox). The user collection contains simple user documents for tracking the total number of messages in their inbox. Since we will probably need to show the total number of messages for a user in a variety of places in our application, this is a nice place to maintain the count instead of calculating for each request. Our message document is the same as in the prior examples.

To send a message we iterate through the list of recipients as we did in the Fan out on Write example, but we also take another step to increment the count of total messages in the inbox of the recipient, which is maintained on the user document. Once we know the count of messages, we know the “bucket” in which to add the latest message. As these messages reach the 50 item threshold, the sequence number increments and we begin to add messages to the next “bucket” document. The most recent messages will always be in the “bucket” document with the highest sequence number. Viewing the most recent 50 messages for a user’s inbox is at most two reads; viewing the most recent 100 messages is at most three reads.

Normally a user’s entire inbox will exist on a single shard. However, it is possible that a few user inboxes could be spread across two shards. Because our application will probably page through a user’s inbox, it is still likely that every query for these few users will be routed to a single shard.

Fan out on Write with Buckets is generally the most scalable approach of the these three designs for social inbox applications. Every design presents different trade-offs. In this case viewing a user’s inbox is very efficient, but writes are somewhat more complex, and more disk space is consumed. For many applications these are the right trade-offs to make.

Schema design is one of the most important optimizations you can make for your application. We have a number of additional resources available on schema design if you are interested in learning more:

Fan out on Read
Fan out on Write
Fan out on Write with Buckets
Send Message Performance
Best
Single write
Good
Shard per recipient
Multiple writes
Worst
Shard per recipient
Appends (grows)
Read Inbox Performance
Worst
Broadcast all shards
Random reads
Good
Single shard
Random reads
Best
Single shard
Single read
Data Size
Best
Message stored once
Worst
Copy per recipient
Worst
Copy per recipient


Schema design is one of the most important optimizations you can make for your application. We have a number of additional resources available on schema design if you are interested in learning more:

  • Check out the recording of our recent Schema Design webinar on this topic.
  • Schema Design in general in the MongoDB Manual
  • You can also view our schema design resources on the MongoDB docs page
  • If you have any schema design questions, please view the archived questions on our user forum or ask a question yourselfon the MongoDB User Forum.
陳述
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn
MySQL:世界上最受歡迎的數據庫的簡介MySQL:世界上最受歡迎的數據庫的簡介Apr 12, 2025 am 12:18 AM

MySQL是一種開源的關係型數據庫管理系統,主要用於快速、可靠地存儲和檢索數據。其工作原理包括客戶端請求、查詢解析、執行查詢和返回結果。使用示例包括創建表、插入和查詢數據,以及高級功能如JOIN操作。常見錯誤涉及SQL語法、數據類型和權限問題,優化建議包括使用索引、優化查詢和分錶分區。

MySQL的重要性:數據存儲和管理MySQL的重要性:數據存儲和管理Apr 12, 2025 am 12:18 AM

MySQL是一個開源的關係型數據庫管理系統,適用於數據存儲、管理、查詢和安全。 1.它支持多種操作系統,廣泛應用於Web應用等領域。 2.通過客戶端-服務器架構和不同存儲引擎,MySQL高效處理數據。 3.基本用法包括創建數據庫和表,插入、查詢和更新數據。 4.高級用法涉及復雜查詢和存儲過程。 5.常見錯誤可通過EXPLAIN語句調試。 6.性能優化包括合理使用索引和優化查詢語句。

為什麼要使用mysql?利益和優勢為什麼要使用mysql?利益和優勢Apr 12, 2025 am 12:17 AM

選擇MySQL的原因是其性能、可靠性、易用性和社區支持。 1.MySQL提供高效的數據存儲和檢索功能,支持多種數據類型和高級查詢操作。 2.採用客戶端-服務器架構和多種存儲引擎,支持事務和查詢優化。 3.易於使用,支持多種操作系統和編程語言。 4.擁有強大的社區支持,提供豐富的資源和解決方案。

描述InnoDB鎖定機制(共享鎖,獨家鎖,意向鎖,記錄鎖,間隙鎖,下一鍵鎖)。描述InnoDB鎖定機制(共享鎖,獨家鎖,意向鎖,記錄鎖,間隙鎖,下一鍵鎖)。Apr 12, 2025 am 12:16 AM

InnoDB的鎖機制包括共享鎖、排他鎖、意向鎖、記錄鎖、間隙鎖和下一個鍵鎖。 1.共享鎖允許事務讀取數據而不阻止其他事務讀取。 2.排他鎖阻止其他事務讀取和修改數據。 3.意向鎖優化鎖效率。 4.記錄鎖鎖定索引記錄。 5.間隙鎖鎖定索引記錄間隙。 6.下一個鍵鎖是記錄鎖和間隙鎖的組合,確保數據一致性。

MySQL查詢性能差的常見原因是什麼?MySQL查詢性能差的常見原因是什麼?Apr 12, 2025 am 12:11 AM

MySQL查询性能不佳的原因主要包括没有使用索引、查询优化器选择错误的执行计划、表设计不合理、数据量过大和锁竞争。1.没有索引导致查询缓慢,添加索引后可显著提升性能。2.使用EXPLAIN命令可以分析查询计划,找出优化器错误。3.重构表结构和优化JOIN条件可改善表设计问题。4.数据量大时,采用分区和分表策略。5.高并发环境下,优化事务和锁策略可减少锁竞争。

您什麼時候應該使用複合索引與多個單列索引?您什麼時候應該使用複合索引與多個單列索引?Apr 11, 2025 am 12:06 AM

在數據庫優化中,應根據查詢需求選擇索引策略:1.當查詢涉及多個列且條件順序固定時,使用複合索引;2.當查詢涉及多個列但條件順序不固定時,使用多個單列索引。複合索引適用於優化多列查詢,單列索引則適合單列查詢。

如何識別和優化MySQL中的慢速查詢? (慢查詢日誌,performance_schema)如何識別和優化MySQL中的慢速查詢? (慢查詢日誌,performance_schema)Apr 10, 2025 am 09:36 AM

要優化MySQL慢查詢,需使用slowquerylog和performance_schema:1.啟用slowquerylog並設置閾值,記錄慢查詢;2.利用performance_schema分析查詢執行細節,找出性能瓶頸並優化。

MySQL和SQL:開發人員的基本技能MySQL和SQL:開發人員的基本技能Apr 10, 2025 am 09:30 AM

MySQL和SQL是開發者必備技能。 1.MySQL是開源的關係型數據庫管理系統,SQL是用於管理和操作數據庫的標準語言。 2.MySQL通過高效的數據存儲和檢索功能支持多種存儲引擎,SQL通過簡單語句完成複雜數據操作。 3.使用示例包括基本查詢和高級查詢,如按條件過濾和排序。 4.常見錯誤包括語法錯誤和性能問題,可通過檢查SQL語句和使用EXPLAIN命令優化。 5.性能優化技巧包括使用索引、避免全表掃描、優化JOIN操作和提升代碼可讀性。

See all articles

熱AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover

AI Clothes Remover

用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool

Undress AI Tool

免費脫衣圖片

Clothoff.io

Clothoff.io

AI脫衣器

AI Hentai Generator

AI Hentai Generator

免費產生 AI 無盡。

熱門文章

R.E.P.O.能量晶體解釋及其做什麼(黃色晶體)
3 週前By尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.最佳圖形設置
3 週前By尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.如果您聽不到任何人,如何修復音頻
3 週前By尊渡假赌尊渡假赌尊渡假赌
WWE 2K25:如何解鎖Myrise中的所有內容
3 週前By尊渡假赌尊渡假赌尊渡假赌

熱工具

mPDF

mPDF

mPDF是一個PHP庫,可以從UTF-8編碼的HTML產生PDF檔案。原作者Ian Back編寫mPDF以從他的網站上「即時」輸出PDF文件,並處理不同的語言。與原始腳本如HTML2FPDF相比,它的速度較慢,並且在使用Unicode字體時產生的檔案較大,但支援CSS樣式等,並進行了大量增強。支援幾乎所有語言,包括RTL(阿拉伯語和希伯來語)和CJK(中日韓)。支援嵌套的區塊級元素(如P、DIV),

DVWA

DVWA

Damn Vulnerable Web App (DVWA) 是一個PHP/MySQL的Web應用程序,非常容易受到攻擊。它的主要目標是成為安全專業人員在合法環境中測試自己的技能和工具的輔助工具,幫助Web開發人員更好地理解保護網路應用程式的過程,並幫助教師/學生在課堂環境中教授/學習Web應用程式安全性。 DVWA的目標是透過簡單直接的介面練習一些最常見的Web漏洞,難度各不相同。請注意,該軟體中

SecLists

SecLists

SecLists是最終安全測試人員的伙伴。它是一個包含各種類型清單的集合,這些清單在安全評估過程中經常使用,而且都在一個地方。 SecLists透過方便地提供安全測試人員可能需要的所有列表,幫助提高安全測試的效率和生產力。清單類型包括使用者名稱、密碼、URL、模糊測試有效載荷、敏感資料模式、Web shell等等。測試人員只需將此儲存庫拉到新的測試機上,他就可以存取所需的每種類型的清單。

記事本++7.3.1

記事本++7.3.1

好用且免費的程式碼編輯器

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

這個專案正在遷移到osdn.net/projects/mingw的過程中,你可以繼續在那裡關注我們。 MinGW:GNU編譯器集合(GCC)的本機Windows移植版本,可自由分發的導入函式庫和用於建置本機Windows應用程式的頭檔;包括對MSVC執行時間的擴展,以支援C99功能。 MinGW的所有軟體都可以在64位元Windows平台上運作。