首页  >  文章  >  数据库  >  PHP 思维中的 MongoDB,第 3 部分

PHP 思维中的 MongoDB,第 3 部分

WBOY
WBOY原创
2016-06-07 16:28:13913浏览

这是该系列的第 3 部分,重点关注使用文档数据库的数据建模方面。前面的部分也可供阅读:第 1 部分:入门和第 2 部分:查询和索引。常见嫌疑犯艾尔

这是系列中的第 3 部分,重点关注使用文档数据库的数据建模方面。前面的部分也可供阅读:第 1 部分:入门和第 2 部分:查询和索引。

常见嫌疑人

尽管有大量关于数据建模以利用文档数据库的现有文章、演示文稿和网络广播,但作为本系列的一部分,这篇文章采取了稍微以 PHP 为中心的立场。不过,无论他们选择哪种编程语言,此信息对任何人都有用。

我们将使用两种不同的场景来研究文档世界中的数据建模,选择作为常见示例来说明关系数据库和文档数据库之间的实现差异:

  • 博客。您的花园各种数据,涵盖帖子、评论和标签
  • 私人销售/电子商务。查看订单、用户和产品的需求

场景 1:获取所有博客

我从经过尝试但真实的博客示例开始,因为它是一个常见的参考框架。评估 MongoDB 的最大挑战之一是快速理解文档数据建模并摆脱关系数据建模限制的能力。一个常见的场景有助于说明这一点。

您的典型博客将需要考虑以下元素:

作者在大多数情况下,这可以指向现有的用户数据帖子您存储每个博客帖子的实际数据的位置评论所有博客帖子都需要评论类别您如何选择按类别分组来组织帖子标签标签帮助人们查找帖子,并帮助您将相关帖子链接在一起

典型的第三范式关系模型将生成大约六个表。对于多对多关系(例如帖子到标签),您需要引用表以及带有键的链接表。例如,对于标签,您可能有 posts 表、tags 表和 posts_tags 表,该表简单地将每个引用的 post_id 与 tag_id 链接起来。这不仅使您的数据模型变得复杂,而且显示了关系理论和文档结构之间的脱节。

以下是如何使用关系模型将您的帖子挂钩到标签的示例:

PHP 思维中的 MongoDB,第 3 部分

这种方法可能会使您的代码变得复杂,因为您必须编写更复杂的查询来将多个表连接在一起。如果您的应用程序还需要创建新的帖子和标签,那么该逻辑也会更加复杂。如果您使用框架,您可能会发现自己花费更多时间来弄清楚如何处理数据库插入和更新,而不是实际编写代码。

例如,每次创建新博客文章时,您都必须执行以下操作:

  • 首先检查该类别是否存在,如果不存在则创建它
  • 检查标签,也创建它们
  • 为本文使用的每个标签创建帖子和标签之间的链接
  • 作者链接

评论当然是在帖子发布后发生的,所以插入它们会省点力气:

  • 检查此评论是否回复其他评论
  • 插入评论

使用 MongoDB,您可以通过几种方法来解决这个问题。

等值连接和嵌入列表

以我们的帖子和标签为例,您可以通过在帖子文档中存储 tag_ids 列表来删除中间部分。这就是文档模型的样子:

	> db.posts.findOne();
	{
		"_id" : ObjectId("508d27069cc1ae293b36928d"),
		"title" : "This is the title",
		"body" : "This is the body text.",
		"tags" : [
			ObjectId("508d35349cc1ae293b369299"),
			ObjectId("508d35349cc1ae293b36929a"),
			ObjectId("508d35349cc1ae293b36929b"),
			ObjectId("508d35349cc1ae293b36929c")
		],
		"created_date" : ISODate("2012-10-28T12:41:39.110Z"),
		"author_id" : ObjectId("508d280e9cc1ae293b36928e"),
		"category_id" : ObjectId("508d29709cc1ae293b369295"),
		"comments" : [
			ObjectId("508d359a9cc1ae293b3692a0"),
			ObjectId("508d359a9cc1ae293b3692a1"),
			ObjectId("508d359a9cc1ae293b3692a2")
		]
	}

此方法假定您将标签和评论存储在它们自己的集合中。当然,您的用户也位于单独的集合中。对于 MongoDB,等连接的工作方式与关系模型中的工作方式类似,但是您必须执行单独的查询来获取该数据。

您可能会问自己那么为什么运行单独的查询比带有一些连接的一条 SQL 语句更好? ​​考虑一下关系数据库的查询缓存。该一个查询将命中多个表,其中任何一个只需要更新一次,该查询就会从缓存中删除。

MongoDB caches queries too, however with separate queries for each collection, an update to one of these collections does not invalidate the cache for the others; and for example, a user updating their password will not touch the cached post, comments, categories, or tags. That same event on a relational database will drop the query from the cache, even though the data being updated had nothing to do in particular with that query.

One last comment about the monster SQL statement approach: Many platforms and frameworks are breaking out the logic that pulls the content for a page, and separating that from the blocks that typically populate the sidebars and footer. For example, the comments are always rendered from a separate module. If you are using a complex, heavy platform that means you have to run separate queries for the post and comments anyway, as the comments module won’t have access to the post content object.

The simplest example is running a single query to fetch the content of your blog post and render it in the main body of your app, and then run a separate query for grabbing all the comments and displaying them in a separate comments module at the bottom of your content area. Although you still can enforce relational integrity, at this point you are getting a minimized benefit from a relational engine as you are displaying related data from separate queries. Some modern platforms will do this with everything, including separating queries for authors, categories, and tags — so you’re running separate queries in the end regardless of database platform.

Embedded Lists, No Join

You could also just embed all of your tags and comments in each post, dropping your count to just the posts collection, and no longer needing a separate collection for comments or tags.

	> db.posts.findOne();
	{
		"_id" : ObjectId("508d27069cc1ae293b36928d"),
		"title" : "This is the title",
		"body" : "This is the body text.",
		"tags" : [
			"chocolate",
			"spleen",
			"piano",
			"spatula"
		],
		"created_date" : ISODate("2012-10-28T12:41:39.110Z"),
		"author_id" : ObjectId("508d280e9cc1ae293b36928e"),
		"category_id" : ObjectId("508d29709cc1ae293b369295"),
		"comments" : [
			{
				"subject" : "This is coment 1",
				"body" : "This is the body of comment 1.",
				"author_id" : ObjectId("508d345f9cc1ae293b369296"),
				"created_date" : ISODate("2012-10-28T13:34:23.929Z")
			},
			{
				"subject" : "This is coment 2",
				"body" : "This is the body of comment 2.",
				"author_id" : ObjectId("508d34739cc1ae293b369297"),
				"created_date" : ISODate("2012-10-28T13:34:43.192Z")
			},
			{
				"subject" : "This is coment 3",
				"body" : "This is the body of comment 3.",
				"author_id" : ObjectId("508d34839cc1ae293b369298"),
				"created_date" : ISODate("2012-10-28T13:34:59.336Z")
			}
		]
	}

This greatly speeds the assembly of data for rendering that page, as one query returns the post, tags and comments.

So which approach is better?

With document design you need to consider two things: scale and search. ScaleMongoDB Documents have a limit of 16MB, which although sounds quite small, can accommodate thousands of documents. However if you are expecting 20,000 comments per post on a high traffic website, or your comments are unlimited in size, then embedding might not work well for you.SearchDepending on how you want to find your documents, you should consider their structure. MongoDB makes it dead simple to embed lists and even other documents, but if you find yourself constantly reaching deeper and deeper to find the data you need, then performance can become a problem as your data set grows.

You must weigh both of these to decide what approach makes the most sense for your application’s needs.

Outro, or What’s Coming Next

The next article in this series covers Example Scenario 2: Private Sales. That article examines the issues of managing inventory and making atomic updates.

声明:
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn