P粉9304480302023-08-23 14:08:54
@StoneHeart
I will always use EAV and MVC.
@Bill Karvin
All the things you mentioned here:
In my opinion, none of these should be present in a database, as no database can handle these interactions and requirements at the appropriate level that the application's programming language can.
In my opinion, using a database in this way is like hitting a nail with a rock. You can do it with a stone, but shouldn't you use a more precise hammer specifically designed for this activity?
This problem can be solved by performing a small number of queries on part of the data and processing it into a table layout. Even if you have 600GB of product data, if you need to get data for each row from this table, you can process it in batches.
Furthermore, if you want to improve query performance, you can select certain operations, such as reporting or global text search, and prepare index tables to store the required data and regenerate them periodically, such as every 30 minutes.
You don't even need to worry about the cost of additional data storage, as it's getting cheaper every day.
If you are still worried about the performance of the operations performed by the application, you can always use Erlang, C, Go language to preprocess the data and then further process the optimized data in the main application.
P粉5049209922023-08-23 09:38:42
You have at least the following five options for modeling the type hierarchy you describe:
Single table inheritance: Use one table for all product types, with enough columns to store all attributes of all types. This means there are many columns on each row, most of which are NULL on any given row.
Class table inheritance: Use a table for products to store common attributes of all product types. Then, use a table for each product type to store the attributes specific to that product type.
Specific table inheritance: There is no table for common product attributes. Instead, use one table for each product type to store common product attributes and product-specific attributes.
Serialized LOB: Use one table for products to store common attributes for all product types. An additional column stores a BLOB of semi-structured data, which can be XML, YAML, JSON, or other formats. This BLOB allows you to store attributes specific to each product type. You can use complex design patterns to describe this process, such as Facade and Memento. But anyway, you have a property BLOB that can't be easily queried in SQL, you have to get the entire BLOB back to the application and sort it there.
Entity-Attribute-Value: Use a table for the products, and a table that rotates the attributes into rows instead of columns. EAV is not an efficient design in relational paradigm, but many people still use it. This is the "property pattern" mentioned in another answer. Check out other questions with the eav tag on StackOverflow to learn about some of the gotchas.
I wrote more about this in a demo called Extensible Data Modeling.
Other thoughts on EAV: While many people seem to like EAV, I don't. It seems to be the most flexible solution and therefore the best. However, please remember this adage TANSTAAFL. Here are some of the disadvantages of EAV:
NOT NULL
). JOIN
is required for each attribute. The flexibility provided by EAV requires sacrifices in other areas, potentially making your code as complex (or worse) than if you solved the original problem in a more traditional way.
And, in most cases, having that level of flexibility isn't necessary. In your question about product types, it would be simpler to create a table for each product type to store the product-specific attributes, so that at least some consistent structure can be enforced for entries of the same product type.
I would use EAV only if each row is allowed to have a different set of properties. EAV is overkill when you have a limited set of product types. Class table inheritance would be my first choice.
2019 Update: The more I see people using JSON as a solution to the "many custom properties" problem, the less I like this solution. Even with special JSON functions to support them, the query becomes too complex. Storing JSON documents requires more storage space than storing them in regular rows and columns.
Basically, in a relational database, none of these solutions are easy or efficient. The whole concept of having "mutable properties" is fundamentally inconsistent with relational theory.
At the end of the day, you have to choose one of these solutions based on how you query the data, which is based on the least worst solution for your application. Therefore, before choosing a database design, you need to know how to query the data. No one solution is "best," as any one solution may be the best choice for an application.