I am new to database system design. After reading many articles, I am really confused what is the limit we should have 1 table without sharding or partitioning. I know it's really difficult to provide a general answer, things depend on factors like
But when someone asks this question
If the number of rows is less than a million, and the row size increases by thousands, the choice is simple. But things get trickier when the selection involves millions or billions of rows.
Note: I didn't mention the delay number in the question. please Answer based on the number of delays you are comfortable with. Also, we are talking about structured data.
I'm not sure, but I can add 3 specific questions:
NOTE: Throughout this question, it is assumed that we will choose SQL solution. Also, if the use case provided doesn't make logical sense, ignore it. The aim is to acquire numerical knowledge.
Can anyone help me understand what the benchmark is? Any real numbers from the project you're currently working on will show that for a large database with so many queries, this is the latency that is observed. Anything that can help me justify the number of select tables for a certain number of queries for a specific latency.
P粉4019012662024-01-17 09:55:18
Some answers for MySQL. Since all databases are subject to disk space, network latency, etc. other engines may be similar.
SELECT
that takes hours or even days to run. So you need to understand if the query is pathological like this. (I think this is an example of high "latency".) PARTITIONing
(especially in MySQL) has few uses. More details: PartitionINDEX
is very important for performance. Inserting
a million rows per day is not a problem. (Of course, some schema designs may cause this problem.) Rule of thumb: 100/sec may not be a problem; 1000/sec may be possible; after that it gets harder. More aboutHigh-speed ingestWhen you get into large databases, they come in a few different types; each has some different characteristics.
SPATIAL
or some technique here