I am not a professional DBA, but as a developer with a B/S architecture, I am always inseparable from the database. Generally, developers only use the four classic statements of SQL: select, insert, delete, and update. But I have never studied how they work. In this article, I want to talk about how select works in the database.
The most classic topic in B/S architecture is nothing more than the three-tier architecture, which can be roughly divided into data layer, business logic layer and presentation layer. The role of the data layer is generally to interact with the database, such as querying records. We often write the query SQL and then call the program to execute the SQL. But what is its internal workflow? Which step to do first, which step to do next, etc., I think most of my friends are not sure like me.
Step 1: The application sends the query SQL statement to the server for execution
When we execute the SQL statement in the data layer, the application program will connect to the corresponding database server and send the SQL statement to the server for processing.
Step 2: The server parses the requested SQL statement
1. SQL plan cache. Friends who often use query analyzer probably know this fact. Often a query statement needs to be executed for a particularly long time when it is run for the first time. time, but if you run the same statement immediately or within a certain period of time, the query results will be returned in a short time.
Reason:
After receiving the query request, the server will not immediately go to the database to query. Instead, it will look in the plan cache in the database to see if there is a corresponding execution plan. If it exists, it will directly call the compiled one. The execution plan saves the compilation time of the execution plan.
If the queried row already exists in the data buffer storage area, there is no need to query the physical file, but the data is fetched from the cache. In this way, fetching the data from the memory will be much faster than reading the data from the hard disk, improving In order to improve the query efficiency, the data buffer storage area will be mentioned later.
2. If there is no corresponding execution plan in the SQL plan cache, the server will first perform syntax verification on the SQL statement requested by the user. If there is a syntax error, the server will end the query operation and return the corresponding error message to the caller. app.
Note: The error message returned at this time will only contain basic syntax error information, such as select written as select, etc. If the error message contains a column that is not in the list, the server will not check it at this time. Because it is only a syntax verification, whether the semantics is correct is left to the next step.
3. After the syntax is consistent, start to verify whether its semantics are correct, for example, whether the database objects such as table names, column names, stored procedures, etc. actually exist. If it is found that there is one that does not exist, an error will be reported to the application. At the same time End query.
4. The next step is to obtain the parsing lock of the object. When we query a table, the server will first lock the object. This is to ensure the unity of the data. If it is not locked, data will be inserted at this time, but Because there is no lock, the query has already read this record, and some insertions will be rolled back due to transaction failure, resulting in a dirty read phenomenon.
5. The next step is to verify the database user permissions. The SQL statement syntax and semantics are correct. Query results may not be obtained at this time. If the database user does not have corresponding access permissions, the server will report an error of insufficient permissions. Applications, in larger projects, often contain several database connection strings. These database users have different permissions. Some have read-only permissions, some have write-only permissions, and some can read and write. According to different operations, Select different users to execute, and if you don't pay attention, no matter how perfect your SQL statement is, it will be useless if it is perfect.
6. The last step of the analysis is to determine the final execution plan. When the syntax, semantics, and permissions are all verified, the server will not return the results to you immediately. Instead, it will optimize your SQL and select different query algorithms to return it to the application in the most efficient form. For example, when doing table joint queries, the server will ultimately decide to use hashjoin, mergejoin, or loopjoin based on the cost, which index will be more efficient, etc. However, its automated optimization is limited. If you want to write efficient queries SQL still needs to optimize its own SQL query statements.
After the execution plan is determined, the execution plan will be saved in the SQL plan cache. Next time there is the same execution request, it will be fetched directly from the plan cache to avoid recompiling the execution plan.
Step 3: Statement execution
After the server completes parsing the SQL statement, the server will know what the statement means, and then it will actually execute the SQL statement.
There are two situations at this time:
If the data rows contained in the query statement have been read into the data buffer storage area, the server will directly read the data from the data buffer storage area and return it to the application, avoiding the need to retrieve data from the physical file. Read in to improve query speed.
If the data row is not in the data buffer, the record will be read from the physical file and returned to the application, and the data row will be written to the data buffer for next use.
Note: There are several types of SQL cache. Friends who are interested here can search for it. Sometimes due to the existence of cache, it is difficult for us to see the results of optimization immediately, because the second execution will It is particularly fast, so the cache is generally eliminated first, and then the performance before and after optimization is compared. Here are a few common methods:
DBCCDROPCLEANBUFFERS
Delete all clear buffers from the buffer pool.
DBCCFREEPROCCACHE
Removes all elements from the procedure cache.
DBCCFREESYSTEMCACHE
Release all unused cache entries from all caches. The SQLServer2005 database engine cleans up unused cache entries in the background beforehand to make memory available for current entries. However, you can use this command to manually remove unused entries from all caches.
This can only basically eliminate the impact of SQL cache. There seems to be no solution to completely eliminate the cache. If you have any, please give me some advice.
Conclusion: Only by knowing the operation process of the SQL submitted by the service execution application can we debug our application well.
Make sure the SQL syntax is correct;
Make sure the SQL semantics are correct, that is, whether the object exists;
Whether the database user has the corresponding access rights.