Home >Backend Development >C#.Net Tutorial >Regarding hibernate cache issues:
1. About hibernate caching:
1.1.1. Basic caching principle
Hibernate cache is divided into two levels. The first level is stored in the session and is called the first level cache. It is included by default and cannot be uninstalled.
The second level is the process-level cache controlled by sessionFactory. It is a globally shared cache, and any query method that calls the second-level cache will benefit from it. Second-level cache will only work if it is configured correctly. At the same time, the corresponding method must be used to obtain data from the cache when performing conditional queries. For example, Query.iterate() method, load, get method, etc. It must be noted that the session.find method always obtains data from the database and will not obtain data from the second-level cache, even if it contains the required data.
The implementation process of using cache when querying is: first query whether the required data is available in the first-level cache. If not, query the second-level cache. If it is not found in the second-level cache, then query the database. It should be noted that the query speed of these three methods decreases in sequence.
1.2. Existing problems
1.2.1. Problems with the first-level cache and the reasons for using the second-level cache
Because the lifetime of the Session is often very short, the lifetime of the fastest first-level cache that exists inside the Session is also very short. Short, so the hit rate of the first level cache is very low. Its improvement in system performance is also very limited. Of course, the main function of this Session internal cache is to keep the Session's internal data state synchronized. It is not provided by hibernate to greatly improve system performance.
In order to improve the performance of using hibernate, in addition to some conventional methods that need attention, such as:
using lazy loading, urgent external connections, query filtering, etc., you also need to configure hibernate's second-level cache. Its improvement in overall system performance often has immediate results!
(Through my experience in previous projects, there will generally be a 3~4 times performance improvement)
1.2.2. The problem of N+1 queries
When executing conditional queries, the iterate() method has the famous "n+ 1" query problem, that is to say, in the first query, the iterate method will execute the number of query results that meet the conditions plus one (n+1) query. However, this problem only exists during the first query, and the performance will be greatly improved when executing the same query later. This method is suitable for querying business data with large amounts of data.
But note: When the amount of data is particularly large (such as pipeline data, etc.), you need to configure its specific cache strategy for this persistent object, such as setting the maximum number of records that exist in the cache, the cache existence time and other parameters to avoid The system loads a large amount of data into the memory at the same time, causing rapid exhaustion of memory resources, which in turn reduces system performance! ! !
1.3. Other considerations for using hibernate’s second-level cache:
1.3.1. About the validity of data
In addition, hibernate will maintain the data in the second-level cache by itself to ensure that the data in the cache and the real data in the database are The consistency! Whenever you pass an object by calling the save(), update(), or saveOrUpdate() methods, or obtain an object using the load(), get(), list(), iterate(), or scroll() methods , the object will be added to the Session's internal cache. When the flush() method is subsequently called, the state of the object is synchronized with the database.
That is to say, when deleting, updating, or adding data, the cache is updated at the same time. Of course this also includes L2 cache!
As long as the hibernate API is called to perform database-related work. Hibernate will automatically ensure the validity of cached data for you! !
However, if you use JDBC to bypass hibernate and directly perform operations on the database. At this time, Hibernate will not/cannot sense the changes made to the database on its own, and it can no longer guarantee the validity of the data in the cache! !
This is also a problem common to all ORM products. Fortunately, Hibernate exposes the Cache clearing method for us, which provides us with an opportunity to manually ensure data validity! !
The first-level cache and the second-level cache have corresponding clearing methods.缓 The method of removal provided by the secondary cache is:
Press the object CLASS empty cache
Press the object CLASS and the primary key ID of the object to clear the cache data in the set of the object of the object.
1.3.2. Suitable situations for use
Not all situations are suitable for using second-level cache, and it needs to be decided according to the specific situation. At the same time, you can configure a specific cache strategy for a persistent object.
Suitable for the use of second-level cache:
1. The data will not be modified by a third party;
Generally, it is best not to configure the second-level cache for data that will be modified by other than hibernate to avoid causing inconsistent data. However, if this data needs to be cached for performance reasons and may be modified by a third party such as SQL, you can also configure a secondary cache for it. It's just that at this time, you need to manually call the cache clearing method after the SQL is modified. To ensure data consistency
2. The data size is within the acceptable range;
If the amount of data in the data table is particularly large, it is not suitable for secondary cache at this time. The reason is that too much cached data may cause memory resource constraints, which in turn reduces performance.
If the amount of data in the data table is particularly huge, only the newer part of the data is often used. At this time, you can also configure a second-level cache for it. However, the caching strategy of its persistence class must be configured separately, such as the maximum number of caches, cache expiration time, etc., and these parameters should be reduced to a reasonable range (too high will cause memory resource constraints, and if it is too low, caching will be of little significance).
3. The data update frequency is low;
For data with too high data update frequency, the cost of frequently synchronizing the data in the cache may be equivalent to the benefits obtained from querying the data in the cache, and the disadvantages and benefits are offset. Caching is of little significance at this time.
4. Non-critical data (not financial data, etc.)
Financial data, etc. are very important data, and invalid data is absolutely not allowed to appear or be used, so it is best not to use the second-level cache for security reasons at this time.
Because at this time the importance of "correctness" is far greater than the importance of "high performance".
2. Recommendations for using hibernate cache in the current system
1.4. Current situation
There are three situations in general systems that will bypass hibernate to perform database operations:
1. Multiple application systems access a database at the same time
In this case, use hibernate Second-level cache will inevitably cause data inconsistency.
At this time, detailed design is required. For example, avoid simultaneous write operations to the same data table in the design, use various levels of locking mechanisms in the database, etc.
2. Dynamic table related
Analysis:
When Article 3 (sql batch deletion) is executed, subsequent queries can only be in the following three ways:a. session.find() method:
According to the previous summary, the find method will not query the second query. level cached data, but directly queries the database.
b. When calling the iterate method to execute a conditional query:
According to the execution method of the iterate query method, it will query the database for the id value that meets the conditions every time, and then obtain the data from the cache based on this id. When there is no such id in the cache, Only the data of id will be executed in the database query;
If this record has been deleted directly by sql, iterate will not query the id when executing the id query. Therefore, even if this record exists in the cache, it will not be obtained by the customer, and there will be no inconsistency. (This situation has been verified by testing)
c. Use the get or load method to execute the query by id:
Objectively, expired data will be queried at this time. But because the SQL batch deletion in the system is generally
for intermediate related data tables, for
The query of the intermediate association table generally uses conditional query. The probability of querying a certain association relationship by id is very low, so this problem does not exist!
If a value object really needs to query an association relationship by id, at the same time And because of the large amount of data, SQL is used to perform batch deletion. When these two conditions are met, in order to ensure that the query by id gets the correct result, you can use the method of manually clearing the data of this object in the second-level cache!!
(This situation is less likely to occur)
1.5 . Recommendations
1. It is recommended not to use SQL to directly update the data of the data persistence object, but batch deletion can be performed. (There are also fewer places in the system that require batch updates)
2. If you must use SQL to update data, the cached data of this object must be cleared. Call
SessionFactory.evict(class)
SessionFactory.evict(class,id)
and other methods.
3. When the amount of batch deletion data is not large, you can directly use hibernate's batch deletion, so that there is no need to bypass the cache data consistency problem caused by hibernate executing SQL.
4. It is not recommended to use hibernate’s batch deletion method to delete large batches of record data.
The reason is that hibernate's batch deletion will execute 1 query statement plus n deletion statements that meet the conditions. Instead of executing one conditional delete statement at a time! !
When there is a lot of data to be deleted, there will be a huge performance bottleneck! ! ! If the amount of data to be deleted in batches is large, for example, more than 50 items, JDBC can be used to delete them directly. The advantage of this is that only one SQL delete statement is executed, and the performance will be greatly improved. At the same time, for the problem of cache data synchronization, hibernate can be used to clear the relevant data in the second-level cache.
Call SessionFactory.evict(class); SessionFactory.evict(class,id) and other methods.
So, for general application system development (not involving clusters, distributed data synchronization issues, etc.), sql execution is only called when batch deletion of the intermediate association table is performed, and the intermediate association table is generally the execution condition The query is unlikely to perform a query by id. Therefore, you can directly perform sql deletion at this time without even calling the cache clearing method. Doing so will not cause data validity problems caused by configuring the second-level cache in the future.
Taking a step back, even if the method of querying the intermediate table object by ID is actually called in the future, it can be solved by calling the method of clearing the cache.
4. Specific configuration method
According to what I know, many hibernate users superstitiously believe that "hibernate will handle performance issues for us on its own" or "hibernate will automatically handle performance issues for us" when calling its corresponding methods. "All operations call cache". The actual situation is that although hibernate provides us with a good caching mechanism and support for extended caching framework, it must be called correctly before it can work! ! Therefore, the performance problems caused by many systems using hibernate are actually not caused by hibernate being ineffective or bad, but because users do not correctly understand how to use it. On the contrary, if configured properly, hibernate's performance will make you quite "surprised". Below I will explain the specific configuration method.
ibernate provides a second-level cache interface:
net.sf.hibernate.cache.Provider,
It also provides a default implementation of net.sf.hibernate.cache.HashtableCacheProvider,
also You can configure other implementations such as ehcache, jbosscache, etc.
The specific configuration location is in the hibernate.cfg.xml file
Many hibernate users think that they are done when configuring this step.
Note: In fact, with this configuration, hibernate's second-level cache is not used at all. At the same time, because most of the time they close the session immediately when using hibernate, the first-level cache does not play any role. The result is that no cache is used, and all hibernate operations are performed directly on the database! ! The performance is as expected.
The correct way is that in addition to the above configuration, you should also configure the specific cache strategy of each vo object and configure it in the mapping file. For example:
The key is this
read-only, read-write, transactional, etc. Then pay attention when executing the query. If it is a conditional query, or a query that returns all results, the session.find() method will not obtain the data in the cache. The cached data will only be adjusted when the query.iterate() method is called.
In short, hibernate is effectively configured and used correctly according to different business situations and project situations, so as to maximize its strengths and avoid weaknesses. There is no one-size-fits-all solution for every situation.