Home >Java >javaTutorial >A brief analysis of the caching and lazy loading mechanisms in Java's Hibernate framework
The difference between hibernate first-level cache and second-level cache
The cache is between the application and the physical data source. Its function is to reduce the frequency of the application's access to the physical data source, thereby improving the operation of the application. performance. The data in the cache is a copy of the data in the physical data source. The application reads and writes data from the cache at runtime, and synchronizes the data in the cache and the physical data source at a specific moment or event.
The cache medium is usually memory, so the read and write speed is very fast. But if the amount of data stored in the cache is very large, the hard disk will also be used as the cache medium. The implementation of cache must not only consider the storage medium, but also consider managing concurrent access to the cache and the life cycle of the cached data.
Hibernate’s cache includes Session cache and SessionFactory cache. SessionFactory cache can be divided into two categories: built-in cache and external cache. Session cache is built-in and cannot be unloaded. It is also called Hibernate's first-level cache. The built-in cache of SessionFactory and the cache of Session are similar in implementation. The former refers to the data contained in some collection attributes of the SessionFactory object, and the latter refers to the data contained in some collection attributes of Session. The built-in cache of SessionFactory stores mapping metadata and predefined SQL statements. The mapping metadata is a copy of the data in the mapping file, and the predefined SQL statement is derived from the mapping metadata during the Hibernate initialization phase. The built-in cache of SessionFactory is only For reading, the application cannot modify the mapping metadata and predefined SQL statements in the cache, so SessionFactory does not need to synchronize the built-in cache with the mapping file. SessionFactory's external cache is a configurable plug-in. By default, SessionFactory does not enable this plug-in. The data in the external cache is a copy of the database data, and the medium of the external cache can be memory or hard disk. SessionFactory's external cache is also called Hibernate's second-level cache.
Hibernate’s two levels of cache are both located in the persistence layer and store copies of database data. So what is the difference between them? In order to understand the difference between the two, it is necessary to have a deep understanding of the two characteristics of the persistence layer cache: the scope of the cache and the concurrent access policy of the cache.
The scope of the cache in the persistence layer
The scope of the cache determines the life cycle of the cache and who can access it. The scope of cache is divided into three categories.
1 Transaction scope: The cache can only be accessed by the current transaction. The life cycle of the cache depends on the life cycle of the transaction. When the transaction ends, the cache also ends its life cycle. In this scope, the cache medium is memory. Transactions can be database transactions or application transactions. Each transaction has its own cache. The data in the cache is usually in the form of interrelated objects.
2 Process scope: The cache is shared by all transactions within the process. These transactions may access the cache concurrently, so necessary transaction isolation mechanisms must be adopted for the cache. The life cycle of the cache depends on the life cycle of the process. When the process ends, the cache also ends its life cycle. The process-wide cache may store a large amount of data, so the storage medium can be memory or hard disk. The data in the cache can be in the form of related objects or loose data of objects. The loose object data form is somewhat similar to the object's serialized data, but the algorithm for object decomposition into loose data is faster than the algorithm required for object serialization.
3 Cluster scope: In a cluster environment, the cache is shared by processes on one machine or multiple machines. The data in the cache is copied to each process node in the cluster environment, and remote communication is used between processes to ensure the consistency of the data in the cache. The data in the cache usually takes the form of loose data of objects.
For most applications, you should carefully consider whether to use a cluster-wide cache, because the access speed is not necessarily much faster than directly accessing the database data.
The persistence layer can provide multiple ranges of cache. If the corresponding data is not found in the transaction-wide cache, you can also query it in the process-wide or cluster-wide cache. If it is still not found, you can only query it in the database. The transaction-wide cache is the first-level cache of the persistence layer and is usually required; the process-wide or cluster-wide cache is the second-level cache of the persistence layer and is usually optional.
Concurrent access strategy for the cache of the persistence layer
When multiple concurrent transactions access the same data cached in the persistence layer at the same time, concurrency problems will occur, and necessary transaction isolation measures must be taken.
Concurrency problems will occur in the process-wide or cluster-wide cache, that is, the second-level cache. Therefore, the following four types of concurrent access strategies can be set, each strategy corresponding to a transaction isolation level.
Transactional: Applicable only in managed environments. It provides Repeatable Read transaction isolation level. For data that is frequently read but rarely modified, this isolation type can be used because it can prevent concurrency problems such as dirty reads and non-repeatable reads.
Read-write: Provides Read Committed transaction isolation level. Only applicable in non-clustered environment. For data that is frequently read but rarely modified, this isolation type can be used because it can prevent concurrency problems such as dirty reads.
Non-strict read-write type: The consistency of the cache and the data in the database is not guaranteed. If there is a possibility that two transactions access the same data in the cache at the same time, a short data expiration time must be configured for the data to avoid dirty reads. This concurrent access strategy can be used for data that is rarely modified and allows occasional dirty reads. Read-only: This concurrent access strategy can be used for data that will never be modified, such as reference data.
The transactional concurrent access strategy has the highest transaction isolation level, and the read-only isolation level is the lowest. The higher the transaction isolation level, the lower the concurrency performance.
What kind of data is suitable to be stored in the second-level cache?
1. Data that is rarely modified
2. Data that is not very important, occasional concurrent data is allowed
3. Data that will not be accessed concurrently
4. Reference data
Data that is not suitable for storage in the second level cache?
1. Frequently modified data
2. Financial data, concurrency is absolutely not allowed
3. Data shared with other applications.
Hibernate's second-level cache
As mentioned earlier, Hibernate provides two-level cache, the first level is the Session cache. Since the life cycle of the Session object usually corresponds to a database transaction or an application transaction, its cache is a transaction-scope cache. First level caching is required, is not allowed and in fact cannot be removed. In the first level cache, each instance of a persistent class has a unique OID.
The second-level cache is a pluggable cache plug-in, which is managed by SessionFactory. Since the life cycle of the SessionFactory object corresponds to the entire process of the application, the second-level cache is a process-wide or cluster-wide cache. Loose data of objects stored in this cache. Second-level objects have the potential for concurrency issues and require an appropriate concurrent access strategy that provides a transaction isolation level for the cached data. The cache adapter is used to integrate specific cache implementation software with Hibernate. Second level caching is optional and can be configured at a per-class or per-collection granularity.
The general process of Hibernate's second-level cache strategy is as follows:
1) When querying conditions, always issue a SQL statement such as select * from table_name where .... (select all fields) to query the database and obtain all at once data object.
2) Put all the obtained data objects into the second-level cache according to their IDs.
3) When Hibernate accesses the data object based on the ID, it first searches it from the Session first-level cache; if it cannot be found, if the second-level cache is configured, it then checks it from the second-level cache; if it cannot be found, it queries the database again. , put the result into the cache according to the ID.
4) When deleting, updating, or adding data, the cache is updated at the same time.
Hibernate’s second-level cache strategy is a cache strategy for ID queries, but has no effect on conditional queries. To this end, Hibernate provides Query caching for conditional queries.
The process of Hibernate's Query caching strategy is as follows:
1) Hibernate first forms a Query Key based on this information. The Query Key includes the general information requested by the conditional query: SQL, parameters required by SQL, record range (starting position) rowStart, the maximum number of records (maxRows), etc.
2) Hibernate searches the Query cache for the corresponding result list based on this Query Key. If it exists, then return the result list; if it does not exist, query the database, obtain the result list, and put the entire result list into the Query cache according to the Query Key.
3) The SQL in Query Key involves some table names. If any data in these tables is modified, deleted, added, etc., these related Query Keys will be cleared from the cache.
Hibernate delayed loading mechanism
Delayed loading:
The delayed loading mechanism is proposed to avoid some unnecessary performance overhead. The so-called delayed loading means that when the data is really needed, Only then the data loading operation is actually performed. Hibernate provides lazy loading of entity objects and lazy loading of collections. In addition, Hibernate3 also provides lazy loading of properties. Below we will introduce the details of these types of lazy loading respectively.
A. Lazy loading of entity objects:
If you want to use lazy loading for entity objects, you must make the corresponding configuration in the entity's mapping configuration file, as shown below:
<hibernate-mapping> <class name=”com.neusoft.entity.User” table=”user” lazy=”true”> …… </class> </hibernate-mapping>
Enable the lazy loading feature of the entity by setting the lazy attribute of the class to true. If we run the following code:
User user=(User)session.load(User.class,”1”);
(1)
System.out.println(user.getName());
(2)
当运行到(1)处时,Hibernate并没有发起对数据的查询,如果我们此时通过一些调试工具(比如JBuilder2005的Debug工具),观察此时user对象的内存快照,我们会惊奇的发现,此时返回的可能是User$EnhancerByCGLIB$$bede8986类型的对象,而且其属性为null,这是怎么回事?还记得前面我曾讲过session.load()方法,会返回实体对象的代理类对象,这里所返回的对象类型就是User对象的代理类对象。在Hibernate中通过使用CGLIB,来实现动态构造一个目标对象的代理类对象,并且在代理类对象中包含目标对象的所有属性和方法,而且所有属性均被赋值为null。通过调试器显示的内存快照,我们可以看出此时真正的User对象,是包含在代理对象的CGLIB$CALBACK_0.target属性中,当代码运行到(2)处时,此时调用user.getName()方法,这时通过CGLIB赋予的回调机制,实际上调用CGLIB$CALBACK_0.getName()方法,当调用该方法时,Hibernate会首先检查CGLIB$CALBACK_0.target属性是否为null,如果不为空,则调用目标对象的getName方法,如果为空,则会发起数据库查询,生成类似这样的SQL语句:select * from user where id='1';来查询数据,并构造目标对象,并且将它赋值到CGLIB$CALBACK_0.target属性中。
这样,通过一个中间代理对象,Hibernate实现了实体的延迟加载,只有当用户真正发起获得实体对象属性的动作时,才真正会发起数据库查询操作。所以实体的延迟加载是用通过中间代理类完成的,所以只有session.load()方法才会利用实体延迟加载,因为只有session.load()方法才会返回实体类的代理类对象。
B、 集合类型的延迟加载:
在Hibernate的延迟加载机制中,针对集合类型的应用,意义是最为重大的,因为这有可能使性能得到大幅度的提高,为此Hibernate进行了大量的努力,其中包括对JDK Collection的独立实现,我们在一对多关联中,定义的用来容纳关联对象的Set集合,并不是java.util.Set类型或其子类型,而是net.sf.hibernate.collection.Set类型,通过使用自定义集合类的实现,Hibernate实现了集合类型的延迟加载。为了对集合类型使用延迟加载,我们必须如下配置我们的实体类的关于关联的部分:
<hibernate-mapping> <class name=”com.neusoft.entity.User” table=”user”> ….. <set name=”addresses” table=”address” lazy=”true” inverse=”true”> <key column=”user_id”/> <one-to-many class=”com.neusoft.entity.Arrderss”/> </set> </class> </hibernate-mapping>
通过将ace372f96ca3ec664acb3aaa2421b04c元素的lazy属性设置为true来开启集合类型的延迟加载特性。我们看下面的代码:
User user=(User)session.load(User.class,”1”); Collection addset=user.getAddresses();
(1)
Iterator it=addset.iterator();
(2)
while(it.hasNext()){ Address address=(Address)it.next(); System.out.println(address.getAddress()); }
当程序执行到(1)处时,这时并不会发起对关联数据的查询来加载关联数据,只有运行到(2)处时,真正的数据读取操作才会开始,这时Hibernate会根据缓存中符合条件的数据索引,来查找符合条件的实体对象。
这里我们引入了一个全新的概念——数据索引,下面我们首先将接一下什么是数据索引。在Hibernate中对集合类型进行缓存时,是分两部分进行缓存的,首先缓存集合中所有实体的id列表,然后缓存实体对象,这些实体对象的id列表,就是所谓的数据索引。当查找数据索引时,如果没有找到对应的数据索引,这时就会一条select SQL的执行,获得符合条件的数据,并构造实体对象集合和数据索引,然后返回实体对象的集合,并且将实体对象和数据索引纳入Hibernate的缓存之中。另一方面,如果找到对应的数据索引,则从数据索引中取出id列表,然后根据id在缓存中查找对应的实体,如果找到就从缓存中返回,如果没有找到,在发起select SQL查询。在这里我们看出了另外一个问题,这个问题可能会对性能产生影响,这就是集合类型的缓存策略。如果我们如下配置集合类型:
<hibernate-mapping> <class name=”com.neusoft.entity.User” table=”user”> ….. <set name=”addresses” table=”address” lazy=”true” inverse=”true”> <cache usage=”read-only”/> <key column=”user_id”/> <one-to-many class=”com.neusoft.entity.Arrderss”/> </set> </class> </hibernate-mapping>
这里我们应用了6b8f159f2058e6fec96445d54a7ff0ae配置,如果采用这种策略来配置集合类型,Hibernate将只会对数据索引进行缓存,而不会对集合中的实体对象进行缓存。如上配置我们运行下面的代码:
User user=(User)session.load(User.class,”1”); Collection addset=user.getAddresses(); Iterator it=addset.iterator(); while(it.hasNext()){ Address address=(Address)it.next(); System.out.println(address.getAddress()); } System.out.println(“Second query……”); User user2=(User)session.load(User.class,”1”); Collection it2=user2.getAddresses(); while(it2.hasNext()){ Address address2=(Address)it2.next(); System.out.println(address2.getAddress()); }
运行这段代码,会得到类似下面的输出:
Select * from user where id='1'; Select * from address where user_id='1'; Tianjin Dalian Second query…… Select * from address where id='1'; Select * from address where id='2'; Tianjin Dalian
我们看到,当第二次执行查询时,执行了两条对address表的查询操作,为什么会这样?这是因为当第一次加载实体后,根据集合类型缓存策略的配置,只对集合数据索引进行了缓存,而并没有对集合中的实体对象进行缓存,所以在第二次再次加载实体时,Hibernate找到了对应实体的数据索引,但是根据数据索引,却无法在缓存中找到对应的实体,所以Hibernate根据找到的数据索引发起了两条select SQL的查询操作,这里造成了对性能的浪费,怎样才能避免这种情况呢?我们必须对集合类型中的实体也指定缓存策略,所以我们要如下对集合类型进行配置:
<hibernate-mapping> <class name=”com.neusoft.entity.User” table=”user”> ….. <set name=”addresses” table=”address” lazy=”true” inverse=”true”> <cache usage=”read-write”/> <key column=”user_id”/> <one-to-many class=”com.neusoft.entity.Arrderss”/> </set> </class> </hibernate-mapping>
此时Hibernate会对集合类型中的实体也进行缓存,如果根据这个配置再次运行上面的代码,将会得到类似如下的输出:
Select * from user where id='1'; Select * from address where user_id='1'; Tianjin Dalian Second query…… Tianjin Dalian
这时将不会再有根据数据索引进行查询的SQL语句,因为此时可以直接从缓存中获得集合类型中存放的实体对象。
C、 属性延迟加载:
在Hibernate3中,引入了一种新的特性——属性的延迟加载,这个机制又为获取高性能查询提供了有力的工具。在前面我们讲大数据对象读取时,在User对象中有一个resume字段,该字段是一个java.sql.Clob类型,包含了用户的简历信息,当我们加载该对象时,我们不得不每一次都要加载这个字段,而不论我们是否真的需要它,而且这种大数据对象的读取本身会带来很大的性能开销。在Hibernate2中,我们只有通过我们前面讲过的面性能的粒度细分,来分解User类,来解决这个问题(请参照那一节的论述),但是在Hibernate3中,我们可以通过属性延迟加载机制,来使我们获得只有当我们真正需要操作这个字段时,才去读取这个字段数据的能力,为此我们必须如下配置我们的实体类:
<hibernate-mapping> <class name=”com.neusoft.entity.User” table=”user”> …… <property name=”resume” type=”java.sql.Clob” column=”resume” lazy=”true”/> </class> </hibernate-mapping>
通过对3fcb97bb666cd7884d4d3210fb47b5ef元素的lazy属性设置true来开启属性的延迟加载,在Hibernate3中为了实现属性的延迟加载,使用了类增强器来对实体类的Class文件进行强化处理,通过增强器的增强,将CGLIB的回调机制逻辑,加入实体类,这里我们可以看出属性的延迟加载,还是通过CGLIB来实现的。CGLIB是Apache的一个开源工程,这个类库可以操纵java类的字节码,根据字节码来动态构造符合要求的类对象。根据上面的配置我们运行下面的代码:
String sql=”from User user where user.name='zx' ”; Query query=session.createQuery(sql);
(1)
List list=query.list(); for(int i=0;i<list.size();i++){ User user=(User)list.get(i); System.out.println(user.getName()); System.out.println(user.getResume()); }
(2)
当执行到(1)处时,会生成类似如下的SQL语句:
Select id,age,name from user where name='zx';
这时Hibernate会检索User实体中所有非延迟加载属性对应的字段数据,当执行到(2)处时,会生成类似如下的SQL语句:
Select resume from user where id='1';
这时会发起对resume字段数据真正的读取操作。
更多浅析Java的Hibernate框架中的缓存和延迟加载机制相关文章请关注PHP中文网!