Spring Data JPA Stream Query Methods-javaTutorial-php.cn

Home

Java

javaTutorial

Spring Data JPA Stream Query Methods

Patricia Arquette

Nov 22, 2024 am 05:35 AM

Spring Data JPA Stream Query Methods

Introduction

Traditionally, fetching large amounts of data can strain memory resources, as it often involves loading the entire result set into memory.

=> Stream query methods offer a solution by providing a way to process data incrementally using Java 8 Streams. This ensures that only a portion of the data is held in memory at any time, enhancing performance and scalability.

In this blog post, we'll dive deep into how stream query methods work in Spring Data JPA, explore their use cases, and demonstrate their implementation.

For this guide, we’re using:

IDE: IntelliJ IDEA (recommended for Spring applications) or Eclipse
Java Version: 17
Spring Data JPA Version: 2.7.x or higher (compatible with Spring Boot 3.x)

<dependency>
    <groupid>org.springframework.boot</groupid>
    <artifactid>spring-boot-starter-data-jpa</artifactid>
</dependency>

NOTE: For more detailed examples, please visit my GitHub repository here

1. What are Stream Query Methods?

Stream query methods in Spring Data JPA allow us to return query results as a Stream instead of a List or other collection types. This approach provides several benefits:

Efficient Resource Management: Data is processed incrementally, reducing memory overhead.
Lazy Processing: Results are fetched and processed on-demand, which is ideal for scenarios like pagination or batch processing.
Integration with Functional Programming: Streams fit with Java's functional programming features, enabling operations like filter, map, and collect.

2. How To Use Stream Query Methods?

=> Let's imagine that we are developing an e-commerce application and want to:

Retrieve all customers who placed orders after a specific date.
Filter orders with a total amount above a specific provided amount.
Group customers by their total order value within the last 6 months.
Return the data as a summary of customer names and their total order values.

Entities

Customer: Represents a customer.

@Setter
@Getter
@Entity
@Entity(name = "tbl_customer")
public class Customer {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String name;
    private String email;

    @OneToMany(mappedBy = "customer", cascade = CascadeType.ALL, fetch = FetchType.LAZY)
    private List<order> orders;
}
</order>

Order: Represents an order placed by a customer.

@Setter
@Getter
@Entity(name = "tbl_order")
public class Order {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private Double amount;
    private LocalDateTime orderDate;

    @ManyToOne
    @JoinColumn(name = "customer_id")
    private Customer customer;
}

Repository

CustomerRepository used for selecting customers and their associated orders placed after a specific date. And we used Stream instead of List to handle result of query.

public interface CustomerRepository extends JpaRepository<customer long> {
    @Query("""
                SELECT c FROM tbl_customer c JOIN FETCH c.orders o WHERE o.orderDate >= :startDate
            """)
    @QueryHints(
            @QueryHint(name = AvailableHints.HINT_FETCH_SIZE, value = "25")
    )
    Stream<customer> findCustomerWithOrders(@Param("startDate") LocalDateTime startDate);
}
</customer></customer>

NOTE:

The JOIN FETCH ensures orders are eagerly loaded.
The @QueryHints used to provide additional hints to the JPA provides (e.g,. Hibernate) to optimize the query execution.

=> For example, when my query return 100 records:

The first 25 records are fetched and processed by the application.
Once those are processed, the next 25 are fetched, and so on, until all 100 records are processed.
This behavior minimizes memory usage and avoids loading all 100 records into memory at once.

Service

<dependency>
    <groupid>org.springframework.boot</groupid>
    <artifactid>spring-boot-starter-data-jpa</artifactid>
</dependency>

Here's the service class to process the data with two parameters startDate and minOrderAmount. As you can see, we don't filter by using sql query and load all data as stream then filter and group by our Java code.

Controller

@Setter
@Getter
@Entity
@Entity(name = "tbl_customer")
public class Customer {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String name;
    private String email;

    @OneToMany(mappedBy = "customer", cascade = CascadeType.ALL, fetch = FetchType.LAZY)
    private List<order> orders;
}
</order>

Testing

=> To create data for testing, you can execute the following script inside my source code or add by yourself.

src/main/resources/dummy-data.sql

Request:

startDate: 2024-05-01T00:00:00
minOrderAmount: 100

@Setter
@Getter
@Entity(name = "tbl_order")
public class Order {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private Double amount;
    private LocalDateTime orderDate;

    @ManyToOne
    @JoinColumn(name = "customer_id")
    private Customer customer;
}

Response:

Return all customers with their total amount which equal or greater than minOrderAmount.

public interface CustomerRepository extends JpaRepository<customer long> {
    @Query("""
                SELECT c FROM tbl_customer c JOIN FETCH c.orders o WHERE o.orderDate >= :startDate
            """)
    @QueryHints(
            @QueryHint(name = AvailableHints.HINT_FETCH_SIZE, value = "25")
    )
    Stream<customer> findCustomerWithOrders(@Param("startDate") LocalDateTime startDate);
}
</customer></customer>

3. Stream vs List

=> You can use IntelliJ Profiler to monitor memory usage and execution time. For more detail about how to add and test with large data set, you can find in my GitHub repository

Small Dataset: (10 customers, 100 orders)

Stream: Execution time (~5ms), Memory usage (Low)
List: Execution time (~4ms), Memory usage (Low)

Large Dataset (10.000 customers, 100.000 orders)

Stream: Execution time (~202ms), Memory usage (Moderate)
List: Execution time (~176ms), Memory usage (High)

Performance Metrics

Metric	Stream	List
Initial Fetch Time	Slightly slower (due to lazy loading)	Faster (all at once)
Memory Consumption	Low (incremental processing)	High (entire dataset in memory)
Memory Consumption	Low (incremental processing)	High (entire dataset in memory)
Processing Overhead	Efficient for large datasets	May cause memory issues for large datasets
Batch Fetching	Supported (with fetch size)	Not applicable
Error Recovery	Graceful with early termination	Limited, as data is preloaded

Wrapping up

Spring Data JPA stream query methods offer an elegant way to process large datasets efficiently while leveraging the power of Java Streams. By processing data incrementally, they reduce memory consumption and integrate seamlessly with modern functional programming paradigms.

What are your thoughts on stream query methods? Share your experiences and use cases in the comments below!

See you in the next posts. Happy Coding!

The above is the detailed content of Spring Data JPA Stream Query Methods. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

How do I use Maven or Gradle for advanced Java project management, build automation, and dependency resolution?Mar 17, 2025 pm 05:46 PM

The article discusses using Maven and Gradle for Java project management, build automation, and dependency resolution, comparing their approaches and optimization strategies.

How do I create and use custom Java libraries (JAR files) with proper versioning and dependency management?Mar 17, 2025 pm 05:45 PM

The article discusses creating and using custom Java libraries (JAR files) with proper versioning and dependency management, using tools like Maven and Gradle.

How do I implement multi-level caching in Java applications using libraries like Caffeine or Guava Cache?Mar 17, 2025 pm 05:44 PM

The article discusses implementing multi-level caching in Java using Caffeine and Guava Cache to enhance application performance. It covers setup, integration, and performance benefits, along with configuration and eviction policy management best pra

How can I use JPA (Java Persistence API) for object-relational mapping with advanced features like caching and lazy loading?Mar 17, 2025 pm 05:43 PM

The article discusses using JPA for object-relational mapping with advanced features like caching and lazy loading. It covers setup, entity mapping, and best practices for optimizing performance while highlighting potential pitfalls.[159 characters]

How does Java's classloading mechanism work, including different classloaders and their delegation models?Mar 17, 2025 pm 05:35 PM

Java's classloading involves loading, linking, and initializing classes using a hierarchical system with Bootstrap, Extension, and Application classloaders. The parent delegation model ensures core classes are loaded first, affecting custom class loa

See all articles