Hierarchical data with PostgreSQL and Spring Data JPA-javaTutorial-php.cn

Home

Java

javaTutorial

Hierarchical data with PostgreSQL and Spring Data JPA

DDD

Nov 01, 2024 am 11:30 AM

He who plants a tree,
Plants a hope.
Plant a tree by Lucy Larcom ?

Intro

In this post I'm going to show you a couple of options for managing hierarchical data represented as a tree data structure. This is the natural approach when you need to implement things like:

file system paths
organisational charts
discussion forum comments
a more contemporary topic: small2big retrieval for RAG applications

If you know what a graph is already, a tree is basically a graph without any cycles. You can represent one visually like this.

Hierarchical data with PostgreSQL and Spring Data JPA

There are multiple alternatives for storing trees in relational databases. In the sections below, I'll show you three of them:

adjacency list
materialized paths
nested sets

There will be two parts to this blog post. In this first one the alternatives are introduced and you see how to load and store data - the basics. Having that out of the way, in the second part, the focus is more on their comparison and trade-offs, for example I want to look at what happens at increased data volumes and what are the appropriate indexing strategies.

All the code you'll see in the sections below can be found here if you're interested to check it out.

The running use-case will be that of employees and their managers, and the IDs for each will be exactly the ones you saw in the tree visualisation I showed above.

Local environment

I'm using the recently released Postgres 17 with Testcontainers. This gives me a repeatable setup to work with. For example, we can use initialisation SQL scripts to automate the creation of a Postgres database with the necessary tables and populate with some test data.

@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {

    private static final String POSTGRES = "postgres";

    @Bean
    @ServiceConnection
    PostgreSQLContainer> postgresContainer() {
        return new PostgreSQLContainer(DockerImageName.parse("postgres:latest"))
                .withUsername(POSTGRES)
                .withPassword(POSTGRES)
                .withDatabaseName(POSTGRES)
                .withInitScript("init-script.sql");
    }
}

Let's jump in and have a look at the first approach.

1. The adjacency list model

This was the first solution for managing hierarchical data, so we can expect that it's still widely present in codebases, therefore chances are that you might encounter it sometime. The idea is that we store the manager's, or more generically said, the parent ID in the same row. It will quickly become clear once we look at the table structure.

Schema

The table corresponding to the adjacency list option looks like this:

create table employees
(
    id           bigserial primary key,
    manager_id   bigint references employees
    name         text,
);

In addition to the above, in order to ensure data integrity, we should also write constraint checks that ensure at least the following:

there is a single parent for every node
no cycles

Generating test data

Especially for Part 2 of this series, we need a way to generate as much data as we want for populating the schema. Let's do it at first step by step for clarity, then afterwards recursively.

Step by step

We start simple by inserting three levels of employees in the hierarchy explicitly.

You might know already about CTEs in Postgres - they are auxiliary named queries executed within the context of a main query. Below, you can see how I construct each level on the basis of the level before.

@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {

    private static final String POSTGRES = "postgres";

    @Bean
    @ServiceConnection
    PostgreSQLContainer> postgresContainer() {
        return new PostgreSQLContainer(DockerImageName.parse("postgres:latest"))
                .withUsername(POSTGRES)
                .withPassword(POSTGRES)
                .withDatabaseName(POSTGRES)
                .withInitScript("init-script.sql");
    }
}

Let's verify that it works as expected so far, and for this purpose do a count to see how many elements have been inserted. You can compare it with the number of nodes in the tree visualisation I showed at the beginning of this post.

create table employees
(
    id           bigserial primary key,
    manager_id   bigint references employees
    name         text,
);

Looks alright! Three levels, and in total we get 15 nodes.

Time to move on to the recursive approach.

Recursive

Writing recursive queries follows a standard procedure. We define a base step and a recursive step then "connect" them to each other using union all. At runtime Postgres will follow this recipe and generate all our results. Have a look.

with root as (
  insert into 
    employees(manager_id, name)
      select 
        null, 
        'root' || md5(random()::text) 
      from  
        generate_series(1, 1) g
      returning 
        employees.id
  ),
  first_level as (
    insert into 
      employees(manager_id, name)
        select 
          root.id, 
          'first_level' || md5(random()::text) 
        from 
          generate_series(1, 2) g, 
          root
        returning 
          employees.id
  ),
  second_level as (
    insert into 
      employees(manager_id, name)
        select 
          first_level.id, 
          'second_level' || md5(random()::text) 
        from 
          generate_series(1, 2) g, 
          first_level
        returning 
          employees.id
  )
insert into 
  employees(manager_id, name)
select 
  second_level.id, 
  'third_level' || md5(random()::text) 
from 
  generate_series(1, 2) g, 
  second_level;

After running it, let's do a count again to see if the same number of elements are inserted.

postgres=# select count(*) from employees;
 count 
-------
 15
(1 row)

Cool! We're in business. We can now populate the schema with however many levels and elements we want, and thus, completely control the inserted volume. No worries if for now recursive queries look a bit difficult still, we'll actually revisit them a bit later with the occasion of writing the queries to retrieve the data.

For now, let's proceed to have a look at the Hibernate entity we can use to map our table to a Java class.

create temporary sequence employees_id_seq;
insert into employees (id, manager_id, name)
with recursive t(id, parent_id, level, name) AS
(
  select 
    nextval('employees_id_seq')::bigint,
    null::bigint, 
    1, 
    'root' from generate_series(1,1) g

    union all

    select 
      nextval('employees_id_seq')::bigint, 
      t.id, 
      level+1, 
      'level' || level || '-' || md5(random()::text) 
    from 
      t, 
      generate_series(1,2) g
    where 
      level 



Nothing special, just a one-to-many relationship between managers and employees. You saw this coming. Let's start querying. 


  
  
  Descendants


All subordinates of a manager

For retrieving all employees which are subordinates of a specific manager referenced by her ID, we'll write a recursive query again. You'll see again a base step and a recursive step that is linked up with the base step. Postgres will then repeat this and retrieve all the relevant rows for the query. Let's take the employee with ID = 2 for example. This is a visual representation which makes it hopefully easier to understand what I've just described. I haven't included all the results, just the first few.



Here's the JPQL query for querying descendants:

@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {

    private static final String POSTGRES = "postgres";

    @Bean
    @ServiceConnection
    PostgreSQLContainer> postgresContainer() {
        return new PostgreSQLContainer(DockerImageName.parse("postgres:latest"))
                .withUsername(POSTGRES)
                .withPassword(POSTGRES)
                .withDatabaseName(POSTGRES)
                .withInitScript("init-script.sql");
    }
}




In queries such as the above one, in order to make them cleaner and avoid needing to write the fully qualified name of the record we will write the results into, we can use the hypersistence-utils library to write a ClassImportIntegratorProvider:



create table employees
(
    id           bigserial primary key,
    manager_id   bigint references employees
    name         text,
);





  
  
  Reviewing the generated queries


It works, but let's have a deeper look at what Hibernate generated. It's always good to understand what's happening under the hood, otherwise we might incur inefficiencies that will happen with every user request, this will add up.

We'll have to start the Spring Boot app with the following setting:



with root as (
  insert into 
    employees(manager_id, name)
      select 
        null, 
        'root' || md5(random()::text) 
      from  
        generate_series(1, 1) g
      returning 
        employees.id
  ),
  first_level as (
    insert into 
      employees(manager_id, name)
        select 
          root.id, 
          'first_level' || md5(random()::text) 
        from 
          generate_series(1, 2) g, 
          root
        returning 
          employees.id
  ),
  second_level as (
    insert into 
      employees(manager_id, name)
        select 
          first_level.id, 
          'second_level' || md5(random()::text) 
        from 
          generate_series(1, 2) g, 
          first_level
        returning 
          employees.id
  )
insert into 
  employees(manager_id, name)
select 
  second_level.id, 
  'third_level' || md5(random()::text) 
from 
  generate_series(1, 2) g, 
  second_level;




Alright, let's have a look. Here's the query for the descendants generated by Hibernate.



postgres=# select count(*) from employees;
 count 
-------
 15
(1 row)




Hmm - looks a bit more complicated than expected! Let's see if we can simplify it a bit, keeping in mind the picture I showed you earlier about the base step and the recursive step linked with the base step. We shouldn't need to do more than that. See what you think of the following.



create temporary sequence employees_id_seq;
insert into employees (id, manager_id, name)
with recursive t(id, parent_id, level, name) AS
(
  select 
    nextval('employees_id_seq')::bigint,
    null::bigint, 
    1, 
    'root' from generate_series(1,1) g

    union all

    select 
      nextval('employees_id_seq')::bigint, 
      t.id, 
      level+1, 
      'level' || level || '-' || md5(random()::text) 
    from 
      t, 
      generate_series(1,2) g
    where 
      level 



<p>Much better! We removed some unnecessary joins. This is expected to make the query go faster because it will have less work to do. </p>

<h4>
  
  
  Final result
</h4>

<p>As a final step let's clean up the query and replace the table names that Hibernate adds with ones that are more human readable.<br>
</p>

<pre class="brush:php;toolbar:false">postgres=# select count(*) from employees;
 count 
-------
 15
(1 row)




Alright, time to see how we go "up" the tree.


  
  
  Ancestors


All managers up the chain

Let's first try to write down the conceptual steps for getting the managers of employee with ID = 14.



Looks very much like the one for the descendants, just the connection between the base step and the recursive step is the other way.

We can write the JPQL query looks like this:



@Entity
@Table(name = "employees")
@Getter
@Setter
public class Employee {
    @Id
    private Long id;

    private String name;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "manager_id")
    private Employee manager;

    @OneToMany(
            mappedBy = "parent",
            cascade = CascadeType.ALL,
            orphanRemoval = true
    )
    private List<employee> employees = new ArrayList();
}
</employee>



And that's it! I have looked at the SQL query generated but I could not find any extra commands that I could shave off. Time to move on to approach 2.


  
  
  2. Materialized paths


ltree is a Postgres extension we can use to work with hierarchical tree structures as materialized paths (starting from the top of the tree). For example, this is how we will record the path for leaf node 8: 1.2.4.8. There are several useful functions it comes with. We can use it as a table column:



return entityManager.createQuery("""
 with employeeRoot as (
  select
    employee.employees employee
  from
    Employee employee
  where
    employee.id = :employeeId

  union all

  select
    employee.employees employee
  from
    Employee employee
  join
    employeeRoot root ON employee = root.employee
  order by
    employee.id
  )
  select 
    new Employee(
     root.employee.id
   )
  from 
  employeeRoot root
 """, Employee.class
)
 .setParameter("employeeId", employeeId)
 .getResultList();




In order to populate the above table with test data, the approach I took is basically migrate the generated data from the table used for the adjacency list you saw before, using the following SQL command. It's again a recursive query which collects elements into an accumulator at every step.



public class ClassImportIntegratorProvider implements IntegratorProvider {
    @Override
    public List<integrator> getIntegrators() {
        return List.of(
                new ClassImportIntegrator(
                        singletonList(
                                Employee.class
                        )
                )
        );
    }
}
</integrator>



Here's the entries that the above command generated.

@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {

    private static final String POSTGRES = "postgres";

    @Bean
    @ServiceConnection
    PostgreSQLContainer> postgresContainer() {
        return new PostgreSQLContainer(DockerImageName.parse("postgres:latest"))
                .withUsername(POSTGRES)
                .withPassword(POSTGRES)
                .withDatabaseName(POSTGRES)
                .withInitScript("init-script.sql");
    }
}




We can proceed to write the Hibernate entity. In order to map columns of type ltree, I implemented a UserType. I can then map the path field with @Type(LTreeType.class):



create table employees
(
    id           bigserial primary key,
    manager_id   bigint references employees
    name         text,
);




We're ready to write some queries. In native SQL, it would look like the following:



with root as (
  insert into 
    employees(manager_id, name)
      select 
        null, 
        'root' || md5(random()::text) 
      from  
        generate_series(1, 1) g
      returning 
        employees.id
  ),
  first_level as (
    insert into 
      employees(manager_id, name)
        select 
          root.id, 
          'first_level' || md5(random()::text) 
        from 
          generate_series(1, 2) g, 
          root
        returning 
          employees.id
  ),
  second_level as (
    insert into 
      employees(manager_id, name)
        select 
          first_level.id, 
          'second_level' || md5(random()::text) 
        from 
          generate_series(1, 2) g, 
          first_level
        returning 
          employees.id
  )
insert into 
  employees(manager_id, name)
select 
  second_level.id, 
  'third_level' || md5(random()::text) 
from 
  generate_series(1, 2) g, 
  second_level;




But let's write our queries in JPQL. For this, we'll have to first write our custom StandardSQLFunction. This will allow us to define a substitution for the Postgres native operator.



postgres=# select count(*) from employees;
 count 
-------
 15
(1 row)




We then have to register it as a FunctionContributor, like so:



create temporary sequence employees_id_seq;
insert into employees (id, manager_id, name)
with recursive t(id, parent_id, level, name) AS
(
  select 
    nextval('employees_id_seq')::bigint,
    null::bigint, 
    1, 
    'root' from generate_series(1,1) g

    union all

    select 
      nextval('employees_id_seq')::bigint, 
      t.id, 
      level+1, 
      'level' || level || '-' || md5(random()::text) 
    from 
      t, 
      generate_series(1,2) g
    where 
      level 



<p>The last step is to create a resource file in the META-INF/services folder called org.hibernate.boot.model.FunctionContributor where we will add a single line with the fully qualified name of the class above.</p>

<p>Okay, cool! We're finally in position to write the following query:<br>
</p>

<pre class="brush:php;toolbar:false">postgres=# select count(*) from employees;
 count 
-------
 15
(1 row)




For example, we can call this method like this to retrieve all the paths that contain ID = 2:



@Entity
@Table(name = "employees")
@Getter
@Setter
public class Employee {
    @Id
    private Long id;

    private String name;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "manager_id")
    private Employee manager;

    @OneToMany(
            mappedBy = "parent",
            cascade = CascadeType.ALL,
            orphanRemoval = true
    )
    private List<employee> employees = new ArrayList();
}
</employee>



Postgres offers a wide set of functions for working with ltrees. You can find them in the official docs page. As well, there's a useful cheatsheet.

It's important to add constraints to our schema in order to ensure data consistency - here's a good resource I found on this topic.


  
  
  3. Nested sets


Easiest to understand is with an image showing the intuition. At every node of the tree we have an extra "left" and a "right" column besides its ID. The rule is that all the children have their left and right in between their parent's left and right values. 



Here's the table structure to represent the tree above.



return entityManager.createQuery("""
 with employeeRoot as (
  select
    employee.employees employee
  from
    Employee employee
  where
    employee.id = :employeeId

  union all

  select
    employee.employees employee
  from
    Employee employee
  join
    employeeRoot root ON employee = root.employee
  order by
    employee.id
  )
  select 
    new Employee(
     root.employee.id
   )
  from 
  employeeRoot root
 """, Employee.class
)
 .setParameter("employeeId", employeeId)
 .getResultList();




In order to populate the table, I have converted the script from Joe Celko's "SQL for smarties" book into Postgres syntax. Here it is:



public class ClassImportIntegratorProvider implements IntegratorProvider {
    @Override
    public List<integrator> getIntegrators() {
        return List.of(
                new ClassImportIntegrator(
                        singletonList(
                                Employee.class
                        )
                )
        );
    }
}
</integrator>



Alright, I'm ready to do some queries. Here's how to retrieve the ancestors.



@DynamicPropertySource
static void registerPgProperties(DynamicPropertyRegistry registry) {
    registry.add("spring.jpa.show_sql", () -> true);
}




For the descendants, we'd first have to retrieve the left and right, after which we can use the below query.



with recursive employeeRoot (employee_id) as 
(
select 
  e1_0.id
from 
  employees eal1_0
join 
  employees e1_0 on eal1_0.id = e1_0.manager_id
where eal1_0.id=?

union all

(
select 
  e2_0.id
from 
  employees eal2_0
join 
  employeeRoot root1_0 on eal2_0.id = root1_0.employee_id
join 
  employees e2_0 on eal2_0.id = e2_0.manager_id
order by 
  eal2_0.id
)
)
select 
  root2_0.employee_id
from 
  employeeRoot root2_0




And that's it! You've seen how to go up or down the tree for all three approaches. I hope that you enjoyed the journey and you find it useful.


  
  
  Postgres vs. document/graph databases


The database we've used for the examples above is PostgreSQL. It is not the only option, for example you might wonder why not choose a document database like MongoDB, or a graph databases like Neo4j, because they were actually built with this type of workload in mind. 
Chances are, you already have your source of truth data in Postgres in a relational model leveraging transactional guarantees. In that case, you should first check how well Postgres itself handles your auxiliary use-cases as well, in order to keep everything in one place. This way, you will avoid the increased cost and operational complexity needed to spin up and maintain/upgrade a new separate specialised data store, as well as needing to get familiar with it. 


  
  
  Conclusion


There are several interesting options for modelling hierarchical data in your database applications. In this post I've shown you three ways to do it. Stay tuned for Part 2 where we will compare them as well as see what happens with larger volume of data. 


  
  
  References


https://dev.to/yugabyte/learn-how-to-write-sql-recursive-cte-in-5-steps-3n88

https://vladmihalcea.com/hibernate-with-recursive-query/

https://vladmihalcea.com/dto-projection-jpa-query/

https://tudborg.com/posts/2022-02-04-postgres-hierarchical-data-with-ltree/

https://aregall.tech/hibernate-6-custom-functions#heading-implementing-a-custom-function

https://www.amazon.co.uk/Joe-Celkos-SQL-Smarties-Programming/dp/0128007613 

https://madecurious.com/curiosities/trees-in-postgresql/

https://schinckel.net/2014/11/27/postgres-tree-shootout-part-2:-adjacency-list-using-ctes/

The above is the detailed content of Hierarchical data with PostgreSQL and Spring Data JPA. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Is Java Platform Independent if then how?May 09, 2025 am 12:11 AM

Java is platform-independent because of its "write once, run everywhere" design philosophy, which relies on Java virtual machines (JVMs) and bytecode. 1) Java code is compiled into bytecode, interpreted by the JVM or compiled on the fly locally. 2) Pay attention to library dependencies, performance differences and environment configuration. 3) Using standard libraries, cross-platform testing and version management is the best practice to ensure platform independence.

The Truth About Java's Platform Independence: Is It Really That Simple?May 09, 2025 am 12:10 AM

Java'splatformindependenceisnotsimple;itinvolvescomplexities.1)JVMcompatibilitymustbeensuredacrossplatforms.2)Nativelibrariesandsystemcallsneedcarefulhandling.3)Dependenciesandlibrariesrequirecross-platformcompatibility.4)Performanceoptimizationacros

Java Platform Independence: Advantages for web applicationsMay 09, 2025 am 12:08 AM

Java'splatformindependencebenefitswebapplicationsbyallowingcodetorunonanysystemwithaJVM,simplifyingdeploymentandscaling.Itenables:1)easydeploymentacrossdifferentservers,2)seamlessscalingacrosscloudplatforms,and3)consistentdevelopmenttodeploymentproce

JVM Explained: A Comprehensive Guide to the Java Virtual MachineMay 09, 2025 am 12:04 AM

TheJVMistheruntimeenvironmentforexecutingJavabytecode,crucialforJava's"writeonce,runanywhere"capability.Itmanagesmemory,executesthreads,andensuressecurity,makingitessentialforJavadeveloperstounderstandforefficientandrobustapplicationdevelop

Key Features of Java: Why It Remains a Top Programming LanguageMay 09, 2025 am 12:04 AM

Javaremainsatopchoicefordevelopersduetoitsplatformindependence,object-orienteddesign,strongtyping,automaticmemorymanagement,andcomprehensivestandardlibrary.ThesefeaturesmakeJavaversatileandpowerful,suitableforawiderangeofapplications,despitesomechall

Java Platform Independence: What does it mean for developers?May 08, 2025 am 12:27 AM

Java'splatformindependencemeansdeveloperscanwritecodeonceandrunitonanydevicewithoutrecompiling.ThisisachievedthroughtheJavaVirtualMachine(JVM),whichtranslatesbytecodeintomachine-specificinstructions,allowinguniversalcompatibilityacrossplatforms.Howev

How to set up JVM for first usage?May 08, 2025 am 12:21 AM

To set up the JVM, you need to follow the following steps: 1) Download and install the JDK, 2) Set environment variables, 3) Verify the installation, 4) Set the IDE, 5) Test the runner program. Setting up a JVM is not just about making it work, it also involves optimizing memory allocation, garbage collection, performance tuning, and error handling to ensure optimal operation.

How can I check Java platform independence for my product?May 08, 2025 am 12:12 AM

ToensureJavaplatformindependence,followthesesteps:1)CompileandrunyourapplicationonmultipleplatformsusingdifferentOSandJVMversions.2)UtilizeCI/CDpipelineslikeJenkinsorGitHubActionsforautomatedcross-platformtesting.3)Usecross-platformtestingframeworkss

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055518 fails to install in Windows 10?

1 months agoByDDD

How to fix KB5055523 fails to install in Windows 11?

1 months agoByDDD

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Hot Tools

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Hot Topics

1664

1423

1317

1268

1246