Navigating the world of Harry Potter with Knowledge Graphs-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Navigating the world of Harry Potter with Knowledge Graphs

Susan Sarandon

Dec 22, 2024 am 12:13 AM

Aim

Are you a Harry Potter fan who want to have everything about the Harry Potter universe on your fingertips? Or do you simply want to impress your friends with a cool chart of how the different characters in Harry Potter come together? Look no further than knowledge graphs.

This guide will show you how to get a knowledge graph up in Neo4J with just your laptop and your favourite book.

What is knowledge graph

According to Wikipedia:

A knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data.

What do you need

In terms of hardware, all you need is a computer, preferably one with a Nvidia graphics card. To be fully self-sufficient, I will go with a local LLM setup, but one could easily also use an OpenAI API for the same purpose.

Steps in setting up

You will need the following:

Ollama, and your favourite LLM model
a python environment
Neo4J

Ollama

As I am coding on Ubuntu 24.04 in WSL2, in order for any GPU workload to passthrough easily, I am using Ollama docker. Running Ollama as a docker container is as simple as first installing the Nvidia container toolkit, and then the following:

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

If you do not have a Nvidia GPU, you can run a CPU-only Ollama using the following command in CLI:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Once you are done, you can pull your favourite LLM model into Ollama. The list of models available on Ollama is here. For example if I want to pull qwen2.5, I can run the following command in CLI:

docker exec -it ollama ollama run qwen2.5

And you are done with Ollama!

Python environment

You will first want to create a python virtual environment, so that any packages you install, or any configurations changes you made, are restricted to within the environment, instead of having these applied globally. The following command will create a virtual environment harry-potter-rag:

python -m venv harry-potter-rag

You can then activate the virtual environment using the following command:

source tutorial-env/bin/activate

Next, use pip to install the relevant packages, mainly from LangChain:

%pip install --upgrade --quiet  langchain langchain-community langchain-openai langchain-experimental neo4j

Setting up Neo4J

We will set up Neo4J as a docker container. For ease of setting up with specific configurations, we use docker compose. You may simply copy the following into a file called docker-compose.yaml, and then run docker-compose up -d in the same directory to set up Neo4J.

This setup also ensures data, logs and plugins are persisted in local folders, i.e. /data. /logs and plugins.

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Building the Knowledge Graph

We can now start building the Knowledge Graph in Jupyter Notebook! We first set up an Ollama LLM instance using the following:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Next, we connect our LLM to Neo4J:

docker exec -it ollama ollama run qwen2.5

Now, it is time to grab your favourite Harry Potter text, or any favourite book, and we will use LangChain to split the text into chunks. Chunking is a strategy to break down a long text into parts, and we can then send each part to the LLM to convert them into nodes and edges, and insert each chunk's nodes and edges in Neo4J. Just a quick primer, nodes are circles you see on a graph, and each edge joins two nodes together.

The code also prints the first chunk for a quick preview of how the chunks look like.

python -m venv harry-potter-rag

Now, it is time to let our GPU do the heavy lifting and convert out text into Knowledge Graph! Before we dive deep into the entire book, let us experiment with prompts to better guide the LLM in returning a graph in the way we want.

Prompts are essentially examples of what we expect, or instructions of what we want to appear in the response. In the context of knowledge graphs, we can instruct the LLM to only extract persons and organisations as nodes, and to only accept certain types of relationships given the entities. For example, we can allow the relationship of spouse to only happen between a person and another person, and not between a person and an organisation.

We can now employ the LLMGraphTransformer on the first chunk of text to see how the graph could turn out. This is a good chance for us to tweak the prompt until the result is to our liking.

The following example expects nodes which could be a Person or Organization, and the allowed_relationships specify the types of relationships that are allowed. In order to allow LLM to capture the variety of the original text, I also set strict_mode to False, so that any other relationships or entities which are not defined below can also be captured. If you instead set strict_mode to True, entities and relationships that do not comply with what is allowed could be either dropped, or forced into what is allowed (which may be inaccurate).

source tutorial-env/bin/activate

After you are satisfied with fine-tuning your prompt, it is now time to ingest into a Knowledge Graph. Note that the try - except is to explicitly handle any response that could not be properly inserted into Neo4J -- the code is designed so that any error is logged, but does not block the loop from moving on with converting subsequent chunks into graph.

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

The loop above took me about 46 minutes to ingest Harry Potter and the Philosopher's Stone, Harry Potter and the Chamber of Secrets, and Harry Potter and the Prisoner of Azkaban. I end up with 4868 unique nodes! A quick preview is available below. You can see that the graph is really crowded, and and it is hard to distinguish who is related to who else, and in what way.

Navigating the world of Harry Potter with Knowledge Graphs

We can now leverage on cypher queries to look at say, Dumbledore!

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Navigating the world of Harry Potter with Knowledge Graphs

Ok now we get just Dumbledore himself. Let's see how he is related to Harry Potter.

docker exec -it ollama ollama run qwen2.5

Navigating the world of Harry Potter with Knowledge Graphs

Ok, now we are interested in what Harry and Dumbledore have spoked.

python -m venv harry-potter-rag

Navigating the world of Harry Potter with Knowledge Graphs

We can see that the graph is still really confusing, with many documents to go through to really find what we are looking for. We can see that the modelling of documents as nodes is not ideal, and further work could be done on the LLMGraphTransformer to make the graph more intuitive to use.

Conclusion

You can see how easy it is to set up a Knowledge Graph on your own local computer, without even needing to connect to the internet.

The github repo, which also contains the entire Knowledge Graph of the Harry Potter universe, is available here.

Postscript

To import the harry_potter.graphml file into Neo4J, copy the graphml file into neo4j /import folder, and run the following on the Neo4J browser:

source tutorial-env/bin/activate

The above is the detailed content of Navigating the world of Harry Potter with Knowledge Graphs. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

How does the choice between lists and arrays impact the overall performance of a Python application dealing with large datasets?May 03, 2025 am 12:11 AM

ForhandlinglargedatasetsinPython,useNumPyarraysforbetterperformance.1)NumPyarraysarememory-efficientandfasterfornumericaloperations.2)Avoidunnecessarytypeconversions.3)Leveragevectorizationforreducedtimecomplexity.4)Managememoryusagewithefficientdata

Explain how memory is allocated for lists versus arrays in Python.May 03, 2025 am 12:10 AM

InPython,listsusedynamicmemoryallocationwithover-allocation,whileNumPyarraysallocatefixedmemory.1)Listsallocatemorememorythanneededinitially,resizingwhennecessary.2)NumPyarraysallocateexactmemoryforelements,offeringpredictableusagebutlessflexibility.

How do you specify the data type of elements in a Python array?May 03, 2025 am 12:06 AM

InPython, YouCansSpectHedatatYPeyFeLeMeReModelerErnSpAnT.1) UsenPyNeRnRump.1) UsenPyNeRp.DLOATP.PLOATM64, Formor PrecisconTrolatatypes.

What is NumPy, and why is it important for numerical computing in Python?May 03, 2025 am 12:03 AM

NumPyisessentialfornumericalcomputinginPythonduetoitsspeed,memoryefficiency,andcomprehensivemathematicalfunctions.1)It'sfastbecauseitperformsoperationsinC.2)NumPyarraysaremorememory-efficientthanPythonlists.3)Itoffersawiderangeofmathematicaloperation

Discuss the concept of 'contiguous memory allocation' and its importance for arrays.May 03, 2025 am 12:01 AM

Contiguousmemoryallocationiscrucialforarraysbecauseitallowsforefficientandfastelementaccess.1)Itenablesconstanttimeaccess,O(1),duetodirectaddresscalculation.2)Itimprovescacheefficiencybyallowingmultipleelementfetchespercacheline.3)Itsimplifiesmemorym

How do you slice a Python list?May 02, 2025 am 12:14 AM

SlicingaPythonlistisdoneusingthesyntaxlist[start:stop:step].Here'showitworks:1)Startistheindexofthefirstelementtoinclude.2)Stopistheindexofthefirstelementtoexclude.3)Stepistheincrementbetweenelements.It'susefulforextractingportionsoflistsandcanuseneg

What are some common operations that can be performed on NumPy arrays?May 02, 2025 am 12:09 AM

NumPyallowsforvariousoperationsonarrays:1)Basicarithmeticlikeaddition,subtraction,multiplication,anddivision;2)Advancedoperationssuchasmatrixmultiplication;3)Element-wiseoperationswithoutexplicitloops;4)Arrayindexingandslicingfordatamanipulation;5)Ag

How are arrays used in data analysis with Python?May 02, 2025 am 12:09 AM

ArraysinPython,particularlythroughNumPyandPandas,areessentialfordataanalysis,offeringspeedandefficiency.1)NumPyarraysenableefficienthandlingoflargedatasetsandcomplexoperationslikemovingaverages.2)PandasextendsNumPy'scapabilitieswithDataFramesforstruc

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

4 weeks agoByDDD

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Blue Prince: How To Get To The Basement

3 weeks agoByDDD

Hot Tools

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.