@herbertbeckman - LinkedIn
@rndtavares - LinkedIn
Reliable AI agent in prod with Java Quarkus Langchain4j - Part 1 - AI as Service
Reliable AI agent in Java Quarkus Langchain4j prod - Part 2 - Memory (this article)
Reliable AI agent in prod with Java Quarkus Langchain4j - Part 3 - RAG (coming soon)
Trusted AI agent in prod with Java Quarkus Langchain4j - Part 4 - Guardrails (coming soon)
When we create an agent, we must keep in mind that LLMs do not store any type of information, that is, they are stateless. For our agent to have the ability to "remember" information, we must implement memory management. Quarkus already provides us with a configured default memory, the however is that it can literally take down your agent by blowing up the RAM memory made available to it, as described in this Quarkus documentation, if due care is not taken. To no longer have this problem and also to be able to use our agent in a scalable environment, we need a ChatMemoryStore.
We use a chat to interact with our agent and there are important concepts that we must know so that our interaction with him can occur in the best possible way and does not cause bugs in production. Firstly we need to know the types of messages we use when interacting with him, they are:
User Messages: The message or request sent by the end customer. When we send the message in Quarkus DevUI, we are always sending a UserMessage. Furthermore, it is also used in the results of tool calls that we saw before.
AI Messages (AiMessage): The response message from the model. Whenever LLM responds to our agent, he will receive a message like this. This type of message alternates its content between a textual response and tool execution requests.
SystemMessage: This message can only be defined once and is only at development time.
Now that you know the 3 types of messages we have, let's explain how they should behave with some graphics. All graphics were taken from the presentation Java meets AI: Build LLM-Powered Apps with LangChain4j by Deandrea, Andrianakis, Escoffier, I highly recommend the video.
The first graph demonstrates the use of the 3 types of messages. UserMessage in blue, SystemMessage in red and AiMessage in green.
This second graph demonstrates how "memory" should be managed. An interesting detail is that we must maintain a certain order in the messages and some premises must be respected.
Another important detail that you should pay attention to is the size of your ChatMemory. The larger the memory of your interaction, the higher the token costs, as the LLM will need to process more text to provide a response. Then establish a memory window that best suits your use case. One tip is to check the average number of messages from your customers to get an idea of the size of interaction. We will show the implementation through MessageWindowChatMemory, the class specialized in managing this for us in Langchain4j.
Now that we know all these concepts and premises, let's get our hands dirty!
Here we will use MongoDB as a ChatMemoryStore. We use the MongoDB doc and upload an instance to Docker. Feel free to configure it as you wish.
Let's start by adding the necessary dependency to connect to MongoDB using Quarkus.
<dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-mongodb-panache</artifactId> </dependency>
After the dependencies, we need to add the connection settings in our src/main/resources/application.properties.
quarkus.mongodb.connection-string=mongodb://${MONGODB_USER}:${MONGODB_PASSWORD}@localhost:27017 quarkus.mongodb.database=chat_memory
We still won't be able to test our connection to the base, as we need to create our entities and repositories first.
Now let's implement our Interaction entity. This entity will have our list of messages made. Whenever a new customer connects, a new Interaction will be generated. If we need to reuse this Interaction, we simply enter the same Interaction identifier.
package <seupacote>; import dev.langchain4j.data.message.ChatMessage; import io.quarkus.mongodb.panache.common.MongoEntity; import org.bson.codecs.pojo.annotations.BsonId; import java.util.List; import java.util.Objects; @MongoEntity(collection = "interactions") public class InteractionEntity { @BsonId private String interactionId; private List<ChatMessage> messages; public InteractionEntity() { } public InteractionEntity(String interactionId, List<ChatMessage> messages) { this.interactionId = interactionId; this.messages = messages; } public String getInteractionId() { return interactionId; } public void setInteractionId(String interactionId) { this.interactionId = interactionId; } public List<ChatMessage> getMessages() { return messages; } public void setMessages(List<ChatMessage> messages) { this.messages = messages; } @Override public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; InteractionEntity that = (InteractionEntity) o; return Objects.equals(interactionId, that.interactionId); } @Override public int hashCode() { return Objects.hash(interactionId, messages); } }
We can now create our repository.
<dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-mongodb-panache</artifactId> </dependency>
Now we will implement some langchain4j components, the ChatMemoryStore and the ChatMemoryProvider. ChatMemoryProvider is the class we will use in our Agent. In it we will add a ChatMemoryStore that will use our repository to store messages in our MongoDB. Follow ChatMemoryStore:
quarkus.mongodb.connection-string=mongodb://${MONGODB_USER}:${MONGODB_PASSWORD}@localhost:27017 quarkus.mongodb.database=chat_memory
The ChatMemoryProvider will look like this:
package <seupacote>; import dev.langchain4j.data.message.ChatMessage; import io.quarkus.mongodb.panache.common.MongoEntity; import org.bson.codecs.pojo.annotations.BsonId; import java.util.List; import java.util.Objects; @MongoEntity(collection = "interactions") public class InteractionEntity { @BsonId private String interactionId; private List<ChatMessage> messages; public InteractionEntity() { } public InteractionEntity(String interactionId, List<ChatMessage> messages) { this.interactionId = interactionId; this.messages = messages; } public String getInteractionId() { return interactionId; } public void setInteractionId(String interactionId) { this.interactionId = interactionId; } public List<ChatMessage> getMessages() { return messages; } public void setMessages(List<ChatMessage> messages) { this.messages = messages; } @Override public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; InteractionEntity that = (InteractionEntity) o; return Objects.equals(interactionId, that.interactionId); } @Override public int hashCode() { return Objects.hash(interactionId, messages); } }
Notice the MessageWindowChatMemory. This is where we implement the message window that we mentioned at the beginning of the article. In the maxMessages() method, you must change it to the number you think is best for your scenario. What I recommend is using the largest number of messages that have ever existed in your scenario, or using the average. Here we define the arbitrary number 100.
Let's now change our agent to use our new ChatMemoryProvider and add MemoryId. It should look like this:
package <seupacote>; import dev.langchain4j.data.message.ChatMessage; import io.quarkus.mongodb.panache.PanacheMongoRepositoryBase; import java.util.List; public class InteractionRepository implements PanacheMongoRepositoryBase<InteractionEntity, String> { public InteractionEntity findByInteractionId(String interactionId) { return findById(interactionId); } public void updateMessages(String interactionId, List<ChatMessage> messages) { persistOrUpdate(new InteractionEntity(interactionId, messages)); } public void deleteMessages(String interactionId) { deleteById(interactionId); } }
This should break our AgentWSEndpoint. Let's change it so that it receives the Interaction identifier and we can use it as our MemoryId:
package <seupacote>; import dev.langchain4j.data.message.ChatMessage; import dev.langchain4j.store.memory.chat.ChatMemoryStore; import java.util.List; import java.util.Objects; public class MongoDBChatMemoryStore implements ChatMemoryStore { private InteractionRepository interactionRepository = new InteractionRepository(); @Override public List<ChatMessage> getMessages(Object memoryId) { var interactionEntity = interactionRepository.findByInteractionId(memoryId.toString()); return Objects.isNull(interactionEntity) ? List.of() : interactionEntity.getMessages(); } @Override public void updateMessages(Object memoryId, List<ChatMessage> messages) { interactionRepository.updateMessages(memoryId.toString(), messages); } @Override public void deleteMessages(Object memoryId) { interactionRepository.deleteMessages(memoryId.toString()); } }
We can now test our agent again. To do this, we simply connect to websocket by passing a UUID whenever we want. You can generate a new UUID here, or use the uuidgen command in Linux.
When we carry out the test you will not receive any response from the agent. This happens because the agent is having problems writing our messages to MongoDB and it will show you this through an exception. So that we can check for this exception happening, we must add a new property to our src/main/resources/application.properties, which is the log level we want to see in Quarkus. Then, add the following line in it:
package <seupacote>; import dev.langchain4j.memory.chat.ChatMemoryProvider; import dev.langchain4j.memory.chat.MessageWindowChatMemory; import java.util.function.Supplier; public class MongoDBChatMemoryProvider implements Supplier<ChatMemoryProvider> { private MongoDBChatMemoryStore mongoDBChatMemoryStore = new MongoDBChatMemoryStore(); @Override public ChatMemoryProvider get() { return memoryId -> MessageWindowChatMemory.builder() .maxMessages(100) .id(memoryId) .chatMemoryStore(mongoDBChatMemoryStore) .build(); } }
Now test the agent. The exception should be this:
package <seupacote>; import dev.langchain4j.service.SystemMessage; import dev.langchain4j.service.UserMessage; import io.quarkiverse.langchain4j.RegisterAiService; import io.quarkiverse.langchain4j.ToolBox; import jakarta.enterprise.context.ApplicationScoped; @ApplicationScoped @RegisterAiService( chatMemoryProviderSupplier = MongoDBChatMemoryProvider.class ) public interface Agent { @ToolBox(AgentTools.class) @SystemMessage(""" Você é um agente especializado em futebol brasileiro, seu nome é FutAgentBR Você sabe responder sobre os principais títulos dos principais times brasileiros e da seleção brasileira Sua resposta precisa ser educada, você pode deve responder em Português brasileiro e de forma relevante a pergunta feita Quando você não souber a resposta, responda que você não sabe responder nesse momento mas saberá em futuras versões. """) String chat(@MemoryId String interactionId, @UserMessage String message); }
This exception occurs because MongoDB cannot handle Langchain4j's ChatMessage interface, so we must implement a codec to make this possible. Quarkus itself already offers us a codec, but we need to make it clear that we want to use it. We will then create the ChatMessageCodec and ChatMessageCodecProvider classes as follows:
package <seupacote>; import io.quarkus.websockets.next.OnTextMessage; import io.quarkus.websockets.next.WebSocket; import io.quarkus.websockets.next.WebSocketConnection; import jakarta.inject.Inject; import java.util.Objects; import java.util.UUID; @WebSocket(path = "/ws/{interactionId}") public class AgentWSEndpoint { private final Agent agent; private final WebSocketConnection connection; @Inject AgentWSEndpoint(Agent agent, WebSocketConnection connection) { this.agent = agent; this.connection = connection; } @OnTextMessage String reply(String message) { var interactionId = connection.pathParam("interactionId"); return agent.chat( Objects.isNull(interactionId) || interactionId.isBlank() ? UUID.randomUUID().toString() : interactionId, message ); } }
quarkus.log.level=DEBUG
Ready! Now we can test and verify the messages in our MongoDB. When querying, we can check the 3 types of messages in the document's messages array.
That ends the second part of our series. We hope you enjoyed it and see you in part 3.
The above is the detailed content of Reliable AI agent in prod with Java Quarkus Langchain - Memory Part. For more information, please follow other related articles on the PHP Chinese website!