Home > Article > Technology peripherals > Build an LLM application: Leveraging the vector search capabilities of Azure Cognitive Services
Author | Simon Bisson
curator | Ethan
Microsoft’s cognitive search API now offers vector search as a service for use with large language models in Azure OpenAI and more.
Tools such as Semantic Core, TypeChat and LangChain make it possible to build applications around generative AI technologies such as Azure OpenAI. This is because they allow constraints to be imposed on the underlying large language model (LLM), which can be used as a tool for building and running natural language interfaces
Essentially, an LLM is a tool for navigating a semantic space , where a deep neural network can predict the next syllable in a chain of tokens starting from the initial cue. If the prompt is open-ended, the LLM may exceed its input scope and produce something that seems reasonable but is actually complete nonsense.
Just as we tend to trust the output of search engines, we also tend to trust the output of LLM because we view them as another aspect of familiar technology. But training large language models using trusted data from sites like Wikipedia, Stack Overflow, and Reddit doesn't convey an understanding of the content; it simply gives the ability to generate text that follows the same patterns as the text in those sources. . Sometimes the output may be correct, but other times it may be wrong.
How do we avoid errors and meaningless output from large language models and ensure that our users get accurate and reasonable answers to their queries?
What we need to do is limit LLM to ensure that it only generates text from smaller data sets. This is where Microsoft's new LLM-based development stack comes in. It provides the necessary tools to control your model and prevent it from generating errors
You can force a specific output format by using tools like TypeChat, or use orchestration pipelines like Semantic Kernel to handle other possible information source, thereby effectively "rooting" the model in the known semantic space, thus constraining the LLM. Here, LLM can do what it does well, summarize the constructed prompt and generate text based on that prompt, without overshooting (or at least significantly reducing the likelihood of overshooting).
What Microsoft calls "semantic memory" is the basis of the last method. Semantic memory uses vector search to provide hints that can be used to provide the factual output of the LLM. The vector database manages the context of the initial prompt, the vector search looks for stored data that matches the initial user query, and the LLM generates the text based on that data. See this approach in action in Bing Chat, which uses Bing's native vector search tools to build answers derived from its search database
Semantic memory enables vector databases and vector searches to be provided based on LLM means of application. You can choose to use one of the growing number of open source vector databases, or add vector indexes to your familiar SQL and NoSQL databases. One new product that looks particularly useful extends Azure Cognitive Search, adding a vector index to your data and providing a new API for querying that index
Azure Cognitive Search is built on Microsoft's own search tools. It provides a combination of familiar Lucene queries and its own natural language query tools. Azure Cognitive Search is a software-as-a-service platform that can host private data and access content using Cognitive Services APIs. Recently, Microsoft also added support for building and using vector indexes, which allows you to use similarity searches to rank relevant results in your data and use them in AI-based applications. This makes Azure Cognitive Search ideal for Azure-hosted LLM applications built with Semantic Kernel and Azure OpenAI, and Semantic Kernel plugins for Cognitive Search for C# and Python are also available
with other Azure Like services, Azure Cognitive Search is a managed service that works with other Azure services. It allows you to index and search across various Azure storage services, hosting text, images, audio and video. Data is stored in multiple regions, providing high availability and reducing latency and response times. Additionally, for enterprise applications, you can use Microsoft Entra ID (the new name for Azure Active Directory) to control access to private data
Required Note that Azure Cognitive Search is a "bring your own embedding vector" service. Cognitive Search won't generate the vector embeddings you need, so you need to use Azure OpenAI or the OpenAI embedding API to create embeddings for your content. This may require chunking large files to ensure you stay within the service's token limits. When needed, be prepared to create new tables to index vector data
In Azure Cognitive Search, vector search uses a nearest neighbor model to return a user-selected number of documents that are similar to the original query. This process calls vector indexing by using the vector embedding of the original query and returns similar vector and index content from the database ready for use by the LLM prompt
Microsoft uses this vector store as part of the Retrieval Augmented Generation (RAG) design pattern for Azure Machine Learning and in conjunction with its prompt flow tool. RAG leverages vector indexing in cognitive search to build the context that forms the basis of LLM prompts. This gives you a low-code way to build and use vector indexes, such as setting the number of similar documents returned by a query
Usage Azure Cognitive Search makes vector queries very easy. Start by creating resources for Azure OpenAI and Cognitive Search in the same region. This will allow you to load the search index with embeds with minimal latency. You need to call the Azure OpenAI API and Cognitive Search API to load the index, so it's a good idea to make sure your code can respond to any possible rate limits in the service for you by adding code that manages retries. When you use the service API, you should use asynchronous calls to generate embeds and load indexes.
Vectors are stored in search indexes as vector fields, where vectors are floating point numbers with dimensions. These vectors are mapped through a hierarchical navigable small-world neighborhood graph that sorts vectors into neighborhoods of similar vectors, speeding up the actual process of searching for vector indexes.
After defining the index schema for vector search, you can load data into the index for cognitive search. Note that data may be associated with multiple vectors. For example, if you use cognitive search to host company documents, you might have a separate vector for key document metadata terms and document content. The dataset must be stored as a JSON document, which simplifies the process of using the results to assemble prompt context. The index does not need to contain the source document as it supports using the most common Azure storage options
Before running the query, you need to first call the embedded model of your choice with the query body. This returns a multidimensional vector that you can use to search the index of your choice. When calling the vector search API, specify the target vector index, the desired number of matches, and the relevant text fields in the index. Choosing the appropriate similarity measure can be very helpful for queries, the most commonly used of which is the cosine metric
Azure Cognitive Search’s vector capabilities go beyond just matching text . Cognitive Search can be used with multilingual embeddings to support document searches across multiple languages. You can also use more complex APIs. For example, you can mix Bing semantic search tools in Hybrid Search to provide more accurate results, thereby improving the quality of output from LLM-powered applications.
Microsoft is rapidly productizing the tools and technology it used to build its own GPT-4-based Bing search engine and various Copilots. Orchestration engines like Semantic Kernel and Azure AI Studio’s prompt flow are core to Microsoft’s approach to working with large language models. Now that those foundations have been laid, we're seeing the company roll out more of the necessary enabling technology. Vector search and vector indexing are key to providing accurate responses. By building familiar tools to deliver these services, Microsoft will help us minimize the cost and learning curve
The above is the detailed content of Build an LLM application: Leveraging the vector search capabilities of Azure Cognitive Services. For more information, please follow other related articles on the PHP Chinese website!