


Build an LLM application: Leveraging the vector search capabilities of Azure Cognitive Services
Author | Simon Bisson
curator | Ethan
Microsoft’s cognitive search API now offers vector search as a service for use with large language models in Azure OpenAI and more.
Tools such as Semantic Core, TypeChat and LangChain make it possible to build applications around generative AI technologies such as Azure OpenAI. This is because they allow constraints to be imposed on the underlying large language model (LLM), which can be used as a tool for building and running natural language interfaces
Essentially, an LLM is a tool for navigating a semantic space , where a deep neural network can predict the next syllable in a chain of tokens starting from the initial cue. If the prompt is open-ended, the LLM may exceed its input scope and produce something that seems reasonable but is actually complete nonsense.
Just as we tend to trust the output of search engines, we also tend to trust the output of LLM because we view them as another aspect of familiar technology. But training large language models using trusted data from sites like Wikipedia, Stack Overflow, and Reddit doesn't convey an understanding of the content; it simply gives the ability to generate text that follows the same patterns as the text in those sources. . Sometimes the output may be correct, but other times it may be wrong.
How do we avoid errors and meaningless output from large language models and ensure that our users get accurate and reasonable answers to their queries?
1. Limit large models with semantic memory constraints
What we need to do is limit LLM to ensure that it only generates text from smaller data sets. This is where Microsoft's new LLM-based development stack comes in. It provides the necessary tools to control your model and prevent it from generating errors
You can force a specific output format by using tools like TypeChat, or use orchestration pipelines like Semantic Kernel to handle other possible information source, thereby effectively "rooting" the model in the known semantic space, thus constraining the LLM. Here, LLM can do what it does well, summarize the constructed prompt and generate text based on that prompt, without overshooting (or at least significantly reducing the likelihood of overshooting).
What Microsoft calls "semantic memory" is the basis of the last method. Semantic memory uses vector search to provide hints that can be used to provide the factual output of the LLM. The vector database manages the context of the initial prompt, the vector search looks for stored data that matches the initial user query, and the LLM generates the text based on that data. See this approach in action in Bing Chat, which uses Bing's native vector search tools to build answers derived from its search database
Semantic memory enables vector databases and vector searches to be provided based on LLM means of application. You can choose to use one of the growing number of open source vector databases, or add vector indexes to your familiar SQL and NoSQL databases. One new product that looks particularly useful extends Azure Cognitive Search, adding a vector index to your data and providing a new API for querying that index
2. Adding vector indexes to Azure Cognitive Search
Azure Cognitive Search is built on Microsoft's own search tools. It provides a combination of familiar Lucene queries and its own natural language query tools. Azure Cognitive Search is a software-as-a-service platform that can host private data and access content using Cognitive Services APIs. Recently, Microsoft also added support for building and using vector indexes, which allows you to use similarity searches to rank relevant results in your data and use them in AI-based applications. This makes Azure Cognitive Search ideal for Azure-hosted LLM applications built with Semantic Kernel and Azure OpenAI, and Semantic Kernel plugins for Cognitive Search for C# and Python are also available
with other Azure Like services, Azure Cognitive Search is a managed service that works with other Azure services. It allows you to index and search across various Azure storage services, hosting text, images, audio and video. Data is stored in multiple regions, providing high availability and reducing latency and response times. Additionally, for enterprise applications, you can use Microsoft Entra ID (the new name for Azure Active Directory) to control access to private data
3. Generate and store embedding vectors for content
Required Note that Azure Cognitive Search is a "bring your own embedding vector" service. Cognitive Search won't generate the vector embeddings you need, so you need to use Azure OpenAI or the OpenAI embedding API to create embeddings for your content. This may require chunking large files to ensure you stay within the service's token limits. When needed, be prepared to create new tables to index vector data
In Azure Cognitive Search, vector search uses a nearest neighbor model to return a user-selected number of documents that are similar to the original query. This process calls vector indexing by using the vector embedding of the original query and returns similar vector and index content from the database ready for use by the LLM prompt
Microsoft uses this vector store as part of the Retrieval Augmented Generation (RAG) design pattern for Azure Machine Learning and in conjunction with its prompt flow tool. RAG leverages vector indexing in cognitive search to build the context that forms the basis of LLM prompts. This gives you a low-code way to build and use vector indexes, such as setting the number of similar documents returned by a query
4, Getting started with vector search in Azure Cognitive Search
Usage Azure Cognitive Search makes vector queries very easy. Start by creating resources for Azure OpenAI and Cognitive Search in the same region. This will allow you to load the search index with embeds with minimal latency. You need to call the Azure OpenAI API and Cognitive Search API to load the index, so it's a good idea to make sure your code can respond to any possible rate limits in the service for you by adding code that manages retries. When you use the service API, you should use asynchronous calls to generate embeds and load indexes.
Vectors are stored in search indexes as vector fields, where vectors are floating point numbers with dimensions. These vectors are mapped through a hierarchical navigable small-world neighborhood graph that sorts vectors into neighborhoods of similar vectors, speeding up the actual process of searching for vector indexes.
After defining the index schema for vector search, you can load data into the index for cognitive search. Note that data may be associated with multiple vectors. For example, if you use cognitive search to host company documents, you might have a separate vector for key document metadata terms and document content. The dataset must be stored as a JSON document, which simplifies the process of using the results to assemble prompt context. The index does not need to contain the source document as it supports using the most common Azure storage options
Before running the query, you need to first call the embedded model of your choice with the query body. This returns a multidimensional vector that you can use to search the index of your choice. When calling the vector search API, specify the target vector index, the desired number of matches, and the relevant text fields in the index. Choosing the appropriate similarity measure can be very helpful for queries, the most commonly used of which is the cosine metric
5. Beyond simple text vectors
Azure Cognitive Search’s vector capabilities go beyond just matching text . Cognitive Search can be used with multilingual embeddings to support document searches across multiple languages. You can also use more complex APIs. For example, you can mix Bing semantic search tools in Hybrid Search to provide more accurate results, thereby improving the quality of output from LLM-powered applications.
Microsoft is rapidly productizing the tools and technology it used to build its own GPT-4-based Bing search engine and various Copilots. Orchestration engines like Semantic Kernel and Azure AI Studio’s prompt flow are core to Microsoft’s approach to working with large language models. Now that those foundations have been laid, we're seeing the company roll out more of the necessary enabling technology. Vector search and vector indexing are key to providing accurate responses. By building familiar tools to deliver these services, Microsoft will help us minimize the cost and learning curve
The above is the detailed content of Build an LLM application: Leveraging the vector search capabilities of Azure Cognitive Services. For more information, please follow other related articles on the PHP Chinese website!

Large language models (LLMs) have surged in popularity, with the tool-calling feature dramatically expanding their capabilities beyond simple text generation. Now, LLMs can handle complex automation tasks such as dynamic UI creation and autonomous a

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

“History has shown that while technological progress drives economic growth, it does not on its own ensure equitable income distribution or promote inclusive human development,” writes Rebeca Grynspan, Secretary-General of UNCTAD, in the preamble.

Easy-peasy, use generative AI as your negotiation tutor and sparring partner. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining

The TED2025 Conference, held in Vancouver, wrapped its 36th edition yesterday, April 11. It featured 80 speakers from more than 60 countries, including Sam Altman, Eric Schmidt, and Palmer Luckey. TED’s theme, “humanity reimagined,” was tailor made

Joseph Stiglitz is renowned economist and recipient of the Nobel Prize in Economics in 2001. Stiglitz posits that AI can worsen existing inequalities and consolidated power in the hands of a few dominant corporations, ultimately undermining economic

Graph Databases: Revolutionizing Data Management Through Relationships As data expands and its characteristics evolve across various fields, graph databases are emerging as transformative solutions for managing interconnected data. Unlike traditional

Large Language Model (LLM) Routing: Optimizing Performance Through Intelligent Task Distribution The rapidly evolving landscape of LLMs presents a diverse range of models, each with unique strengths and weaknesses. Some excel at creative content gen


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

Notepad++7.3.1
Easy-to-use and free code editor

Atom editor mac version download
The most popular open source editor

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.