


Build an LLM application: Leveraging the vector search capabilities of Azure Cognitive Services
Author | Simon Bisson
curator | Ethan
Microsoft’s cognitive search API now offers vector search as a service for use with large language models in Azure OpenAI and more.
Tools such as Semantic Core, TypeChat and LangChain make it possible to build applications around generative AI technologies such as Azure OpenAI. This is because they allow constraints to be imposed on the underlying large language model (LLM), which can be used as a tool for building and running natural language interfaces
Essentially, an LLM is a tool for navigating a semantic space , where a deep neural network can predict the next syllable in a chain of tokens starting from the initial cue. If the prompt is open-ended, the LLM may exceed its input scope and produce something that seems reasonable but is actually complete nonsense.
Just as we tend to trust the output of search engines, we also tend to trust the output of LLM because we view them as another aspect of familiar technology. But training large language models using trusted data from sites like Wikipedia, Stack Overflow, and Reddit doesn't convey an understanding of the content; it simply gives the ability to generate text that follows the same patterns as the text in those sources. . Sometimes the output may be correct, but other times it may be wrong.
How do we avoid errors and meaningless output from large language models and ensure that our users get accurate and reasonable answers to their queries?
1. Limit large models with semantic memory constraints
What we need to do is limit LLM to ensure that it only generates text from smaller data sets. This is where Microsoft's new LLM-based development stack comes in. It provides the necessary tools to control your model and prevent it from generating errors
You can force a specific output format by using tools like TypeChat, or use orchestration pipelines like Semantic Kernel to handle other possible information source, thereby effectively "rooting" the model in the known semantic space, thus constraining the LLM. Here, LLM can do what it does well, summarize the constructed prompt and generate text based on that prompt, without overshooting (or at least significantly reducing the likelihood of overshooting).
What Microsoft calls "semantic memory" is the basis of the last method. Semantic memory uses vector search to provide hints that can be used to provide the factual output of the LLM. The vector database manages the context of the initial prompt, the vector search looks for stored data that matches the initial user query, and the LLM generates the text based on that data. See this approach in action in Bing Chat, which uses Bing's native vector search tools to build answers derived from its search database
Semantic memory enables vector databases and vector searches to be provided based on LLM means of application. You can choose to use one of the growing number of open source vector databases, or add vector indexes to your familiar SQL and NoSQL databases. One new product that looks particularly useful extends Azure Cognitive Search, adding a vector index to your data and providing a new API for querying that index
2. Adding vector indexes to Azure Cognitive Search
Azure Cognitive Search is built on Microsoft's own search tools. It provides a combination of familiar Lucene queries and its own natural language query tools. Azure Cognitive Search is a software-as-a-service platform that can host private data and access content using Cognitive Services APIs. Recently, Microsoft also added support for building and using vector indexes, which allows you to use similarity searches to rank relevant results in your data and use them in AI-based applications. This makes Azure Cognitive Search ideal for Azure-hosted LLM applications built with Semantic Kernel and Azure OpenAI, and Semantic Kernel plugins for Cognitive Search for C# and Python are also available
with other Azure Like services, Azure Cognitive Search is a managed service that works with other Azure services. It allows you to index and search across various Azure storage services, hosting text, images, audio and video. Data is stored in multiple regions, providing high availability and reducing latency and response times. Additionally, for enterprise applications, you can use Microsoft Entra ID (the new name for Azure Active Directory) to control access to private data
3. Generate and store embedding vectors for content
Required Note that Azure Cognitive Search is a "bring your own embedding vector" service. Cognitive Search won't generate the vector embeddings you need, so you need to use Azure OpenAI or the OpenAI embedding API to create embeddings for your content. This may require chunking large files to ensure you stay within the service's token limits. When needed, be prepared to create new tables to index vector data
In Azure Cognitive Search, vector search uses a nearest neighbor model to return a user-selected number of documents that are similar to the original query. This process calls vector indexing by using the vector embedding of the original query and returns similar vector and index content from the database ready for use by the LLM prompt
Microsoft uses this vector store as part of the Retrieval Augmented Generation (RAG) design pattern for Azure Machine Learning and in conjunction with its prompt flow tool. RAG leverages vector indexing in cognitive search to build the context that forms the basis of LLM prompts. This gives you a low-code way to build and use vector indexes, such as setting the number of similar documents returned by a query
4, Getting started with vector search in Azure Cognitive Search
Usage Azure Cognitive Search makes vector queries very easy. Start by creating resources for Azure OpenAI and Cognitive Search in the same region. This will allow you to load the search index with embeds with minimal latency. You need to call the Azure OpenAI API and Cognitive Search API to load the index, so it's a good idea to make sure your code can respond to any possible rate limits in the service for you by adding code that manages retries. When you use the service API, you should use asynchronous calls to generate embeds and load indexes.
Vectors are stored in search indexes as vector fields, where vectors are floating point numbers with dimensions. These vectors are mapped through a hierarchical navigable small-world neighborhood graph that sorts vectors into neighborhoods of similar vectors, speeding up the actual process of searching for vector indexes.
After defining the index schema for vector search, you can load data into the index for cognitive search. Note that data may be associated with multiple vectors. For example, if you use cognitive search to host company documents, you might have a separate vector for key document metadata terms and document content. The dataset must be stored as a JSON document, which simplifies the process of using the results to assemble prompt context. The index does not need to contain the source document as it supports using the most common Azure storage options
Before running the query, you need to first call the embedded model of your choice with the query body. This returns a multidimensional vector that you can use to search the index of your choice. When calling the vector search API, specify the target vector index, the desired number of matches, and the relevant text fields in the index. Choosing the appropriate similarity measure can be very helpful for queries, the most commonly used of which is the cosine metric
5. Beyond simple text vectors
Azure Cognitive Search’s vector capabilities go beyond just matching text . Cognitive Search can be used with multilingual embeddings to support document searches across multiple languages. You can also use more complex APIs. For example, you can mix Bing semantic search tools in Hybrid Search to provide more accurate results, thereby improving the quality of output from LLM-powered applications.
Microsoft is rapidly productizing the tools and technology it used to build its own GPT-4-based Bing search engine and various Copilots. Orchestration engines like Semantic Kernel and Azure AI Studio’s prompt flow are core to Microsoft’s approach to working with large language models. Now that those foundations have been laid, we're seeing the company roll out more of the necessary enabling technology. Vector search and vector indexing are key to providing accurate responses. By building familiar tools to deliver these services, Microsoft will help us minimize the cost and learning curve
The above is the detailed content of Build an LLM application: Leveraging the vector search capabilities of Azure Cognitive Services. For more information, please follow other related articles on the PHP Chinese website!
![Can't use ChatGPT! Explaining the causes and solutions that can be tested immediately [Latest 2025]](https://img.php.cn/upload/article/001/242/473/174717025174979.jpg?x-oss-process=image/resize,p_40)
ChatGPT is not accessible? This article provides a variety of practical solutions! Many users may encounter problems such as inaccessibility or slow response when using ChatGPT on a daily basis. This article will guide you to solve these problems step by step based on different situations. Causes of ChatGPT's inaccessibility and preliminary troubleshooting First, we need to determine whether the problem lies in the OpenAI server side, or the user's own network or device problems. Please follow the steps below to troubleshoot: Step 1: Check the official status of OpenAI Visit the OpenAI Status page (status.openai.com) to see if the ChatGPT service is running normally. If a red or yellow alarm is displayed, it means Open

On 10 May 2025, MIT physicist Max Tegmark told The Guardian that AI labs should emulate Oppenheimer’s Trinity-test calculus before releasing Artificial Super-Intelligence. “My assessment is that the 'Compton constant', the probability that a race to

AI music creation technology is changing with each passing day. This article will use AI models such as ChatGPT as an example to explain in detail how to use AI to assist music creation, and explain it with actual cases. We will introduce how to create music through SunoAI, AI jukebox on Hugging Face, and Python's Music21 library. Through these technologies, everyone can easily create original music. However, it should be noted that the copyright issue of AI-generated content cannot be ignored, and you must be cautious when using it. Let’s explore the infinite possibilities of AI in the music field together! OpenAI's latest AI agent "OpenAI Deep Research" introduces: [ChatGPT]Ope

The emergence of ChatGPT-4 has greatly expanded the possibility of AI applications. Compared with GPT-3.5, ChatGPT-4 has significantly improved. It has powerful context comprehension capabilities and can also recognize and generate images. It is a universal AI assistant. It has shown great potential in many fields such as improving business efficiency and assisting creation. However, at the same time, we must also pay attention to the precautions in its use. This article will explain the characteristics of ChatGPT-4 in detail and introduce effective usage methods for different scenarios. The article contains skills to make full use of the latest AI technologies, please refer to it. OpenAI's latest AI agent, please click the link below for details of "OpenAI Deep Research"

ChatGPT App: Unleash your creativity with the AI assistant! Beginner's Guide The ChatGPT app is an innovative AI assistant that handles a wide range of tasks, including writing, translation, and question answering. It is a tool with endless possibilities that is useful for creative activities and information gathering. In this article, we will explain in an easy-to-understand way for beginners, from how to install the ChatGPT smartphone app, to the features unique to apps such as voice input functions and plugins, as well as the points to keep in mind when using the app. We'll also be taking a closer look at plugin restrictions and device-to-device configuration synchronization

ChatGPT Chinese version: Unlock new experience of Chinese AI dialogue ChatGPT is popular all over the world, did you know it also offers a Chinese version? This powerful AI tool not only supports daily conversations, but also handles professional content and is compatible with Simplified and Traditional Chinese. Whether it is a user in China or a friend who is learning Chinese, you can benefit from it. This article will introduce in detail how to use ChatGPT Chinese version, including account settings, Chinese prompt word input, filter use, and selection of different packages, and analyze potential risks and response strategies. In addition, we will also compare ChatGPT Chinese version with other Chinese AI tools to help you better understand its advantages and application scenarios. OpenAI's latest AI intelligence

These can be thought of as the next leap forward in the field of generative AI, which gave us ChatGPT and other large-language-model chatbots. Rather than simply answering questions or generating information, they can take action on our behalf, inter

Efficient multiple account management techniques using ChatGPT | A thorough explanation of how to use business and private life! ChatGPT is used in a variety of situations, but some people may be worried about managing multiple accounts. This article will explain in detail how to create multiple accounts for ChatGPT, what to do when using it, and how to operate it safely and efficiently. We also cover important points such as the difference in business and private use, and complying with OpenAI's terms of use, and provide a guide to help you safely utilize multiple accounts. OpenAI


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

Zend Studio 13.0.1
Powerful PHP integrated development environment

Notepad++7.3.1
Easy-to-use and free code editor
