Home >Technology peripherals >AI >Hundreds of billions of ultra-large-scale vector databases are accelerating the evolution of AI

Hundreds of billions of ultra-large-scale vector databases are accelerating the evolution of AI

WBOY
WBOYforward
2023-11-24 20:46:571524browse

When the "War of the Gods" started in large-scale models, a fatal problem arose that made those users who tried it intolerable. There is a common problem in many large-scale models, which is "seriously talking nonsense". This is what we often call "AI illusion". So, how do you make large models more accurate, smarter, and less gibberish? In addition to model frameworks, data and algorithms, there is also a key application, and that is vector databases!

Hundreds of billions of ultra-large-scale vector databases are accelerating the evolution of AI

Behind the Data Center

There are many different interpretations of the relationship between vector databases and large models and their importance. A more vivid way of saying it is that if a large model is compared to a brain that is easy to forget, then the vector database is equivalent to the "hippocampus" in it, which is mainly responsible for functions such as storage and directional memory. From an anatomical point of view, if a person's hippocampus is removed, the person will lose the ability to long-term memory and be unable to perceive information such as sound, light, taste, etc.

To put it bluntly, the fundamental reason why large models have hallucinations is that the vector database of large models is not powerful enough. As a result, large models can only find answers from given data. The results of inference are often generalized or nonsense, which is extremely influential. experience. Therefore, whether a large model is smart or not depends on whether the vector database is powerful. This is also the fundamental reason why Tencent Cloud focuses on vector databases to build an AGI "data center".

Some people may think: If I improve data scheduling capabilities at the data center level, can traditional relational databases also support it? But the reality is that when enterprises build and use large models, they first need to safely and efficiently connect massive data to the large model. Among the many complex data, only 20% are suitable for relational databases and the remaining 80% are structured data. They are all unstructured data such as text, images, videos, and audios. The vector database can process complex unstructured data into multi-dimensional logical coordinate values ​​and connect it to large models. The data processing efficiency is 10 times higher than that of traditional databases.

At the same time, the vector database can also be used as an external knowledge base to deliver the latest, most accurate, and comprehensive information to large models, efficiently respond to real-time questions and answers, and allow large models to have long-term memory to avoid fragmentation during chat. In this way, it is easier to understand that vector databases and large models are the best partners.

Professional vector database VS traditional database vector plug-in

In fact, with vector databases as the main track behind large models, leading companies are already on the journey of innovation. According to preliminary statistics, there are already more than 50 manufacturers working on vector databases. From the specific technical route, it is mainly divided into two categories: one is a professional vector-native database, which has been designed for vectors since its birth and can store, unlock, and query vector data structures; the other is a traditional database A vector plug-in has been added to enable vector retrieval.

Comparative analysis, both methods have their own application scenarios. For example, when a company just starts, the amount of data is not large and does not want to introduce a new database, then you can choose the traditional database vector plug-in method. But if the enterprise has a large amount of data, wants to build smarter large models, and has higher requirements for performance and future development, then choosing a professional vector database product like Tencent Cloud will obviously be more suitable.

From the application perspective of vector databases, there is still more potential. Currently, many companies are using vector databases to address weaknesses such as the illusion of large models and knowledge enhancement. However, future development is not limited to these capabilities, but can also achieve better performance in image query. For example, you can query photos on your mobile phone, similar to an image search engine, which is actually a vector query

Professional vector databases cannot replace traditional databases, especially in large-scale scenarios. Traditional relational databases and vector databases can develop collaboratively and complement each other. Vector databases use vectorized data to meet the needs of large-scale data, low-latency high-concurrency retrieval, fuzzy matching and other fields that are difficult to handle with traditional relational databases. Vector databases only support new data types and do not store original data, while traditional databases support traditional data types such as numerical values, strings, and time. The data scale supported by traditional databases is relatively small, and can only support up to 100 million pieces of data, while vector databases can support large-scale data, with the bottom line being 100 billion pieces of data. The query method of traditional databases is precise search, which either meets the conditions or does not meet the conditions; while vector databases use approximate searches, where the query structure and input conditions must be as similar as possible, and the requirements for computing power are also higher. Upper-layer applications can use a unified API method, which is more suitable for the deployment and use of large-scale artificial intelligence applications

INTELLIGENT EVOLUTION

Large models do not start from scratch, nor do vector databases. So, how did the vector database develop? The Tencent Cloud Database team once thought deeply!

Luo Yun, deputy general manager of Tencent Cloud Database, believes that the essence of a large model should not be an infinitely large storage body, but a platform with intelligent computing capabilities, which combines the underlying computing capabilities that were previously only accessible through programming languages. , using natural language to schedule, this should be an exciting singularity. While excited, I once again thought calmly. In the process of human beings completing digital transformation, besides computing platforms, are there any other possibilities? What exactly is the technical core of the AGI era? In summary, it is found that the intelligent circulation of underlying data is the golden key to leveraging the data center!

Nowadays, when enterprises have general intelligent computing capabilities, the underlying data can flow quickly. We can store files in the file system, and we can call table data in relational databases and KV data in non-relational databases. , all data can be circulated and linked in an intelligent way. But if you want data to talk to humans, it is not enough to have a computing platform. You also need an intelligent data platform that can use natural language to extract the data and then hand it over to the large model for calculation. To achieve this goal, vector database It becomes an important hub.

Since the vector database is so important, how should we talk to the data platform based on traditional database experience through intelligent upgrades? This is exactly the specialty of Tencent Cloud Database! At the Tencent Cloud Vector Database Technology Summit, Tencent Cloud announced that it had completed a test in cooperation with a third-party organization, proving that Tencent Cloud Vector Database can support hundreds of billions of data and significantly increased the query rate per second, reaching 5 million. Peak capacity

At present, Tencent Cloud Vector Database already has a large number of users, including companies such as Baichuan Intelligence, TAL, and SalesEasy. Recently, they made an AGI launch plan with Baichuan, giving away 4 million Tokens of vector database instances and Baichuan2 large models.

Through core technologies such as Embedding, vector indexing, distributed system architecture, and hardware acceleration, Tencent Cloud Vector Database can effectively solve specific problems in text, images, videos, including biopharmaceuticals, risk control, audio, multi-modal and other broad scenarios. question. For example: use Embedding technology to map high-dimensional data (such as text, pictures, audio) to low-dimensional space, that is, convert pictures, sounds and text into vectors to represent them, and store these vectors to form a vector database to realize the Embedding process Methods include neural networks, LSH (locality sensitive hashing algorithm), etc.

Tencent has been committed to improving the capabilities of vector databases since 2019 and leading enterprise business into the AGI era. To date, Tencent Cloud has provided services to more than 40 internal customers, supporting more than 160 billion vector data retrievals every day. At the same time, Tencent Cloud also provides services to 1,000 external customers, and the growth rate is amazing

Looking to the future, AGI is accelerating its evolution, which brings surprises and challenges. Tencent Cloud Database will continue to explore and lead innovation as always. "Road to AGI, Together on the Path" - this sentence perfectly summarizes the current status of Tencent Cloud's technical team!

The above is the detailed content of Hundreds of billions of ultra-large-scale vector databases are accelerating the evolution of AI. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:sohu.com. If there is any infringement, please contact admin@php.cn delete