


A US$38 billion data giant wants to launch an 'AI' revolution in enterprises
Author | Wan Chen, Li Yuan
Editor | Jingyu
On June 28, local time, Databricks, a well-known American data platform, held its own annual conference, the Data and Artificial Intelligence Summit. At the meeting, Databricks announced a series of important new products such as LakehouseIQ, Lakehouse AI, Databricks Marketplace, and Lakehouse Apps.
Whether it is the name of the summit or the naming of new products, it can be seen that this well-known data platform is taking advantage of the big language model to accelerate its transformation to AI.
Databricks CEO Ali Ghodsi said the inclusiveness of data and AI|Databricks
"What Databricks wants to achieve is "data inclusiveness" and AI inclusiveness. The former allows data to reach every employee, and the latter allows AI to enter every product. Databricks CEO Ali Ghodsi announced the team's mission in his speech .
Just before the conference started, Databricks had just announced the acquisition of MosaicML, a new force in the AI field, for US$1.3 billion, setting a current acquisition record in the AI field, which shows the company's strength and determination in AI transformation.
Liu Qi, founder and CEO of PingCAP, who is participating in the meeting ahead, told Geek Park that the Databricks platform has just launched enterprise-level AI applications, and more than 1,500 companies have already trained models on it. "The numbers exceed expectations." At the same time, he believes that Databricks’ previous accumulation in data AI allowed the company to quickly add new products based on the previous platform when AI became popular, and to quickly provide services related to large models.
"The most critical thing is speed." Liu Qi said that in the era of large models, how to integrate large models with existing products faster and solve users' pain points may be the biggest challenge for all data companies at the moment. It is also the biggest opportunity.
Talking points
- Through the upgrade of the interactive interface, ordinary people who are not data analysts can directly use natural language to query and analyze data.
- It will become easier and easier for enterprises to deploy large models to cloud databases, and it will also become easier to directly use finished large model tools to analyze data.
- With the advancement of AI, the value of data will become higher and higher, and the potential of data will be further released.
Database welcomes natural language interaction
Databricks released a new LakehouseIQ tool at the conference, which was hailed as an "artifact". LakehouseIQ carries one of Databricks' biggest recent efforts - the universalization of data analysis. That is, ordinary people who do not master Python and SQL can easily access company data and conduct data analysis using natural language.
To achieve this goal, LakehouseIQ is designed as a collection of functions that can be used by both ordinary end users and developers, with different functions designed for different users.
LakehouseIQ Product Picture|Databricks
For developers, LakehouseIQ in Notebooks has been released. In this feature, LakehouseIQ can use large language models to help developers complete, generate and interpret code, as well as perform code repair, debugging and report generation.
For ordinary non-programmers, Databricks provides an interface that can be directly interacted with natural language. It is driven by a large language model and can directly use natural language to search and query data. At the same time, this function is integrated with Unity Catalog, allowing companies to control access to data searches and queries, and only return data that the questioner is authorized to view.
Since the launch of large models, using natural language to query and analyze data has actually been a hot topic in the direction of data analysis, and many companies have made plans in this direction. Including Databricks’ old rival Snowflake, the just-announced Document AI feature also focuses on this direction.
LakehouseIQ Natural Language Query Interface|Databricks
However, Databricks claims that LakehouseIQ is functionally superior. It points out that general purpose big language models have limitations in understanding specific customer data, internal terminology and usage patterns. Databricks’ technology leverages customers’ own schemas, documents, queries, popularity, threads, notebooks and business intelligence dashboards to gain intelligence and answer more queries.
There is another difference between the functions of Databricks and Snowflake. The Document AI function of the Snowflake platform is limited to querying unstructured data in documents, while LakehouseIQ is suitable for structured Lakehouse data and code.
02
From machine learning to AI
The similarities between Databricks and Snowflake at the launch don’t end there.
In this conference, Databricks released Databricks Marketplace and Lakehouse AI, which are completely consistent with the focus of Snowflake's two-day conference. Both focus on deploying large language models into database environments.
In Databricks’ vision, Databricks can not only assist customers in deploying large models in the future, but also provide finished large model tools.
Databricks used to have the Databricks Machine Learning brand. At this press conference, Databricks fully repositioned its brand and upgraded it to Lakehouse AI, focusing on assisting customers in deploying large models.
Databricks Marketplace is now available on Databricks. In the Databricks Marketplace, users can access a large, screened collection of open source language models, including MPT-7B, Falcon-7B, and Stable Diffusion, and can also discover and obtain data sets and data assets. Lakehouse AI also provides some large language model operations (LLMOps) functions.
Lakehouse AI Architecture Diagram|Databricks
Snowflake is also actively deploying this, and its similar functions are provided by Nvidia NeMo, Nvidia AI Enterprise, Dataiku and John Snow Labs (cooperation with Nvidia is one of the highlights of the Snowflake conference, see Geek Park's report) .
Snowflake and Databricks have differences in helping customers deploy large models. Snowflake has chosen to actively engage with partners, while Databricks has sought to add the functionality as a native feature of its core platform.
In terms of providing finished tools, Databricks announced that the Databricks Marketplace will also provide Lakehouse Apps in the future. Lakehouse Apps will run directly on customers' Databricks instances, where they can integrate with customers' data, consume and extend Databricks services, and enable users to interact through a single sign-on experience. Data never needs to leave the customer's instance, and there are no data movement and security/access issues.
This is completely consistent with Snowflake’s products in terms of naming and functionality. Snowflake's similar Snowflake Marketplace and Snowflake Native App are already online and are one of the highlights of its launch. Bloomberg announced a Data License Plus (DL) APP provided by Bloomberg at the Snowflake conference, which allows customers to configure a ready-to-use environment in the cloud in minutes, with fully modeled Bloomberg subscription data. and ESG content from multiple vendors.
03
Data platform welcomes new changes
At the opening keynote speech, Databricks announced a number: in the past 30 days, more than 1,500 customers have trained Transformer models on the Databricks platform.
When talking about this impressive number, PingCAP Liu Qi believes that this shows that enterprises are applying AI much faster than expected. “The application model does not necessarily have to train the model, so if the trained model There are 1,500 companies, so the number of applications must be much larger than this (number)."
Another point of view is that this shows that Databricks’ strategic layout in the field of AI is quite comprehensive. It is now more than just a data warehouse or data lake. Now it also provides: AI training, AI serving, model management, etc. 』
Ali Ghodsi uses the revolution of computing and the Internet to compare the transformation of large models in machine learning|Databricks
In other words, the underlying model can be trained on the Databricks platform, and the lowest-level model can be trained by simply adjusting the parameters. For the AI services required on top of this model, Databricks has also laid out the corresponding infrastructure - today it released vector search and feature store.
Databricks is fully upgraded to large models.
In the past, Databricks has accumulated a lot in AI, such as using small models to improve efficiency and reduce latency in building indexes, querying data, and predicting workloads. However, the ability to make up for large models at such a fast pace still surprises many people.
Before the AI layout fully displayed at today’s summit, Databricks acquired Okera (AI data governance), launched its own open source large model Dolly 2.0, and acquired MosaicML for US$1.3 billion. A series of actions were completed in one go.
In this regard, Howie, a teacher from Silicon Valley, believes that it can be clearly seen from the two conferences of Databricks and Snowflake: the founders of the two companies believe that the actions they have taken based on databases and data lakes will face them next. Fundamental changes. The way they were doing it a year ago won't work in the next few years.
Correspondingly, the ability to quickly complete large models also means that the incremental market brought by large models can be obtained.
Liu Qi believes that the emergence of large models has triggered many new demands that did not exist before there were no large models. Without data support, the model will not be able to function, especially in terms of differentiation. If everyone is a big model, then there may be no difference between you and others. 』
But compared to large models, the audience at the summit seemed to pay more attention to small models because of several advantages of small models: speed, cost, and safety. Liu Qi said that based on his own unique data, he can make differentiated models. The model must be small enough to meet these three requirements: cheap enough, fast enough, and safe enough.
It is worth noting that both Databricks and Snowflake recently announced their revenue data, and the annual revenue growth of the platform is more than 60%. This growth rate is reflected in the growing focus on data against the backdrop of a slowdown in software spending across the market. With the emergence of large-scale models, the value of data was highlighted at this Databricks Summit with the theme of data plus AI.
With the introduction of large-scale models, automatic generation of data becomes possible, and the amount of data is expected to increase exponentially. How to easily access data, how to support different data formats, and how to mine the value behind the data will become increasingly frequent needs.
On the other hand, many companies today are still exploring and waiting to integrate large models into enterprise software, but considering security, privacy, and cost, there are still very few who dare to use it directly. Once large models are deployed directly to enterprise data without moving data, the threshold for deploying large models will be further lowered, and the amount and speed of data consumption will be further released.
The above is the detailed content of A US$38 billion data giant wants to launch an 'AI' revolution in enterprises. For more information, please follow other related articles on the PHP Chinese website!

In John Rawls' seminal 1971 book The Theory of Justice, he proposed a thought experiment that we should take as the core of today's AI design and use decision-making: the veil of ignorance. This philosophy provides a simple tool for understanding equity and also provides a blueprint for leaders to use this understanding to design and implement AI equitably. Imagine that you are making rules for a new society. But there is a premise: you don’t know in advance what role you will play in this society. You may end up being rich or poor, healthy or disabled, belonging to a majority or marginal minority. Operating under this "veil of ignorance" prevents rule makers from making decisions that benefit themselves. On the contrary, people will be more motivated to formulate public

Numerous companies specialize in robotic process automation (RPA), offering bots to automate repetitive tasks—UiPath, Automation Anywhere, Blue Prism, and others. Meanwhile, process mining, orchestration, and intelligent document processing speciali

The future of AI is moving beyond simple word prediction and conversational simulation; AI agents are emerging, capable of independent action and task completion. This shift is already evident in tools like Anthropic's Claude. AI Agents: Research a

Rapid technological advancements necessitate a forward-looking perspective on the future of work. What happens when AI transcends mere productivity enhancement and begins shaping our societal structures? Topher McDougal's upcoming book, Gaia Wakes:

Product classification, often involving complex codes like "HS 8471.30" from systems such as the Harmonized System (HS), is crucial for international trade and domestic sales. These codes ensure correct tax application, impacting every inv

The future of energy consumption in data centers and climate technology investment This article explores the surge in energy consumption in AI-driven data centers and its impact on climate change, and analyzes innovative solutions and policy recommendations to address this challenge. Challenges of energy demand: Large and ultra-large-scale data centers consume huge power, comparable to the sum of hundreds of thousands of ordinary North American families, and emerging AI ultra-large-scale centers consume dozens of times more power than this. In the first eight months of 2024, Microsoft, Meta, Google and Amazon have invested approximately US$125 billion in the construction and operation of AI data centers (JP Morgan, 2024) (Table 1). Growing energy demand is both a challenge and an opportunity. According to Canary Media, the looming electricity

Generative AI is revolutionizing film and television production. Luma's Ray 2 model, as well as Runway's Gen-4, OpenAI's Sora, Google's Veo and other new models, are improving the quality of generated videos at an unprecedented speed. These models can easily create complex special effects and realistic scenes, even short video clips and camera-perceived motion effects have been achieved. While the manipulation and consistency of these tools still need to be improved, the speed of progress is amazing. Generative video is becoming an independent medium. Some models are good at animation production, while others are good at live-action images. It is worth noting that Adobe's Firefly and Moonvalley's Ma

ChatGPT user experience declines: is it a model degradation or user expectations? Recently, a large number of ChatGPT paid users have complained about their performance degradation, which has attracted widespread attention. Users reported slower responses to models, shorter answers, lack of help, and even more hallucinations. Some users expressed dissatisfaction on social media, pointing out that ChatGPT has become “too flattering” and tends to verify user views rather than provide critical feedback. This not only affects the user experience, but also brings actual losses to corporate customers, such as reduced productivity and waste of computing resources. Evidence of performance degradation Many users have reported significant degradation in ChatGPT performance, especially in older models such as GPT-4 (which will soon be discontinued from service at the end of this month). this


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 Linux new version
SublimeText3 Linux latest version

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 Chinese version
Chinese version, very easy to use

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool
