Home  >  Article  >  Technology peripherals  >  Use volcano engine and large model to "ignite" the data flywheel

Use volcano engine and large model to "ignite" the data flywheel

王林
王林forward
2023-09-20 21:21:031276browse
In the process of big models transforming thousands of industries, Volcano Engine took the lead in delivering a personalized answer to the data industry.

#On September 19th, at the "Data Flywheel V-Tech Data-Driven Technology Summit" held in Shanghai, Volcano Engine announced the large application model of the digital intelligence platform VeDI (Large Language Models) capabilities.
Use volcano engine and large model to ignite the data flywheel
After the product upgrade, it will be able to use natural language to "find numbers", assist data warehouse model development, optimize code, and also complete the generation of visual charts. Implement functions such as attribution analysis during conversations. Even ordinary operators without coding skills can quickly find numbers and analyze them. Currently, VeDI-related data products have been invited for testing.

#The upgraded data products have greatly lowered the threshold for using data. In the past, if an ordinary operator wanted to find the number, he often needed to turn to R&D personnel, who would write codes to help retrieve the number. Analyzing a piece of data required combining a lot of professional knowledge. Now, with the help of upgraded data products, operators can input their needs in natural language at any time and get the data they want in real time.

This will further stimulate the value of data. Within the enterprise, a lower usage threshold will enable more people in the data consumption chain to start contacting and using data. The data needs that have been suppressed by the actual threshold in the past will be met, and business insights based on data will be more timely and decision-making will be more timely. More scientific and data-based business imagination will be unleashed.

#For enterprises that are in the process of digitalization, the value of data will be released in a higher frequency of circulation, and the data flywheel will be further accelerated.

Large models are integrated into the full data link to further reduce the threshold for data production and use

Compared with small models, large models have powerful generalization reasoning capabilities, external tool retrieval capabilities, and code generation capabilities. These capabilities have a significant impact on data products.

Stronger generalized reasoning ability means higher intelligence, but at the same time, it also needs to be combined with many tools to adjust various abilities, such as mathematics and analytical abilities. As a supplement. The natural language interaction model opened in the era of large models has also brought new imagination space to the use of data products.

Beginning in March this year, Byte internally began to combine large models with data products. In small-scale tests with rapid iteration, the Luo Xuan team soon It was found that in the main scenarios of data products, the improvements and changes brought by large models are obvious. Subsequently, the team began to experiment on a large scale in data product scenarios, constantly quantifying the priorities of scenarios, and promoting the implementation of large models in products.

In the process of big models transforming the data industry, the selection of scenarios is one of the most critical steps. A suitable usage scenario requires not only Based on current technology or foreseeable technology, it is also necessary to ensure that users or business parties can have a better experience after adding large models, while bringing more data consumption value and further driving data production.

Luo Xuan shared that, for example, in some scenarios, if the original solution only takes 1-2 seconds, after using a large model, due to the delay problem of the large model , using natural language may take more than 5 seconds, then this scenario cannot meet the business's timeliness experience requirements, and it is not established.

"However, for example, in the short code generation process, after adding natural language, the efficiency of the scene will be greatly improved. In the future, as the performance of large models continues to improve, in the data In all aspects of the entire link, the intelligent changes that large models can bring will be more worth looking forward to."

In this "Data Flywheel·V-Tech Data Driven At the "Technology Summit", the product upgrade of the digital intelligence platform VeDI announced by Volcano Engine mainly includes two parts: DataLeap and DataWind. Among them, the "Number Assistant" in DataLeap can support finding numbers in a question-and-answer manner, and the "Development Assistant" can support the generation and optimization of SQL code in natural language; the DataWind - Analysis Assistant can support natural language to complete data visualization query and analysis. Covers the entire link of finding, retrieving, and analyzing numbers, lowering the technical threshold for the entire process of data production and consumption.

DataLeap - Find Number Assistant

"Find Number" Usually The first step in the entire data consumption chain is to find the correct data assets to achieve data consumption.However, "finding numbers" in the traditional process is not a simple task and requires strong reliance on the input of business expertise. Usually people can only confirm through keyword searches, manual screening or seeking professional data developers.

Use volcano engine and large model to ignite the data flywheel

## Use DataLEAP -Find the Assistant "Find"

## " ” function, combined with the large language model (LLM), greatly lowers the threshold of “finding numbers”.
Using the "Numerical Search Assistant", people without coding skills can also perform "anthropomorphic" queries through natural language
. For example, an e-commerce operator can directly ask: "The operating conditions of Haowu Live Broadcast Room in the last seven days , which tables should be used?". DataLeap - The data finding assistant will recommend tables related to business conditions based on the business knowledge base, and explain the data dimensions corresponding to each table.

Currently, the "Numerical Assistant" can implement question-and-answer questions on various data types including Hive tables, data sets, dashboards, data indicators, dimensions, etc. and related business knowledge. Retrieval to realize anthropomorphic query.

In addition, in addition to making "finding numbers" easier, the "number finding assistant" combined with the capabilities of large models can further improve the accuracy of "finding numbers" . Under traditional technical solutions in the past, data asset retrieval relied on structured data management. Unstructured business data may have missing connections. When keywords are used for retrieval, the link fragmentation problem may result, which may greatly reduce the number of data based on business scenarios. Find and consume efficiently. In addition, the search provides a set of candidate answers based on keywords, which requires manual screening and confirmation. They are not direct answers, making it difficult for users to have a good experience.

Now, in the conversational process with users, large language models (LLM) can understand the true intentions of users, making the search process more focused and saving the time of human judgment. Cost, "finding numbers" itself has become faster. At the same time, with the gradual improvement of model semantic understanding and analysis capabilities, conversational retrieval has a higher retrieval efficiency across the entire link than simple keyword retrieval.

DataLeap - Development Assistant

In the data production and processing process, "Development Assistant" It can support the use of natural language and automatically generate SQL code; it can automatically implement bug repair, code optimization, explanation and annotation for existing codes. In addition, it can also realize document search, function usage, code examples and other SQL usage classes through dialogue. Advisory.

Use volcano engine and large model to ignite the data flywheel##                                                                                                                                                                                                                    Automatically developing SQL code

The bottom layer of the development assistant uses a large language model (LLM) , after massive code and corpus training, it can automatically associate metadata information including table schema according to the user's natural language input, generate high-quality data processing code, and have the ability to understand, rewrite and question and answer the code.
                                                                                                                                      out out out         down         out through ’ ’s ’ ’s ’ ’ together ’ s ’ way ’ way ’ back way way way ’   way way ’s ’ s ’ s ’ s 1 - - 1 - t t t t t t t to t to be developed automatically

Development Assistant breaks the language barrier and greatly lowers the threshold for data development. "Originally, to (process) data, you might need to know a programming language, such as SQL or Python, which is a relatively strong skill requirement. However, now you no longer need a programming language and can use natural language. So, This means that the requirements for people who do this have been further reduced."

For analysts and operators who have data consumption demands, they do not understand SQL You can also do some basic ETL. Operators can let DataLeap automatically generate data demand codes corresponding to business conditions, such as order sales by city, or live broadcast room traffic by time period, etc. Operators can also ask about the meaning of the code, such as "Is there any optimization plan while this table is running?", or they can have a conversation: "Help me check and fix this string of code." You can also parse the generated code with one click, call SQL tools to check the table, and click to confirm the AI ​​automatic repair to further optimize data assets.

More importantly, for professional developers, DataLeap-Development Assistant can help them do some basic work, handle data from data analysts, and rely on data. For some complex but basic needs of business operations personnel, engineers only need to correct and check the accuracy of the generated code at the end. As a result, R&D personnel can focus on more creative work, focus more on the needs of complex scenarios, use development assistants to optimize code, and improve R&D productivity and code quality.

DataWind - Analysis Assistant

In the implementation of finding and retrieving numbers After that, came the data analysis link. DataWind - Analysis Assistant, which combines large model capabilities, can help people in non-analytic positions complete a series of business explorations such as data visualization query and analysis through natural language dialogue, lowering the threshold for this link.

The first is the creation of the “data set”. With data assets, operators use DataWind drag-and-drop method to create data sets, and then use natural language to define the logic of different fields, such as directly checking the data of "big celebrity live broadcast period".

Use volcano engine and large model to ignite the data flywheel

##                                                                                                                         Being field generation

After checking, operators can perform visual analysis and exploration. In the past, BI tools generally adopted a drag-and-drop operation method. Although the threshold for dashboard production has been lowered, in the field of analysis and insight, a large amount of professional knowledge is still required to better understand the data. This is a "threshold" .

Use volcano engine and large model to ignite the data flywheel

##                                                                                                                                                                                       Visual exploration

But more generalized reasoning through large models With the blessing of capabilities, DataWind has been able to conduct basic assumptions and verifications, and propose analytical ideas.
The AI ​​automatic analysis function provided by DataWind can support further exploration of the reasons behind it based on charts. For example, AI can automatically analyze the generated visual charts such as "Live broadcast room traffic graph by time period" and "Live broadcast room sales top area". Operators only need to make further attributions through dialogue based on the analysis results.

At the same time, DataWind also connects with office collaboration tools such as Feishu. Users can conduct more extended analysis through IM message subscription and natural dialogue, achieving flexible analysis anytime and anywhere. It meets self-service intelligence on the entire chain from data sets, visual insights, message subscriptions, etc., and integrates Unicom Office to seamlessly integrate data analysis into daily life.

                                                                                                                                                                                                                          to conduct extended analysis in collaboration with IM message subscriptions. Language dialogue can directly understand the results, and the data analysis and thinking cycle is greatly shortened. It solves the pain points that required a lot of professional knowledge in the past analysis and insights, and shortens the data analysis cycle.

At this stage, the application scenarios of DataWind - Analysis Assistant are already very rich. In addition to enabling conversational exploration in core analysis scenarios, the Analysis Assistant also extends its capabilities to expression. In scenarios such as formula generation that required more technical thresholds in the past.

Large models accelerate the data flywheel and help enterprises become more data-driven

ByteDance has deep data-driven genes. Since its establishment, almost all scenarios within ByteDance have been subject to A/B testing, and adjustments are made through data feedback to drive business strategies, such as whether the optimization effect of Douyin video quality is good, whether the recommendation algorithm strategy optimization is accurate, and even The names of Toutiao have also been A/B tested.

Within bytes, the scope of data consumption is very wide. Organizationally, everyone from top to middle management, as well as front-line employees can basically see the data and use the data to evaluate the company's operating status, revenue and expenditure, business progress, and product strategies. In specific scenarios, such as real-time marketing in live e-commerce, operations design and push corresponding marketing strategies based on real-time data.

Byte realizes scientific decision-making and agile action through data consumption, which brings improved business value; through frequent data consumption and business benefits, it also enables targeted low-cost construction of high-quality data assets to better support business applications.

In April this year, based on ByteDance’s more than ten years of data-driven practical experience, Volcano Engine released a new paradigm for enterprise digital intelligence upgrade, the “Data Flywheel”. "Data flywheel" is used to summarize the flywheel effect of improving data assets and business applications after enterprise data flows are fully integrated into business flows.

Under the overall trend of digitalization, corporate businesses in thousands of industries are getting closer to digitalization, and data is becoming more and more important to enterprises. As a new factor of production, data is supporting the digital and intelligent transformation of enterprises. But objectively speaking, although many companies have done a lot of digital construction, they are unable to fully release the value of data.

"An enterprise may have deployed data products at a high price, but there may be very few people who actually use them internally. If the data is difficult to flow, it will be difficult to realize its value." Luo Xuan has observed in the data product market that many companies that are undergoing digital construction have problems such as high data construction and management costs, high barriers to use of data products, and low data asset value.

#From the perspective of the entire digitalization process, it is difficult but correct to achieve "data-driven". Taking Byte as an example, Luo Xuan revealed that currently, 80% of ByteDance employees can directly use data products, and the manageable and operational data assets cover 80% of daily analysis scenarios. Judging from Byte's experience, this means that the utilization rate of internal data products within the enterprise and the coverage of manageable and operational data assets in the scenario need to be increased to a higher level in order to form a good "data flywheel" in the company. .

#In this process, data products supported by large models may be an important driving force in helping enterprises achieve their goals.
The digital intelligence platform VeDI, which has been upgraded with large model capabilities, further reduces the entire process of data production and consumption, such as finding numbers, retrieving numbers, and data analysis. Under the same level of demand, using the upgraded VeDI, the number of people in the company who have the ability to use data products has expanded from professional data analysts to all people with data needs, which may be operations, bosses, product managers, etc., data Consumption becomes inclusive.

"Only by lowering the threshold and using the data can we know what kind of value the data will generate in the circulation." For companies that have just entered the digitalization process, In other words, the value of data is a treasure that is far from being discovered, and data products with lower thresholds may be the key to unlocking it.

#With the support of large models, the “data flywheel” within the enterprise will accelerate its rotation.
The company's business has a more powerful engine, and business personnel can quickly get data feedback from "data out in seconds", thereby optimizing the business faster. In the process of accelerating data flow, more high-quality data assets continue to be generated. Precipitation brings more insights to the business, ultimately making business decisions more scientific and agile.

The above is the detailed content of Use volcano engine and large model to "ignite" the data flywheel. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jiqizhixin.com. If there is any infringement, please contact admin@php.cn delete