Home  >  Article  >  Technology peripherals  >  ​Self-service machine learning based on smart databases

​Self-service machine learning based on smart databases

WBOY
WBOYforward
2023-04-09 16:31:081650browse

​Self-service machine learning based on smart databases

Translator|Zhang Yi

Revised|Liang Ce Sun Shujuan

1. How to become an IDO?

IDO(insight -driven organization) refers to an insight-driven (information-oriented) organization. To become an IDO, you first need data and the tools to operate and analyze the data; secondly, a data analyst or data scientist with appropriate experience; and finally, you need to find a technology or method to implement insight-driven decision-making processes throughout the company. .

Machine learning is a technology that can maximize the advantages of data. The ML process first uses data to train a prediction model, and then solves data-related problems after the training is successful. Among them, artificial neural networks are the most effective technology, and their design is derived from our current understanding of how the human brain works. Given the vast computing resources people currently have at their disposal, it can produce incredible models trained on massive amounts of data.

Businesses can use a variety of self-service software and scripts to complete different tasks to avoid human error. Likewise, you can make decisions based on data to avoid human error.

2. Why are companies slow to adopt artificial intelligence?

Only a minority of companies use artificial intelligence or machine learning to process data. The US Census Bureau said that as of 2020, less than 10% of US businesses had adopted machine learning (mostly large companies).

Barriers to ML adoption include:

  • There is still a lot of work to be done before artificial intelligence can replace humans. The first is that many companies lack and cannot afford professionals. Data scientists are highly regarded in this field, but they are also the most expensive to hire.
  • Lack of available data, data security, and time-consuming ML algorithm implementation.
  • It is difficult for companies to create an environment where data and its advantages can be realized. This environment requires relevant tools, processes and strategies.

3. Only automatic ML (AutoML) tools are not enough for the promotion of machine learning

Although the automatic ML platform has a bright future, its coverage is currently quite limited. There is also debate over whether automated ML will soon replace data scientists.

If you want to successfully deploy self-service machine learning in your company, AutoML tools are indeed crucial, but processes, methods, and strategies must also be paid attention to. AutoML platforms are just tools, and most ML experts believe this is not enough.

4. Break down the machine learning process

​Self-service machine learning based on smart databases

Any ML process starts with data. It is generally accepted that data preparation is the most important aspect of the ML process, and the modeling part is only one part of the overall data pipeline, while being simplified through AutoML tools. The complete workflow still requires a lot of work to transform the data and feed it to the model. Data preparation and data transformation can be some of the most time-consuming and unpleasant parts of the job.

In addition, the business data used to train ML models will also be updated regularly. Therefore, it requires enterprises to build complex ETL pipelines that can master complex tools and processes, so ensuring the continuity and real-time nature of the ML process is also a challenging task.

5. Integrate ML with Applications

Assume now that we have built the ML model and then need to deploy it. The classic deployment approach treats it as an application layer component, as shown below:

​Self-service machine learning based on smart databases

Its input is the data and its output is the prediction we get. Consume the output of ML models by integrating the APIs of these applications. This all seems easy just from a developer perspective, but not when you think about the process. In a large organization, any integration and maintenance with business applications can be quite cumbersome. Even if the company is tech-savvy, any request for code changes must go through a specific review and testing process across multiple levels of departments. This negatively affects flexibility and increases the complexity of the overall workflow.

If there is enough flexibility in testing various concepts and ideas, ML-based decision-making will be much easier, so people will prefer products with self-service capabilities.

6. Self-service machine learning/intelligent database?

As we saw above, data is the core of the ML process, existing ML tools take the data and return predictions, and these predictions It is also the form of data.

Now comes the question:

  • Why do we want to treat ML as a standalone application and implement complex integration between ML models, applications and databases?
  • Why not make ML a core feature of the database?
  • Why not make ML models available through standard database syntax (such as SQL)?

Let’s Analyze the above problems and their challenges to find ML solutions.

Challenge #1: Complex Data Integration and ETL Pipelines

Maintaining complex data integration and ETL pipelines between ML models and databases is one of the biggest challenges facing ML processes.

SQL is an excellent data manipulation tool, so we can solve this problem by introducing ML models into the data layer. In other words, the ML model will learn in the database and return predictions.

Challenge #2: Integration of ML Models with Applications

Integrating ML models with business applications through APIs is another challenge faced.

Business applications and BI tools are tightly coupled with the database. Therefore, if the AutoML tool becomes part of the database, we can use standard SQL syntax to make predictions. Next, API integration between ML models and business applications is no longer required because the models reside in the database.

Solution: Embed AutoML in the database

Embedding AutoML tools in the database will bring many benefits, such as:

  • Anyone who works with data and understands SQL Anyone (data analyst or data scientist) can harness the power of machine learning.
  • Software developers can embed ML into business tools and applications more efficiently.
  • No complex integration is required between data and models, and between models and business applications.

In this way, the above relatively complex integration diagram changes as follows:

​Self-service machine learning based on smart databases

It looks simpler and makes the ML process smoother and more efficient. .

7. How to implement self-service ML using models as virtual database tables

The next step in finding the solution is to implement it.

To do this, we use a structure called AI Tables. It brings machine learning to the data platform in the form of virtual tables. It can be created like any other database table and then exposed to applications, BI tools and DB clients. We make predictions by simply querying the data.

​Self-service machine learning based on smart databases

AI Tables was originally developed by MindsDB and is available as an open source or managed cloud service. They integrate traditional SQL and NoSQL databases such as Kafka and Redis.

8. Using AI Tables

The concept of AI Tables enables us to perform the ML process in the database so that all steps of the ML process (i.e. data preparation, model training and prediction) can be database.

  • Training AI Tables

First, users must create an AI Table according to their own needs, which is similar to a machine learning model and includes columns from the source table, etc. features; and then complete the remaining modeling tasks by itself through the AutoML engine. Examples will be given later.

  • Make predictions

Once the AI ​​Table is created, it is ready for use without any further deployment. To make predictions, just run a standard SQL query on the AI ​​Table.

You can make predictions one by one or in batches. AI Tables can handle many complex machine learning tasks, such as multivariate time series, detecting anomalies, etc.

9.AI Tables Working Example

For retailers, ensuring that products are in stock at the right time is a complex task. When demand increases, supply increases. Based on this data and machine learning, we can predict how much stock a given product should have on a given day, resulting in more revenue for retailers.

First you need to track the following information and create an AI Table:

  • Product sold date (date_of_sale)
  • Product sold store (shop)
  • Specific products sold (product_code)
  • Quantity of products sold (amount)

As shown below:

​Self-service machine learning based on smart databases

(1) Training AI Tables

To create and train AI Tables, you must first allow MindsDB to access the data. For detailed instructions, please refer to the MindsDB documentation.

AI Tables are like ML models and require historical data to train them.

The following uses a simple SQL command to train an AITable:

​Self-service machine learning based on smart databases

Let us analyze this query:

  • Use MindsDB CREATE PREDICTOR statement in .
  • Define the source database based on historical data.
  • Train the AI ​​Table based on the historical data table (historical_table), and the selected columns (column_1 and column_2) are features used for prediction.
  • AutoML automatically completes the remaining modeling tasks.
  • MindsDB will identify the data type of each column, normalize and encode it, and build and train the ML model.

At the same time, you can see the overall accuracy and confidence of each prediction and estimate which columns (features) are more important to the result.

In databases, we often need to process tasks involving multivariate time series data with high cardinality. Using traditional methods, considerable effort is required to create such ML models. We need to group the data and sort it based on a given time, date or timestamp data field.

For example, we predict the number of hammers sold in a hardware store. Well, the data is grouped by store and product, and predictions are made for each different store and product combination. This brings us to the problem of creating a time series model for each group.

This sounds like a huge project, but MindsDB provides a method to create a single ML model using the GROUP BY statement to train multivariate time series data at once. Let’s see how it’s done using just one SQL command:

​Self-service machine learning based on smart databases

The stock_forecaster predictor is created to predict how many items a particular store will sell in the future. The data is sorted by sales date and grouped by store. So we can predict the sales amount for each store.

(2) Batch prediction

By using the following query to connect the sales data table with the predictor, the JOIN operation adds the predicted quantity to the record, so we can get many at once Recorded batch predictions.

​Self-service machine learning based on smart databases

To learn more about analyzing and visualizing predictions in BI tools, check out this article.

(3) Practical Application

The traditional approach treats ML models as independent applications, requiring maintenance of ETL pipelines to the database and API integration to business applications. Although AutoML tools make the modeling part easy and straightforward, the complete ML workflow still requires experienced experts to manage. In fact, the database is already the preferred tool for data preparation, so it makes more sense to introduce ML into the database rather than introducing data into ML. Because AutoML tools reside in the database, the AI ​​Tables construct from MindsDB provides data practitioners with self-service AutoML and streamlines machine learning workflows.

Original link: https://dzone.com/articles/self-service-machine-learning-with-intelligent-dat

Translator’s introduction

Zhang Yi, 51CTO community editor, intermediate engineer. Mainly researches the implementation of artificial intelligence algorithms and scenario applications, has an understanding and mastery of machine learning algorithms and automatic control algorithms, and will continue to pay attention to the development trends of artificial intelligence technology at home and abroad, especially the application of artificial intelligence technology in intelligent connected cars and smart homes. Specific implementation and applications in other fields.

​Self-service machine learning based on smart databases

The above is the detailed content of ​Self-service machine learning based on smart databases. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete