Home > Article > Technology peripherals > Self-service machine learning based on smart databases
Translator|Zhang Yi
Revised|Liang Ce Sun Shujuan
IDO(insight -driven organization) refers to an insight-driven (information-oriented) organization. To become an IDO, you first need data and the tools to operate and analyze the data; secondly, a data analyst or data scientist with appropriate experience; and finally, you need to find a technology or method to implement insight-driven decision-making processes throughout the company. .
Machine learning is a technology that can maximize the advantages of data. The ML process first uses data to train a prediction model, and then solves data-related problems after the training is successful. Among them, artificial neural networks are the most effective technology, and their design is derived from our current understanding of how the human brain works. Given the vast computing resources people currently have at their disposal, it can produce incredible models trained on massive amounts of data.
Businesses can use a variety of self-service software and scripts to complete different tasks to avoid human error. Likewise, you can make decisions based on data to avoid human error.
Only a minority of companies use artificial intelligence or machine learning to process data. The US Census Bureau said that as of 2020, less than 10% of US businesses had adopted machine learning (mostly large companies).
Barriers to ML adoption include:
Although the automatic ML platform has a bright future, its coverage is currently quite limited. There is also debate over whether automated ML will soon replace data scientists.
If you want to successfully deploy self-service machine learning in your company, AutoML tools are indeed crucial, but processes, methods, and strategies must also be paid attention to. AutoML platforms are just tools, and most ML experts believe this is not enough.
Any ML process starts with data. It is generally accepted that data preparation is the most important aspect of the ML process, and the modeling part is only one part of the overall data pipeline, while being simplified through AutoML tools. The complete workflow still requires a lot of work to transform the data and feed it to the model. Data preparation and data transformation can be some of the most time-consuming and unpleasant parts of the job.
In addition, the business data used to train ML models will also be updated regularly. Therefore, it requires enterprises to build complex ETL pipelines that can master complex tools and processes, so ensuring the continuity and real-time nature of the ML process is also a challenging task.
Assume now that we have built the ML model and then need to deploy it. The classic deployment approach treats it as an application layer component, as shown below:
Its input is the data and its output is the prediction we get. Consume the output of ML models by integrating the APIs of these applications. This all seems easy just from a developer perspective, but not when you think about the process. In a large organization, any integration and maintenance with business applications can be quite cumbersome. Even if the company is tech-savvy, any request for code changes must go through a specific review and testing process across multiple levels of departments. This negatively affects flexibility and increases the complexity of the overall workflow.
If there is enough flexibility in testing various concepts and ideas, ML-based decision-making will be much easier, so people will prefer products with self-service capabilities.
As we saw above, data is the core of the ML process, existing ML tools take the data and return predictions, and these predictions It is also the form of data.
Now comes the question:
Let’s Analyze the above problems and their challenges to find ML solutions.
Maintaining complex data integration and ETL pipelines between ML models and databases is one of the biggest challenges facing ML processes.
SQL is an excellent data manipulation tool, so we can solve this problem by introducing ML models into the data layer. In other words, the ML model will learn in the database and return predictions.
Integrating ML models with business applications through APIs is another challenge faced.
Business applications and BI tools are tightly coupled with the database. Therefore, if the AutoML tool becomes part of the database, we can use standard SQL syntax to make predictions. Next, API integration between ML models and business applications is no longer required because the models reside in the database.
Embedding AutoML tools in the database will bring many benefits, such as:
In this way, the above relatively complex integration diagram changes as follows:
It looks simpler and makes the ML process smoother and more efficient. .
The next step in finding the solution is to implement it.
To do this, we use a structure called AI Tables. It brings machine learning to the data platform in the form of virtual tables. It can be created like any other database table and then exposed to applications, BI tools and DB clients. We make predictions by simply querying the data.
AI Tables was originally developed by MindsDB and is available as an open source or managed cloud service. They integrate traditional SQL and NoSQL databases such as Kafka and Redis.
The concept of AI Tables enables us to perform the ML process in the database so that all steps of the ML process (i.e. data preparation, model training and prediction) can be database.
First, users must create an AI Table according to their own needs, which is similar to a machine learning model and includes columns from the source table, etc. features; and then complete the remaining modeling tasks by itself through the AutoML engine. Examples will be given later.
Once the AI Table is created, it is ready for use without any further deployment. To make predictions, just run a standard SQL query on the AI Table.
You can make predictions one by one or in batches. AI Tables can handle many complex machine learning tasks, such as multivariate time series, detecting anomalies, etc.
For retailers, ensuring that products are in stock at the right time is a complex task. When demand increases, supply increases. Based on this data and machine learning, we can predict how much stock a given product should have on a given day, resulting in more revenue for retailers.
First you need to track the following information and create an AI Table:
As shown below:
To create and train AI Tables, you must first allow MindsDB to access the data. For detailed instructions, please refer to the MindsDB documentation.
AI Tables are like ML models and require historical data to train them.
The following uses a simple SQL command to train an AITable:
Let us analyze this query:
At the same time, you can see the overall accuracy and confidence of each prediction and estimate which columns (features) are more important to the result.
In databases, we often need to process tasks involving multivariate time series data with high cardinality. Using traditional methods, considerable effort is required to create such ML models. We need to group the data and sort it based on a given time, date or timestamp data field.
For example, we predict the number of hammers sold in a hardware store. Well, the data is grouped by store and product, and predictions are made for each different store and product combination. This brings us to the problem of creating a time series model for each group.
This sounds like a huge project, but MindsDB provides a method to create a single ML model using the GROUP BY statement to train multivariate time series data at once. Let’s see how it’s done using just one SQL command:
The stock_forecaster predictor is created to predict how many items a particular store will sell in the future. The data is sorted by sales date and grouped by store. So we can predict the sales amount for each store.
By using the following query to connect the sales data table with the predictor, the JOIN operation adds the predicted quantity to the record, so we can get many at once Recorded batch predictions.
To learn more about analyzing and visualizing predictions in BI tools, check out this article.
The traditional approach treats ML models as independent applications, requiring maintenance of ETL pipelines to the database and API integration to business applications. Although AutoML tools make the modeling part easy and straightforward, the complete ML workflow still requires experienced experts to manage. In fact, the database is already the preferred tool for data preparation, so it makes more sense to introduce ML into the database rather than introducing data into ML. Because AutoML tools reside in the database, the AI Tables construct from MindsDB provides data practitioners with self-service AutoML and streamlines machine learning workflows.
Original link: https://dzone.com/articles/self-service-machine-learning-with-intelligent-dat
Zhang Yi, 51CTO community editor, intermediate engineer. Mainly researches the implementation of artificial intelligence algorithms and scenario applications, has an understanding and mastery of machine learning algorithms and automatic control algorithms, and will continue to pay attention to the development trends of artificial intelligence technology at home and abroad, especially the application of artificial intelligence technology in intelligent connected cars and smart homes. Specific implementation and applications in other fields.
The above is the detailed content of Self-service machine learning based on smart databases. For more information, please follow other related articles on the PHP Chinese website!