Self-service machine learning based on smart databases
Translator|Zhang Yi
Revised|Liang Ce Sun Shujuan
1. How to become an IDO?
IDO(insight -driven organization) refers to an insight-driven (information-oriented) organization. To become an IDO, you first need data and the tools to operate and analyze the data; secondly, a data analyst or data scientist with appropriate experience; and finally, you need to find a technology or method to implement insight-driven decision-making processes throughout the company. .
Machine learning is a technology that can maximize the advantages of data. The ML process first uses data to train a prediction model, and then solves data-related problems after the training is successful. Among them, artificial neural networks are the most effective technology, and their design is derived from our current understanding of how the human brain works. Given the vast computing resources people currently have at their disposal, it can produce incredible models trained on massive amounts of data.
Businesses can use a variety of self-service software and scripts to complete different tasks to avoid human error. Likewise, you can make decisions based on data to avoid human error.
2. Why are companies slow to adopt artificial intelligence?
Only a minority of companies use artificial intelligence or machine learning to process data. The US Census Bureau said that as of 2020, less than 10% of US businesses had adopted machine learning (mostly large companies).
Barriers to ML adoption include:
- There is still a lot of work to be done before artificial intelligence can replace humans. The first is that many companies lack and cannot afford professionals. Data scientists are highly regarded in this field, but they are also the most expensive to hire.
- Lack of available data, data security, and time-consuming ML algorithm implementation.
- It is difficult for companies to create an environment where data and its advantages can be realized. This environment requires relevant tools, processes and strategies.
3. Only automatic ML (AutoML) tools are not enough for the promotion of machine learning
Although the automatic ML platform has a bright future, its coverage is currently quite limited. There is also debate over whether automated ML will soon replace data scientists.
If you want to successfully deploy self-service machine learning in your company, AutoML tools are indeed crucial, but processes, methods, and strategies must also be paid attention to. AutoML platforms are just tools, and most ML experts believe this is not enough.
4. Break down the machine learning process
Any ML process starts with data. It is generally accepted that data preparation is the most important aspect of the ML process, and the modeling part is only one part of the overall data pipeline, while being simplified through AutoML tools. The complete workflow still requires a lot of work to transform the data and feed it to the model. Data preparation and data transformation can be some of the most time-consuming and unpleasant parts of the job.
In addition, the business data used to train ML models will also be updated regularly. Therefore, it requires enterprises to build complex ETL pipelines that can master complex tools and processes, so ensuring the continuity and real-time nature of the ML process is also a challenging task.
5. Integrate ML with Applications
Assume now that we have built the ML model and then need to deploy it. The classic deployment approach treats it as an application layer component, as shown below:
Its input is the data and its output is the prediction we get. Consume the output of ML models by integrating the APIs of these applications. This all seems easy just from a developer perspective, but not when you think about the process. In a large organization, any integration and maintenance with business applications can be quite cumbersome. Even if the company is tech-savvy, any request for code changes must go through a specific review and testing process across multiple levels of departments. This negatively affects flexibility and increases the complexity of the overall workflow.
If there is enough flexibility in testing various concepts and ideas, ML-based decision-making will be much easier, so people will prefer products with self-service capabilities.
6. Self-service machine learning/intelligent database?
As we saw above, data is the core of the ML process, existing ML tools take the data and return predictions, and these predictions It is also the form of data.
Now comes the question:
- Why do we want to treat ML as a standalone application and implement complex integration between ML models, applications and databases?
- Why not make ML a core feature of the database?
- Why not make ML models available through standard database syntax (such as SQL)?
Let’s Analyze the above problems and their challenges to find ML solutions.
Challenge #1: Complex Data Integration and ETL Pipelines
Maintaining complex data integration and ETL pipelines between ML models and databases is one of the biggest challenges facing ML processes.
SQL is an excellent data manipulation tool, so we can solve this problem by introducing ML models into the data layer. In other words, the ML model will learn in the database and return predictions.
Challenge #2: Integration of ML Models with Applications
Integrating ML models with business applications through APIs is another challenge faced.
Business applications and BI tools are tightly coupled with the database. Therefore, if the AutoML tool becomes part of the database, we can use standard SQL syntax to make predictions. Next, API integration between ML models and business applications is no longer required because the models reside in the database.
Solution: Embed AutoML in the database
Embedding AutoML tools in the database will bring many benefits, such as:
- Anyone who works with data and understands SQL Anyone (data analyst or data scientist) can harness the power of machine learning.
- Software developers can embed ML into business tools and applications more efficiently.
- No complex integration is required between data and models, and between models and business applications.
In this way, the above relatively complex integration diagram changes as follows:
It looks simpler and makes the ML process smoother and more efficient. .
7. How to implement self-service ML using models as virtual database tables
The next step in finding the solution is to implement it.
To do this, we use a structure called AI Tables. It brings machine learning to the data platform in the form of virtual tables. It can be created like any other database table and then exposed to applications, BI tools and DB clients. We make predictions by simply querying the data.
AI Tables was originally developed by MindsDB and is available as an open source or managed cloud service. They integrate traditional SQL and NoSQL databases such as Kafka and Redis.
8. Using AI Tables
The concept of AI Tables enables us to perform the ML process in the database so that all steps of the ML process (i.e. data preparation, model training and prediction) can be database.
- Training AI Tables
First, users must create an AI Table according to their own needs, which is similar to a machine learning model and includes columns from the source table, etc. features; and then complete the remaining modeling tasks by itself through the AutoML engine. Examples will be given later.
- Make predictions
Once the AI Table is created, it is ready for use without any further deployment. To make predictions, just run a standard SQL query on the AI Table.
You can make predictions one by one or in batches. AI Tables can handle many complex machine learning tasks, such as multivariate time series, detecting anomalies, etc.
9.AI Tables Working Example
For retailers, ensuring that products are in stock at the right time is a complex task. When demand increases, supply increases. Based on this data and machine learning, we can predict how much stock a given product should have on a given day, resulting in more revenue for retailers.
First you need to track the following information and create an AI Table:
- Product sold date (date_of_sale)
- Product sold store (shop)
- Specific products sold (product_code)
- Quantity of products sold (amount)
As shown below:
(1) Training AI Tables
To create and train AI Tables, you must first allow MindsDB to access the data. For detailed instructions, please refer to the MindsDB documentation.
AI Tables are like ML models and require historical data to train them.
The following uses a simple SQL command to train an AITable:
Let us analyze this query:
- Use MindsDB CREATE PREDICTOR statement in .
- Define the source database based on historical data.
- Train the AI Table based on the historical data table (historical_table), and the selected columns (column_1 and column_2) are features used for prediction.
- AutoML automatically completes the remaining modeling tasks.
- MindsDB will identify the data type of each column, normalize and encode it, and build and train the ML model.
At the same time, you can see the overall accuracy and confidence of each prediction and estimate which columns (features) are more important to the result.
In databases, we often need to process tasks involving multivariate time series data with high cardinality. Using traditional methods, considerable effort is required to create such ML models. We need to group the data and sort it based on a given time, date or timestamp data field.
For example, we predict the number of hammers sold in a hardware store. Well, the data is grouped by store and product, and predictions are made for each different store and product combination. This brings us to the problem of creating a time series model for each group.
This sounds like a huge project, but MindsDB provides a method to create a single ML model using the GROUP BY statement to train multivariate time series data at once. Let’s see how it’s done using just one SQL command:
The stock_forecaster predictor is created to predict how many items a particular store will sell in the future. The data is sorted by sales date and grouped by store. So we can predict the sales amount for each store.
(2) Batch prediction
By using the following query to connect the sales data table with the predictor, the JOIN operation adds the predicted quantity to the record, so we can get many at once Recorded batch predictions.
To learn more about analyzing and visualizing predictions in BI tools, check out this article.
(3) Practical Application
The traditional approach treats ML models as independent applications, requiring maintenance of ETL pipelines to the database and API integration to business applications. Although AutoML tools make the modeling part easy and straightforward, the complete ML workflow still requires experienced experts to manage. In fact, the database is already the preferred tool for data preparation, so it makes more sense to introduce ML into the database rather than introducing data into ML. Because AutoML tools reside in the database, the AI Tables construct from MindsDB provides data practitioners with self-service AutoML and streamlines machine learning workflows.
Original link: https://dzone.com/articles/self-service-machine-learning-with-intelligent-dat
Translator’s introduction
Zhang Yi, 51CTO community editor, intermediate engineer. Mainly researches the implementation of artificial intelligence algorithms and scenario applications, has an understanding and mastery of machine learning algorithms and automatic control algorithms, and will continue to pay attention to the development trends of artificial intelligence technology at home and abroad, especially the application of artificial intelligence technology in intelligent connected cars and smart homes. Specific implementation and applications in other fields.
The above is the detailed content of Self-service machine learning based on smart databases. For more information, please follow other related articles on the PHP Chinese website!

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Dreamweaver Mac version
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

WebStorm Mac version
Useful JavaScript development tools