Home  >  Article  >  Technology peripherals  >  Ten GitHub repositories for AutoML

Ten GitHub repositories for AutoML

王林
王林forward
2023-04-12 11:43:091752browse

Breakthroughs in artificial intelligence and machine learning are two of the most exciting topics of the past two decades. Machine learning and data science engineers require extensive research and hard work to understand and run their models effectively.

Ten GitHub repositories for AutoML

While they may vary from person to person, traditional machine learning steps include:

  1. Data Acquisition
  2. Data Exploration
  3. Data preparation
  4. Feature engineering
  5. Selection
  6. Model training
  7. Hyperparameter adjustment
  8. Prediction

While 8 steps may not seem like much when building a machine learning model, starting with the steps above will take quite some time to perfect!

The problem is exacerbated when non-expert machine learning practitioners go through these steps for the first time; the process often takes more time and resources to complete, and even then, the end result may not match expectations.

AutoML comes in handy by automating much of the model creation process for experts and non-experts alike.

What is automatic machine learning (AutoML)?

Automated machine learning, often called AutoML, makes machine learning easier. AutoML makes machine learning more accessible to non-machine learning experts using automated processing done by a given framework.

It focuses on accelerating artificial intelligence research and improving the efficiency of machine learning models.

The traditional machine learning process focuses on all 8 steps mentioned earlier, while AutoML covers two steps:

  1. Data acquisition is collected before storing the data in the data warehouse , the process of filtering and cleaning the data used.
  2. Prediction refers to the actual output returned by a given model, and a trained model is likely to return an accurate final prediction.

The framework of data exploration, data preparation, feature engineering, model selection, model training and final model tuning will cover the other 6 steps.

Benefits of AutoML

  • Improve productivity
  • Better end results
  • Minimize errors
  • Expand machine learning

AutoML Popular Frameworks

Now that we have discussed what AutoML is and understood some of its advantages, we will cover the top 10 AutoML frameworks, where to find them and what they offer function.

1. Google AutoML

Google AutoML is one of the most famous frameworks available and ranks first on our list. Google has launched many AutoML frameworks, such as Google AutoML vision, Google AutoML Natural Language, etc.

2. Automatic SKLearn

Users who have been exposed to machine learning before may be familiar with the name SKlearn. As an add-on to the popular sci-kit-learn library, Auto SKLearn is an open source machine learning framework that handles the automation of machine learning tasks.

A unique feature of the Auto Sklearn framework is its ability to perform its model selection, hyperparameter tuning, and characterization.

By performing model selection, Auto SKlearn will automatically search for the best algorithm that can handle the problem given by the user.

Going to the second feature of Auto SKlearn, we have hyperparameter tuning. As one of the final steps for any machine or deep learning model, users should find the best model parameters to optimize results. This task requires a lot of time and can be easily automated through such frameworks.

The unique and ultimate benefit of using Auto SKlearn is its ability to perform automatic characterization. Characterization is the process of converting raw data into usable information.

3.TPot

TPOT, also known as Tree Pipeline Optimization Tool, is one of the earliest python open source autoML software packages. It focuses on optimizing machine learning pipelines using genetic programming.

The main goal of TPOT is to automatically build ML pipelines by combining flexible expression tree representations of pipelines with stochastic search algorithms such as genetic programming.

Please note that TPOT works on top of the sci-kit-learn library, which must be installed first.

4.AutoKeras

AutoKeras is an open source library built for AutoML and deep learning models, originally developed by DATA Labs.

Auto Keras helps non-expert machine and deep learning enthusiasts run and train their models with minimal effort. Auto Keras aims to make machine learning accessible to everyone and is a great tool for beginners

5. Ludwig

Ludwig is an open source autoML framework that focuses on assembling and training deep learning models using a simple configuration file system.

By letting the user provide a configuration file that defines the inputs and outputs of a given model and their respective data types, the Ludwig framework will leverage this data to build its deep learning model based on the previously mentioned properties.

6. MLBOX

MLBOX is rising and quickly becoming one of the top automated machine learning framework tools.

According to MLBOX official documentation, it provides the following benefits:

  • Fast reading and distributed data preprocessing/cleaning/formatting.
  • Highly robust feature selection and leak detection.
  • Precise hyperparameter optimization in high-dimensional space.
  • State-of-the-art classification and regression prediction models (deep learning, stacking, LightGBM, etc.).
  • Use model explanations to make predictions.

7. AutoGloun

AutoGlounFor expert and non-expert machine learning practitioners, focusing on automation stack integration, deep learning and automation across images, text and Real-world applications of tabular data.

According to the AutoGloun online documentation, AutoGLoun enables users to:

  • Quickly build deep learning and classic ML solutions for raw data with just a few lines of code Prototype.
  • Automatically leverage state-of-the-art technology (where appropriate) without expertise.
  • Leverage automated hyperparameter tuning, model selection/integration, architecture search, and data processing.
  • Easily improve/tweak custom models and data pipelines, or customize AutoGluon for specific use cases.

8. Microsoft Neural Network Intelligence (NNI)

Microsoft Neural Network Intelligence, also known as NNI, is a toolkit designed for deep learning Automate feature engineering, neural architecture search, hyperparameter tuning, and model compression.

NNI tools support PyTorch, TensorFlow, Scikit-learn, XGBoost, LightGBM and other frameworks. The main benefit of using Microsoft Neural Network is neural architecture search, NNI tools support multi-trail (grid search, regularized evolution, policy-based IRL, etc.) and one-shot (DARTS, ENAS FBNet, etc.) neural architecture search.

This tool provides a variety of hyperparameter tuning algorithms, such as Bayesian optimization, exhaustive search and heuristic search. Check out NNI's Readme file on Github to learn more about what else this tool offers.

9. Transmogrif

TransmogrifAI is designed to help developers improve machine learning productivity. TransmogrifAI runs on Apache Spark.

As briefly mentioned in the Github readme on Transmogrif, "With automation, it can achieve an accuracy close to that of manually tuning the model, and in nearly 100x less time."

Like the other autoML frameworks mentioned, the TransmogrifAI tool is capable of selecting the best algorithm for a user-selected dataset.

10. H2O Automatic Machine Learning

H2O autoML is an open source framework tool created by H2O and supports both R and Python programming.

It also supports the most widely used statistical and machine learning algorithms, including gradient boosting machines, generalized linear models, and deep learning.

The H2O autoML interface accommodates new machine learning users by requiring as few parameters as possible. The main task of the user when using the H2O tool is to provide the dataset.

Other Useful AutoML Tools

1. Hypertunity

Hypertunity is a lightweight tool designed to be optimized using lightweight packages The given hyperparameters of the model. They are modular, simple, and extensible to allow seamless scheduling implementations.

Hypertunity supports Bayesian optimization using GPyOpt, Slurm-compatible schedulers, and real-time visualization using Tensorboard (via the HParams plugin).

2. Dragonfly

Dragonfly is an open source autoML tool designed for scalable Bayesian optimization.

Bayesian optimization is used to evaluate very expensive black-box functions beyond ordinary optimization.

Dragonfly allows new users to solve scalable Bayesian optimization errors with minimal knowledge.

3. Ray Tune

As our second hyperparameter optimization tool, Ray Tune is a unified framework for scaling AI and Python applications.

It enables simple AI workload scaling through distributed data processing, distributed training, scalable hyperparameter tuning, scalable reinforcement learning, and scalable programmable services.

4. Automatic graph learning

Auto Graph Learning is a unique autoML framework that focuses on machine learning of graph datasets, very easy and simple.

They use datasets to maintain datasets for graph-based machine learning based on Pytorch Geometric or Datasets in the Deep Graph Library.

GitHub Repository for Automated Machine Learning

As the field of machine and deep learning advances, the need for machine learning experts has increased significantly but remains unmet.

This is where automation of machine learning tools and techniques comes in, allowing new users to build fully functional and highly optimized models more easily than ever before.

In short, when looking for the perfect automated machine learning tool, you should focus on what you are trying to achieve with a given model and the exact part of the machine learning process you wish to automate. We recommend that you try several of the above autoML tools yourself and only use the ones you find efficient and easy to use.

The above is the detailed content of Ten GitHub repositories for AutoML. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete