Will machine learning put database management system operators out of work?-LINUX-php.cn

Home

System Tutorial

LINUX

Will machine learning put database management system operators out of work?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jan 13, 2024 pm 06:24 PM

linuxlinux tutorialRed Hatlinux systemlinux commandlinux certificationred hat linuxlinux video

Introduction

The database management system (DBMS) is the most important part of any data-intensive application system. They can handle large amounts of data and complex workloads. But they are difficult to manage because they have hundreds or thousands of configuration "knobs" that control factors such as the amount of memory used for caching and how often data is written to the storage device. Organizations often hire experts to help fine-tune their campaigns, but experts are prohibitively expensive for many businesses.

This article was written by three guests from Carnegie Mellon University: Dana Van Aken, Andy Pavlo and Geoff Gordon Co-authored article. This project demonstrates how academic researchers can use the AWS Cloud Credits for Research Program (https://aws.amazon.com/research-credits/) to support their scientific breakthroughs.

OtterTune is a new tool developed by students and researchers in the Carnegie Mellon University Database Group (http://db.cs.cmu.edu/projects/autotune/) that automatically Configure button to find the appropriate setting. The goal is to make it easier for anyone to deploy a DBMS, even those with no expertise in database administration.

OtterTune is different from other DBMS configuration tools because it fully leverages the knowledge gained from tuning previously deployed DBMSs to tune newly deployed DBMSs. This significantly reduces the time and resources required to tune a newly deployed DBMS. To this end, OtterTune maintains a database containing tuning data collected from previous tuning sessions. It uses this data to build machine learning models that capture information about how the DBMS reacts to different configurations. OtterTune uses these models to guide users when trying out new applications, suggesting settings that improve specific goals, such as reducing latency or increasing throughput.

In this article we explore each component of OtterTune’s machine learning pipeline and demonstrate how they relate to each other to tune the configuration of your DBMS. We then evaluated OtterTune's performance on MySQL and Postgres by comparing the performance of its optimal configurations with configurations chosen by database administrators (DBAs) and other automated tuning tools.

OtterTune is an open source tool developed by students and researchers in the Database Group at Carnegie Mellon University. All code is placed on GitHub (https://github.com/cmu-db/ottertune) and is released under the Apache License 2.0.

working principle

The picture below shows the components and workflow of OtterTune.

Will machine learning put database management system operators out of work?

At the beginning of a new tuning session, the user tells OtterTune which specific goal to optimize for (such as latency or throughput). The client controller connects to the target DBMS and collects the Amazon EC2 instance type and current target.

The controller then begins the first observation period, during which it observes the DBMS and logs specific targets. After the observation period is over, the controller collects internal metrics from the DBMS, such as MySQL counts of pages read from disk and pages written to disk. The controller returns both specific goals and internal metrics to the tuning manager.

After OtterTune's tuning manager receives the metrics, it stores them in the database. OtterTune uses the results to calculate the next configuration the controller should be installed on the target DBMS. The tuning manager returns this configuration to the controller and estimates the expected improvements through actual runs. The user can decide to continue the tuning session or terminate it.

illustrate

OtterTune maintains a button blacklist for each DBMS version it supports. The blacklist includes buttons that do not need to be tuned (such as the path name of the DBMS storage file), or buttons that may have serious or hidden consequences (such as may cause the DBMS to lose data). At the beginning of each tuning session, OtterTune provides users with a blacklist so they can add any other buttons they want OtterTune to avoid tuning.

OtterTune makes certain assumptions that may limit its usefulness to some users. For example, it assumes that the user has administrator rights, allowing the controller to modify the DBMS configuration. If the user does not have administrator rights, they can deploy a second copy of the database to other hardware for OtterTune tuning experiments. This requires users to replay workload traces or forward queries from a production-grade DBMS. For a complete discussion of assumptions and limitations, see our paper (http://db.cs.cmu.edu/papers/2017/tuning-sigmod2017.pdf).

Machine Learning Pipeline

The diagram below shows how data is processed as it passes through OtterTune’s machine learning pipeline. All observations are placed in OtterTune's database.

OtterTune first transmits the observation results to the Workload Characterization component. This component identifies a small set of DBMS metrics that most accurately capture performance changes and unique characteristics of different workloads.

Next, the Knob Identification component generates a button sorting list, listing the buttons that have the greatest impact on the performance of the DBMS. OtterTune then feeds all this information to Automatic Tuner. This component maps the workload of the target DBMS to the most similar workload in the data repository and reuses the workload data to generate a more appropriate configuration.

Will machine learning put database management system operators out of work?

Now let’s dive into each component of the machine learning pipeline.

Workload Characterization: OtterTune uses the DBMS's internal runtime metrics to describe the behavioral characteristics of the workload. These metrics accurately represent the workload because they capture many aspects of runtime behavior. However, many metrics are redundant: some are the same metric value recorded in different units, and others represent independent parts of the DBMS where values are highly correlated. Streamlining redundant metrics is important because it reduces the complexity of the machine learning models that use them. To this end, we divide DBMS metrics into clusters based on correlation patterns. We then select a representative metric from each cluster, specifically the one closest to the cluster center. Subsequent components in the machine learning pipeline use these metrics.

Knob Identification: A DBMS may have hundreds of buttons, but only a small number of buttons affect the performance of the DBMS. OtterTune uses a popular feature selection technique called Lasso to decide which buttons significantly affect the overall performance of your system. By applying this technique to data in a database, OtterTune can identify the DBMS' button order of importance.

OtterTune then has to decide how many buttons to use in the proposed configuration. Using too many buttons greatly increases OtterTune's optimization time. Using too few buttons prevents OtterTune from finding the optimal configuration. To automate this process, OtterTune uses an incremental approach. It gradually increases the number of buttons used in a tuning session. This approach allows OtterTune to explore and optimize configurations for a small set of the most important buttons, and then expand the scope to consider additional buttons.

Automatic Tuner: The Automated Tuning component determines which configuration OtterTune should recommend by performing a two-step analysis after each observation period.

First, the system identifies the workload from a previous tuning session that best represents the target DBMS workload using performance data for the metrics identified in the Workload Characterization component. It compares the session's metrics to metrics from previous workloads to see which ones respond similarly to different button settings.

Then, OtterTune selects another button configuration to give it a try. It fits statistical models to the data that has been collected, as well as data from the most similar workloads in the repository. This model allows OtterTune to predict how the DBMS will perform using every possible configuration. OtterTune optimizes the next configuration to strike a balance between exploration (gathering information to improve the model) and exploitation (performing as well as possible on a specific metric).

accomplish

OtterTune is written in Python.

As far as the two components of Workload Characterization and Knob Identification are concerned, runtime performance is not the main issue to worry about, so we used scikit-learn to implement the corresponding machine learning algorithm. These algorithms run in a background process and will integrate new data as soon as it becomes available in OtterTune's database.

As for the Automatic Tuner, the machine learning algorithm is on the critical path. They run after each observation period, integrating new data so that OtterTune can select a button configuration to try next. Since performance is a consideration, we implemented these algorithms using TensorFlow.

To collect data on DBMS hardware, button configurations and runtime performance metrics, we integrated OtterTune's controller with the OLTP-Bench benchmarking framework.

Evaluate

For evaluation, we compared the best configuration selected by OtterTune with the following configurations for MySQL and Postgres performance:

Default: Configuration provided by DBMS
Tuning script: Configuration generated by open source tuning advisory tool
DBA: Database Administrator Generated Configuration
RDS: Configuration customized for DBMS, managed by Amazon RD, deployed on EC2 instances of the same type.

We conducted all experiments on Amazon EC2 Spot Instances. We conducted each trial on two instances: one for OtterTune's controller and another for the deployed target DBMS system. We used m4.large and m3.xlarge instance types respectively. We deployed OtterTune's tuning manager and data database on a local server equipped with 20 cores and 128GB of memory.

We used the TPC-C workload, which is the industry standard for evaluating the performance of online transaction processing (OLTP) systems.

We measured latency and throughput against each database we used in our experiments: MySQL and Postgres. The following figures show the results. The first graph shows the amount of 99th percentile latency, which represents the "worst case" time it takes for a transaction to complete. The second graph shows the results for throughput, measured as the average number of transactions completed per second.

MySQL results:

Will machine learning put database management system operators out of work?

Comparing the optimal configuration generated by OtterTune with the configuration generated by the tuning script and RDS, you will find that if you use OtterTune configuration, MySQL's latency is reduced by about 60% and the throughput is increased by 35%. OtterTune also produces configurations with results that are as good as those chosen by the database administrator.

A few MySQL buttons have a significant impact on the performance of TPC-C workloads. The configurations generated by OtterTune and the database administrator provide good settings for each of these buttons. RDS performed a little less well due to providing sub-optimal settings for one button. The configuration of the tuning script performed the worst since only one button was modified.

Results for Postgres:

Will machine learning put database management system operators out of work?

In terms of latency, OtterTune, tuning tools, database management, and RDS-generated configurations all offer similar improvements over Postgres' default settings. We can probably attribute this to the overhead required for the round trip between the OLTP-Bench client and the DBMS over the network. As for throughput, if you use the configuration recommended by OtterTune, Postgres performance is about 12% higher than the configuration selected by the database administrator and tuning script, and about 32% higher than RDS.

Similar to MySQL, only a few buttons have a significant impact on Postgres performance. OtterTune, database administrators, tuning scripts, and RDS-generated configurations all modify these buttons, and most provide pretty good settings.

Conclusion

OtterTune automates the process of finding the appropriate settings for a DBMS configuration button. To tune a newly deployed DBMS, it reuses training data collected from previous tuning sessions. Because OtterTune does not require the generation of initial data sets for training machine learning models, tuning time is significantly reduced.

What’s next? To accommodate the growing popularity of DBaaS deployments that do not have remote access to the DBMS's host system, OtterTune will soon be able to automatically detect the hardware capabilities of the target DMBS without requiring remote access.

For more details on OtterTune, see our paper or the code on GitHub. Please stay tuned to this website (http://ottertune.cs.cmu.edu/), we will soon launch OtterTune, an online tuning service.
About the Author:

Dana Van Aken is a doctoral student in computer science at Carnegie Mellon University, mentored by Dr. Andrew Pavlo.

Andy Pavlo is an assistant professor of database science in the Department of Computer Science at Carnegie Mellon University.

Geoff Gordon is an associate professor and associate director of education in the Machine Learning Department at Carnegie Mellon University.

The above is the detailed content of Will machine learning put database management system operators out of work?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:Linux就该这么学. If there is any infringement, please contact admin@php.cn delete

Xojo 2025r1: Big Updates for Developers with Linux ARM Support, Web Drag and Drop, and Direct App Store PublishingApr 25, 2025 am 09:50 AM

The Future of Linux Software: Will Flatpak and Snap Replace Native Desktop Apps?Apr 25, 2025 am 09:10 AM

For years, Linux software distribution relied on native formats like DEB and RPM, deeply ingrained in each distribution's ecosystem. However, Flatpak and Snap have emerged, promising a universal approach to application packaging. This article exami

What are the differences in how Linux and Windows handle device drivers?Apr 25, 2025 am 12:13 AM

The differences between Linux and Windows in handling device drivers are mainly reflected in the flexibility of driver management and the development environment. 1. Linux adopts a modular design, and the driver can be loaded and uninstalled dynamically. Developers need to have an in-depth understanding of the kernel mechanism. 2. Windows relies on the Microsoft ecosystem, and the driver needs to be developed through WDK and signed and certified. The development is relatively complex but ensures the stability and security of the system.

Compare and contrast the security models of Linux and Windows.Apr 24, 2025 am 12:03 AM

The security models of Linux and Windows each have their own advantages. Linux provides flexibility and customizability, enabling security through user permissions, file system permissions, and SELinux/AppArmor. Windows focuses on user-friendliness and relies on WindowsDefender, UAC, firewall and BitLocker to ensure security.

How does hardware compatibility differ between Linux and Windows?Apr 23, 2025 am 12:15 AM

Linux and Windows differ in hardware compatibility: Windows has extensive driver support, and Linux depends on the community and vendors. To solve Linux compatibility problems, you can manually compile drivers, such as cloning RTL8188EU driver repository, compiling and installing; Windows users need to manage drivers to optimize performance.

What are the differences in virtualization support between Linux and Windows?Apr 22, 2025 pm 06:09 PM

The main differences between Linux and Windows in virtualization support are: 1) Linux provides KVM and Xen, with outstanding performance and flexibility, suitable for high customization environments; 2) Windows supports virtualization through Hyper-V, with a friendly interface, and is closely integrated with the Microsoft ecosystem, suitable for enterprises that rely on Microsoft software.

What are the main tasks of a Linux system administrator?Apr 19, 2025 am 12:23 AM

The main tasks of Linux system administrators include system monitoring and performance tuning, user management, software package management, security management and backup, troubleshooting and resolution, performance optimization and best practices. 1. Use top, htop and other tools to monitor system performance and tune it. 2. Manage user accounts and permissions through useradd commands and other commands. 3. Use apt and yum to manage software packages to ensure system updates and security. 4. Configure a firewall, monitor logs, and perform data backup to ensure system security. 5. Troubleshoot and resolve through log analysis and tool use. 6. Optimize kernel parameters and application configuration, and follow best practices to improve system performance and stability.

Is it hard to learn Linux?Apr 18, 2025 am 12:23 AM

Learning Linux is not difficult. 1.Linux is an open source operating system based on Unix and is widely used in servers, embedded systems and personal computers. 2. Understanding file system and permission management is the key. The file system is hierarchical, and permissions include reading, writing and execution. 3. Package management systems such as apt and dnf make software management convenient. 4. Process management is implemented through ps and top commands. 5. Start learning from basic commands such as mkdir, cd, touch and nano, and then try advanced usage such as shell scripts and text processing. 6. Common errors such as permission problems can be solved through sudo and chmod. 7. Performance optimization suggestions include using htop to monitor resources, cleaning unnecessary files, and using sy

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

4 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months agoByDDD

Atomfall guide: item locations, quest guides, and tips

1 months agoByDDD

Hot Tools

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Hot Topics

Where is the login entrance for gmail email?

7715

1641

1395

1289

1232