search
HomeDatabaseMysql TutorialSome thoughts and designs on migrating data from mysql to hbase

1. Reasons for migration

Due to the development of business, using mysql to create indexes and search has caused the bottleneck of data flow to be stuck in the database io. For example, every time a full table is dumped, it will cause The pressure is too great, which takes a long time, and the current data volume has basically reached 100 million levels. If you want mysql to provide better services, you must consider sub-databases and tables in the next step; based on this In this case, consider using hbase for data storage, because the amount of data that hbase can bear is much larger than mysql, and the expansion of columns is also very convenient

2. Some differences between relational databases and Nosql

(1) Differences in storage methods

In relational databases such as mysql, sqlserver, oracle, data is stored according to rows, as shown in the following figure:


But in hbase, all data is stored based on columns, as shown below:


The logical model of hbase is as follows:


Among them: com.cnn.ww corresponds to rowkey, which is equivalent to the concept of mysql's primary key

contents, anchor: These two correspond to the concept of column family. In terms of physical storage, data of the same column family is stored in the same file

cnnsi.com, mylook.ca: corresponds to Columns under the column family can be dynamically added in hbase

The corresponding grid data represents unit data, that is, corresponding to rowkey, cf: the specific value under column

Among them, tn: represents the timestamp, different versions of unit data

One of the storage structures is as follows:



##(2) Some differences between CRUD

CRUD is the most basic and commonly used operation of the database. There are also corresponding commands in hbase. For example, the table creation statement for mysql will not be detailed here. For The hbase shell is as follows

create 'table','columnfamily'

You can create a table named table, the column family is columnfamily, and some other blocksize and version data are default

When reading data, use hbase statements such as: get 'table', 'row', 'cf:column' to get the corresponding data

When updating data, use hbase There is no concept of corresponding updates, but there will be a new version, which can be reflected from the timestamp. The statements used are

put 'table', 'row', 'cf:name', 'value '

can assign the value of value to the corresponding cf column family. The column of name

is the difference between deleting data. Deleting data in mysql can only be to delete a row directly or to change a certain column. Set it to empty, and you can directly delete a column in hbase

(3) Differences in indexes

In mysql, you can create indexes or filter queries, but in hbase, only rowkey is supported The fastest query speed

(4) Thoughts on the development from mysql to nosql

The history of relational databases has been long, but when the amount of data expands, for example, for the mysql database, when the amount of data reaches hundreds of millions or more Sometimes, if you query according to the index, the effect may not be particularly obvious. In the end, you can only query according to the primary key, or gradually develop into a sub-database and sub-table model. However, sub-database and sub-table bring a lot of trouble to operation, maintenance and use. Big trouble; so at this time, the development of primary key of nosql database, nosql abbreviated as not only sql, gradually developed and expanded as the amount of data increased dramatically. Taking hbase in nosql as an example, it supports TB and PB data, and columns The expansion is particularly flexible

(5) Why can hbase store massive amounts of data?

In fact, hbase can be regarded as the result of mysql sub-database and table sub-database, but the difference is that mysql sub-database is divided into The table supports indexes, etc., but hbase only supports rowkey as the primary key index. From the book, we can know that hbase data is stored according to columns, and when the data is too large, it will be split according to rows, as shown below :



## Put different regions on different machines, and finally there is a master for management, which is equivalent to The rows and columns are divided to store a large amount of data

3. Some problems encountered in data migration

(1) Problems with joint index

There will be problems in mysql In some joint index situations, for example, there is a table of correspondence between products and categories. We need to get all the categories of a certain product, and we also hope to get all the products of a certain category. In mysql, we can directly follow the joint index to meet the requirements, but in What should I do when hbase can only query according to rowkey?

After reading the relevant data, there are two solutions as follows

1. Build a wide table

In hbase , allowing the columns between rows to be different, as long as there is a common column family, then for the above situation, you can build a wide table classified as rowkey, as shown below

Classification id , as rowkey

product_id, as column name

value is stored as whether to delete


The above rowkey can be the classification id , you can get all product_id directly from row, and then filter whether to delete it yourself

2. Build a tall table

What is building a tall table, that is to say, you don’t need so many columns, just To store multiple rows, because hbase is sorted in dictionary order, the following design can be done

Classification id_product id, as rowkey


As long as you scan the rows starting with 1, you can get all the data

Essentially, the above two methods build a secondary index to store the data


The above are some thoughts and designs on migrating data from mysql to hbase. For more related content, please pay attention to the PHP Chinese website (www.php. cn)!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Explain the InnoDB Buffer Pool and its importance for performance.Explain the InnoDB Buffer Pool and its importance for performance.Apr 19, 2025 am 12:24 AM

InnoDBBufferPool reduces disk I/O by caching data and indexing pages, improving database performance. Its working principle includes: 1. Data reading: Read data from BufferPool; 2. Data writing: After modifying the data, write to BufferPool and refresh it to disk regularly; 3. Cache management: Use the LRU algorithm to manage cache pages; 4. Reading mechanism: Load adjacent data pages in advance. By sizing the BufferPool and using multiple instances, database performance can be optimized.

MySQL vs. Other Programming Languages: A ComparisonMySQL vs. Other Programming Languages: A ComparisonApr 19, 2025 am 12:22 AM

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages ​​such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages ​​have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

Learning MySQL: A Step-by-Step Guide for New UsersLearning MySQL: A Step-by-Step Guide for New UsersApr 19, 2025 am 12:19 AM

MySQL is worth learning because it is a powerful open source database management system suitable for data storage, management and analysis. 1) MySQL is a relational database that uses SQL to operate data and is suitable for structured data management. 2) The SQL language is the key to interacting with MySQL and supports CRUD operations. 3) The working principle of MySQL includes client/server architecture, storage engine and query optimizer. 4) Basic usage includes creating databases and tables, and advanced usage involves joining tables using JOIN. 5) Common errors include syntax errors and permission issues, and debugging skills include checking syntax and using EXPLAIN commands. 6) Performance optimization involves the use of indexes, optimization of SQL statements and regular maintenance of databases.

MySQL: Essential Skills for Beginners to MasterMySQL: Essential Skills for Beginners to MasterApr 18, 2025 am 12:24 AM

MySQL is suitable for beginners to learn database skills. 1. Install MySQL server and client tools. 2. Understand basic SQL queries, such as SELECT. 3. Master data operations: create tables, insert, update, and delete data. 4. Learn advanced skills: subquery and window functions. 5. Debugging and optimization: Check syntax, use indexes, avoid SELECT*, and use LIMIT.

MySQL: Structured Data and Relational DatabasesMySQL: Structured Data and Relational DatabasesApr 18, 2025 am 12:22 AM

MySQL efficiently manages structured data through table structure and SQL query, and implements inter-table relationships through foreign keys. 1. Define the data format and type when creating a table. 2. Use foreign keys to establish relationships between tables. 3. Improve performance through indexing and query optimization. 4. Regularly backup and monitor databases to ensure data security and performance optimization.

MySQL: Key Features and Capabilities ExplainedMySQL: Key Features and Capabilities ExplainedApr 18, 2025 am 12:17 AM

MySQL is an open source relational database management system that is widely used in Web development. Its key features include: 1. Supports multiple storage engines, such as InnoDB and MyISAM, suitable for different scenarios; 2. Provides master-slave replication functions to facilitate load balancing and data backup; 3. Improve query efficiency through query optimization and index use.

The Purpose of SQL: Interacting with MySQL DatabasesThe Purpose of SQL: Interacting with MySQL DatabasesApr 18, 2025 am 12:12 AM

SQL is used to interact with MySQL database to realize data addition, deletion, modification, inspection and database design. 1) SQL performs data operations through SELECT, INSERT, UPDATE, DELETE statements; 2) Use CREATE, ALTER, DROP statements for database design and management; 3) Complex queries and data analysis are implemented through SQL to improve business decision-making efficiency.

MySQL for Beginners: Getting Started with Database ManagementMySQL for Beginners: Getting Started with Database ManagementApr 18, 2025 am 12:10 AM

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.