Hbase入门6 -白话MySQL(RDBMS)与HBase之间-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

Hbase入门6 -白话MySQL(RDBMS)与HBase之间

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:26 PM

hbasemysqlgetting Started

我的废话1: 任何一项新技术并非救命稻草，一抹一擦立马药到病除的百宝箱，并非使用Spring或者NOSQL的产品就神乎其神+五光十色，如果那样基本是扯淡。同类型产品中不管那种技术最终要达到的目的是一样的，通过新的技术手段你往往可能避讳了当前你所需要面对

我的废话1:
任何一项新技术并非救命稻草，一抹一擦立马药到病除的百宝箱，并非使用Spring或者NOSQL的产品就神乎其神+五光十色，如果那样基本是扯淡。同类型产品中不管那种技术最终要达到的目的是一样的，通过新的技术手段你往往可能避讳了当前你所需要面对的问题，但过后新的问题又来了。也许回过头来看看还不如在原来的基础上多动动脑筋想想办法做些改良可以得到更高的回报。

传统数据库是以数据块来存储数据，简单来说，你的表字段越多，占用的数据空间就越多，那么查询有可能就要跨数据块，将会导致查询的速度变慢。在大型系统中一张表上百个字段，并且表中的数据上亿条这是完全是有可能的。因此会带来数据库查询的瓶颈。我们都知道一个常识数据库中表记录的多少对查询的性能有非常大的影响，此时你很有可能想到分表、分库的做法来分载数据库运算的压力，那么又会带来新的问题，例如：分布式事务、全局唯一ID的生成、跨数据库查询等，依旧会让你面对棘手的问题。如果打破这种按照行存储的模式，采用一种基于列存储的模式，对于大规模数据场景这样情况有可能发生一些好转。由于查询中的选择规则是通过列来定义的，因此整个数据库是自动索引化的。按列存储每个字段的数据聚集存储，可以动态增加，并且列为空就不存储数据，节省存储空间。每个字段的数据按照聚集存储，能大大减少读取的数据量，查询时指哪打哪，来的更直接。无需考虑分库、分表 Hbase将对存储的数据自动切分数据，并支持高并发读写操作，使得海量数据存储自动具有更强的扩展性。

Java中的HashMap是Key/Value的结构，你也可以把HBase的数据结构看做是一个Key/Value的体系,话说HBase的区域由表名和行界定的。在HBase区域每一个"列族"都由一个名为HStore的对象管理。每个HStore由一个或多个MapFiles(Hadoop中的一个文件类型)组成。MapFiles的概念类似于Google的SSTable。在Hbase里面有以下两个主要的概念，Row key 和 Column Family，其次是Cell qualifier和Timestamp tuple，Column family我们通常称之为“列族”，访问控制、磁盘和内存的使用统计都是在列族层面进行的。列族Column family是之前预先定义好的数据模型，每一个Column Family都可以根据“限定符”有多个column。在HBase每个cell存储单元对同一份数据有多个版本，根据唯一的时间戳来区分每个版本之间的差异，最新的数据版本排在最前面。

口水：Hbase将table水平划分成N个Region，region按column family划分成Store，每个store包括内存中的memstore和持久化到disk上的HFile。

上述可能我表达的还不够到位，下面来看一个实践中的场景，将原来是存放在MySQL中Blog中的数据迁移到HBase中的过程：
MySQL中现有的表结构：

迁移HBase中的表结构：

原来系统中有2张表blogtable和comment表，采用HBase后只有一张blogtable表，如果按照传统的RDBMS的话，blogtable表中的列是固定的，比如schema 定义了Author,Title,URL,text等属性，上线后表字段是不能动态增加的。但是如果采用列存储系统，比如Hbase，那么我们可以定义blogtable表，然后定义info 列族，User的数据可以分为：info:title ,info:author ,info:url 等，如果后来你又想增加另外的属性，这样很方便只需要 info:xxx 就可以了。
对于Row key你可以理解row key为传统RDBMS中的某一个行的主键，Hbase是不支持条件查询以及Order by等查询，因此Row key的设计就要根据你系统的查询需求来设计了额。 Hbase中的记录是按照rowkey来排序的，这样就使得查询变得非常快。

具体操作过程如下：
============================创建blogtable表=========================
create 'blogtable', 'info','text','comment_title','comment_author','comment_text'

============================插入概要信息=========================
put 'blogtable', '1', 'info:title', 'this is doc title'
put 'blogtable', '1', 'info:author', 'javabloger'
put 'blogtable', '1', 'info:url', 'http://www.javabloger.com/index.php'

put 'blogtable', '2', 'info:title', 'this is doc title2'
put 'blogtable', '2', 'info:author', 'H.E.'
put 'blogtable', '2', 'info:url', 'http://www.javabloger.com/index.html'

============================插入正文信息=========================
put 'blogtable', '1', 'text:', 'what is this doc context ?'
put 'blogtable', '2', 'text:', 'what is this doc context2?'

==========================插入评论信息===============================
put 'blogtable', '1', 'comment_title:', 'this is doc comment_title '
put 'blogtable', '1', 'comment_author:', 'javabloger'
put 'blogtable', '1', 'comment_text:', 'this is nice doc'

put 'blogtable', '2', 'comment_title:', 'this is blog comment_title '
put 'blogtable', '2', 'comment_author:', 'H.E.'
put 'blogtable', '2', 'comment_text:', 'this is nice blog'

HBase的数据查询\读取，可以通过单个row key访问，row key的range和全表扫描,大致如下：
注意：HBase不能支持where条件、Order by 查询，只支持按照Row key来查询，但是可以通过HBase提供的API进行条件过滤。
例如：http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html

scan 'blogtable' ,{COLUMNS => ['text:','info:title'] } —> 列出文章的内容和标题

scan 'blogtable' , {COLUMNS => 'info:url' , STARTROW => '2'} —> 根据范围列出文章的内容和标题

get 'blogtable','1' —> 列出文章id 等于1的数据

get 'blogtable','1', {COLUMN => 'info'} —> 列出文章id 等于1 的 info 的头(Head)内容

get 'blogtable','1', {COLUMN => 'text'} —> 列出文章id 等于1 的 text 的具体(Body)内容

get 'blogtable','1', {COLUMN => ['text','info:author']} —> 列出文章id 等于1 的内容和作者(Body/Author)内容

我的废话2:
有人会问Java Web服务器中是Tomcat快还是GlassFish快？小型数据库中是MySQL效率高还是MS-SQL效率高？我看是关键用在什么场景和怎么使用这个产品(技术)，所以我渐渐的认为是需要对产品、技术本身深入的了解，而并非一项新的技术就是绝佳的选择。试问：Tomcat的默认的运行参数能和我们线上正在使用的GlassFish性能相提并论吗？我不相信GlassFishv2和GlassFishv3在默认的配置参数下有显著的差别。我们需要对产品本身做到深入的了解才能发挥他最高的性能，而并非感观听从厂家的广告和自己的感性认识迷信哪个产品的优越性。

我的废话3:
对于NOSQL这样的新技术，的的确确是可以解决过去我们所需要面对的问题，但也并非适合每个应用场景，所以在使用新产品的同时需要切合当前的产品需要，是需求在引导新技术的投入，而并非为了赶时髦去使用他。你的产品是否过硬不是你使用了什么新技术，用户关心的是速度和稳定性，不会关心你是否使用了 NOSQL。相反Google有着超大的数据量，能给全世界用户带来了惊人的速度和准确性，大家才会回过头来好奇Google到底是怎么做到的。所以根据自己的需要千万别太勉强自己使用了某项新技术。

我的废话4:
总之一句话，用什么不是最关键，最关键是怎么去使用！

相关文章:
Lily-建立在HBase上的分布式搜索
MySQL向Hive/HBase的迁移工具
HBase入门5(集群) -压力分载与失效转发
Hive入门3–Hive与HBase的整合
HBase入门篇4
HBase入门篇3
HBase入门篇2-Java操作HBase例子
HBase入门篇
基于Hbase存储的分布式消息(IM)系统-JABase

–end–

原文地址：Hbase入门6 -白话MySQL(RDBMS)与HBase之间, 感谢原作者分享。

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the InnoDB Buffer Pool and its importance for performance.Apr 19, 2025 am 12:24 AM

InnoDBBufferPool reduces disk I/O by caching data and indexing pages, improving database performance. Its working principle includes: 1. Data reading: Read data from BufferPool; 2. Data writing: After modifying the data, write to BufferPool and refresh it to disk regularly; 3. Cache management: Use the LRU algorithm to manage cache pages; 4. Reading mechanism: Load adjacent data pages in advance. By sizing the BufferPool and using multiple instances, database performance can be optimized.

MySQL vs. Other Programming Languages: A ComparisonApr 19, 2025 am 12:22 AM

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

Learning MySQL: A Step-by-Step Guide for New UsersApr 19, 2025 am 12:19 AM

MySQL is worth learning because it is a powerful open source database management system suitable for data storage, management and analysis. 1) MySQL is a relational database that uses SQL to operate data and is suitable for structured data management. 2) The SQL language is the key to interacting with MySQL and supports CRUD operations. 3) The working principle of MySQL includes client/server architecture, storage engine and query optimizer. 4) Basic usage includes creating databases and tables, and advanced usage involves joining tables using JOIN. 5) Common errors include syntax errors and permission issues, and debugging skills include checking syntax and using EXPLAIN commands. 6) Performance optimization involves the use of indexes, optimization of SQL statements and regular maintenance of databases.

MySQL: Essential Skills for Beginners to MasterApr 18, 2025 am 12:24 AM

MySQL is suitable for beginners to learn database skills. 1. Install MySQL server and client tools. 2. Understand basic SQL queries, such as SELECT. 3. Master data operations: create tables, insert, update, and delete data. 4. Learn advanced skills: subquery and window functions. 5. Debugging and optimization: Check syntax, use indexes, avoid SELECT*, and use LIMIT.

MySQL: Structured Data and Relational DatabasesApr 18, 2025 am 12:22 AM

MySQL efficiently manages structured data through table structure and SQL query, and implements inter-table relationships through foreign keys. 1. Define the data format and type when creating a table. 2. Use foreign keys to establish relationships between tables. 3. Improve performance through indexing and query optimization. 4. Regularly backup and monitor databases to ensure data security and performance optimization.

MySQL: Key Features and Capabilities ExplainedApr 18, 2025 am 12:17 AM

MySQL is an open source relational database management system that is widely used in Web development. Its key features include: 1. Supports multiple storage engines, such as InnoDB and MyISAM, suitable for different scenarios; 2. Provides master-slave replication functions to facilitate load balancing and data backup; 3. Improve query efficiency through query optimization and index use.

The Purpose of SQL: Interacting with MySQL DatabasesApr 18, 2025 am 12:12 AM

SQL is used to interact with MySQL database to realize data addition, deletion, modification, inspection and database design. 1) SQL performs data operations through SELECT, INSERT, UPDATE, DELETE statements; 2) Use CREATE, ALTER, DROP statements for database design and management; 3) Complex queries and data analysis are implemented through SQL to improve business decision-making efficiency.

MySQL for Beginners: Getting Started with Database ManagementApr 18, 2025 am 12:10 AM

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Hot Topics

Where is the login entrance for gmail email?

7627

CakePHP Tutorial

1389

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

140