search

前面我们已经把生产上使用的Hive版本由0.7.0升级到0.12。Hive 0.12在存储层做了很大的改进,包括更高的压缩比以及更高的查询效率。ORCFile在hive 0.11中被引入进来,这是一个不可思议的压缩格式,下面我们来看看它到底有多神奇。首先,我们看看如何使用ORCFi

前面我们已经把生产上使用的Hive版本由0.7.0升级到0.12。Hive 0.12在存储层做了很大的改进,包括更高的压缩比以及更高的查询效率。 ORCFile在hive 0.11中被引入进来,这是一个不可思议的压缩格式,下面我们来看看它到底有多神奇。   首先,我们看看如何使用ORCFile。先建立一张以ORCFile为存储格式的表,如下:
CREATE TABLE orc_test(
...
) STORED AS orc;
把存量的数据转换成ORCFile格式存储是超级简单的,我们只需要把存量表oldtable中的数据insert overwrite到新表orc_test里面即可。
INSERT INTO TABLE orc_test SELECT * FROM oldtable;
ORCFile存储格式有几个表属性可以进一步改善效果,这些属性如下:
属性 默认值 说明
orc.compress ZLIB 列压缩格式(NONE, ZLIB, SNAPPY)
orc.compress.size 262,144 (= 256 KiB) 每一个压缩块大小
orc.stripe.size 268,435,456 (= 256 MiB) 每一个stripe大小
orc.row.index.stride 10,000 index间隔行数(必须大于10000)
orc.create.index true 是否创建内联index
  如果想修改默认的列压缩格式,例如修改为SNAPPY:
CREATE TABLE orc_test2(
...
) STORED AS orc tblproperties ("orc.compress"="SNAPPY");
  使用ORC存储格式实验效果展示。 这里使用ubs一小时的数据作为测试用例。先看看测试表的数据文件存储情况:
   63426542  pt=2014.03.02.00/00000
   64508155  pt=2014.03.02.00/00001
   65929482  pt=2014.03.02.00/00002
   63833597  pt=2014.03.02.00/00003
   ..............................
   ..............................
   63850876  pt=2014.03.02.00/00126
   62094988  pt=2014.03.02.00/00127
测试表数据一共有128个文件,每个文件60MB左右。总的数据量为8063408920B=7.5GB。 通过insert overwrite导入到上文中的orc_test表中,默认结果生成4个文件,文件大小如下:
 260485638 2014-03-03 19:15 orc_test/000000_0
 261423217 2014-03-03 19:15 orc_test/000001_0
 253876162 2014-03-03 19:15 orc_test/000002_0
  18124387 2014-03-03 19:14 orc_test/000003_0
总大小为793909404字节=757MB。 增加文件个数,使其生成16个文件,结果如下:
108734630 2014-03-03 15:19 orc_test3/000000_0
108308186 2014-03-03 15:18 orc_test3/000001_0
110069765 2014-03-03 15:18 orc_test3/000002_0
110582145 2014-03-03 15:18 orc_test3/000003_0
109460085 2014-03-03 15:18 orc_test3/000004_0
110110866 2014-03-03 15:18 orc_test3/000005_0
109337311 2014-03-03 15:18 orc_test3/000006_0
108678617 2014-03-03 15:18 orc_test3/000007_0
108587893 2014-03-03 15:18 orc_test3/000008_0
109880571 2014-03-03 15:18 orc_test3/000009_0
108456961 2014-03-03 15:18 orc_test3/000010_0
109082450 2014-03-03 15:18 orc_test3/000011_0
110406471 2014-03-03 15:18 orc_test3/000012_0
110369926 2014-03-03 15:18 orc_test3/000013_0
109480607 2014-03-03 15:18 orc_test3/000014_0
109651103 2014-03-03 15:18 orc_test/000015_0
由结果可见,每个文件大小在100MB左右,表的总大小为1751197587字节=1.63GB。 测试小结:
测试表 存储 压缩比 count(*)时间(秒)
rcfile原始表 7.5GB 100% 48
orc_test(4个文件) 757MB 10% 57
orc_test3(16个文件) 1.63GB 22% 56
由上表可见,orc存储格式对存储的压缩提升了很多,而计算效率损失不大。业务方可以根据自身的计算特点选择使用。    
Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
MySQL String Types: Storage, Performance, and Best PracticesMySQL String Types: Storage, Performance, and Best PracticesMay 10, 2025 am 12:02 AM

MySQLstringtypesimpactstorageandperformanceasfollows:1)CHARisfixed-length,alwaysusingthesamestoragespace,whichcanbefasterbutlessspace-efficient.2)VARCHARisvariable-length,morespace-efficientbutpotentiallyslower.3)TEXTisforlargetext,storedoutsiderows,

Understanding MySQL String Types: VARCHAR, TEXT, CHAR, and MoreUnderstanding MySQL String Types: VARCHAR, TEXT, CHAR, and MoreMay 10, 2025 am 12:02 AM

MySQLstringtypesincludeVARCHAR,TEXT,CHAR,ENUM,andSET.1)VARCHARisversatileforvariable-lengthstringsuptoaspecifiedlimit.2)TEXTisidealforlargetextstoragewithoutadefinedlength.3)CHARisfixed-length,suitableforconsistentdatalikecodes.4)ENUMenforcesdatainte

What are the String Data Types in MySQL?What are the String Data Types in MySQL?May 10, 2025 am 12:01 AM

MySQLoffersvariousstringdatatypes:1)CHARforfixed-lengthstrings,2)VARCHARforvariable-lengthtext,3)BINARYandVARBINARYforbinarydata,4)BLOBandTEXTforlargedata,and5)ENUMandSETforcontrolledinput.Eachtypehasspecificusesandperformancecharacteristics,sochoose

How to Grant Permissions to New MySQL UsersHow to Grant Permissions to New MySQL UsersMay 09, 2025 am 12:16 AM

TograntpermissionstonewMySQLusers,followthesesteps:1)AccessMySQLasauserwithsufficientprivileges,2)CreateanewuserwiththeCREATEUSERcommand,3)UsetheGRANTcommandtospecifypermissionslikeSELECT,INSERT,UPDATE,orALLPRIVILEGESonspecificdatabasesortables,and4)

How to Add Users in MySQL: A Step-by-Step GuideHow to Add Users in MySQL: A Step-by-Step GuideMay 09, 2025 am 12:14 AM

ToaddusersinMySQLeffectivelyandsecurely,followthesesteps:1)UsetheCREATEUSERstatementtoaddanewuser,specifyingthehostandastrongpassword.2)GrantnecessaryprivilegesusingtheGRANTstatement,adheringtotheprincipleofleastprivilege.3)Implementsecuritymeasuresl

MySQL: Adding a new user with complex permissionsMySQL: Adding a new user with complex permissionsMay 09, 2025 am 12:09 AM

ToaddanewuserwithcomplexpermissionsinMySQL,followthesesteps:1)CreatetheuserwithCREATEUSER'newuser'@'localhost'IDENTIFIEDBY'password';.2)Grantreadaccesstoalltablesin'mydatabase'withGRANTSELECTONmydatabase.TO'newuser'@'localhost';.3)Grantwriteaccessto'

MySQL: String Data Types and CollationsMySQL: String Data Types and CollationsMay 09, 2025 am 12:08 AM

The string data types in MySQL include CHAR, VARCHAR, BINARY, VARBINARY, BLOB, and TEXT. The collations determine the comparison and sorting of strings. 1.CHAR is suitable for fixed-length strings, VARCHAR is suitable for variable-length strings. 2.BINARY and VARBINARY are used for binary data, and BLOB and TEXT are used for large object data. 3. Sorting rules such as utf8mb4_unicode_ci ignores upper and lower case and is suitable for user names; utf8mb4_bin is case sensitive and is suitable for fields that require precise comparison.

MySQL: What length should I use for VARCHARs?MySQL: What length should I use for VARCHARs?May 09, 2025 am 12:06 AM

The best MySQLVARCHAR column length selection should be based on data analysis, consider future growth, evaluate performance impacts, and character set requirements. 1) Analyze the data to determine typical lengths; 2) Reserve future expansion space; 3) Pay attention to the impact of large lengths on performance; 4) Consider the impact of character sets on storage. Through these steps, the efficiency and scalability of the database can be optimized.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!