search

By Lars Hofhansl Modern CPU cores can execute hundreds of instructions in the time it takes to reload the L1 cache. "RAM is the new disk" as a coworker at Salesforce likes to say. The L1-cache is the new RAM I might add. As we add more and

By Lars Hofhansl

Modern CPU cores can execute hundreds of instructions in the time it takes to reload the L1 cache. "RAM is the new disk" as a coworker at Salesforce likes to say. The L1-cache is the new RAM I might add.

As we add more and more CPU cores, we can easily be memory IO bound unless we are a careful.

Many common problems I have seen over the years were related to:
  1. concurrency problems
    Aside from safety and liveliness considerations, a typical problem is too much synchronization limiting potential parallel execution.
  2. unneeded or unintended memory barriers
    Memory barriers are required in Java by the following language constructs:
    • synchronized - sets read and write barriers as needed (details depend on JVM, version, and settings)
    • volatile - sets a read barrier before a read to a volatile, and write barrier after a write
    • final - set a write barrier after the assignment
    • AtomicInteger, AtomicLong, etc - uses volatiles and hardware CAS instructions
  3. unnecessary, unintended, or repeated memory copy or access
    Memory copying is often seen in Java for example because of the lack of in-array pointers, or really just general unawareness and the expectation that "garbage collector will clean up the mess." Well, it does, but not without a price.
(Entire collections of books are dedicated to each of these topics, so I won't embarrass myself by going into more detail.)

Like any software project of reasonable size, HBase has problems of all the above categories.

Profiling in Java has become extremely convenient. Just start jVisualVM which ships with the SunOracleJDK, pick the process to profile (in my case a local HBase regionserver) and start profiling.

Over the past few weeks I did some on and off profiling in HBase, which lead to the following issues:

HBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Ironically here it was the collection of a performance metric that caused a measurable slowdown of up 15%(!) for very wide rows (> 10k columns).
The metric was maintained as an AtomicLong, which introduced a memory barrier in one of the hottest code paths in HBase.
The good folks at Facebook have found the same issue at roughly the same time. (It turns that they were also... uhm... the folks who introduced the problem.)

HBASE-6621 - Reduce calls to Bytes.toInt

A KeyValue (the data structure that represents "columns" in HBase) is currently backed by a single byte[]. The sizes of the various parts are encoded in this byte[] and have to read and decoded; each time an extra memory access. In many cases that can be avoided, leading to slight performance improvement.

HBASE-6711 - Avoid local results copy in StoreScanner

All references pertaining to a single row (i.e. KeyValue with the same row key) were copied at the StoreScanner layer. Removing this lead to another slight performance increase with wide rows.

HBASE-7180 - RegionScannerImpl.next() is inefficient

This introduces a mechanism for coprocessors to access RegionScanners at a lower level, thus allowing skipping of a lot of unnecessary setup for each next() call. In tight loops a coprocessor can make use of this new API to save another 10-15%.

HBASE-7279 - Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher

The row key of KeyValue was copied in the various scan related classes. To reduce that effect the row key was previously cached in the KeyValue class - leading to extra memory required for each KeyValue.
This change avoids all copying and hence also obviates the need for caching the row key.
A KeyValue now is hardly more than an array pointer (a byte[], an offset, and a length), and no data is copied any longer all the way from the block loaded from disk or cache to the RPC layer (unless the KeyValues are optionally encoded on disk, in which case they still need to be decoded in memory - we're working on improving that too).

Previously the size of a KeyValue on the scan path was at least 116 bytes + the length of the rowkey (which can be arbitrarily long). Now it is ~60 bytes, flat and including its own reference.
(remember during a course of a large scan we might be creating millions or even billions of KeyValue objects)

This is nice improvement both in term of scan performance (15-20% for small row keys of few bytes, much more for large ones) and in terms of produced garbage.
Since all copying is avoided, scanning now scales almost linearly with the number of cores.

HBASE-6852 - SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields

Other folks have been busy too. Here Cheng Hao found another problem with a scan related metric that caused a noticeable slowdown (even though I did not believe him first).
This removed another set of unnecessary memory barriers.

HBASE-7336 - HFileBlock.readAtOffset does not work well with multiple threads

This is slightly different issue caused by bad synchronization of the FSReader associated with a Storefile. There is only a single reader per storefile. So if the file's blocks are not cached - possibly because the scan indicated that it wants no caching, because it expects to touch too many blocks - the scanner threads are now competing for read access to the store file. That lead to outright terrible performance, such a scanners timing out even with just two scanners accessing the same file in tight loop.
This patch is a stop gap measure: Attempt to acquire the lock on the reader, if that failed switch to HDFS positional reads, which can read at an offset without affecting the state of the stream, and hence requires no locking.

Summary

Together these various changes can lead to ~40-50% scan performance improvement when using a single core. Even more when using multiple cores on the same machines (as is the case with HBase)

An entirely unscientific benchmark

20m rows, with two column families just a few dozen bytes each.

I performed two tests:

1. A scan that returns rows to the client

2. A scan that touches all rows via a filter but does not return anything to the client.

(This is useful to gauge the actual server side performance).


Further I tested with (1) no caching, all reads from disk (2) all data in the OS cache and (3) all data in HBase's block cache.


I compared 0.94.0 against the current 0.94 branch (what I will soon release as 0.94.4).


Results:

  • Scanning with scanner caching set to 10000:
    • 0.94.0
      no data in cache: 54s
      data in OS cache: 51s
      data in block cache: 35s

    • 0.94.4-snapshot
      no data in cache: 50s (IO bound between disk and network)
      data in OS cache: 43s
      data in block cache: 32s
      (limiting factor was shipping the results to the client)

  • all data filtered at the server (with a SingleValueColumnFilter that does not match anything, so each rows is still scanned)
    • 0.94.0
      no data in cache: 31s

      data in OS cache: 25s
      data in block cache: 11s

    • 0.94.4-snapshot
      no data in cache: 22s
      data in OS cache: 17s
      cache in block cache: 6.3s
I have not quantified the same with multiple concurrent scanners, yet.

So as you can see scan performance has significantly improved since 0.94.0.

Salesforce just hired some performance engineers from a well known chip manufacturer, and I plan to get some of their time to analyze HBase in even more details, to track down memory stalls, etc. 

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
What Are the Limitations of Using Views in MySQL?What Are the Limitations of Using Views in MySQL?May 14, 2025 am 12:10 AM

MySQLviewshavelimitations:1)Theydon'tsupportallSQLoperations,restrictingdatamanipulationthroughviewswithjoinsorsubqueries.2)Theycanimpactperformance,especiallywithcomplexqueriesorlargedatasets.3)Viewsdon'tstoredata,potentiallyleadingtooutdatedinforma

Securing Your MySQL Database: Adding Users and Granting PrivilegesSecuring Your MySQL Database: Adding Users and Granting PrivilegesMay 14, 2025 am 12:09 AM

ProperusermanagementinMySQLiscrucialforenhancingsecurityandensuringefficientdatabaseoperation.1)UseCREATEUSERtoaddusers,specifyingconnectionsourcewith@'localhost'or@'%'.2)GrantspecificprivilegeswithGRANT,usingleastprivilegeprincipletominimizerisks.3)

What Factors Influence the Number of Triggers I Can Use in MySQL?What Factors Influence the Number of Triggers I Can Use in MySQL?May 14, 2025 am 12:08 AM

MySQLdoesn'timposeahardlimitontriggers,butpracticalfactorsdeterminetheireffectiveuse:1)Serverconfigurationimpactstriggermanagement;2)Complextriggersincreasesystemload;3)Largertablesslowtriggerperformance;4)Highconcurrencycancausetriggercontention;5)M

MySQL: Is it safe to store BLOB?MySQL: Is it safe to store BLOB?May 14, 2025 am 12:07 AM

Yes,it'ssafetostoreBLOBdatainMySQL,butconsiderthesefactors:1)StorageSpace:BLOBscanconsumesignificantspace,potentiallyincreasingcostsandslowingperformance.2)Performance:LargerrowsizesduetoBLOBsmayslowdownqueries.3)BackupandRecovery:Theseprocessescanbe

MySQL: Adding a user through a PHP web interfaceMySQL: Adding a user through a PHP web interfaceMay 14, 2025 am 12:04 AM

Adding MySQL users through the PHP web interface can use MySQLi extensions. The steps are as follows: 1. Connect to the MySQL database and use the MySQLi extension. 2. Create a user, use the CREATEUSER statement, and use the PASSWORD() function to encrypt the password. 3. Prevent SQL injection and use the mysqli_real_escape_string() function to process user input. 4. Assign permissions to new users and use the GRANT statement.

MySQL: BLOB and other no-sql storage, what are the differences?MySQL: BLOB and other no-sql storage, what are the differences?May 13, 2025 am 12:14 AM

MySQL'sBLOBissuitableforstoringbinarydatawithinarelationaldatabase,whileNoSQLoptionslikeMongoDB,Redis,andCassandraofferflexible,scalablesolutionsforunstructureddata.BLOBissimplerbutcanslowdownperformancewithlargedata;NoSQLprovidesbetterscalabilityand

MySQL Add User: Syntax, Options, and Security Best PracticesMySQL Add User: Syntax, Options, and Security Best PracticesMay 13, 2025 am 12:12 AM

ToaddauserinMySQL,use:CREATEUSER'username'@'host'IDENTIFIEDBY'password';Here'showtodoitsecurely:1)Choosethehostcarefullytocontrolaccess.2)SetresourcelimitswithoptionslikeMAX_QUERIES_PER_HOUR.3)Usestrong,uniquepasswords.4)EnforceSSL/TLSconnectionswith

MySQL: How to avoid String Data Types common mistakes?MySQL: How to avoid String Data Types common mistakes?May 13, 2025 am 12:09 AM

ToavoidcommonmistakeswithstringdatatypesinMySQL,understandstringtypenuances,choosetherighttype,andmanageencodingandcollationsettingseffectively.1)UseCHARforfixed-lengthstrings,VARCHARforvariable-length,andTEXT/BLOBforlargerdata.2)Setcorrectcharacters

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),