>  기사  >  데이터 베이스  >  Filters in HBase (or intra row scanning part II)

Filters in HBase (or intra row scanning part II)

WBOY
WBOY원래의
2016-06-07 16:26:281130검색

Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...). Intras row scanning can be done using ColumnRa

Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...).

Intras row scanning can be done using ColumnRangeFilter. Other filters such as ColumnPrefixFilter or MultipleColumnPrefixFilter might also be handy for this. All three filters have in common that they can provide scanners (see scanning in hbase) with what I will call "seek hints". These hints allow a scanner to seek to the next column, the next row, or an arbitrary next cell determined by the filter. This is far more efficient than having a dumb filter that is passed each cell and determines whether the cell is included in the result or not.

Many other filters also provide these "seek hints". The exception here are filters that filter on column values, as there is no inherent ordering between column values; these filters need to look at the value for each column.

For example check out this code in MultipleColumnPrefixFilter (ASF 2.0 license):
    TreeSet lesserOrEqualPrefixes =
      (TreeSet) sortedPrefixes.headSet(qualifier, true);
    if (lesserOrEqualPrefixes.size() != 0) {
      byte [] largestPrefixSmallerThanQualifier = lesserOrEqualPrefixes.last();
      if (Bytes.startsWith(qualifier, largestPrefixSmallerThanQualifier)) {
        return ReturnCode.INCLUDE;
      }
      if (lesserOrEqualPrefixes.size() == sortedPrefixes.size()) {
        return ReturnCode.NEXT_ROW;
      } else {
        hint = sortedPrefixes.higher(largestPrefixSmallerThanQualifier);
        return ReturnCode.SEEK_NEXT_USING_HINT;
      }
    } else {
      hint = sortedPrefixes.first();
      return ReturnCode.SEEK_NEXT_USING_HINT;
    }

(the is used later to skip ahead to that column prefix)

See how this code snippet allows the filter to
  1. seek to the next row if all prefixes are know to be less or equal the current qualifier (and the largest didn't match the passed column qualifier). Note that a single seek to the next row can potentially skip millions of columns with a single seek operation.
  2. seek to the next larger prefix if there are more prefixes, but the current does not match the qualifier.
  3. seek to the first prefix (the smallest) if none the prefixes are less or equal to the current qualifier.
If you didn't feel like looking at the code, you can take away from this that these filters can be safely and efficiently used in very wide rows. If the filter instead would indicate only INCLUDE or SKIP and be forced to visit/examine every version of every column of every row, it would be inefficient to use for wide rows with hundreds of thousands or millions of columns.

I'm in the process of adding more information for these Filter to the HBase Book Reference Guide.
성명:
본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.