Home  >  Article  >  Backend Development  >  Detailed explanation of some usage issues and related precautions based on the HBase Thrift interface_PHP tutorial

Detailed explanation of some usage issues and related precautions based on the HBase Thrift interface_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:09:47832browse

HBase provides Thrift interface support for non-Java languages. Here, based on the experience of using the HBase Thrift interface (HBase version 0.92.1), we summarize some of the problems encountered and related precautions.
1. The storage order of bytes
In HBase, because rows (row key and column family, column qualifier, time stamp) are sorted in lexicographic order , Therefore, for short, int, long and other types of data, after being converted into byte arrays through Bytes.toBytes(...), they must be stored in big-endian mode (high byte at low address, low byte at high address). The same is true for value. Therefore, when using the Thrift API (C++, Php, Python, etc.), it is best to pack and unpack rows and values ​​in a unified big-endian manner.
For example, in C++, for int type variables, they are converted to dictionary order in the following way:

Copy the code The code is as follows:

string key;
int32_t timestamp = 1352563200;
const char* pTs =(const char*) ×tamp;
size_t n = sizeof(int32_t);
key.append(pTs , n);

Convert lexicographic order to int in the following way:
Copy code The code is as follows:

const char * ts = key.c_str();
int32_t timestamp = *((int32_t*)(ts));

Php provides pack and unpack Method to convert:
Copy code The code is as follows:

$key = pack("N", $num);
$num = unpack("N", $key);

2. TScan usage traps
HBase’s PHP Thrift interface In TScan, you can directly set the startRow, stopRow, columns, filter and other attributes. By default, these attributes are null and become non-null after setting (through the constructor of TScan or directly assigning values ​​to the member variables of TScan). When performing RPC operations with Thrift Server through the write() method, the direct judgment is based on the fact that these attributes are not null, and they are transmitted to the Thrift Server through the Thrift protocol.
But in the Thrift interface of C++, there is a variable of type _TScan__isset __isset in TScan, whose internal structure is as follows:
Copy code The code is as follows:

typedef struct _TScan__isset {
_TScan__isset() : startRow(false), stopRow(false), timestamp(false), columns(false), caching(false), filterString(false) { }
bool startRow;
bool stopRow;
bool timestamp;
bool columns;
bool caching;
bool filterString;
} _TScan__isset;

TScan’s write() method determines whether each bool variable tag under _TScan__isset is set with attributes such as startRow, stopRow, columns, filter, etc., and determines whether to transmit these attributes to the Thrift Server through the Thrift protocol, and these Attributes must be set through the __set_xxx() method to take effect! In the default constructor of TScan, the __isset tag corresponding to these attributes is not set to true!
Therefore, if you directly initialize startRow, stopRow, columns, filter and other attributes through the constructor of TScan, the table will be traversed from the beginning. Only when the __set_xxx() method is called will the corresponding bool flag be set to true, so that the service The terminal will obtain startRow, stopRow, columns, filter and other attributes from the Thrift Server for scanning.
3. Number of concurrent access threads
First of all, in order to minimize the time overhead caused by network transmission, HBase’s Thrift Server is best deployed with the application client on the same machine. When Thrift Server starts, you can configure the number of concurrent threads through parameters, otherwise it will easily cause Thrift Server threads to be full and not respond to client read and write requests. Specific command: bin/hbase-daemon.sh start thrift --threadpool -m 200 -w 500 (For more parameters, refer here: bin/hbase-daemon.sh start thrift -h).
4. Maximum heap memory configuration
If the client performs a scan operation with the Thrift Server to sequentially read data, and a certain number of cache records is set (through TScan int32_t caching variable setting), then the number of cached records may occupy a considerable part of the Thrift Server's heap memory, especially when multiple clients access concurrently.
Therefore, before starting the Thrift Server, you can increase the maximum heap memory, otherwise the process may be killed due to the java.lang.OutOfMemoryError exception, especially when a larger number of cache records is set during Scan. (The default is export HBASE_HEAPSIZE=1000MB, which can be set in conf/hbase-env.sh).

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/327254.htmlTechArticleHBase provides Thrift interface support for non-Java languages. This is combined with the HBase Thrift interface (HBase version is 0.92.1 ), summarize some of the problems encountered and related precautions...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn