Home > Article > Backend Development > Detailed introduction to the method of operating hbase data in Python
Configuring thrift
Package thrift used by python
The python compiler I use personally is pycharm community edition. In the project settings, find the project interpreter, in Under the corresponding project, find the package, then select "+" to add, search for hbase-thrift (Python client for HBase Thrift interface), and then install the package.
Install server-side thrift.
Refer to the official website, and you can also install it on your local machine for terminal use.
thrift Getting Started
You can also refer to the installation methodPython calls HBase example
First, install thrift
Download thrift, here, I use thrift -0.7.0-dev.tar.gz This version
tar xzf thrift-0.7.0-dev.tar.gz
cd thrift-0.7.0-dev
sudo ./configure –with-cpp =no –with-ruby=no
sudo make
sudo make install
Then, go to the HBase source package and find
src/main/resources/org/apache/hadoop/hbase/thrift /
Execute
thrift –gen py Hbase.thrift
mv gen-py/hbase/ /usr/lib/python2.4/site-packages/ (may vary depending on the python version)
Get Data example 1
# coding:utf-8 from thrift import Thrift from thrift.transport import TSocket from thrift.transport import TTransport from thrift.protocol import TBinaryProtocol from hbase import Hbase # from hbase.ttypes import ColumnDescriptor, Mutation, BatchMutation from hbase.ttypes import * import csv def client_conn(): # Make socket transport = TSocket.TSocket('hostname,like:localhost', port) # Buffering is critical. Raw sockets are very slow transport = TTransport.TBufferedTransport(transport) # Wrap in a protocol protocol = TBinaryProtocol.TBinaryProtocol(transport) # Create a client to use the protocol encoder client = Hbase.Client(protocol) # Connect! transport.open() return client if __name__ == "__main__": client = client_conn() # r = client.getRowWithColumns('table name', 'row name', ['column name']) # print(r[0].columns.get('column name')), type((r[0].columns.get('column name'))) result = client.getRow("table name","row name") data_simple =[] # print result[0].columns.items() for k, v in result[0].columns.items(): #.keys() #data.append((k,v)) # print type(k),type(v),v.value,,v.timestamp data_simple.append((v.timestamp, v.value)) writer.writerows(data) csvfile.close() csvfile_simple = open("data_xy_simple.csv", "wb") writer_simple = csv.writer(csvfile_simple) writer_simple.writerow(["timestamp", "value"]) writer_simple.writerows(data_simple) csvfile_simple.close() print "finished"
Those who know basic python should know that result is a list, and result[0].columns.items() is a dict key-value pair. You can check relevant information. Or by outputting the variable, observe the value and type of the variable.
Note: In the above program, transport.open() is linked. After execution, transport.close() needs to be disconnected.
Currently it only involves reading data, and it will continue in the future. Update other dbase operations.
The above is the detailed content of Detailed introduction to the method of operating hbase data in Python. For more information, please follow other related articles on the PHP Chinese website!