Home  >  Article  >  Backend Development  >  Detailed introduction to the method of operating hbase data in Python

Detailed introduction to the method of operating hbase data in Python

高洛峰
高洛峰Original
2017-03-24 17:29:592783browse

Configuring thrift
Package thrift used by python
The python compiler I use personally is pycharm community edition. In the project settings, find the project interpreter, in Under the corresponding project, find the package, then select "+" to add, search for hbase-thrift (Python client for HBase Thrift interface), and then install the package.
Install server-side thrift.
Refer to the official website, and you can also install it on your local machine for terminal use.
thrift Getting Started
You can also refer to the installation methodPython calls HBase example
First, install thrift
Download thrift, here, I use thrift -0.7.0-dev.tar.gz This version
tar xzf thrift-0.7.0-dev.tar.gz
cd thrift-0.7.0-dev
sudo ./configure –with-cpp =no –with-ruby=no
sudo make
sudo make install
Then, go to the HBase source package and find
src/main/resources/org/apache/hadoop/hbase/thrift /
Execute
thrift –gen py Hbase.thrift
mv gen-py/hbase/ /usr/lib/python2.4/site-packages/ (may vary depending on the python version)
Get Data example 1

# coding:utf-8
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
# from hbase.ttypes import ColumnDescriptor, Mutation, BatchMutation
from hbase.ttypes import *
import csv
def client_conn():
 # Make socket
 transport = TSocket.TSocket('hostname,like:localhost', port)
 # Buffering is critical. Raw sockets are very slow
 transport = TTransport.TBufferedTransport(transport)
 # Wrap in a protocol
 protocol = TBinaryProtocol.TBinaryProtocol(transport)
 # Create a client to use the protocol encoder
 client = Hbase.Client(protocol)
 # Connect!
 transport.open()
 return client
if __name__ == "__main__":
 client = client_conn()
 # r = client.getRowWithColumns('table name', 'row name', ['column name'])
 # print(r[0].columns.get('column name')), type((r[0].columns.get('column name')))
 result = client.getRow("table name","row name")
 data_simple =[]
 # print result[0].columns.items()
 for k, v in result[0].columns.items(): #.keys()
  #data.append((k,v))
  # print type(k),type(v),v.value,,v.timestamp
  data_simple.append((v.timestamp, v.value))
 writer.writerows(data)
 csvfile.close()
 csvfile_simple = open("data_xy_simple.csv", "wb")
 writer_simple = csv.writer(csvfile_simple)
 writer_simple.writerow(["timestamp", "value"])
 writer_simple.writerows(data_simple)
 csvfile_simple.close()
 print "finished"

Those who know basic python should know that result is a list, and result[0].columns.items() is a dict key-value pair. You can check relevant information. Or by outputting the variable, observe the value and type of the variable.
Note: In the above program, transport.open() is linked. After execution, transport.close() needs to be disconnected.
Currently it only involves reading data, and it will continue in the future. Update other dbase operations.

The above is the detailed content of Detailed introduction to the method of operating hbase data in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn