Hadoop 中利用地图reduce 读写 mysql 数据-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

Hadoop 中利用地图reduce 读写 mysql 数据

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:27 PM

hadoopmysqlreduceusemapnumberRead and write

Hadoop 中利用 mapreduce 读写 mysql 数据问题导读 1.hadoop mapreduce的通过哪两个类可以读取数据源？ 2.如果没有mysql驱动包，一般会是什么问题？ 3.如何添加包？有时候我们在项目中会遇到输入结果集很大，但是输出结果很小，比如一些 pv、uv 数据，然后

Hadoop 中利用 mapreduce 读写 mysql 数据

问题导读
1.hadoop mapreduce的通过哪两个类可以读取数据源？
2.如果没有mysql驱动包，一般会是什么问题？
3.如何添加包？

有时候我们在项目中会遇到输入结果集很大，但是输出结果很小，比如一些 pv、uv 数据，然后为了实时查询的需求，或者一些 OLAP 的需求，我们需要 mapreduce 与 mysql 进行数据的交互，而这些特性正是 hbase 或者 hive 目前亟待改进的地方。

好了言归正传，简单的说说背景、原理以及需要注意的地方：

1、为了方便 MapReduce 直接访问关系型数据库（Mysql,Oracle），Hadoop提供了DBInputFormat和DBOutputFormat两个类。通过DBInputFormat类把数据库表数据读入到HDFS，根据DBOutputFormat类把MapReduce产生的结果集导入到数据库表中。

2、由于0.20版本对DBInputFormat和DBOutputFormat支持不是很好，该例用了0.19版本来说明这两个类的用法。

至少在我的 0.20.203 中的 org.apache.hadoop.mapreduce.lib 下是没见到 db 包，所以本文也是以老版的 API 来为例说明的。

3、运行MapReduce时候报错：java.io.IOException: com.mysql.jdbc.Driver，一般是由于程序找不到mysql驱动包。解决方法是让每个tasktracker运行MapReduce程序时都可以找到该驱动包。

添加包有两种方式：

（1）在每个节点下的${HADOOP_HOME}/lib下添加该包。重启集群，一般是比较原始的方法。

（2）a)把包传到集群上： hadoop fs -put mysql-connector-java-5.1.0- bin.jar /hdfsPath/

? ?? ? b)在mr程序提交job前，添加语句：DistributedCache.addFileToClassPath(new Path(“/hdfsPath/mysql- connector-java- 5.1.0-bin.jar”), conf);

（3）虽然API用的是0.19的，但是使用0.20的API一样可用，只是会提示方法已过时而已。、

4、测试数据：

CREATE TABLE `t` (
`id` int DEFAULT NULL,
`name` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `t2` (
`id` int DEFAULT NULL,
`name` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into t values (1,"june"),(2,"decli"),(3,"hello"),
? ? ? ? (4,"june"),(5,"decli"),(6,"hello"),(7,"june"),
? ? ? ? (8,"decli"),(9,"hello"),(10,"june"),
? ? ? ? (11,"june"),(12,"decli"),(13,"hello");

复制代码

5、代码：

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Iterator;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.lib.IdentityReducer;
import org.apache.hadoop.mapred.lib.db.DBConfiguration;
import org.apache.hadoop.mapred.lib.db.DBInputFormat;
import org.apache.hadoop.mapred.lib.db.DBOutputFormat;
import org.apache.hadoop.mapred.lib.db.DBWritable;
/**
* Function: 测试 mr 与 mysql 的数据交互，此测试用例将一个表中的数据复制到另一张表中
* ? ? ? ? ? ? ? ? ? ? ? ???实际当中，可能只需要从 mysql 读，或者写到 mysql 中。
* date: 2013-7-29 上午2:34:04
* @author june
*/
public class Mysql2Mr {
? ? ? ? // DROP TABLE IF EXISTS `hadoop`.`studentinfo`;
? ? ? ? // CREATE TABLE studentinfo (
? ? ? ? // id INTEGER NOT NULL PRIMARY KEY,
? ? ? ? // name VARCHAR(32) NOT NULL);
? ? ? ? public static class StudentinfoRecord implements Writable, DBWritable {
? ? ? ? ? ? ? ? int id;
? ? ? ? ? ? ? ? String name;
? ? ? ? ? ? ? ? public StudentinfoRecord() {
? ? ? ? ? ? ? ? }
? ? ? ? ? ? ? ? public void readFields(DataInput in) throws IOException {
? ? ? ? ? ? ? ? ? ? ? ? this.id = in.readInt();
? ? ? ? ? ? ? ? ? ? ? ? this.name = Text.readString(in);
? ? ? ? ? ? ? ? }
? ? ? ? ? ? ? ? public String toString() {
? ? ? ? ? ? ? ? ? ? ? ? return new String(this.id + " " + this.name);
? ? ? ? ? ? ? ? }
? ? ? ? ? ? ? ? @Override
? ? ? ? ? ? ? ? public void write(PreparedStatement stmt) throws SQLException {
? ? ? ? ? ? ? ? ? ? ? ? stmt.setInt(1, this.id);
? ? ? ? ? ? ? ? ? ? ? ? stmt.setString(2, this.name);
? ? ? ? ? ? ? ? }
? ? ? ? ? ? ? ? @Override
? ? ? ? ? ? ? ? public void readFields(ResultSet result) throws SQLException {
? ? ? ? ? ? ? ? ? ? ? ? this.id = result.getInt(1);
? ? ? ? ? ? ? ? ? ? ? ? this.name = result.getString(2);
? ? ? ? ? ? ? ? }
? ? ? ? ? ? ? ? @Override
? ? ? ? ? ? ? ? public void write(DataOutput out) throws IOException {
? ? ? ? ? ? ? ? ? ? ? ? out.writeInt(this.id);
? ? ? ? ? ? ? ? ? ? ? ? Text.writeString(out, this.name);
? ? ? ? ? ? ? ? }
? ? ? ? }
? ? ? ? // 记住此处是静态内部类，要不然你自己实现无参构造器，或者等着抛异常：
? ? ? ? // Caused by: java.lang.NoSuchMethodException: DBInputMapper.()
? ? ? ? // http://stackoverflow.com/questions/7154125/custom-mapreduce-input-format-cant-find-constructor
? ? ? ? // 网上脑残式的转帖，没见到一个写对的。。。
? ? ? ? public static class DBInputMapper extends MapReduceBase implements
? ? ? ? ? ? ? ? ? ? ? ? Mapper {
? ? ? ? ? ? ? ? public void map(LongWritable key, StudentinfoRecord value,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? OutputCollector collector, Reporter reporter) throws IOException {
? ? ? ? ? ? ? ? ? ? ? ? collector.collect(new LongWritable(value.id), new Text(value.toString()));
? ? ? ? ? ? ? ? }
? ? ? ? }
? ? ? ? public static class MyReducer extends MapReduceBase implements
? ? ? ? ? ? ? ? ? ? ? ? Reducer {
? ? ? ? ? ? ? ? @Override
? ? ? ? ? ? ? ? public void reduce(LongWritable key, Iterator values,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? OutputCollector output, Reporter reporter) throws IOException {
? ? ? ? ? ? ? ? ? ? ? ? String[] splits = values.next().toString().split(" ");
? ? ? ? ? ? ? ? ? ? ? ? StudentinfoRecord r = new StudentinfoRecord();
? ? ? ? ? ? ? ? ? ? ? ? r.id = Integer.parseInt(splits[0]);
? ? ? ? ? ? ? ? ? ? ? ? r.name = splits[1];
? ? ? ? ? ? ? ? ? ? ? ? output.collect(r, new Text(r.name));
? ? ? ? ? ? ? ? }
? ? ? ? }
? ? ? ? public static void main(String[] args) throws IOException {
? ? ? ? ? ? ? ? JobConf conf = new JobConf(Mysql2Mr.class);
? ? ? ? ? ? ? ? DistributedCache.addFileToClassPath(new Path("/tmp/mysql-connector-java-5.0.8-bin.jar"), conf);
? ? ? ? ? ? ? ? conf.setMapOutputKeyClass(LongWritable.class);
? ? ? ? ? ? ? ? conf.setMapOutputValueClass(Text.class);
? ? ? ? ? ? ? ? conf.setOutputKeyClass(LongWritable.class);
? ? ? ? ? ? ? ? conf.setOutputValueClass(Text.class);
? ? ? ? ? ? ? ? conf.setOutputFormat(DBOutputFormat.class);
? ? ? ? ? ? ? ? conf.setInputFormat(DBInputFormat.class);
? ? ? ? ? ? ? ? // // mysql to hdfs
? ? ? ? ? ? ? ? // conf.setReducerClass(IdentityReducer.class);
? ? ? ? ? ? ? ? // Path outPath = new Path("/tmp/1");
? ? ? ? ? ? ? ? // FileSystem.get(conf).delete(outPath, true);
? ? ? ? ? ? ? ? // FileOutputFormat.setOutputPath(conf, outPath);
? ? ? ? ? ? ? ? DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver", "jdbc:mysql://192.168.1.101:3306/test",
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "root", "root");
? ? ? ? ? ? ? ? String[] fields = { "id", "name" };
? ? ? ? ? ? ? ? // 从 t 表读数据
? ? ? ? ? ? ? ? DBInputFormat.setInput(conf, StudentinfoRecord.class, "t", null, "id", fields);
? ? ? ? ? ? ? ? // mapreduce 将数据输出到 t2 表
? ? ? ? ? ? ? ? DBOutputFormat.setOutput(conf, "t2", "id", "name");
? ? ? ? ? ? ? ? // conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);
? ? ? ? ? ? ? ? conf.setMapperClass(DBInputMapper.class);
? ? ? ? ? ? ? ? conf.setReducerClass(MyReducer.class);
? ? ? ? ? ? ? ? JobClient.runJob(conf);
? ? ? ? }
}

复制代码

6、结果：

执行两次后，你可以看到mysql结果：

mysql> select * from t2;
+------+-------+
| id? ?| name??|
+------+-------+
|? ? 1 | june??|
|? ? 2 | decli |
|? ? 3 | hello |
|? ? 4 | june??|
|? ? 5 | decli |
|? ? 6 | hello |
|? ? 7 | june??|
|? ? 8 | decli |
|? ? 9 | hello |
|? ?10 | june??|
|? ?11 | june??|
|? ?12 | decli |
|? ?13 | hello |
|? ? 1 | june??|
|? ? 2 | decli |
|? ? 3 | hello |
|? ? 4 | june??|
|? ? 5 | decli |
|? ? 6 | hello |
|? ? 7 | june??|
|? ? 8 | decli |
|? ? 9 | hello |
|? ?10 | june??|
|? ?11 | june??|
|? ?12 | decli |
|? ?13 | hello |
+------+-------+
26 rows in set (0.00 sec)
mysql>

复制代码

7、日志：

13/07/29 02:33:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/29 02:33:03 INFO filecache.TrackerDistributedCacheManager: Creating mysql-connector-java-5.0.8-bin.jar in /tmp/hadoop-june/mapred/local/archive/-8943686319031389138_-1232673160_640840668/192.168.1.101/tmp-work--8372797484204470322 with rwxr-xr-x
13/07/29 02:33:03 INFO filecache.TrackerDistributedCacheManager: Cached hdfs://192.168.1.101:9000/tmp/mysql-connector-java-5.0.8-bin.jar as /tmp/hadoop-june/mapred/local/archive/-8943686319031389138_-1232673160_640840668/192.168.1.101/tmp/mysql-connector-java-5.0.8-bin.jar
13/07/29 02:33:03 INFO filecache.TrackerDistributedCacheManager: Cached hdfs://192.168.1.101:9000/tmp/mysql-connector-java-5.0.8-bin.jar as /tmp/hadoop-june/mapred/local/archive/-8943686319031389138_-1232673160_640840668/192.168.1.101/tmp/mysql-connector-java-5.0.8-bin.jar
13/07/29 02:33:03 INFO mapred.JobClient: Running job: job_local_0001
13/07/29 02:33:03 INFO mapred.MapTask: numReduceTasks: 1
13/07/29 02:33:03 INFO mapred.MapTask: io.sort.mb = 100
13/07/29 02:33:03 INFO mapred.MapTask: data buffer = 79691776/99614720
13/07/29 02:33:03 INFO mapred.MapTask: record buffer = 262144/327680
13/07/29 02:33:03 INFO mapred.MapTask: Starting flush of map output
13/07/29 02:33:03 INFO mapred.MapTask: Finished spill 0
13/07/29 02:33:03 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/07/29 02:33:04 INFO mapred.JobClient:??map 0% reduce 0%
13/07/29 02:33:06 INFO mapred.LocalJobRunner:?
13/07/29 02:33:06 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/07/29 02:33:06 INFO mapred.LocalJobRunner:?
13/07/29 02:33:06 INFO mapred.Merger: Merging 1 sorted segments
13/07/29 02:33:06 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 235 bytes
13/07/29 02:33:06 INFO mapred.LocalJobRunner:?
13/07/29 02:33:06 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
13/07/29 02:33:07 INFO mapred.JobClient:??map 100% reduce 0%
13/07/29 02:33:09 INFO mapred.LocalJobRunner: reduce > reduce
13/07/29 02:33:09 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
13/07/29 02:33:09 WARN mapred.FileOutputCommitter: Output path is null in cleanup
13/07/29 02:33:10 INFO mapred.JobClient:??map 100% reduce 100%
13/07/29 02:33:10 INFO mapred.JobClient: Job complete: job_local_0001
13/07/29 02:33:10 INFO mapred.JobClient: Counters: 18
13/07/29 02:33:10 INFO mapred.JobClient:? ?File Input Format Counters?
13/07/29 02:33:10 INFO mapred.JobClient:? ???Bytes Read=0
13/07/29 02:33:10 INFO mapred.JobClient:? ?File Output Format Counters?
13/07/29 02:33:10 INFO mapred.JobClient:? ???Bytes Written=0
13/07/29 02:33:10 INFO mapred.JobClient:? ?FileSystemCounters
13/07/29 02:33:10 INFO mapred.JobClient:? ???FILE_BYTES_READ=1211691
13/07/29 02:33:10 INFO mapred.JobClient:? ???HDFS_BYTES_READ=1081704
13/07/29 02:33:10 INFO mapred.JobClient:? ???FILE_BYTES_WRITTEN=2392844
13/07/29 02:33:10 INFO mapred.JobClient:? ?Map-Reduce Framework
13/07/29 02:33:10 INFO mapred.JobClient:? ???Map output materialized bytes=239
13/07/29 02:33:10 INFO mapred.JobClient:? ???Map input records=13
13/07/29 02:33:10 INFO mapred.JobClient:? ???Reduce shuffle bytes=0
13/07/29 02:33:10 INFO mapred.JobClient:? ???Spilled Records=26
13/07/29 02:33:10 INFO mapred.JobClient:? ???Map output bytes=207
13/07/29 02:33:10 INFO mapred.JobClient:? ???Map input bytes=13
13/07/29 02:33:10 INFO mapred.JobClient:? ???SPLIT_RAW_BYTES=75
13/07/29 02:33:10 INFO mapred.JobClient:? ???Combine input records=0
13/07/29 02:33:10 INFO mapred.JobClient:? ???Reduce input records=13
13/07/29 02:33:10 INFO mapred.JobClient:? ???Reduce input groups=13
13/07/29 02:33:10 INFO mapred.JobClient:? ???Combine output records=0
13/07/29 02:33:10 INFO mapred.JobClient:? ???Reduce output records=13
13/07/29 02:33:10 INFO mapred.JobClient:? ???Map output records=13

复制代码

MapReduce直接连接Mysql获取数据

Mysql中数据：

mysql> select * from lxw_tbls;
+---------------------+----------------+
| TBL_NAME? ?? ?? ?? ?| TBL_TYPE? ?? ? |
+---------------------+----------------+
| lxw_test_table? ?? ?| EXTERNAL_TABLE |
| lxw_t? ?? ?? ?? ?? ?| MANAGED_TABLE??|
| lxw_t1? ?? ?? ?? ???| MANAGED_TABLE??|
| tt? ?? ?? ?? ?? ?? ?| MANAGED_TABLE??|
| tab_partition? ?? ? | MANAGED_TABLE??|
| lxw_hbase_table_1? ?| MANAGED_TABLE??|
| lxw_hbase_user_info | MANAGED_TABLE??|
| t? ?? ?? ?? ?? ?? ? | EXTERNAL_TABLE |
| lxw_jobid? ?? ?? ???| MANAGED_TABLE??|
+---------------------+----------------+
9 rows in set (0.01 sec)
mysql> select * from lxw_tbls where TBL_NAME like 'lxw%' order by TBL_NAME;
+---------------------+----------------+
| TBL_NAME? ?? ?? ?? ?| TBL_TYPE? ?? ? |
+---------------------+----------------+
| lxw_hbase_table_1? ?| MANAGED_TABLE??|
| lxw_hbase_user_info | MANAGED_TABLE??|
| lxw_jobid? ?? ?? ???| MANAGED_TABLE??|
| lxw_t? ?? ?? ?? ?? ?| MANAGED_TABLE??|
| lxw_t1? ?? ?? ?? ???| MANAGED_TABLE??|
| lxw_test_table? ?? ?| EXTERNAL_TABLE |
+---------------------+----------------+
6 rows in set (0.00 sec)

复制代码

MapReduce程序代码，ConnMysql.java:

package com.lxw.study;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.net.URI;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBInputFormat;
import org.apache.hadoop.mapreduce.lib.db.DBWritable;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class ConnMysql {
? ?? ???
? ?? ???private static Configuration conf = new Configuration();
? ?? ???
? ?? ???static {
? ?? ?? ?? ?? ? conf.addResource(new Path("F:/lxw-hadoop/hdfs-site.xml"));
? ?? ?? ?? ?? ? conf.addResource(new Path("F:/lxw-hadoop/mapred-site.xml"));
? ?? ?? ?? ?? ? conf.addResource(new Path("F:/lxw-hadoop/core-site.xml"));
? ?? ?? ?? ?? ? conf.set("mapred.job.tracker", "10.133.103.21:50021");
? ?? ???}
? ?? ???
? ?? ???public static class TblsRecord implements Writable, DBWritable {
? ?? ?? ?? ?? ? String tbl_name;
? ?? ?? ?? ?? ? String tbl_type;
? ?? ?? ?? ?? ? public TblsRecord() {
? ?? ?? ?? ?? ? }
? ?? ?? ?? ?? ? @Override
? ?? ?? ?? ?? ? public void write(PreparedStatement statement) throws SQLException {
? ?? ?? ?? ?? ?? ?? ?? ?// TODO Auto-generated method stub
? ?? ?? ?? ?? ?? ?? ?? ?statement.setString(1, this.tbl_name);
? ?? ?? ?? ?? ?? ?? ?? ?statement.setString(2, this.tbl_type);
? ?? ?? ?? ?? ? }
? ?? ?? ?? ?? ? @Override
? ?? ?? ?? ?? ? public void readFields(ResultSet resultSet) throws SQLException {
? ?? ?? ?? ?? ?? ?? ?? ?// TODO Auto-generated method stub
? ?? ?? ?? ?? ?? ?? ?? ?this.tbl_name = resultSet.getString(1);
? ?? ?? ?? ?? ?? ?? ?? ?this.tbl_type = resultSet.getString(2);
? ?? ?? ?? ?? ? }
? ?? ?? ?? ?? ? @Override
? ?? ?? ?? ?? ? public void write(DataOutput out) throws IOException {
? ?? ?? ?? ?? ?? ?? ?? ?// TODO Auto-generated method stub
? ?? ?? ?? ?? ?? ?? ?? ?Text.writeString(out, this.tbl_name);
? ?? ?? ?? ?? ?? ?? ?? ?Text.writeString(out, this.tbl_type);
? ?? ?? ?? ?? ? }
? ?? ?? ?? ?? ? @Override
? ?? ?? ?? ?? ? public void readFields(DataInput in) throws IOException {
? ?? ?? ?? ?? ?? ?? ?? ?// TODO Auto-generated method stub
? ?? ?? ?? ?? ?? ?? ?? ?this.tbl_name = Text.readString(in);
? ?? ?? ?? ?? ?? ?? ?? ?this.tbl_type = Text.readString(in);
? ?? ?? ?? ?? ? }
? ?? ?? ?? ?? ? public String toString() {
? ?? ?? ?? ?? ?? ?? ?? ?return new String(this.tbl_name + " " + this.tbl_type);
? ?? ?? ?? ?? ? }
? ?? ???}
? ?? ???public static class ConnMysqlMapper extends Mapper {
? ?? ?? ?? ?? ? public void map(LongWritable key,TblsRecord values,Context context)?
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???throws IOException,InterruptedException {
? ?? ?? ?? ?? ?? ?? ?? ?context.write(new Text(values.tbl_name), new Text(values.tbl_type));
? ?? ?? ?? ?? ? }
? ?? ???}
? ?? ???
? ?? ???public static class ConnMysqlReducer extends Reducer {
? ?? ?? ?? ?? ? public void reduce(Text key,Iterable values,Context context)?
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???throws IOException,InterruptedException {
? ?? ?? ?? ?? ?? ?? ?? ?for(Iterator itr = values.iterator();itr.hasNext();) {
? ?? ?? ?? ?? ?? ?? ?? ?? ?? ???context.write(key, itr.next());
? ?? ?? ?? ?? ?? ?? ?? ?}
? ?? ?? ?? ?? ? }
? ?? ???}
? ?? ???
? ?? ???public static void main(String[] args) throws Exception {
? ?? ?? ?? ?? ? Path output = new Path("/user/lxw/output/");
? ?? ?? ?? ?? ??
? ?? ?? ?? ?? ? FileSystem fs = FileSystem.get(URI.create(output.toString()), conf);
? ?? ?? ?? ?? ? if (fs.exists(output)) {
? ?? ?? ?? ?? ?? ?? ?? ?fs.delete(output);
? ?? ?? ?? ?? ? }
? ?? ?? ?? ?? ??
? ?? ?? ?? ?? ? //mysql的jdbc驱动
? ?? ?? ?? ?? ? DistributedCache.addFileToClassPath(new Path(??
? ?? ?? ?? ?? ?? ?? ?? ???"hdfs://hd022-test.nh.sdo.com/user/liuxiaowen/mysql-connector-java-5.1.13-bin.jar"), conf);??
? ?? ?? ?? ?? ??
? ?? ?? ?? ?? ? DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver",??
? ?? ?? ?? ?? ?? ?? ?? ???"jdbc:mysql://10.133.103.22:3306/hive", "hive", "hive");??
? ?? ?? ?? ?? ??
? ?? ?? ?? ?? ? Job job = new Job(conf,"test mysql connection");
? ?? ?? ?? ?? ? job.setJarByClass(ConnMysql.class);
? ?? ?? ?? ?? ??
? ?? ?? ?? ?? ? job.setMapperClass(ConnMysqlMapper.class);
? ?? ?? ?? ?? ? job.setReducerClass(ConnMysqlReducer.class);
? ?? ?? ?? ?? ??
? ?? ?? ?? ?? ? job.setOutputKeyClass(Text.class);
? ?? ?? ?? ?? ? job.setOutputValueClass(Text.class);
? ?? ?? ?? ?? ??
? ?? ?? ?? ?? ? job.setInputFormatClass(DBInputFormat.class);
? ?? ?? ?? ?? ? FileOutputFormat.setOutputPath(job, output);
? ?? ?? ?? ?? ??
? ?? ?? ?? ?? ? //列名
? ?? ?? ?? ?? ? String[] fields = { "TBL_NAME", "TBL_TYPE" };?
? ?? ?? ?? ?? ? //六个参数分别为：
? ?? ?? ?? ?? ? //1.Job;2.Class extends DBWritable>
? ?? ?? ?? ?? ? //3.表名;4.where条件
? ?? ?? ?? ?? ? //5.order by语句;6.列名
? ?? ?? ?? ?? ? DBInputFormat.setInput(job, TblsRecord.class,
? ?? ?? ?? ?? ?? ?? ?"lxw_tbls", "TBL_NAME like 'lxw%'", "TBL_NAME", fields);??
? ?? ?? ?? ?? ??
? ?? ?? ?? ?? ? System.exit(job.waitForCompletion(true) ? 0 : 1);
? ?? ???}
? ?? ???
}

复制代码

运行结果：

[lxw@hd025-test ~]$ hadoop fs -cat /user/lxw/output/part-r-00000
lxw_hbase_table_1? ?? ? MANAGED_TABLE
lxw_hbase_user_info? ???MANAGED_TABLE
lxw_jobid? ?? ? MANAGED_TABLE
lxw_t? ?MANAGED_TABLE
lxw_t1??MANAGED_TABLE
lxw_test_table??EXTERNAL_TABLE

复制代码

http://www.aboutyun.com/forum.php?highlight=MapReduce+MySQL&mod=viewthread&tid=7405

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the InnoDB Buffer Pool and its importance for performance.Apr 19, 2025 am 12:24 AM

InnoDBBufferPool reduces disk I/O by caching data and indexing pages, improving database performance. Its working principle includes: 1. Data reading: Read data from BufferPool; 2. Data writing: After modifying the data, write to BufferPool and refresh it to disk regularly; 3. Cache management: Use the LRU algorithm to manage cache pages; 4. Reading mechanism: Load adjacent data pages in advance. By sizing the BufferPool and using multiple instances, database performance can be optimized.

MySQL vs. Other Programming Languages: A ComparisonApr 19, 2025 am 12:22 AM

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

Learning MySQL: A Step-by-Step Guide for New UsersApr 19, 2025 am 12:19 AM

MySQL is worth learning because it is a powerful open source database management system suitable for data storage, management and analysis. 1) MySQL is a relational database that uses SQL to operate data and is suitable for structured data management. 2) The SQL language is the key to interacting with MySQL and supports CRUD operations. 3) The working principle of MySQL includes client/server architecture, storage engine and query optimizer. 4) Basic usage includes creating databases and tables, and advanced usage involves joining tables using JOIN. 5) Common errors include syntax errors and permission issues, and debugging skills include checking syntax and using EXPLAIN commands. 6) Performance optimization involves the use of indexes, optimization of SQL statements and regular maintenance of databases.

MySQL: Essential Skills for Beginners to MasterApr 18, 2025 am 12:24 AM

MySQL is suitable for beginners to learn database skills. 1. Install MySQL server and client tools. 2. Understand basic SQL queries, such as SELECT. 3. Master data operations: create tables, insert, update, and delete data. 4. Learn advanced skills: subquery and window functions. 5. Debugging and optimization: Check syntax, use indexes, avoid SELECT*, and use LIMIT.

MySQL: Structured Data and Relational DatabasesApr 18, 2025 am 12:22 AM

MySQL efficiently manages structured data through table structure and SQL query, and implements inter-table relationships through foreign keys. 1. Define the data format and type when creating a table. 2. Use foreign keys to establish relationships between tables. 3. Improve performance through indexing and query optimization. 4. Regularly backup and monitor databases to ensure data security and performance optimization.

MySQL: Key Features and Capabilities ExplainedApr 18, 2025 am 12:17 AM

MySQL is an open source relational database management system that is widely used in Web development. Its key features include: 1. Supports multiple storage engines, such as InnoDB and MyISAM, suitable for different scenarios; 2. Provides master-slave replication functions to facilitate load balancing and data backup; 3. Improve query efficiency through query optimization and index use.

The Purpose of SQL: Interacting with MySQL DatabasesApr 18, 2025 am 12:12 AM

SQL is used to interact with MySQL database to realize data addition, deletion, modification, inspection and database design. 1) SQL performs data operations through SELECT, INSERT, UPDATE, DELETE statements; 2) Use CREATE, ALTER, DROP statements for database design and management; 3) Complex queries and data analysis are implemented through SQL to improve business decision-making efficiency.

MySQL for Beginners: Getting Started with Database ManagementApr 18, 2025 am 12:10 AM

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software