hadoop实例---多表关联-mysql教程-PHP中文网

首页

数据库

mysql教程

hadoop实例---多表关联

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:31 PM

hadoop关联实例类

多表关联和单表关联类似，它也是通过对原始数据进行一定的处理，从其中挖掘出关心的信息。如下输入的是两个文件，一个代表工厂表，包含工厂名列和地址编号列；另一个代表地址表，包含地址名列和地址编号列。要求从输入数据中找出工厂名和地址名的对应关系，

多表关联和单表关联类似，它也是通过对原始数据进行一定的处理，从其中挖掘出关心的信息。如下

输入的是两个文件，一个代表工厂表，包含工厂名列和地址编号列；另一个代表地址表，包含地址名列和地址编号列。要求从输入数据中找出工厂名和地址名的对应关系，输出工厂名-地址名表

样本如下：

factory:

factoryname addressed
Beijing Red Star 1
Shenzhen Thunder 3
Guangzhou Honda 2
Beijing Rising 1
Guangzhou Development Bank 2
Tencent 3
Back of Beijing 1

address:

addressID addressname
1 Beijing
2 Guangzhou
3 Shenzhen
4 Xian

结果：

factoryname     addressname
Beijing Red Star        Beijing
Beijing Rising  Beijing
Bank of Beijing         Beijing
Guangzhou Honda         Guangzhou
Guangzhou Development Bank      Guangzhou
Shenzhen Thunder        Shenzhen
Tencent         Shenzhen

代码如下：

import java.io.IOException;
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class MTjoin {
    public static int time = 0;
    /*
     * 在map中先区分输入行属于左表还是右表，然后对两列值进行分割，
     * 保存连接列在key值，剩余列和左右表标志在value中，最后输出
     */
    public static class Map extends Mapper {
        // 实现map函数
        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            String line = value.toString();// 每行文件
            String relationtype = new String();// 左右表标识
            // 输入文件首行，不处理
            if (line.contains("factoryname") == true
                    || line.contains("addressed") == true) {
                return;
            }
            // 输入的一行预处理文本
            StringTokenizer itr = new StringTokenizer(line);
            String mapkey = new String();
            String mapvalue = new String();
            int i = 0;
            while (itr.hasMoreTokens()) {
                // 先读取一个单词
                String token = itr.nextToken();
                // 判断该地址ID就把存到"values[0]"
                if (token.charAt(0) >= '0' && token.charAt(0)  0) {
                        relationtype = "1";
                    } else {
                        relationtype = "2";
                    }
                    continue;
                }
                // 存工厂名
                mapvalue += token + " ";
                i++;
            }
            // 输出左右表
            context.write(new Text(mapkey), new Text(relationtype + "+"+ mapvalue));
        }
    }
    /*
     * reduce解析map输出，将value中数据按照左右表分别保存，
　　* 然后求出笛卡尔积，并输出。
     */
    public static class Reduce extends Reducer {
        // 实现reduce函数
        public void reduce(Text key, Iterable values, Context context)
                throws IOException, InterruptedException {
            // 输出表头
            if (0 == time) {
                context.write(new Text("factoryname"), new Text("addressname"));
                time++;
            }
            int factorynum = 0;
            String[] factory = new String[10];
            int addressnum = 0;
            String[] address = new String[10];
            Iterator ite = values.iterator();
            while (ite.hasNext()) {
                String record = ite.next().toString();
                int len = record.length();
                int i = 2;
                if (0 == len) {
                    continue;
                }
                // 取得左右表标识
                char relationtype = record.charAt(0);
                // 左表
                if ('1' == relationtype) {
                    factory[factorynum] = record.substring(i);
                    factorynum++;
                }
                // 右表
                if ('2' == relationtype) {
                    address[addressnum] = record.substring(i);
                    addressnum++;
                }
            }
            // 求笛卡尔积
            if (0 != factorynum && 0 != addressnum) {
                for (int m = 0; m  <pre class="brush:php;toolbar:false"> javac -classpath hadoop-core-1.1.2.jar:/opt/hadoop-1.1.2/lib/commons-cli-1.2.jar -d firstProject firstProject/MTJoin.java

jar -cvf MTJoin.jar -C firstProject/ .

删除已经存在的output

hadoop fs -rmr output

hadoop fs -mkdir input

hadoop fs -put factory input

 hadoop fs -put address input

运行

hadoop jar  MTJoin.jar MTJoin input output

查看结果

 hadoop fs -cat output/part-r-00000

作者：a331251021 发表于2013-8-4 16:20:52 原文链接

阅读：72 评论：0 查看评论

hadoop实例---多表关联

原文地址：hadoop实例---多表关联, 感谢原作者分享。

声明

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

如何在MySQL中删除或修改现有视图？May 16, 2025 am 12:11 AM

todropaviewInmySQL，使用“ dropviewifexistsview_name;” andTomodifyAview，使用“ createOrreplaceViewViewViewview_nameAsSelect ...”。whendroppingaview，asew dectivectenciesanduse和showcreateateviewViewview_name;“ tounderStanditSsstructure.whenModifying

MySQL视图：我可以使用哪些设计模式？May 16, 2025 am 12:10 AM

mySqlViewScaneFectectialized unizedesignpatternslikeadapter，Decorator，Factory，andObserver.1）adapterPatternadaptSdataForomDifferentTablesIntoAunifiendView.2）decoratorPatternenhancateDataWithCalcalcualdCalcalculenfields.3）fieldfields.3）

在MySQL中使用视图的优点是什么？May 16, 2025 am 12:09 AM

查看InMysqlareBeneForsImplifyingComplexqueries，增强安全性，确保dataConsistency，andOptimizingPerformance.1）他们simimplifycomplexqueriesbleiesbyEncapsbyEnculatingThemintoreusableviews.2）viewsEnenenhancesecuritybyControllityByControllingDataAcces.3）

如何在MySQL中创建一个简单的视图？May 16, 2025 am 12:08 AM

toCreateAsimpleViewInmySQL，USEthecReateaTeviewStatement.1）defitEtheetEtheTeViewWithCreatEaTeviewView_nameas.2）指定usethectstatementTorivedesireddata.3）usethectStatementTorivedesireddata.3）usetheviewlikeatlikeatlikeatlikeatlikeatlikeatable.views.viewssimplplifefifydataaccessandenenanceberity but consisterfort，butconserfort，consoncontorfinft

MySQL创建用户语句：示例和常见错误May 16, 2025 am 12:04 AM

1）foralocaluser：createUser'localuser'@'@'localhost'Indidendify'securepassword'; 2）foraremoteuser：creationuser's creationuser'Remoteer'Remoteer'Remoteer'Remoteer'Remoteer'Remoteer'Remoteer'Remoteer'Rocaluser'@'localhost'Indidendify'seceledify'Securepassword'; 2）

在MySQL中使用视图的局限性是什么？May 14, 2025 am 12:10 AM

mysqlviewshavelimitations：1）他们不使用Supportallsqloperations，限制DatamanipulationThroughViewSwithJoinSorsubqueries.2）他们canimpactperformance，尤其是withcomplexcomplexclexeriesorlargedatasets.3）

确保您的MySQL数据库：添加用户并授予特权May 14, 2025 am 12:09 AM

porthusermanagementInmysqliscialforenhancingsEcurityAndsingsmenting效率databaseoperation.1）usecReateusertoAddusers，指定connectionsourcewith@'localhost'or@'％'。

哪些因素会影响我可以在MySQL中使用的触发器数量？May 14, 2025 am 12:08 AM

mysqldoes notimposeahardlimitontriggers，butacticalfactorsdeterminetheireffactective：1）serverConfiguration impactactStriggerGermanagement; 2）复杂的TriggerSincreaseSySystemsystem load; 3）largertablesslowtriggerperfermance; 4）highConconcConcrencerCancancancancanceTigrignecentign; 5）; 5）

See all articles