hive中leftouterjoin的问题,where过滤条件写的地方不对,得出的结果不一样,请看下面的代码过程,不用多解释。 Hive Hadoop hive desc t1; OKid int name string p_id int Time taken: 0.118 seconds, Fetched: 3 row(s)hive desc t2;OKid int name string T
hive中left outer join 的问题,where过滤条件写的地方不对,得出的结果不一样,请看下面的代码过程,不用多解释。 Hive Hadoophive> desc t1; OK id int name string p_id int Time taken: 0.118 seconds, Fetched: 3 row(s) hive> desc t2; OK id int name string Time taken: 0.051 seconds, Fetched: 2 row(s) hive> select * from t1; OK 1 aaa 2 2 bbb 2 3 ccc 3 4 ddd 4 5 fff 3 6 ooo 23 Time taken: 0.418 seconds, Fetched: 6 row(s) hive> select * from t2; OK 4 jjj 4 jjj 4 jjj 2 abc 3 hhh 4 jjj 3 ii 2 fuck 7 shit Time taken: 0.068 seconds, Fetched: 9 row(s) hive> select * from t1 left outer join t2 on (t1.p_id=t2.id) where t2.name='abc'; OK 1 aaa 2 2 abc 2 bbb 2 2 abc Time taken: 21.53 seconds, Fetched: 2 row(s) hive> select * from t1 left outer join t2 on (t1.p_id=t2.id and t2.name='abc'); OK 1 aaa 2 2 abc 2 bbb 2 2 abc 3 ccc 3 NULL NULL 4 ddd 4 NULL NULL 5 fff 3 NULL NULL 6 ooo 23 NULL NULL Time taken: 17.676 seconds, Fetched: 6 row(s) hive left outer join 要过滤右表的数据应该是第二种写法,第一种是mysql的写法,但是在hive中会存在问题。