Home >Database >Mysql Tutorial >Why Does Adding a `WHERE master_id IS NOT NULL` Clause Increase the `COUNT(*)` in Hive?
Hive count exception: WHERE clause causes count to increase
During Hive data exploration, a strange phenomenon was observed. When counting using the simple statement select count(*) as c from mytable
, the result is 1,129,563. However, adding the filter select count(*) as c from mytable where master_id is not null
unexpectedly increased the count to 1,134,041.
Further investigation revealed that the master_id
column always contained non-null values. This leads to a question: How to explain that the implementation of the null value exclusion condition actually increases the number of rows?
Possible explanation: Hive statistics
The answer lies in understanding the impact of Hive statistics. By default, Hive utilizes statistics to optimize queries and improve performance. When executing a select count(*) as c from mytable
query (without any filtering), Hive may rely on stored statistics to estimate the count. However, these statistics are not always accurate or up-to-date.
In this case, the statistics may indicate that there are many rows in the master_id
column that contain null values. When adding the filter master_id is not null
, Hive re-evaluated the statistics and recognized that most rows contained non-null values. This results in a more accurate count, eliminating noticeable discrepancies.
Methods to solve the problem
To get an accurate count, especially when working with tables that have been significantly changed or have not been analyzed recently, it is recommended:
hive.compute.query.using.stats=false
to prevent Hive from using statistics and force a full table scan. ANALYZE TABLE
command to manually update table statistics and ensure their accuracy. hive.stats.autogather=true
to automatically collect statistics during data operations such as INSERT OVERWRITE
. The above is the detailed content of Why Does Adding a `WHERE master_id IS NOT NULL` Clause Increase the `COUNT(*)` in Hive?. For more information, please follow other related articles on the PHP Chinese website!