Home >Database >Mysql Tutorial >Why Does My Hive COUNT(*) Query Show Fewer Rows Than My Non-NULL Count Query?

Why Does My Hive COUNT(*) Query Show Fewer Rows Than My Non-NULL Count Query?

Patricia Arquette
Patricia ArquetteOriginal
2025-01-12 07:16:41179browse

Why Does My Hive COUNT(*) Query Show Fewer Rows Than My Non-NULL Count Query?

Hive query count difference: non-empty count is greater than total count

In Hive, we observed an interesting phenomenon regarding row count calculation. Suppose there is a table named mytable that contains a field named master_id. When the following query is executed, the total number of rows retrieved is 1,129,563:

<code class="language-sql">SELECT COUNT(*) AS c FROM mytable;</code>

However, when querying for the number of rows with non-empty master_id values, the count increases to 1,134,041:

<code class="language-sql">SELECT COUNT(*) AS c FROM mytable WHERE master_id IS NOT NULL;</code>

This difference occurs because, by default, queries without a WHERE clause may use statistics to estimate the number of rows. To solve this problem, parameter hive.compute.query.using.stats can be set to false to disable statistical estimation.

Alternatively, you can calculate table statistics explicitly using the ANALYZE TABLE statement, or set hive.stats.autogather to true to collect statistics when bulk importing data via the INSERT OVERWRITE operation. This will ensure accurate and consistent query results and avoid unexpected results like this.

The above is the detailed content of Why Does My Hive COUNT(*) Query Show Fewer Rows Than My Non-NULL Count Query?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn