Home >Database >Mysql Tutorial >How Can I Preserve Null Values During Apache Spark Joins?

How Can I Preserve Null Values During Apache Spark Joins?

DDD
DDDOriginal
2024-12-31 17:36:11201browse

How Can I Preserve Null Values During Apache Spark Joins?

Preserving Null Values in Apache Spark Joins

By default, Apache Spark omits rows with null values when performing joins. To include these values in the join output, Spark provides several options.

NULL-Safe Equality Operator (<=>)

Spark 1.6 introduced a special NULL-safe equality operator that allows you to include null values in your join criteria.

numbersDf
  .join(lettersDf, numbersDf("numbers") <=> lettersDf("numbers"))
  .drop(lettersDf("numbers"))

Column.eqNullSafe (PySpark 2.3.0 )

In PySpark 2.3.0 and later, you can use Column.eqNullSafe to perform NULL-safe equality checks.

numbers_df = sc.parallelize([
    ("123", ), ("456", ), (None, ), ("", )
]).toDF(["numbers"])

letters_df = sc.parallelize([
    ("123", "abc"), ("456", "def"), (None, "zzz"), ("", "hhh")
]).toDF(["numbers", "letters"])

numbers_df.join(letters_df, numbers_df.numbers.eqNullSafe(letters_df.numbers))

%<=>% (SparkR)

SparkR offers a %<=>% operator for NULL-safe equality checks.

numbers_df <- createDataFrame(data.frame(numbers = c("123", "456", NA, "")))
letters_df <- createDataFrame(data.frame(
  numbers = c("123", "456", NA, ""),
  letters = c("abc", "def", "zzz", "hhh")
))

head(join(numbers_df, letters_df, numbers_df$numbers %<=>% letters_df$numbers))

IS NOT DISTINCT FROM (SQL)

In SQL (Spark 2.2.0 ), you can use IS NOT DISTINCT FROM to preserve null values in joins.

SELECT * FROM numbers JOIN letters 
ON numbers.numbers IS NOT DISTINCT FROM letters.numbers

This operator can also be used with the DataFrame API:

numbersDf.alias("numbers")
  .join(lettersDf.alias("letters"))
  .where("numbers.numbers IS NOT DISTINCT FROM letters.numbers")

The above is the detailed content of How Can I Preserve Null Values During Apache Spark Joins?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn