Home  >  Article  >  Backend Development  >  How to Add Constant Columns in Spark DataFrames?

How to Add Constant Columns in Spark DataFrames?

Susan Sarandon
Susan SarandonOriginal
2024-11-06 22:55:02391browse

How to Add Constant Columns in Spark DataFrames?

Adding Constant Columns in Spark DataFrames

In Spark, adding a constant column to a DataFrame with a specific value for each row can be achieved using various methods.

lit and Other Functions (Spark 1.3 )

In Spark versions 1.3 and above, the lit function is used to create a literal value, which can be used as the second argument to DataFrame.withColumn to add a constant column:

from pyspark.sql.functions import lit

df.withColumn('new_column', lit(10))

For more complex columns, functions like array, map, and struct can be used to build the desired column values:

from pyspark.sql.functions import array, map, struct

df.withColumn("some_array", array(lit(1), lit(2), lit(3)))
df.withColumn("some_map", map(lit("key1"), lit(1), lit("key2"), lit(2)))

typedLit (Spark 2.2 )

Spark 2.2 introduces the typedLit function, which supports providing Seq, Map, and Tuples as constants:

import org.apache.spark.sql.functions.typedLit

df.withColumn("some_array", typedLit(Seq(1, 2, 3)))
df.withColumn("some_struct", typedLit(("foo", 1, 0.3)))

Using a UDF

As an alternative to using literal values, it is possible to create a User Defined Function (UDF) that returns a constant value for each row and use that UDF to add the column:

from pyspark.sql.functions import udf, lit

def add_ten(row):
    return 10

add_ten_udf = udf(add_ten, IntegerType())
df.withColumn('new_column', add_ten_udf(lit(1.0)))

Note:

The constant values can also be passed as arguments to UDFs or SQL functions using the same constructs.

The above is the detailed content of How to Add Constant Columns in Spark DataFrames?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn