Home > Article > Backend Development > How to Add Constant Columns in Spark DataFrames?
In Spark, adding a constant column to a DataFrame with a specific value for each row can be achieved using various methods.
In Spark versions 1.3 and above, the lit function is used to create a literal value, which can be used as the second argument to DataFrame.withColumn to add a constant column:
from pyspark.sql.functions import lit df.withColumn('new_column', lit(10))
For more complex columns, functions like array, map, and struct can be used to build the desired column values:
from pyspark.sql.functions import array, map, struct df.withColumn("some_array", array(lit(1), lit(2), lit(3))) df.withColumn("some_map", map(lit("key1"), lit(1), lit("key2"), lit(2)))
Spark 2.2 introduces the typedLit function, which supports providing Seq, Map, and Tuples as constants:
import org.apache.spark.sql.functions.typedLit df.withColumn("some_array", typedLit(Seq(1, 2, 3))) df.withColumn("some_struct", typedLit(("foo", 1, 0.3)))
As an alternative to using literal values, it is possible to create a User Defined Function (UDF) that returns a constant value for each row and use that UDF to add the column:
from pyspark.sql.functions import udf, lit def add_ten(row): return 10 add_ten_udf = udf(add_ten, IntegerType()) df.withColumn('new_column', add_ten_udf(lit(1.0)))
Note:
The constant values can also be passed as arguments to UDFs or SQL functions using the same constructs.
The above is the detailed content of How to Add Constant Columns in Spark DataFrames?. For more information, please follow other related articles on the PHP Chinese website!