Home > Article > Backend Development > How to Implement IF-THEN-ELSE Logic in Spark DataFrames?
Spark Equivalent of IF Then ELSE
This question delves into creating a new column in a Spark DataFrame based on conditional rules.
Issue with When Function
The provided code attempts to use the when() function to create a new column named "Class" based on the values in the "iris_class" column. However, it throws an error stating that when() takes only two arguments.
Correct Syntax and Structure
The correct syntax for the when() function is:
F.when(condition1, value1).when(condition2, value2)...otherwise(otherwiseValue)
This allows for multiple when() clauses to be chained together, with an optional otherwise() clause to handle cases not covered by the when() clauses.
The equivalent SQL for this syntax would be a CASE statement with multiple WHEN clauses, as shown below:
CASE WHEN condition1 THEN value1 WHEN condition2 THEN value2 ... ELSE otherwiseValue END
Recommended Solution
Therefore, the correct code to create the "Class" column should be:
iris_spark_df = iris_spark.withColumn( "Class", F.when(iris_spark.iris_class == 'Iris-setosa', 0) .when(iris_spark.iris_class == 'Iris-versicolor', 1) .otherwise(2) )
Alternative Syntax
Another valid syntax for achieving the same result is:
iris_spark_df = iris_spark.withColumn( "Class", F.when(iris_spark.iris_class == 'Iris-setosa', 0) .otherwise( F.when(iris_spark.iris_class == 'Iris-versicolor', 1) .otherwise(2) ) )
Note on Hive IF
It's important to note that the Hive IF conditional, which has the syntax IF(condition, if-true, if-false), is not supported in Spark. This conditional can only be used in raw SQL queries with Hive support.
The above is the detailed content of How to Implement IF-THEN-ELSE Logic in Spark DataFrames?. For more information, please follow other related articles on the PHP Chinese website!