Home >Database >Mysql Tutorial >How to Concatenate Columns in an Apache Spark DataFrame?

How to Concatenate Columns in an Apache Spark DataFrame?

Patricia Arquette
Patricia ArquetteOriginal
2025-01-18 18:42:13339browse

How to Concatenate Columns in an Apache Spark DataFrame?

Concatenating Columns in an Apache Spark DataFrame

In Apache Spark, you can concatenate columns in a DataFrame using either raw SQL or the DataFrame API introduced in Spark 1.5.0.

Using Raw SQL

To concatenate columns using raw SQL, employ the CONCAT function:

In Python:

df = sqlContext.createDataFrame([("foo", 1), ("bar", 2)], ("k", "v"))
df.registerTempTable("df")
sqlContext.sql("SELECT CONCAT(k, ' ',  v) FROM df")

In Scala:

import sqlContext.implicits._

val df = sc.parallelize(Seq(("foo", 1), ("bar", 2))).toDF("k", "v")
df.registerTempTable("df")
sqlContext.sql("SELECT CONCAT(k, ' ',  v) FROM df")

Using DataFrame API

Since Spark 1.5.0, you can use the concat function with the DataFrame API:

In Python:

from pyspark.sql.functions import concat, col, lit

df.select(concat(col("k"), lit(" "), col("v")))

In Scala:

import org.apache.spark.sql.functions.{concat, lit}

df.select(concat($"k", lit(" "), $"v"))

Using concat_ws

There's also the concat_ws function, which takes a string separator as its first argument:

df.select(concat_ws("-", col("k"), col("v")))

The above is the detailed content of How to Concatenate Columns in an Apache Spark DataFrame?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn