Home >Backend Development >Python Tutorial >How to Convert a PySpark String Column to a Date Column?

How to Convert a PySpark String Column to a Date Column?

Barbara Streisand
Barbara StreisandOriginal
2024-12-01 11:26:101036browse

How to Convert a PySpark String Column to a Date Column?

Converting PySpark String to Date Format

You have a PySpark DataFrame with a string column in the MM-dd-yyyy format, and you need to convert it to a date column.

Solution:

To convert a PySpark string column to a date column, you can use the to_date function. However, if you're using an older version of Spark (< 2.2), you can follow the alternative approach below:

Alternative Approach for Spark < 2.2:

Use a combination of unix_timestamp and from_unixtime functions:

from pyspark.sql.functions import unix_timestamp, from_unixtime

# Example DataFrame with string dates
df = spark.createDataFrame(
    [("11/25/1991",), ("11/24/1991",), ("11/30/1991",)],
    ["date_str"]
)

# Convert to timestamps
df2 = df.select(
    "date_str",
    from_unixtime(unix_timestamp("date_str", "MM/dd/yyy")).alias("date")
)

This will create a new column named date with date objects converted from the string column.

The above is the detailed content of How to Convert a PySpark String Column to a Date Column?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn