Home >Database >Mysql Tutorial >How to Fetch Specific Query Results from an External Database into a Spark DataFrame?

How to Fetch Specific Query Results from an External Database into a Spark DataFrame?

DDD
DDDOriginal
2024-11-30 16:05:14698browse

How to Fetch Specific Query Results from an External Database into a Spark DataFrame?

Fetching a Query Result from an External Database in Apache Spark 2.0.0

In Apache Spark, it is possible to connect to external databases and load data into Spark DataFrames using the read method. When reading from a database table, the default behavior is to retrieve the entire table. However, in certain scenarios, it may be desirable to fetch only the results of a specific query.

Querying an External Database in PySpark

Using PySpark, you can specify a SQL query as the "dbtable" option when reading from a database. This allows you to fetch the result set of a query instead of the entire table.

from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .appName("spark play")\
    .getOrCreate()    

df = spark.read\
    .format("jdbc")\
    .option("url", "jdbc:mysql://localhost:port")\
    .option("dbtable", "(SELECT foo, bar FROM schema.tablename) AS tmp")\
    .option("user", "username")\
    .option("password", "password")\
    .load()

In this example, the query (SELECT foo, bar FROM schema.tablename) is executed on the external database, and the result set is loaded into the Spark DataFrame df.

The above is the detailed content of How to Fetch Specific Query Results from an External Database into a Spark DataFrame?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn