Home >Database >Mysql Tutorial >How to Fetch Specific Query Results from an External Database into a Spark DataFrame?
In Apache Spark, it is possible to connect to external databases and load data into Spark DataFrames using the read method. When reading from a database table, the default behavior is to retrieve the entire table. However, in certain scenarios, it may be desirable to fetch only the results of a specific query.
Using PySpark, you can specify a SQL query as the "dbtable" option when reading from a database. This allows you to fetch the result set of a query instead of the entire table.
from pyspark.sql import SparkSession spark = SparkSession\ .builder\ .appName("spark play")\ .getOrCreate() df = spark.read\ .format("jdbc")\ .option("url", "jdbc:mysql://localhost:port")\ .option("dbtable", "(SELECT foo, bar FROM schema.tablename) AS tmp")\ .option("user", "username")\ .option("password", "password")\ .load()
In this example, the query (SELECT foo, bar FROM schema.tablename) is executed on the external database, and the result set is loaded into the Spark DataFrame df.
The above is the detailed content of How to Fetch Specific Query Results from an External Database into a Spark DataFrame?. For more information, please follow other related articles on the PHP Chinese website!