Home >Database >Mysql Tutorial >How to Fetch Specific Query Results from an External Database in Apache Spark 2.0.0?
Fetching Query Results from External Database in Apache Spark 2.0.0
In Apache Spark 2.0.0, it is possible to retrieve the result set of a query from an external database, rather than loading the entire table.
In the provided PySpark example, the df DataFrame is created by reading data from a MySQL table using the JDBC connector. However, to fetch only the results of a specific query, you can specify the subquery as the dbtable argument in the read method.
The following code demonstrates how to fetch the result set of the query SELECT foo, bar FROM schema.tablename:
from pyspark.sql import SparkSession spark = SparkSession\ .builder\ .appName("spark play")\ .getOrCreate() df = spark.read\ .format("jdbc")\ .option("url", "jdbc:mysql://localhost:port")\ .option("dbtable", "(SELECT foo, bar FROM schema.tablename) AS tmp")\ .option("user", "username")\ .option("password", "password")\ .load()
By using this approach, Spark will execute the subquery on the external database and load only the resulting data into the DataFrame. This can be useful for optimizing performance and reducing data transfer when you only need a subset of the data from the table.
The above is the detailed content of How to Fetch Specific Query Results from an External Database in Apache Spark 2.0.0?. For more information, please follow other related articles on the PHP Chinese website!