search
HomeJavaProblem with postgreSQL, trying to connect PySpark on Jupyter Notebook on Docker

php editor Youzi recently received feedback from users that they encountered problems when using Jupyter Notebook on Docker to connect to PySpark. The specific problem is that I encountered some problems related to PostgreSQL during the connection process. In response to this problem, we will provide you with solutions and operation steps to help users successfully connect to PySpark and solve the problem. In this article, we will introduce in detail how to use Jupyter Notebook on Docker to connect to PySpark, and provide solutions to some common problems. We hope it will be helpful to everyone.

Problem content

I encountered this problem py4jjavaerror: An error occurred when calling o124.save. :org.postgresql.util.psqexception: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster accepts tcp/ip connections. When I run this pysark code on jupyter notbook and run everything using docker, postgresql is installed in my local machine (windows).

from pyspark.sql import SparkSession
from pyspark.sql.functions import lit, col, explode
import pyspark.sql.functions as f

spark = SparkSession.builder.appName("ETL Pipeline").config("spark.jars", "./postgresql-42.7.1.jar").getOrCreate()
df = spark.read.text("./Data/WordData.txt")

df2 = df.withColumn("splitedData", f.split("value"," "))
df3 = df2.withColumn("words", explode("splitedData"))
wordsDF = df3.select("words")
wordCount = wordsDF.groupBy("words").count()

driver = "org.postgresql.Driver"
url = "jdbc:postgresql://localhost:5432/local_database"
table = "word_count"
user = "postgres"
password = "12345"

wordCount.write.format("jdbc") \
    .option("driver", driver) \
    .option("url", url) \
    .option("dbtable", table) \
    .option("mode", "append") \
    .option("user", user) \
    .option("password", password) \
    .save()

spark.stop()

I tried editing postgresql.conf adding "listen_addresses='localhost'" and editing pg_hba.conf adding "host all all 0.0.0.0/0 md5" but it didn't work for me so I don't know what to do Do.

Workaround

I also solved the problem of installing PostgreSQL on docker (using this image https://hub.docker .com/_/postgres/ only Create a container for postgres) and use the command to create a network between the PySpark container and the postgreSQL container

docker network creates my_network,

This command is for postgres container

docker run --name postgres_container --network my_network -e POSTGRES_PASSWORD=12345 -d -p 5432:5432 postgres:latest

This is for Jupyter-pyspark container

docker run --name jupyter_container --network my_network -it -p 8888:8888 -v C:\home\work\path:/home/jovyan/work jupyter/pyspark-notebook:latest

The above is the detailed content of Problem with postgreSQL, trying to connect PySpark on Jupyter Notebook on Docker. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:stackoverflow. If there is any infringement, please contact admin@php.cn delete

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),