ホームページ >バックエンド開発 >Python チュートリアル >Streamlit を使用してデータ アプリを深く掘り下げる

Streamlit を使用してデータ アプリを深く掘り下げる

Barbara Streisand
Barbara Streisandオリジナル
2024-09-23 08:15:03499ブラウズ

Deep Dive into Data Apps with Streamlit

対話型アプリケーションを迅速に開発および展開できる機能は非常に貴重です。 Streamlit は、データ サイエンティストや開発者が最小限のコードで直感的な Web アプリを作成できるようにする強力なツールです。必須のデータ サイエンス ライブラリがプリロードされている Python Data Science Notebook Docker Image と組み合わせることで、Streamlit アプリを構築するための堅牢な環境のセットアップがこれまでになく簡単になりました。

ストリームリットとは何ですか?

Streamlit は、データ サイエンスおよび機械学習プロジェクト用のインタラクティブな Web アプリケーションの作成プロセスを簡素化するオープンソース Python ライブラリです。 Streamlit を使用すると、すべて純粋な Python を使用して、わずか数分でデータ スクリプトを共有可能な Web アプリに変換できます。フロントエンド開発スキルや、Flask や Django などの Web フレームワークの知識は必要ありません。

Streamlit の主な機能は次のとおりです:

  • 使いやすい: 簡単な API を使用して、数行のコードでアプリを構築します。
  • インタラクティブなウィジェット: スライダー、ボタン、テキスト入力などを組み込んで、アプリをインタラクティブにします。
  • リアルタイム更新: データまたはコードが変更されると、アプリのコンテンツを自動的に更新します。
  • データ視覚化: Matplotlib、Seaborn、Plotly、Altair などのライブラリとシームレスに統合して、豊富な視覚化を実現します。
  • 導入準備完了: Streamlit Cloud、Heraku、AWS などのさまざまなプラットフォームにアプリを簡単に導入できます。

データ アプリに Streamlit を使用する理由

Streamlit には、データ アプリケーションの開発に理想的な選択肢となるいくつかの利点があります。

  • ラピッド プロトタイピング: 基盤となる Web インフラストラクチャを気にせずに、アイデアを機能的なアプリにすばやく変換します。
  • Python 構文: HTML、CSS、または JavaScript を学ぶ必要がなく、既存のスキルを活用して完全に Python でアプリを作成します。
  • インタラクティブなデータ探索: ユーザーがウィジェットを通じてデータを操作できるようにし、データセットやモデル結果の探索を容易にします。
  • コミュニティとサポート: プラグインと拡張機能の充実したエコシステムに貢献する、成長を続けるコミュニティから恩恵を受けましょう。
  • オープンソース: 継続的な開発とサポートを保証しながら、ニーズに合わせてライブラリを変更および拡張します。

Streamlit を使用することで、データ サイエンティストは、結果を視覚化して理解するための対話型ツールを関係者に提供しながら、データ分析とモデル構築に集中できます。

Python データ サイエンス ノートブックの Docker イメージの概要

Python Data Science Notebook Docker Image は、データ サイエンス ワークフローを合理化するために設計された Docker コンテナです。最小限の python:3.9-slim ベース イメージから構築されており、データ操作、機械学習、視覚化、データベース接続など、データ サイエンスのさまざまな側面に対応するプリインストールされたライブラリの包括的なスイートが含まれています。

主な特徴:

  • Jupyter Notebook アクセス: Web ブラウザーを通じて Jupyter Notebook を実行してアクセスし、インタラクティブなコーディング環境を促進します。
  • プリインストールされたライブラリ:
    • データ操作: pandas、numpy、polars、dask、ibis、pyiceberg、datafusion、sqlframe
    • 機械学習: scikit-learn、tensorflow、torch、xgboost、lightgbm
    • 視覚化: matplotlib、seaborn、plotly
    • データベースアクセス: psycopg2-binary、mysqlclient、sqlalchemy、duckdb、pyarrow
    • オブジェクトストレージ: boto3、s3fs、minio
    • ユーティリティ: openpyxl、requests、 beautifulsoup4、lxml、pyspark、dremio-simple-query
  • ユーザー構成: ホーム ディレクトリが /home/pydata に設定され、ユーザー pydata で動作します。作業ディレクトリは /home/pydata/work です。
  • ポート公開: ポート 8888 を公開して、Jupyter Notebook サーバーへのアクセスを許可します。

利点:

  • 一貫性: さまざまなマシンやチームメンバー間で一貫した開発環境を確保します。
  • 分離: 他のプロジェクトとの競合やローカル マシン上の依存関係を回避します。
  • 移植性: 開発環境をシステム間で簡単に移動したり、サーバーに展開したりできます。
  • 拡張可能: 必要に応じてライブラリや構成を追加して、Docker イメージをカスタマイズします。

この Docker イメージを利用すると、必要なツールとライブラリがすべて自由に使えるため、セットアップの時間を節約し、Streamlit アプリケーションの開発に集中できます。

環境のセットアップ

Python Data Science Notebook Docker イメージを使用して Streamlit アプリケーションの構築を開始するには、環境をセットアップする必要があります。これには、Docker のインストール、Docker イメージのプル、コンテナーの実行、Streamlit がインストールされて正しく機能していることの確認が含まれます。

Docker のインストール

Docker がマシンにまだインストールされていない場合は、次の手順に従います。

  1. Docker デスクトップをダウンロード:
  • Windows および macOS: Docker Desktop ダウンロード ページにアクセスし、オペレーティング システムのインストーラーをダウンロードします。
  • Linux: Ubuntu、Debian、Fedora、または特定のディストリビューションの公式 Docker インストール ガイドを参照してください。
  1. Docker をインストールします:
  • インストーラーを実行し、画面上の指示に従います。
  • Linux の場合は、ディストリビューションのインストール ガイドに記載されているコマンドライン手順に従います。
  1. インストールを確認します:

ターミナルまたはコマンド プロンプトを開いて、次のコマンドを実行します。

   docker --version

Docker のバージョン情報が表示され、Docker がインストールされていることを確認できます。

alexmerced/datanotebook Docker イメージのプル

alexmerced/datanotebook Docker イメージには、データ サイエンス ライブラリがプリインストールされた包括的な Python 環境が含まれています。

Docker イメージをプルします。

ターミナルで次を実行します:

docker pull alexmerced/datanotebook

このコマンドは、Docker Hub からローカル マシンにイメージをダウンロードします。

イメージがプルされていることを確認します。

システム上のすべての Docker イメージをリストします:

docker images

alexmerced/datanotebook が画像の中にリストされているのが確認できるはずです。

Jupyter Notebook Access を使用した Docker コンテナの実行

次に、イメージから Docker コンテナを実行し、Jupyter Notebook サーバーにアクセスします。

  • 作業ディレクトリに移動します

  • ターミナルを開き、Jupyter Notebook と Streamlit アプリを配置するディレクトリに移動します。

cd /path/to/your/project

Docker コンテナを実行します。

次のコマンドを実行します:

docker run -p 8888:8888 -p 8501:8501 -v $(pwd):/home/pydata/work alexmerced/datanotebook
  • ポート マッピング: -p 8888:8888 は、コンテナーのポート 8888 をローカル マシンにマッピングし、Jupyter Notebook へのアクセスを許可します。 -p 8501:8501 は、コンテナのポート 8501 をローカル マシンにマッピングし、Streamlit アプリへのアクセスを許可します。
  • ボリュームのマウント: -v $(pwd):/home/pydata/work は、現在のディレクトリをコンテナーにマウントし、ホストとコンテナーの間でファイル共有を有効にします。 ### Jupyter Notebook にアクセス:

Web ブラウザを開いて http://localhost:8888 に移動します。

パスワードやトークンを必要とせずに、Jupyter Notebook インターフェイスが表示されるはずです。

コンテナ内での Streamlit のインストールの確認

Streamlit が Docker コンテナ内にインストールされ、適切に機能していることを確認します。

  • Open a New Terminal in Jupyter Notebook.

  • In the Jupyter interface, click on the New dropdown menu and select Terminal.

  • In the terminal, run:

streamlit --version

If Streamlit is installed, the version number will be displayed.

If not installed, install it using:

pip install streamlit

Create a Test Streamlit App:

In the Jupyter interface, click on New and select Text File.

Save the file as app.py in your working directory.

Add the following code to app.py:

import streamlit as st

st.title("Streamlit Test App")
st.write("Congratulations! Streamlit is working inside the Docker container.")

Save the file.

Run the Streamlit App:

In the Jupyter terminal, execute:

streamlit run app.py --server.enableCORS false --server.enableXsrfProtection false --server.port 8501 --server.address 0.0.0.0

Server Flags Explained:

  • --server.enableCORS false: Disables Cross-Origin Resource Sharing protection.
  • --server.enableXsrfProtection false: Disables Cross-Site Request Forgery protection.
  • --server.port 8501: Runs the app on port 8501.
  • --server.address 0.0.0.0: Makes the server accessible externally.

Access the Streamlit App:

Open a new tab in your web browser and navigate to http://localhost:8501.

You should see the Streamlit app displaying the title and message.

Optional: Keep Streamlit Running in the Background:

To keep the Streamlit app running without occupying the terminal, you can run it in the background using nohup:

nohup streamlit run app.py --server.enableCORS false --server.enableXsrfProtection false --server.port 8501 --server.address 0.0.0.0 &

Exiting the Docker Container

In the Terminal Running the Container:

Press Ctrl + C to stop the container.

Alternatively, Use Docker Commands:

  • List running containers
docker ps

Stop the container using its Container ID:

docker stop <container_id>

Summary

You've successfully set up your environment:

  • Installed Docker (if necessary).
  • Pulled the alexmerced/datanotebook Docker image.
  • Ran the Docker container with Jupyter Notebook access.
  • Verified that Streamlit is installed and operational within the container.

With this setup, you're ready to develop and run Streamlit applications in a consistent and isolated environment, leveraging the powerful tools provided by the Docker image.

Getting Started with Streamlit

With your environment set up, it's time to dive into Streamlit and start building interactive applications. This section will guide you through creating your first Streamlit app, understanding the basic structure of a Streamlit script, and running Streamlit apps from within the Jupyter Notebook provided by the Docker container.

Creating Your First Streamlit App

Let's begin by creating a simple Streamlit application that displays text and a chart.

  1. Create a New Python Script:
  • In the Jupyter Notebook interface, click on New and select Text File.
  • Save the file as app.py in your working directory (/home/pydata/work).
  1. Write the Streamlit Code:

Open app.py and add the following code:

   import streamlit as st
   import pandas as pd
   import numpy as np

   st.title("My First Streamlit App")

   st.write("Welcome to my first Streamlit application!")

   # Create a random dataframe
   df = pd.DataFrame(
       np.random.randn(20, 3),
       columns=['Column A', 'Column B', 'Column C']
   )

   st.write("Here is a random dataframe:")
   st.dataframe(df)

   st.write("Line chart of the data:")
   st.line_chart(df)

Explanation:

  • Imports necessary libraries.
  • Sets the title and writes introductory text.
  • Generates a random DataFrame.
  • Displays the DataFrame and a line chart based on the data.

Save the Script:

  • Ensure that you save app.py after adding the code.

Understanding the Basic Structure of a Streamlit Script

A Streamlit script is a standard Python script with the streamlit library functions to create interactive elements.

  • Import Streamlit:
import streamlit as st

Set the Title and Headers:

st.title("App Title")
st.header("This is a header")
st.subheader("This is a subheader")

Write Text:

st.text("This is a simple text.")
st.markdown("This is a text with **markdown** formatting.")

Display Data:

st.dataframe(df)  # Displays an interactive table
st.table(df)      # Displays a static table

Display Charts:

st.line_chart(data)
st.bar_chart(data)
st.area_chart(data)

Add Interactive Widgets:

name = st.text_input("Enter your name:")
st.write(f"Hello, {name}!")

age = st.slider("Select your age:", 0, 100)
st.write(f"You are {age} years old.")

Layout Elements:

with st.sidebar:
    st.write("This is the sidebar.")

col1, col2 = st.columns(2)
col1.write("Content in column 1")
col2.write("Content in column 2")

Running Streamlit Apps from Within the Jupyter Notebook

To run your Streamlit app within the Docker container and access it from your host machine:

Open a Terminal in Jupyter Notebook:

In the Jupyter interface, click on New and select Terminal.

Navigate to the Working Directory:

cd /home/pydata/work

Run the Streamlit App:

Execute the following command:

streamlit run app.py --server.enableCORS false --server.enableXsrfProtection false --server.port 8501 --server.address 0.0.0.0

Explanation of Flags:

  • --server.enableCORS false: Disables Cross-Origin Resource Sharing protection.
  • --server.enableXsrfProtection false: Disables Cross-Site Request Forgery protection.
  • --server.port 8501: Sets the port to 8501.
  • --server.address 0.0.0.0: Makes the app accessible externally.

Access the Streamlit App:

  • Open your web browser and navigate to http://localhost:8501.

You should see your Streamlit app running.
Interact with the App:

  • Modify app.py to add more features or interactive elements.
  • Save the changes, and the app will automatically reload in the browser.

Tips for Running Streamlit in Docker

Expose the Correct Port:

When running the Docker container, ensure you expose the port used by Streamlit. If you use port 8501, run the container with:

docker run -p 8888:8888 -p 8501:8501 -v $(pwd):/home/pydata/work alexmerced/datanotebook

Running Multiple Apps:

Use different ports for each app and expose them accordingly.

Background Execution:

To run the Streamlit app without tying up the terminal, use:

nohup streamlit run app.py --server.enableCORS false --server.enableXsrfProtection false --server.port 8501 --server.address 0.0.0.0 &

This runs the app in the background and outputs logs to nohup.out.

Summary

In this section, you:

  • Created your first Streamlit app using the pre-configured Docker environment.
  • Learned about the basic structure and components of a Streamlit script.
  • Ran the Streamlit app from within the Jupyter Notebook environment.
  • Accessed and interacted with the app via your web browser.

With these foundational skills, you're ready to explore more advanced features of Streamlit to build sophisticated data applications.

Building Interactive Data Visualizations

Data visualization is a crucial aspect of data analysis and communication. Streamlit simplifies the process of creating interactive and dynamic visualizations that can help users explore and understand data more effectively. In this section, we'll explore how to use Streamlit's built-in functions and integrate popular visualization libraries to build interactive data visualizations.

Using Streamlit's Built-in Chart Functions

Streamlit provides easy-to-use functions for creating basic charts directly from data structures like Pandas DataFrames and NumPy arrays.

Line Chart

import streamlit as st
import pandas as pd
import numpy as np

# Generate random data
data = np.random.randn(100, 3)
columns = ['Feature A', 'Feature B', 'Feature C']
df = pd.DataFrame(data, columns=columns)

# Display line chart
st.line_chart(df)

Explanation: The st.line_chart() function takes a DataFrame or array-like object and renders an interactive line chart.

Bar Chart

# Display bar chart
st.bar_chart(df)

Explanation: st.bar_chart() displays a bar chart. It's useful for categorical data or comparing different groups.

Area Chart

# Display area chart
st.area_chart(df)

Explanation: st.area_chart() creates an area chart, which is similar to a line chart but with the area below the line filled.

Customizing Charts with Altair

For more advanced visualizations, Streamlit supports libraries like Altair, which provides a declarative statistical visualization library for Python.

Creating an Altair Chart

import altair as alt

# Create an Altair chart
chart = alt.Chart(df.reset_index()).mark_circle(size=60).encode(
    x='index',
    y='Feature A',
    color='Feature B',
    tooltip=['Feature A', 'Feature B', 'Feature C']
).interactive()

st.altair_chart(chart, use_container_width=True)

Explanation: This code creates an interactive scatter plot using Altair, where you can hover over points to see tooltips.

Interactive Widgets for User Input

Streamlit allows you to add widgets that enable users to interact with your visualizations.

Adding a Slider

# Slider to select number of data points
num_points = st.slider('Select number of data points', min_value=10, max_value=100, value=50)

# Generate data based on slider
data = np.random.randn(num_points, 3)
df = pd.DataFrame(data, columns=columns)

# Display updated chart
st.line_chart(df)

Explanation: The slider widget allows users to select the number of data points, and the chart updates accordingly.

Selectbox for Options

# Selectbox to choose a feature
feature = st.selectbox('Select a feature to display', columns)

# Display the selected feature
st.line_chart(df[feature])

Explanation: The selectbox lets users choose which feature to visualize.

Integrating Plotly for Advanced Visualizations

Plotly is another powerful library for creating interactive graphs.

Plotly Example

import plotly.express as px

# Create a Plotly figure
fig = px.scatter(df, x='Feature A', y='Feature B', size='Feature C', color='Feature C', hover_name='Feature C')

# Display the Plotly figure in Streamlit
st.plotly_chart(fig, use_container_width=True)

Explanation: This code creates an interactive scatter plot with Plotly, which includes zooming, panning, and tooltips.

Combining Widgets and Visualizations

You can combine multiple widgets and charts to create a rich interactive experience.

Example: Interactive Data Filtering

# Multiselect to choose features
selected_features = st.multiselect('Select features to visualize', columns, default=columns)

# Checkbox to toggle data normalization
normalize = st.checkbox('Normalize data')

# Process data based on user input
if normalize:
    df_normalized = (df - df.mean()) / df.std()
    data_to_plot = df_normalized[selected_features]
else:
    data_to_plot = df[selected_features]

# Display line chart of selected features
st.line_chart(data_to_plot)

Explanation: Users can select which features to visualize and whether to normalize the data, and the chart updates accordingly.

Best Practices for Interactive Visualizations

  • Limit Data Size: Large datasets can slow down your app. Consider sampling or aggregating data.
  • Use Caching: Use @st.cache_data decorator to cache data loading and computation functions.
  • Provide Instructions: Use st.markdown() or st.write() to guide users on how to interact with your app.
  • Optimize Layout: Organize widgets and charts using columns and expanders for a clean interface.

Example of Layout Optimization

# Create columns
col1, col2 = st.columns(2)

with col1:
    st.header('User Inputs')
    # Add widgets here
    num_points = st.slider('Number of points', 10, 100, 50)
    feature = st.selectbox('Feature', columns)

with col2:
    st.header('Visualization')
    # Generate and display chart
    data = np.random.randn(num_points, len(columns))
    df = pd.DataFrame(data, columns=columns)
    st.line_chart(df[feature])

Explanation: This layout separates user inputs and visualizations into two columns, making the app more organized.

Summary

In this section, you've learned how to:

  • Use Streamlit's built-in chart functions to create quick visualizations.
  • Customize charts using Altair and Plotly for more advanced visualizations.
  • Add interactive widgets like sliders and selectboxes to make your visualizations dynamic.
  • Combine widgets and charts to build a user-friendly data exploration tool.

By leveraging these features, you can create powerful interactive applications that make data exploration and analysis more accessible to your audience.

Advanced Streamlit Features

As you become more familiar with Streamlit, you'll discover a wealth of advanced features that allow you to build more sophisticated and powerful applications. In this section, we'll delve into some of these capabilities, including state management, dynamic content creation, file handling, and performance optimization through caching.

State Management with st.session_state

Streamlit runs your script from top to bottom every time a user interacts with a widget. To maintain state across these reruns, you can use st.session_state, which is a dictionary-like object that persists throughout the user's session.

Example: Counter Application

import streamlit as st

# Initialize counter in session state
if 'counter' not in st.session_state:
    st.session_state.counter = 0

# Increment counter on button click
if st.button('Increment'):
    st.session_state.counter += 1

st.write(f"Counter value: {st.session_state.counter}")

Explanation: The counter value is stored in st.session_state.counter, ensuring it persists across interactions.

Dynamic Content with st.expander and st.tabs

Streamlit provides layout elements to organize content and improve user experience.

Using st.expander

import streamlit as st

st.write("This is visible content")

with st.expander("Click to expand"):
    st.write("This content is hidden by default")

Explanation: st.expander creates a collapsible section that users can expand or collapse.

Using st.tabs

import streamlit as st

tab1, tab2 = st.tabs(["Tab 1", "Tab 2"])

with tab1:
    st.write("Content in Tab 1")

with tab2:
    st.write("Content in Tab 2")

Explanation: st.tabs allows you to organize content into tabs for better navigation.

Uploading and Handling Files with st.file_uploader

Allow users to upload files directly into your app for processing.

Example: CSV File Uploader

import streamlit as st
import pandas as pd

uploaded_file = st.file_uploader("Choose a CSV file", type="csv")

if uploaded_file is not None:
    df = pd.read_csv(uploaded_file)
    st.write("Uploaded Data:")
    st.dataframe(df)

Explanation: Users can upload a CSV file, which the app reads and displays as a DataFrame.

Caching with @st.cache_data for Performance Optimization

Heavy computations or data loading can slow down your app. Use caching to store results and avoid redundant processing.

Using @st.cache_data

import streamlit as st
import pandas as pd

@st.cache_data
def load_data(url):
    return pd.read_csv(url)

data_url = 'https://path-to-large-dataset.csv'
df = load_data(data_url)
st.write("Data loaded successfully")
st.dataframe(df.head())

Explanation: The @st.cache_data decorator caches the load_data function's output, improving performance on subsequent runs.

Customizing the App Layout

Enhance user experience by customizing your app's layout and appearance.

Setting Page Configuration

import streamlit as st

st.set_page_config(
    page_title="Advanced Streamlit Features",
    page_icon="?",
    layout="wide",
    initial_sidebar_state="expanded",
)

Explanation: st.set_page_config sets global configurations like the page title, icon, layout, and sidebar state.

Using Columns and Containers

import streamlit as st

col1, col2 = st.columns(2)

with col1:
    st.header("Column 1")
    st.write("Content for the first column")

with col2:
    st.header("Column 2")
    st.write("Content for the second column")

Explanation: Columns help organize content side by side.

Theming and Styling

Apply custom themes to match your app's branding or preferred aesthetics.

Applying a Custom Theme

Create a .streamlit/config.toml file in your app directory with the following content:

[theme]
primaryColor="#d33682"
backgroundColor="#002b36"
secondaryBackgroundColor="#586e75"
textColor="#ffffff"
font="sans serif"

Explanation: The theme settings adjust the app's color scheme and font.

Interactive Widgets for Advanced User Input

Streamlit offers a variety of widgets for complex user interactions.

Date Input and Time Input

import streamlit as st

date = st.date_input("Select a date")
time = st.time_input("Select a time")

st.write(f"You selected {date} at {time}")

Explanation: Allows users to input dates and times.

Color Picker

import streamlit as st

color = st.color_picker('Pick A Color', '#00f900')
st.write('The current color is', color)

Explanation: Users can select a color, which can be used in visualizations or styling.
Advanced Callbacks and Event Handling
Respond to user interactions with callbacks.

Using Button Callbacks

import streamlit as st

def on_button_click():
    st.write("Button was clicked!")

st.button("Click Me", on_click=on_button_click)

Explanation: The on_click parameter specifies a function to execute when the button is clicked.

Integrating with External APIs

Fetch and display data from external sources.

Example: Fetching Data from an API

import streamlit as st
import requests

st.write("Fetch data from an API")

response = requests.get('https://api.example.com/data')
if response.status_code == 200:
    data = response.json()
    st.write(data)
else:
    st.error("Failed to fetch data")

Explanation: Uses the requests library to fetch data from an API and display it.

Real-Time Data Updates with WebSockets

Streamlit supports bi-directional communication for real-time updates.

Using st.experimental_get_query_params

import streamlit as st

params = st.experimental_get_query_params()
st.write("Query parameters:", params)

Explanation: Access query parameters from the URL to control app behavior dynamically.

Modularizing Code with Components

Break down your app into reusable components.

Creating a Custom Component

# components.py
import streamlit as st

def display_header():
    st.title("Advanced Streamlit Features")
    st.write("This is a custom component")

# main app
import streamlit as st
from components import display_header

display_header()
st.write("Main app content")

Explanation: Organize code by splitting it into modules for better maintainability.

Localization and Internationalization

Make your app accessible to a global audience.

Setting the Language

import streamlit as st

st.write("Hello, World!")

# Use gettext or other localization libraries for translations

Explanation: While Streamlit doesn't provide built-in localization, you can use Python's localization libraries.

Accessibility Features

Ensure your app is usable by people with disabilities.

  • Use Semantic HTML: Streamlit automatically generates accessible HTML elements.

  • Provide Alt Text: When displaying images, use the caption parameter.

st.image('image.png', caption='Descriptive text')

Summary

In this section, we've explored several advanced features of Streamlit that empower you to build more interactive and efficient applications:

  • State Management: Use st.session_state to preserve data across user interactions.
  • Dynamic Layouts: Organize content with expanders, tabs, columns, and containers.
  • File Handling: Allow users to upload and interact with files directly in the app.
  • Performance Optimization: Improve app speed with caching decorators like @st.cache_data.
  • Customization: Enhance the look and feel with custom themes and page configurations.
  • Advanced Widgets: Utilize a variety of input widgets for richer user interactions.
  • External Integrations: Connect your app to external APIs and services.
  • Code Organization: Modularize your code for better readability and maintenance.
  • Global Reach: Consider localization and accessibility to reach a wider audience.

By mastering these advanced features, you can create sophisticated Streamlit applications that provide a seamless and engaging user experience.

Integrating Machine Learning Models

Streamlit excels at making machine learning models accessible through interactive web applications. In this section, we'll explore how to integrate machine learning models into your Streamlit apps using the pre-installed libraries in the Python Data Science Notebook Docker Image, such as TensorFlow, PyTorch, and scikit-learn.

Loading Pre-trained Models with TensorFlow and PyTorch

The Docker image comes with TensorFlow and PyTorch installed, allowing you to work with complex neural network models.

Using TensorFlow

Loading a Pre-trained Model

import streamlit as st
import tensorflow as tf

# Load a pre-trained model, e.g., MobileNetV2
model = tf.keras.applications.MobileNetV2(weights='imagenet')

st.write("Model loaded successfully.")

Making Predictions

from tensorflow.keras.applications.mobilenet_v2 import preprocess_input, decode_predictions
from PIL import Image
import numpy as np

uploaded_file = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])

if uploaded_file is not None:
    # Load and preprocess the image
    image = Image.open(uploaded_file)
    st.image(image, caption='Uploaded Image', use_column_width=True)

    img = image.resize((224, 224))
    img_array = np.array(img)
    img_array = preprocess_input(img_array)
    img_array = np.expand_dims(img_array, axis=0)

    # Make prediction
    predictions = model.predict(img_array)
    results = decode_predictions(predictions, top=3)[0]

    # Display predictions
    st.write("Top Predictions:")
    for i, res in enumerate(results):
        st.write(f"{i+1}. {res[1]}: {round(res[2]*100, 2)}%")

Explanation: Users can upload an image, and the app displays the top predictions from the pre-trained MobileNetV2 model.

Using PyTorch

Loading a Pre-trained Model

import streamlit as st
import torch
from torchvision import models, transforms

# Load a pre-trained ResNet model
model = models.resnet18(pretrained=True)
model.eval()

st.write("PyTorch model loaded successfully.")

Making Predictions

from PIL import Image
import torchvision.transforms as T

uploaded_file = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])

if uploaded_file is not None:
    # Load and preprocess the image
    image = Image.open(uploaded_file)
    st.image(image, caption='Uploaded Image', use_column_width=True)

    preprocess = T.Compose([
        T.Resize(256),
        T.CenterCrop(224),
        T.ToTensor(),
        T.Normalize(
            mean=[0.485, 0.456, 0.406], 
            std=[0.229, 0.224, 0.225]
        )
    ])
    img_t = preprocess(image)
    batch_t = torch.unsqueeze(img_t, 0)

    # Make prediction
    with torch.no_grad():
        out = model(batch_t)
    probabilities = torch.nn.functional.softmax(out[0], dim=0)

    # Load labels
    with open("imagenet_classes.txt") as f:
        labels = [line.strip() for line in f.readlines()]

    # Show top 3 predictions
    top3_prob, top3_catid = torch.topk(probabilities, 3)
    st.write("Top Predictions:")
    for i in range(top3_prob.size(0)):
        st.write(f"{i+1}. {labels[top3_catid[i]]}: {round(top3_prob[i].item()*100, 2)}%")

Note: Ensure that the imagenet_classes.txt file is available in your working directory.

Building a Simple Prediction App with scikit-learn

Let's build a simple regression app using scikit-learn.

Training a Model

import streamlit as st
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

# Load dataset
data = load_boston()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train model
model = RandomForestRegressor()
model.fit(X_train, y_train)

st.write("Model trained successfully.")

Making Predictions with User Input

import numpy as np

st.header("Boston Housing Price Prediction")

# Feature input sliders
CRIM = st.number_input('Per capita crime rate by town', min_value=0.0, value=0.1)
ZN = st.number_input('Proportion of residential land zoned for lots over 25,000 sq.ft.', min_value=0.0, value=0.0)
# ... add inputs for other features

# For brevity, we'll use default values for the rest of the features
input_features = np.array([[CRIM, ZN] + [0]*(X.shape[1]-2)])

# Predict
prediction = model.predict(input_features)
st.write(f"Predicted median value of owner-occupied homes: ${prediction[0]*1000:.2f}")

Explanation: Users can input values for features, and the app predicts housing prices.

Visualizing Model Outputs and Performance Metrics

Visualizations help in understanding model performance.

Displaying Metrics

from sklearn.metrics import mean_squared_error, r2_score

# Predict on test set
y_pred = model.predict(X_test)

# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Display metrics
st.write("Model Performance on Test Set:")
st.write(f"Mean Squared Error: {mse:.2f}")
st.write(f"R² Score: {r2:.2f}")

Plotting Actual vs. Predicted Values

import pandas as pd

df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})

st.write("Actual vs. Predicted Values")
st.line_chart(df)

Explanation: The line chart shows how closely the model's predictions match the actual values.

Tips for Integrating Machine Learning Models in Streamlit

Model Serialization: For complex models, consider saving and loading models using joblib or pickle to avoid retraining every time.

import joblib

# Save model
joblib.dump(model, 'model.joblib')

# Load model
model = joblib.load('model.joblib')
Use Caching for Models: Cache the model loading or training functions to improve performance.

python
Copy code
@st.cache_resource
def load_model():
    # Load or train model
    return model

Handle Large Models: Be mindful of resource limitations. Use efficient data structures and consider offloading heavy computations.

Provide Clear Instructions: Guide users on how to interact with the app, especially when expecting specific input formats.

Summary

In this section, you've learned how to:

  • Load and use pre-trained models with TensorFlow and PyTorch in your Streamlit apps.
  • Build a simple prediction app using scikit-learn.
  • Collect user input to make predictions and display results.
  • Visualize model outputs and performance metrics to evaluate model effectiveness.

By integrating machine learning models into your Streamlit applications, you can create powerful tools that make complex models accessible to end-users in an interactive and user-friendly manner.

Database Connectivity

In many data science projects, interacting with databases is essential for retrieving, processing, and storing data. Streamlit, combined with the powerful libraries included in the Python Data Science Notebook Docker Image, makes it straightforward to connect to various databases and integrate them into your applications. In this section, we'll explore how to connect to databases using sqlalchemy, psycopg2, and specifically how to interact with Dremio using the dremio-simple-query library.

Connecting to Dremio Using dremio-simple-query

Dremio is a data lakehouse platform that enables you to govern, join, and accelerate queries across various data sources such as Iceberg, Delta Lake, S3, JSON, CSV, RDBMS, and more. The dremio-simple-query library simplifies querying a Dremio source using Apache Arrow Flight, providing performant data retrieval for analytics.

Installing the dremio-simple-query Library

First, ensure that the dremio-simple-query library is installed in your environment:

pip install dremio-simple-query

Setting Up the Connection to Dremio

To connect to Dremio, you'll need to obtain your Dremio Arrow Flight endpoint and an authentication token.

Obtaining the Arrow Flight Endpoint

  • Dremio Cloud (NA): grpc+tls://data.dremio.cloud:443
  • Dremio Cloud (EU): grpc+tls://data.eu.dremio.cloud:443
  • Dremio Software (SSL): grpc+tls://:32010
  • Dremio Software (No SSL): grpc://:32010

Getting Your Authentication Token

  • Dremio Cloud: Obtain the token from the Dremio interface or via the REST API.
  • Dremio Software: Obtain the token using the REST API.

You can use the get_token function from the dremio-simple-query library to retrieve the token programmatically.

Connecting to Dremio

import streamlit as st
from dremio_simple_query.connect import get_token, DremioConnection
from os import getenv
from dotenv import load_dotenv

# Load environment variables from a .env file (optional)
load_dotenv()

# Retrieve Dremio credentials and endpoints
username = st.secrets["dremio_username"]
password = st.secrets["dremio_password"]
arrow_endpoint = st.secrets["dremio_arrow_endpoint"]  # e.g., "grpc+tls://data.dremio.cloud:443"
login_endpoint = st.secrets["dremio_login_endpoint"]  # e.g., "https://your-dremio-server:9047/apiv2/login"

# Get authentication token
payload = {
    "userName": username,
    "password": password
}
token = get_token(uri=login_endpoint, payload=payload)

# Establish connection to Dremio
dremio = DremioConnection(token, arrow_endpoint)

# Test the connection
try:
    st.success("Successfully connected to Dremio.")
except Exception as e:
    st.error(f"Failed to connect to Dremio: {e}")

Note: Ensure that you securely manage your credentials using Streamlit's secrets management or environment variables.

Querying Data from Dremio

You can now query data from Dremio and retrieve it in various formats.

Retrieving Data as an Arrow Table

# Query data and get a FlightStreamReader object
stream = dremio.toArrow("SELECT * FROM your_space.your_table LIMIT 100")

# Convert the stream to an Arrow Table
arrow_table = stream.read_all()

# Optionally, display the data in Streamlit
df = arrow_table.to_pandas()
st.write("Data from Dremio:")
st.dataframe(df)

Retrieving Data as a Pandas DataFrame

# Directly get a Pandas DataFrame
df = dremio.toPandas("SELECT * FROM your_space.your_table LIMIT 100")
st.write("Data from Dremio:")
st.dataframe(df)

Retrieving Data as a Polars DataFrame

# Get a Polars DataFrame
df_polars = dremio.toPolars("SELECT * FROM your_space.your_table LIMIT 100")
st.write("Data from Dremio (Polars DataFrame):")
st.write(df_polars)

Querying with DuckDB

You can leverage DuckDB for in-memory analytics on the data retrieved from Dremio.

Using the DuckDB Relation API

# Retrieve data as a DuckDB relation
duck_rel = dremio.toDuckDB("SELECT * FROM your_space.your_table LIMIT 100")

# Perform queries on the DuckDB relation
result = duck_rel.filter("column_name > 50").df()

# Display the result
st.write("Filtered Data using DuckDB:")
st.dataframe(result)

Querying Arrow Objects with DuckDB

Alternatively, you can query Arrow Tables using DuckDB:

import duckdb

# Get data from Dremio as an Arrow Table
stream = dremio.toArrow("SELECT * FROM your_space.your_table LIMIT 100")
arrow_table = stream.read_all()

# Create a DuckDB connection
con = duckdb.connect()

# Register the Arrow Table with DuckDB
con.register("dremio_table", arrow_table)

# Perform SQL queries using DuckDB
query = "SELECT * FROM dremio_table WHERE column_name > 50"
result = con.execute(query).fetch_df()

# Display the result
st.write("Filtered Data using DuckDB on Arrow Table:")
st.dataframe(result)

Best Practices for Using Dremio with Streamlit

  • Secure Credentials: Always handle your Dremio credentials securely. Use Streamlit's secrets management or environment variables to avoid hardcoding sensitive information.
  • Efficient Data Retrieval: Optimize your SQL queries to retrieve only the necessary data. Use LIMIT clauses and filters to reduce data transfer and improve performance.
  • Error Handling: Implement try-except blocks to manage exceptions and provide informative error messages to users.
  • Environment Configuration: Ensure that your arrow_endpoint and login_endpoint are correctly configured based on your Dremio deployment (Cloud or Software, with or without SSL).

Connecting to Databases Using sqlalchemy and psycopg2

In addition to Dremio, you might need to connect to other databases like PostgreSQL or MySQL. The Docker image comes with sqlalchemy, psycopg2-binary, and other database drivers pre-installed.

Setting Up a Connection to a PostgreSQL Database

from sqlalchemy import create_engine
import pandas as pd

# Database connection parameters
DB_USER = st.secrets["db_user"]
DB_PASSWORD = st.secrets["db_password"]
DB_HOST = st.secrets["db_host"]
DB_PORT = st.secrets["db_port"]
DB_NAME = st.secrets["db_name"]

# Create a database engine
engine = create_engine(f'postgresql+psycopg2://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}')

# Test the connection
try:
    with engine.connect() as connection:
        st.success("Successfully connected to the PostgreSQL database.")
except Exception as e:
    st.error(f"Failed to connect to the database: {e}")

Querying Data from the Database

# Sample query
query = "SELECT * FROM your_table LIMIT 10"

# Execute the query and load data into a DataFrame
df = pd.read_sql(query, engine)

# Display the data
st.write("Data from PostgreSQL:")
st.dataframe(df)

Handling Large Datasets with Dask

When dealing with large datasets, performance can become an issue. Dask is a parallel computing library that integrates with Pandas to handle larger-than-memory datasets efficiently.

Using Dask to Query Large Tables

Copy code
import dask.dataframe as dd

# Read data from SQL using Dask
df = dd.read_sql_table(
    table='large_table',
    uri=f'postgresql+psycopg2://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}',
    index_col='id'
)

# Perform computations with Dask DataFrame
filtered_df = df[df['value'] > 100]

# Compute the result and convert to Pandas DataFrame
result = filtered_df.compute()

# Display the result
st.write("Filtered Data:")
st.dataframe(result)

Best Practices for Database Connectivity

  • Secure Credentials: Use Streamlit's secrets management or environment variables to store sensitive information.
  • Parameterized Queries: Always use parameterized queries to prevent SQL injection.
  • Connection Management: Use context managers (with statements) to ensure connections are properly closed.
  • Error Handling: Implement try-except blocks to handle exceptions and provide user-friendly error messages.
  • Limit Data Fetching: When displaying data in the app, limit the number of rows fetched to prevent performance issues.

Summary

In this section, you've learned how to:

  • Connect to Dremio using the dremio-simple-query library and retrieve data efficiently using Apache Arrow Flight.
  • Query data from Dremio and convert it into various formats such as Arrow Tables, Pandas DataFrames, Polars DataFrames, and DuckDB relations.
  • Utilize DuckDB for in-memory analytics on data retrieved from Dremio.
  • Connect to other databases like PostgreSQL using sqlalchemy and psycopg2.
  • Handle large datasets efficiently using Dask. Implement best practices for secure and efficient database connectivity.

By integrating Dremio and other data systems into your Streamlit applications, you can create powerful data-driven apps that interact with live data sources, enabling real-time analysis and insights.

Deploying Streamlit Apps

With your Streamlit app developed and tested within the Docker environment, the next step is to deploy it so that others can access and use it. Deploying Streamlit apps can be done in several ways, including running the app locally, containerizing it with Docker, and deploying it to cloud platforms like Streamlit Community Cloud, Heroku, AWS, or other hosting services.

In this section, we'll explore how to:

  • Run your Streamlit app outside of Jupyter Notebook
  • Containerize your Streamlit app with Docker
  • Deploy your app to cloud platforms

Running Streamlit Apps Outside of Jupyter Notebook

While developing within Jupyter Notebook is convenient, deploying your app typically involves running it as a standalone script.

Steps to Run the App Locally

  1. Ensure Streamlit is Installed

If you followed the previous sections, Streamlit should already be installed in your Docker container. If not, install it using:

   pip install streamlit

Exit the Jupyter Notebook Environment

  • Stop the Jupyter Notebook server if it's still running.

  • Navigate to Your App Directory

  • Open a terminal and navigate to the directory containing your app.py file:

cd /home/pydata/work

Run the Streamlit App

Execute the following command:

streamlit run app.py

This command starts the Streamlit server and serves your app at http://localhost:8501 by default.

Access the App in Your Browser

Open your web browser and navigate to http://localhost:8501 to interact with your app.

Containerizing Your Streamlit App with Docker

Containerizing your app ensures consistency across different environments and simplifies deployment.

Creating a Dockerfile for Your Streamlit App

  • Create a Dockerfile

  • In your app directory, create a file named Dockerfile with the following content:

# Use the official Python image as base
FROM python:3.9-slim

# Set the working directory
WORKDIR /app

# Copy the requirements file
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code
COPY . .

# Expose the port Streamlit uses
EXPOSE 8501

# Run the Streamlit app
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

Create a requirements.txt File

List all your Python dependencies in a file named requirements.txt:

streamlit
pandas
numpy
# Add any other dependencies your app requires

Build the Docker Image

In your terminal, run:

docker build -t my-streamlit-app .

This builds the Docker image and tags it as my-streamlit-app.

Run the Docker Container

docker run -p 8501:8501 my-streamlit-app

Maps port 8501 in the container to port 8501 on your host machine.

Access the App

Open your web browser and navigate to http://localhost:8501.

Pushing the Docker Image to a Registry (Optional)

If you plan to deploy your app using Docker images, you may need to push it to a Docker registry like Docker Hub or a private registry.

# Tag the image for Docker Hub
docker tag my-streamlit-app your-dockerhub-username/my-streamlit-app


# Log in to Docker Hub
docker login

# Push the image
docker push your-dockerhub-username/my-streamlit-app

Deploying to Cloud Platforms

There are several cloud platforms that support deploying Streamlit apps. Below, we'll cover deploying to Streamlit Community Cloud, Heroku, and AWS Elastic Beanstalk.

Deploying to Streamlit Community Cloud

Streamlit offers a free hosting service for public GitHub repositories.

  • Push Your App to GitHub

  • Ensure your app code is in a GitHub repository.

  • Sign Up for Streamlit Community Cloud

  • Go to streamlit.io/cloud and sign up using your GitHub account.

  • Deploy Your App

  • Click on "New app".

  • Select your GitHub repository and branch.

  • Specify the location of your app.py file.

  • Click "Deploy".

  • Access Your App

Once deployed, you'll receive a URL where your app is hosted.

Deploying to Heroku

Heroku is a cloud platform that supports deploying applications using Docker.

Create a Procfile

In your app directory, create a file named Procfile with the following content:

web: streamlit run app.py --server.port=$PORT --server.address=0.0.0.0
Create a requirements.txt File
  • Ensure you have a requirements.txt file listing your dependencies.

  • Initialize a Git Repository

If you haven't already:

git init
git add .
git commit -m "Initial commit"
Create a Heroku App

Install the Heroku CLI and log in:

heroku login

Create a new app:

heroku create your-app-name

Deploy Your App

Push your code to Heroku:

git push heroku master

Scale the Web Process

heroku ps:scale web=1

Access Your App

heroku open

Deploying to AWS Elastic Beanstalk

AWS Elastic Beanstalk supports deploying applications in Docker containers.

  • Install the AWS Elastic Beanstalk CLI

  • Follow the official AWS documentation to install the EB CLI.

  • Initialize Elastic Beanstalk

eb init -p docker my-streamlit-app
Create an Environment
eb create my-streamlit-env
Deploy Your App
eb deploy
Access Your App
eb open

Deploying with Other Services

You can deploy your Streamlit app using other platforms like:

  • Google Cloud Run: For serverless container deployments.
  • Azure App Service: For deploying web apps on Azure.
  • Kubernetes: For scalable and managed deployments.
  • Docker Compose: For multi-container applications.

Example: Deploying to Google Cloud Run

Build and Push the Docker Image to Google Container Registry

# Build the Docker image
docker build -t gcr.io/your-project-id/my-streamlit-app .

# Push the image
docker push gcr.io/your-project-id/my-streamlit-app
Deploy to Cloud Run
gcloud run deploy my-streamlit-app \
  --image gcr.io/your-project-id/my-streamlit-app \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated

Best Practices for Deployment

  • Environment Variables: Use environment variables to manage secrets and configuration settings.
  • Logging: Implement logging to monitor your app's performance and errors.
  • Security: Ensure your app is secure by handling user input appropriately and securing API keys.
  • Scalability: Choose deployment options that allow your app to scale with user demand.
  • Continuous Integration/Continuous Deployment (CI/CD): Set up CI/CD pipelines to automate the deployment process.

Managing Secrets and Configuration

Use environment variables to store sensitive information:

import os

API_KEY = os.getenv("API_KEY")

Set the environment variable in your deployment platform's settings or configuration.

Implementing Logging

Use Python's built-in logging library:

import logging

logging.basicConfig(level=logging.INFO)
logging.info("This is an info message")

Handling User Input Securely

Validate and sanitize all user inputs to prevent security vulnerabilities like injection attacks.

Summary

In this section, you've learned how to:

  • Run your Streamlit app outside of the development environment
  • Containerize your app using Docker for consistent deployments
  • Deploy your app to cloud platforms like Streamlit Community Cloud, Heroku, and AWS Elastic Beanstalk
  • Apply best practices for deploying and maintaining your Streamlit applications

By deploying your Streamlit app, you make it accessible to a wider audience, allowing others to benefit from your interactive data applications.

Best Practices and Tips

Developing Streamlit applications involves not just coding but also adhering to best practices that ensure your app is efficient, maintainable, and user-friendly. In this section, we'll cover some essential tips and best practices to help you optimize your Streamlit apps.

Organizing Your Streamlit Codebase

A well-organized codebase enhances readability and maintainability, especially as your application grows in complexity.

Use Modular Code Structure

  • Separate Concerns: Break down your code into modules or scripts based on functionality, such as data loading, preprocessing, visualization, and utility functions.
  • Create a components Module: Encapsulate reusable UI components in a separate module to avoid code duplication.
  # components.py
  import streamlit as st

  def sidebar_filters():
      st.sidebar.header("Filters")
      # Add filter widgets
# main app.py
import streamlit as st
from components import sidebar_filters

sidebar_filters()
# Rest of your app code

Follow Naming Conventions

  • Consistent Naming: Use meaningful variable and function names that follow Python's naming conventions.
  • Folder Structure: Organize files into folders such as data, models, utils, and pages if using Streamlit's multipage apps.

Use Virtual Environments

  • Environment Isolation: Use virtual environments (e.g., venv, conda, or pipenv) to manage dependencies and avoid conflicts. Version Control
  • Git: Use Git for version control to track changes and collaborate with others.

  • .gitignore: Include a .gitignore file to exclude unnecessary files from your repository.

__pycache__/
.DS_Store
venv/
.env

Enhancing User Experience with Custom Themes and Layouts

A polished UI enhances the user experience and makes your app more engaging.

Custom Themes

  • Streamlit Themes: Customize the appearance of your app using Streamlit's theming options.

  • Modify config.toml: Create a .streamlit/config.toml file to define your theme settings.

Copy code
[theme]
primaryColor="#6eb52f"
backgroundColor="#f0f0f5"
secondaryBackgroundColor="#e0e0ef"
textColor="#262730"
font="sans serif"

Responsive Layouts

Use Columns and Containers: Organize content using st.columns(), st.container(), and st.expander() for a clean layout.

col1, col2 = st.columns(2)

with col1:
    st.header("Section 1")
    # Content for section 1

with col2:
    st.header("Section 2")
    # Content for section 2

Interactive Elements

Feedback: Use st.progress(), st.spinner(), and st.toast() to provide feedback during long computations.

with st.spinner('Loading data...'):
    df = load_data()
st.success('Data loaded successfully!')

Tooltips and Help Text: Add tooltips or help text to widgets to guide users.

st.text_input("Username", help="Enter your user ID assigned by the administrator")

Accessibility

Alt Text for Images: Use the caption parameter in st.image() to provide descriptions.

st.image('chart.png', caption='Sales over time')

Keyboard Navigation: Ensure that all interactive elements can be navigated using the keyboard.

Debugging Common Issues in Streamlit Apps

Being able to identify and fix issues quickly is crucial for smooth app development.

Common Issues and Solutions

App Crashes or Freezes
  • Infinite Loops: Ensure that your code doesn't have infinite loops that can block the app. Large Data Loading: Use caching with @st.cache_data to prevent reloading data on every interaction.
Slow Performance
  • Heavy Computations: Optimize code by using efficient algorithms or leveraging libraries like NumPy and Pandas.
  • Caching: Use @st.cache_data and @st.cache_resource to cache expensive operations. Widget State Not Preserved
  • Session State: Use st.session_state to maintain state across interactions.
if 'counter' not in st.session_state:
    st.session_state.counter = 0

increment = st.button('Increment')
if increment:
    st.session_state.counter += 1

st.write(f"Counter: {st.session_state.counter}")

Errors When Deploying

Dependency Mismatches: Ensure that all dependencies are listed in requirements.txt and versions are compatible.

Environment Variables: Check that all required environment variables are set in the deployment environment.

Streamlit Version Issues

API Changes: If you encounter deprecated functions, update your code to match the latest Streamlit API.

Version Pinning: Specify the Streamlit version in your requirements.txt to maintain consistency.

streamlit==1.25.0

Logging and Error Tracking

Use Logging
import logging

logging.basicConfig(level=logging.INFO)
logging.info("This is an info message")
Display Errors

Use st.error() to display error messages to the user.

try:
    result = perform_calculation()
    st.write(result)
except Exception as e:
    st.error(f"An error occurred: {e}")

Testing

Unit Tests: Write unit tests for your functions using unittest or pytest.
Test Scripts: Create test scripts to simulate user interactions and verify app behavior.

Performance Optimization

Optimizing your app's performance ensures a better user experience.

Efficient Data Handling

Lazy Loading: Load data only when necessary, perhaps in response to user input.
Data Sampling: For large datasets, consider using a sample for initial display and provide options to load more data.

Use of Caching

Cache Data Loading
@st.cache_data
def load_data():
    # Load data from source
    return data
Cache Computations
@st.cache_data
def compute_expensive_operation(params):
    # Perform computation
    return result

Optimize Resource Usage

Avoid Redundant Computations: Structure code to prevent unnecessary re-execution of functions.

Clear Session State When Needed: Manage st.session_state to free up memory if variables are no longer needed.

Security Considerations

Ensure your app is secure, especially when handling sensitive data.

Input Validation: Always validate and sanitize user inputs.

Secrets Management: Use Streamlit's secrets management to handle API keys and passwords.

import os

API_KEY = st.secrets["api_key"]

HTTPS: Deploy your app using HTTPS to encrypt data in transit.

Documentation and User Guides

Provide documentation to help users understand and navigate your app.

Inline Documentation: Use st.markdown() or st.write() to include instructions and explanations within the app.

User Manuals: Provide a downloadable or linked user guide for complex applications.

Tooltips: Utilize the help parameter in widgets to give users quick hints.

Keep Up with Streamlit Updates

Streamlit is actively developed, and staying updated can help you leverage new features.

Changelog: Regularly check the Streamlit changelog for updates.

Community Forums: Participate in the Streamlit community forums to learn from others and share your experiences.

Update Dependencies: Periodically update your dependencies to benefit from performance improvements and security patches.

Summary

By following these best practices and tips, you can:

  • Enhance the maintainability and readability of your code.
  • Create a more engaging and user-friendly app interface.
  • Quickly identify and resolve issues during development.
  • Optimize your app's performance for a better user experience.
  • Ensure the security and integrity of your application and data.

Implementing these strategies will help you develop professional, robust, and efficient Streamlit applications that meet the needs of your users and stakeholders.

Conclusion

In this comprehensive guide, we've embarked on a journey to master Streamlit using the Python Data Science Notebook Docker Image. Throughout the chapters, we've explored how to set up a robust environment, harness the power of Streamlit for building interactive data applications, and leverage advanced features to enhance functionality and user experience.

주요 학습 내용 요약

  • 환경 설정: Docker를 사용하여 일관되고 이식 가능한 개발 환경을 구축하여 필요한 모든 라이브러리와 도구를 쉽게 사용할 수 있도록 합니다.
  • Streamlit 시작하기: Streamlit 애플리케이션을 구성하는 기본 구조와 핵심 구성 요소를 이해하면서 첫 번째 Streamlit 앱을 만들었습니다.
  • 대화형 데이터 시각화: 내장된 Streamlit 기능과 Altair 및 Plotly와 같은 통합 라이브러리를 활용하여 동적인 대화형 시각화를 구축했습니다.
  • 고급 기능: st.session_state를 통한 상태 관리 활용, 레이아웃 요소를 통한 동적 콘텐츠 생성, 캐싱 메커니즘을 통한 성능 최적화.
  • 머신러닝 모델 통합: TensorFlow, PyTorch 및 scikit-learn을 사용하여 머신러닝 모델을 로드하고 상호작용하여 Streamlit 앱 내에서 예측을 수행하고 결과를 시각화합니다.
  • 데이터베이스 연결: 강력한 라이브러리를 사용하여 Dremio, PostgreSQL, MySQL을 포함한 다양한 데이터베이스에 연결하여 데이터를 효율적으로 쿼리하고 조작합니다.
  • Streamlit 앱 배포: 앱을 로컬에서 실행하는 것부터 Docker를 사용한 컨테이너화 및 Streamlit Community Cloud, Heroku, AWS와 같은 클라우드 플랫폼에 배포하는 것까지 다양한 배포 전략을 탐색했습니다.
  • 모범 사례 및 팁: 전문적이고 강력한 애플리케이션을 구축하기 위한 코드 구성, 사용자 경험 향상, 디버깅 기술, 성능 최적화 및 보안 고려 사항의 중요성을 강조했습니다.

Streamlit을 추가로 탐색하기 위한 다음 단계

상당량의 내용을 다루었지만 항상 배우고 탐색할 내용이 더 많습니다.

  • Streamlit 구성 요소 자세히 알아보기: 맞춤 구성 요소와 Streamlit 구성 요소 API를 실험하여 앱 기능을 확장하세요.
  • Streamlit의 다중 페이지 앱 살펴보기: 더 나은 사용자 탐색 및 구조를 위해 복잡한 애플리케이션을 여러 페이지로 구성합니다.
  • 추가 라이브러리 통합: 다른 데이터 과학 및 기계 학습 라이브러리를 통합하여 애플리케이션의 기능을 확장합니다.
  • 커뮤니티에 기여: Streamlit 커뮤니티와 앱 및 구성 요소를 공유하고, 오픈 소스 프로젝트에 기여하고, 토론에 참여하여 다른 사람들로부터 배웁니다.

추가 리소스 및 커뮤니티

  • 공식 Streamlit 문서: docs.streamlit.io
  • Streamlit 포럼: Streamlit Discourse 플랫폼에서 커뮤니티와 소통하세요.
  • GitHub의 Streamlit: github.com/streamlit/streamlit에서 소스 코드를 탐색하고 기여하세요.
  • 튜토리얼 및 강좌: 고급 주제와 실제 사용 사례를 다루는 온라인 튜토리얼, 강좌 및 웹 세미나를 찾아보세요.
  • 블로그 및 기사: 통찰력과 모범 사례를 공유하는 데이터 과학 전문가의 블로그와 기사를 팔로우하세요.

최종 생각

Streamlit은 데이터 애플리케이션을 만들고 공유하는 방식을 혁신하여 데이터 과학자와 개발자가 쉽게 대화형 웹 앱을 구축할 수 있도록 했습니다. Streamlit을 Python Data Science Notebook Docker 이미지와 결합하여 환경 설정을 단순화하고 애플리케이션 개발을 가속화하는 강력한 워크플로를 구축했습니다.

여행을 계속하면서 숙달의 열쇠는 지속적인 연습과 탐색이라는 점을 기억하세요. 주저하지 말고 새로운 아이디어를 실험하고, 피드백을 구하고, 애플리케이션을 반복하세요. 데이터 과학의 세계는 끊임없이 진화하고 있으며 Streamlit과 같은 도구는 모든 사람이 데이터에 더 쉽게 접근하고 참여하도록 만드는 데 앞장서고 있습니다.

以上がStreamlit を使用してデータ アプリを深く掘り下げるの詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。

声明:
この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。