首頁 >後端開發 >Python教學 >使用 Streamlit 和 AWS Translator 的文件翻譯服務

使用 Streamlit 和 AWS Translator 的文件翻譯服務

Barbara Streisand
Barbara Streisand原創
2025-01-01 02:35:09279瀏覽

介紹:

DocuTranslator,一個文件翻譯系統,內建於 AWS 中,由 Streamlit 應用程式框架開發。該應用程式允許最終用戶將文件翻譯成他們想要上傳的首選語言。它提供了根據使用者需求翻譯成多種語言的可行性,這確實幫助使用者以舒適的方式理解內容。

背景:

這個專案的目的是提供一個用戶友好、簡單的應用程式介面,以完成用戶期望的簡單翻譯過程。在此系統中,沒有人需要透過進入 AWS Translate 服務來翻譯文檔,最終使用者可以直接存取應用程式端點並滿足要求。

高層架構圖:

Document Translation Service using Streamlit & AWS Translator

這是如何運作的:

  • 最終用戶可以透過應用程式負載平衡器存取應用程式。
  • 應用程式介面開啟後,使用者將上傳所需的待翻譯檔案和要翻譯的語言。
  • 提交這些詳細資訊後,檔案將上傳到提到的來源 S3 儲存桶,這會觸發 lambda 函數來連接 AWS Translator 服務。
  • 翻譯文件準備好後,將上傳到目標 S3 儲存桶。
  • 之後,最終使用者可以從 Streamlit 應用程式入口網站下載翻譯後的文件。

技術架構:

Document Translation Service using Streamlit & AWS Translator

以上架構顯示了以下要點 -

  • 應用程式程式碼已容器化並儲存到 ECR 儲存庫。
  • 根據上述設計,已經設定了一個 ECS 集群,它實例化了兩個從 ECR 儲存庫提取應用程式映像的任務。
  • 這兩個任務都是在 EC2 之上作為啟動類型啟動的。兩個 EC2 皆在 us-east-1a 和 us-east-1b 可用區的私有子網路中啟動。
  • 建立 EFS 檔案系統是為了在兩個底層 EC2 執行個體之間共用應用程式程式碼。在兩個可用區(us-east-1a 和 us-east-1b)中建立兩個掛載點。
  • 在私人子網路前面配置兩個公有子網,並在 us-east-1a 可用區的公有子網路中設定 NAT 網關。
  • 已在私有子網路前面設定應用程式負載平衡器,該子網路將流量分配到應用程式負載平衡器安全群組 (ALB SG) 連接埠 80 處的兩個公有子網路。
  • 兩個 EC2 執行個體配置在兩個不同的目標群組中,具有相同的 EC2 安全群組 (Streamlit_SG),該安全群組接受來自應用程式負載平衡器的 16347 連接埠流量。
  • EC2 實例中的連接埠 16347 和 ECS 容器中的連接埠 8501 之間配置了連接埠對映。一旦流量到達 EC2 安全群組的 16347 端口,將被重定向到 ECS 容器層級的 8501 端口。

資料如何儲存?

在這裡,我們使用 EFS 共用路徑在兩個底層 EC2 執行個體之間共用相同的應用程式檔案。我們在 EC2 執行個體內建立了一個掛載點 /streamlit_appfiles 並使用 EFS 共用掛載。這種方法將有助於在兩個不同的伺服器之間共享相同的內容。之後,我們的目的是創建一個複製相同的應用程式內容到容器工作目錄 /streamlit。為此,我們使用了綁定掛載,以便對 EC2 層級的應用程式程式碼進行的任何更改也將複製到容器。我們需要限制雙向複製,這意味著如果任何人錯誤地從容器內部更改程式碼,它不應該複製到 EC2 主機級別,因此容器內部工作目錄已建立為唯讀檔案系統。

Document Translation Service using Streamlit & AWS Translator

ECS容器配置和容量:

底層 EC2 配置:
實例類型:t2.medium
網路類型:私有子網路

容器配置:
圖:
網路模式:預設
主機連接埠:16347
貨櫃港:8501
任務CPU:2個vCPU(2048個)
任務記憶體:2.5 GB (2560 MiB)

Document Translation Service using Streamlit & AWS Translator

音量配置:
卷名:streamlit-volume
來源路徑:/streamlit_appfiles
容器路徑:/streamlit
只讀檔案系統:是

Document Translation Service using Streamlit & AWS Translator

任務定義參考:

{
    "taskDefinitionArn": "arn:aws:ecs:us-east-1:<account-id>:task-definition/Streamlit_TDF-1:5",
    "containerDefinitions": [
        {
            "name": "streamlit",
            "image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/anirban:latest",
            "cpu": 0,
            "portMappings": [
                {
                    "name": "streamlit-8501-tcp",
                    "containerPort": 8501,
                    "hostPort": 16347,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            "essential": true,
            "environment": [],
            "environmentFiles": [],
            "mountPoints": [
                {
                    "sourceVolume": "streamlit-volume",
                    "containerPath": "/streamlit",
                    "readOnly": true
                }
            ],
            "volumesFrom": [],
            "ulimits": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/Streamlit_TDF-1",
                    "mode": "non-blocking",
                    "awslogs-create-group": "true",
                    "max-buffer-size": "25m",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                },
                "secretOptions": []
            },
            "systemControls": []
        }
    ],
    "family": "Streamlit_TDF-1",
    "taskRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole",
    "executionRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole",
    "revision": 5,
    "volumes": [
        {
            "name": "streamlit-volume",
            "host": {
                "sourcePath": "/streamlit_appfiles"
            }
        }
    ],
    "status": "ACTIVE",
    "requiresAttributes": [
        {
            "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
        },
        {
            "name": "ecs.capability.execution-role-awslogs"
        },
        {
            "name": "com.amazonaws.ecs.capability.ecr-auth"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.28"
        },
        {
            "name": "com.amazonaws.ecs.capability.task-iam-role"
        },
        {
            "name": "ecs.capability.execution-role-ecr-pull"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
        }
    ],
    "placementConstraints": [],
    "compatibilities": [
        "EC2"
    ],
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "2048",
    "memory": "2560",
    "runtimePlatform": {
        "cpuArchitecture": "X86_64",
        "operatingSystemFamily": "LINUX"
    },
    "registeredAt": "2024-11-09T05:59:47.534Z",
    "registeredBy": "arn:aws:iam::<account-id>:root",
    "tags": []
}

Document Translation Service using Streamlit & AWS Translator

開發應用程式程式碼並建立 Docker 映像:

app.py

import streamlit as st
import boto3
import os
import time
from pathlib import Path

s3 = boto3.client('s3', region_name='us-east-1')
tran = boto3.client('translate', region_name='us-east-1')
lam = boto3.client('lambda', region_name='us-east-1')


# Function to list S3 buckets
def listbuckets():
    list_bucket = s3.list_buckets()
    bucket_name = tuple([it["Name"] for it in list_bucket["Buckets"]])
    return bucket_name

# Upload object to S3 bucket
def upload_to_s3bucket(file_path, selected_bucket, file_name):
    s3.upload_file(file_path, selected_bucket, file_name)

def list_language():
    response = tran.list_languages()
    list_of_langs = [i["LanguageName"] for i in response["Languages"]]
    return list_of_langs

def wait_for_s3obj(dest_selected_bucket, file_name):
    while True:
        try:
            get_obj = s3.get_object(Bucket=dest_selected_bucket, Key=f'Translated-{file_name}.txt')
            obj_exist = 'true' if get_obj['Body'] else 'false'
            return obj_exist
        except s3.exceptions.ClientError as e:
            if e.response['Error']['Code'] == "404":
                print(f"File '{file_name}' not found. Checking again in 3 seconds...")
                time.sleep(3)

def download(dest_selected_bucket, file_name, file_path):
     s3.download_file(dest_selected_bucket,f'Translated-{file_name}.txt', f'{file_path}/download/Translated-{file_name}.txt')
     with open(f"{file_path}/download/Translated-{file_name}.txt", "r") as file:
       st.download_button(
             label="Download",
             data=file,
             file_name=f"{file_name}.txt"
       )

def streamlit_application():
    # Give a header
    st.header("Document Translator", divider=True)
    # Widgets to upload a file
    uploaded_files = st.file_uploader("Choose a PDF file", accept_multiple_files=True, type="pdf")
    # # upload a file
    file_name = uploaded_files[0].name.replace(' ', '_') if uploaded_files else None
    # Folder path
    file_path = '/tmp'
    # Select the bucket from drop down
    selected_bucket = st.selectbox("Choose the S3 Bucket to upload file :", listbuckets())
    dest_selected_bucket = st.selectbox("Choose the S3 Bucket to download file :", listbuckets())
    selected_language = st.selectbox("Choose the Language :", list_language())
    # Create a button
    click = st.button("Upload", type="primary")
    if click == True:
        if file_name:
            with open(f'{file_path}/{file_name}', mode='wb') as w:
                w.write(uploaded_files[0].getvalue())
        # Set the selected language to the environment variable of lambda function
        lambda_env1 = lam.update_function_configuration(FunctionName='TriggerFunctionFromS3', Environment={'Variables': {'UserInputLanguage': selected_language, 'DestinationBucket': dest_selected_bucket, 'TranslatedFileName': file_name}})
        # Upload the file to S3 bucket:
        upload_to_s3bucket(f'{file_path}/{file_name}', selected_bucket, file_name)
        if s3.get_object(Bucket=selected_bucket, Key=file_name):
            st.success("File uploaded successfully", icon="✅")
            output = wait_for_s3obj(dest_selected_bucket, file_name)
            if output:
              download(dest_selected_bucket, file_name, file_path)
        else:
            st.error("File upload failed", icon="?")


streamlit_application()

about.py

import streamlit as st

## Write the description of application
st.header("About")
about = '''
Welcome to the File Uploader Application!

This application is designed to make uploading PDF documents simple and efficient. With just a few clicks, users can upload their documents securely to an Amazon S3 bucket for storage. Here’s a quick overview
of what this app does:

**Key Features:**
- **Easy Upload:** Users can quickly upload PDF documents by selecting the file and clicking the 'Upload' button.
- **Seamless Integration with AWS S3:** Once the document is uploaded, it is stored securely in a designated S3 bucket, ensuring reliable and scalable cloud storage.
- **User-Friendly Interface:** Built using Streamlit, the interface is clean, intuitive, and accessible to all users, making the uploading process straightforward.

**How it Works:**
1. **Select a PDF Document:** Users can browse and select any PDF document from their local system.
2. **Upload the Document:** Clicking the ‘Upload’ button triggers the process of securely uploading the selected document to an AWS S3 bucket.
3. **Success Notification:** After a successful upload, users will receive a confirmation message that their document has been stored in the cloud.
This application offers a streamlined way to store documents on the cloud, reducing the hassle of manual file management. Whether you're an individual or a business, this tool helps you organize and store your
 files with ease and security.
You can further customize this page by adding technical details, usage guidelines, or security measures as per your application's specifications.'''

st.markdown(about)

navigation.py

import streamlit as st

pg = st.navigation([
    st.Page("app.py", title="DocuTranslator", icon="?"),
    st.Page("about.py", title="About", icon="?")
], position="sidebar")

pg.run()

Dockerfile:

FROM python:3.9-slim
WORKDIR /streamlit
COPY requirements.txt /streamlit/requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
RUN mkdir /tmp/download
COPY . /streamlit
EXPOSE 8501
CMD ["streamlit", "run", "navigation.py", "--server.port=8501", "--server.headless=true"]

Docker 檔案將透過打包所有上述應用程式設定檔來建立映像,然後將其推送到 ECR 儲存庫。 Docker Hub 也可以用來儲存映像。

負載平衡

在該架構中,應用程式執行個體應該在私有子網路中創建,並且負載平衡器應該創建以減少私有 EC2 執行個體的傳入流量負載。
由於有兩個底層 EC2 主機可用於託管容器,因此在兩個 EC2 主機之間配置負載平衡以分配傳入流量。建立兩個不同的目標群組,在每個目標群組中放置兩個 EC2 實例,權重為 50%。

負載平衡器接受連接埠 80 的傳入流量,然後傳遞至連接埠 16347 處的後端 EC2 實例,並傳遞給對應的 ECS 容器。

Document Translation Service using Streamlit & AWS Translator

Document Translation Service using Streamlit & AWS Translator

拉姆達函數:

有一個lambda 函數,配置為將來源儲存桶作為輸入,從那裡下載pdf 檔案並提取內容,然後將內容從當前語言翻譯為使用者提供的目標語言,並建立一個文字檔案以上傳到目標S3桶。

{
    "taskDefinitionArn": "arn:aws:ecs:us-east-1:<account-id>:task-definition/Streamlit_TDF-1:5",
    "containerDefinitions": [
        {
            "name": "streamlit",
            "image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/anirban:latest",
            "cpu": 0,
            "portMappings": [
                {
                    "name": "streamlit-8501-tcp",
                    "containerPort": 8501,
                    "hostPort": 16347,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            "essential": true,
            "environment": [],
            "environmentFiles": [],
            "mountPoints": [
                {
                    "sourceVolume": "streamlit-volume",
                    "containerPath": "/streamlit",
                    "readOnly": true
                }
            ],
            "volumesFrom": [],
            "ulimits": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/Streamlit_TDF-1",
                    "mode": "non-blocking",
                    "awslogs-create-group": "true",
                    "max-buffer-size": "25m",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                },
                "secretOptions": []
            },
            "systemControls": []
        }
    ],
    "family": "Streamlit_TDF-1",
    "taskRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole",
    "executionRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole",
    "revision": 5,
    "volumes": [
        {
            "name": "streamlit-volume",
            "host": {
                "sourcePath": "/streamlit_appfiles"
            }
        }
    ],
    "status": "ACTIVE",
    "requiresAttributes": [
        {
            "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
        },
        {
            "name": "ecs.capability.execution-role-awslogs"
        },
        {
            "name": "com.amazonaws.ecs.capability.ecr-auth"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.28"
        },
        {
            "name": "com.amazonaws.ecs.capability.task-iam-role"
        },
        {
            "name": "ecs.capability.execution-role-ecr-pull"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
        }
    ],
    "placementConstraints": [],
    "compatibilities": [
        "EC2"
    ],
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "2048",
    "memory": "2560",
    "runtimePlatform": {
        "cpuArchitecture": "X86_64",
        "operatingSystemFamily": "LINUX"
    },
    "registeredAt": "2024-11-09T05:59:47.534Z",
    "registeredBy": "arn:aws:iam::<account-id>:root",
    "tags": []
}

應用測試:

開啟應用程式負載平衡器 URL「ALB-747339710.us-east-1.elb.amazonaws.com」以開啟 Web 應用程式。瀏覽任何pdf 文件,保持來源"fileuploadbucket-hwirio984092jjs" 和目標儲存桶"translatedfileuploadbucket-kh939809kjkfjsekfl" loadbucket-kh939809kjkfjsekfl"

大程式碼就是上面提到的。選擇您想要翻譯文件的語言,然後按一下上傳。點擊後,應用程式將開始輪詢目標 S3 儲存桶以查明翻譯檔案是否已上傳。如果找到確切的文件,則會顯示一個新選項“下載”,用於從目標 S3 儲存桶下載文件。

申請連結:http://alb-747339710.us-east-1.elb.amazonaws.com/

Document Translation Service using Streamlit & AWS Translator

實際內容:

import streamlit as st
import boto3
import os
import time
from pathlib import Path

s3 = boto3.client('s3', region_name='us-east-1')
tran = boto3.client('translate', region_name='us-east-1')
lam = boto3.client('lambda', region_name='us-east-1')


# Function to list S3 buckets
def listbuckets():
    list_bucket = s3.list_buckets()
    bucket_name = tuple([it["Name"] for it in list_bucket["Buckets"]])
    return bucket_name

# Upload object to S3 bucket
def upload_to_s3bucket(file_path, selected_bucket, file_name):
    s3.upload_file(file_path, selected_bucket, file_name)

def list_language():
    response = tran.list_languages()
    list_of_langs = [i["LanguageName"] for i in response["Languages"]]
    return list_of_langs

def wait_for_s3obj(dest_selected_bucket, file_name):
    while True:
        try:
            get_obj = s3.get_object(Bucket=dest_selected_bucket, Key=f'Translated-{file_name}.txt')
            obj_exist = 'true' if get_obj['Body'] else 'false'
            return obj_exist
        except s3.exceptions.ClientError as e:
            if e.response['Error']['Code'] == "404":
                print(f"File '{file_name}' not found. Checking again in 3 seconds...")
                time.sleep(3)

def download(dest_selected_bucket, file_name, file_path):
     s3.download_file(dest_selected_bucket,f'Translated-{file_name}.txt', f'{file_path}/download/Translated-{file_name}.txt')
     with open(f"{file_path}/download/Translated-{file_name}.txt", "r") as file:
       st.download_button(
             label="Download",
             data=file,
             file_name=f"{file_name}.txt"
       )

def streamlit_application():
    # Give a header
    st.header("Document Translator", divider=True)
    # Widgets to upload a file
    uploaded_files = st.file_uploader("Choose a PDF file", accept_multiple_files=True, type="pdf")
    # # upload a file
    file_name = uploaded_files[0].name.replace(' ', '_') if uploaded_files else None
    # Folder path
    file_path = '/tmp'
    # Select the bucket from drop down
    selected_bucket = st.selectbox("Choose the S3 Bucket to upload file :", listbuckets())
    dest_selected_bucket = st.selectbox("Choose the S3 Bucket to download file :", listbuckets())
    selected_language = st.selectbox("Choose the Language :", list_language())
    # Create a button
    click = st.button("Upload", type="primary")
    if click == True:
        if file_name:
            with open(f'{file_path}/{file_name}', mode='wb') as w:
                w.write(uploaded_files[0].getvalue())
        # Set the selected language to the environment variable of lambda function
        lambda_env1 = lam.update_function_configuration(FunctionName='TriggerFunctionFromS3', Environment={'Variables': {'UserInputLanguage': selected_language, 'DestinationBucket': dest_selected_bucket, 'TranslatedFileName': file_name}})
        # Upload the file to S3 bucket:
        upload_to_s3bucket(f'{file_path}/{file_name}', selected_bucket, file_name)
        if s3.get_object(Bucket=selected_bucket, Key=file_name):
            st.success("File uploaded successfully", icon="✅")
            output = wait_for_s3obj(dest_selected_bucket, file_name)
            if output:
              download(dest_selected_bucket, file_name, file_path)
        else:
            st.error("File upload failed", icon="?")


streamlit_application()

翻譯內容(加拿大法文)

import streamlit as st

## Write the description of application
st.header("About")
about = '''
Welcome to the File Uploader Application!

This application is designed to make uploading PDF documents simple and efficient. With just a few clicks, users can upload their documents securely to an Amazon S3 bucket for storage. Here’s a quick overview
of what this app does:

**Key Features:**
- **Easy Upload:** Users can quickly upload PDF documents by selecting the file and clicking the 'Upload' button.
- **Seamless Integration with AWS S3:** Once the document is uploaded, it is stored securely in a designated S3 bucket, ensuring reliable and scalable cloud storage.
- **User-Friendly Interface:** Built using Streamlit, the interface is clean, intuitive, and accessible to all users, making the uploading process straightforward.

**How it Works:**
1. **Select a PDF Document:** Users can browse and select any PDF document from their local system.
2. **Upload the Document:** Clicking the ‘Upload’ button triggers the process of securely uploading the selected document to an AWS S3 bucket.
3. **Success Notification:** After a successful upload, users will receive a confirmation message that their document has been stored in the cloud.
This application offers a streamlined way to store documents on the cloud, reducing the hassle of manual file management. Whether you're an individual or a business, this tool helps you organize and store your
 files with ease and security.
You can further customize this page by adding technical details, usage guidelines, or security measures as per your application's specifications.'''

st.markdown(about)

結論:

本文向我們展示了文件翻譯過程如何像我們想像的那樣簡單,最終用戶必須單擊一些選項來選擇所需的信息,並在幾秒鐘內獲得所需的輸出,而無需考慮配置。目前,我們已經包含了翻譯 pdf 文件的單一功能,但稍後我們將對此進行更多研究,以便在單一應用程式中具有多種功能,並具有一些有趣的功能。

以上是使用 Streamlit 和 AWS Translator 的文件翻譯服務的詳細內容。更多資訊請關注PHP中文網其他相關文章!

陳述:
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn