ホームページ >テクノロジー周辺機器 >AI >ZenMLプロジェクトでMLOPSを理解する

ZenMLプロジェクトでMLOPSを理解する

Lisa Kudrowオリジナル: 2025-03-08 11:16:09493ブラウズ

AI革命は私たちにありますが、この混乱の間に、私たちのほとんどに非常に重要な質問が見落とされます。これらの洗練されたAIシステムをどのように維持しますか？そこで、機械学習操作（MLOPS）が登場します。このブログでは、エンドツーエンドのプロジェクトを構築することにより、オープンソースMLOPSフレームワークであるZenMLを使用したMLOPSの重要性を理解します。

この記事は、 データサイエンスブログの一部として公開されました。 目次

mlopsとは何ですか？

初期構成
exploratoryデータ分析（eda）
viding for mosoular よくある質問 mlopsとは？
mlopsは、機械学習エンジニアがMLモデルライフサイクルのプロセスを合理化できるようにします。機械学習の生産は困難です。機械学習ライフサイクルは、データインゲスト、データ準備、モデルトレーニング、モデルチューニング、モデルの展開、モデルの監視、説明可能性など、多くの複雑なコンポーネントで構成されています。 MLOPSは、堅牢なパイプラインを介してプロセスの各ステップを自動化して、手動エラーを減らします。これは、最小限の手動努力と最大の効率的な操作でAIインフラストラクチャを容易にするための共同の慣行です。 Mlopsは、いくつかのスパイスを持つAI業界のDevOpsと考えてください。

実践的なプロジェクトでmlopsを理解する

次に、エンドツーエンドのシンプルでありながら生産グレードのデータサイエンスプロジェクトの助けを借りて、MLOPがどのように実装されているかを理解します。このプロジェクトでは、顧客の顧客生涯価値（CLTV）を予測するために、機械学習モデルを作成および展開します。 CLTVは、企業が長期的に顧客からどれだけ利益または損失を得るかを確認するために使用する重要な指標です。このメトリックを使用して、企業はターゲット広告などのために顧客にさらに費やすかどうかを選択できます。

次のセクションでプロジェクトの実装を開始します。

初期構成

ここで、プロジェクトの構成に直接入りましょう。まず、UCI Machine Learningリポジトリからオンライン小売データセットをダウンロードする必要があります。 ZenMLはWindowsではサポートされていないため、Linux（WindowsのWSL）またはMacOSを使用する必要があります。次に、要件をダウンロードします。txt。次に、少数の構成についてターミナルに進みましょう。

# Make sure you have Python 3.10 or above installed
python --version

# Make a new Python environment using any method
python3.10 -m venv myenv 

# Activate the environment
source myenv/bin/activate

# Install the requirements from the provided source above
pip install -r requirements.txt

# Install the Zenml server
pip install zenml[server] == 0.66.0

# Initialize the Zenml server
zenml init

# Launch the Zenml dashboard
zenml up

デフォルトのログイン資格情報を使用して、ZenMLダッシュボードにログインするだけです（パスワードは不要です）。

おめでとうございます。プロジェクトの構成を正常に完了しました

探索的データ分析（EDA）

今、データで手を汚す時が来ました。データを分析するためのajupyterノートブックを作成します

Pro Tip

：私をフォローせずに独自の分析をしてください。

または、プロジェクトで使用するさまざまなデータ分析方法を作成したこのノートブックに従うことができます。 さて、データ分析のシェアを実行したと仮定すると、スパイシーな部分に直接ジャンプしましょう。 ZenMLの手順をモジュラーコーディング

として定義しますコードのモジュール性と再生可能性を高めるために、@stepデコレーターは、パイプラインに渡すためにコードを整理してエラーの可能性を減らすためにコードを整理するZenMLから使用されます。

ソースフォルダーでは、初期化する前に各ステップのメソッドを記述します。各方法（データ摂取、データクリーニング、機能エンジニアリングなど）の戦略のための抽象的なメソッドを作成することにより、各方法のシステム設計パターンに従います。

インゲストデータのサンプルコード

ingest_data.py

のコードのサンプル

残りの方法を作成するために、このパターンに従います。指定されたGitHubリポジトリからコードをコピーできます。

すべてのメソッドを記述した後、ステップフォルダーのZENMLステップを初期化する時が来ました。これまでに作成したすべての方法は、それに応じてZenMLステップで使用されます。

データ摂取のサンプルコード

data_ingestion_step.pyのサンプルコード：

import logging
import pandas as pd
from abc import ABC, abstractmethod

# Setup logging configuration
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

# Abstract Base Class for Data Ingestion Strategy
# ------------------------------------------------
# This class defines a common interface for different data ingestion strategies.
# Subclasses must implement the `ingest` method.
class DataIngestionStrategy(ABC):
    @abstractmethod
    def ingest(self, file_path: str) -> pd.DataFrame:
        """
        Abstract method to ingest data from a file into a DataFrame.

        Parameters:
        file_path (str): The path to the data file to ingest.

        Returns:
        pd.DataFrame: A dataframe containing the ingested data.
        """
        pass
    
# Concrete Strategy for XLSX File Ingestion
# -----------------------------------------
# This strategy handles the ingestion of data from an XLSX file.
class XLSXIngestion(DataIngestionStrategy):
    def __init__(self, sheet_name=0):
        """
        Initializes the XLSXIngestion with optional sheet name.

        Parameters:
        sheet_name (str or int): The sheet name or index to read, default is the first sheet.
        """
        self.sheet_name = sheet_name

    def ingest(self, file_path: str) -> pd.DataFrame:
        """
        Ingests data from an XLSX file into a DataFrame.

        Parameters:
        file_path (str): The path to the XLSX file.

        Returns:
        pd.DataFrame: A dataframe containing the ingested data.
        """
        try:
            logging.info(f"Attempting to read XLSX file: {file_path}")
            df = pd.read_excel(file_path,dtype={'InvoiceNo': str, 'StockCode': str, 'Description':str}, sheet_name=self.sheet_name)
            logging.info(f"Successfully read XLSX file: {file_path}")
            return df
        except FileNotFoundError:
            logging.error(f"File not found: {file_path}")
        except pd.errors.EmptyDataError:
            logging.error(f"File is empty: {file_path}")
        except Exception as e:
            logging.error(f"An error occurred while reading the XLSX file: {e}")
        return pd.DataFrame()


# Context Class for Data Ingestion
# --------------------------------
# This class uses a DataIngestionStrategy to ingest data from a file.
class DataIngestor:
    def __init__(self, strategy: DataIngestionStrategy):
        """
        Initializes the DataIngestor with a specific data ingestion strategy.

        Parameters:
        strategy (DataIngestionStrategy): The strategy to be used for data ingestion.
        """
        self._strategy = strategy

    def set_strategy(self, strategy: DataIngestionStrategy):
        """
        Sets a new strategy for the DataIngestor.

        Parameters:
        strategy (DataIngestionStrategy): The new strategy to be used for data ingestion.
        """
        logging.info("Switching data ingestion strategy.")
        self._strategy = strategy

    def ingest_data(self, file_path: str) -> pd.DataFrame:
        """
        Executes the data ingestion using the current strategy.

        Parameters:
        file_path (str): The path to the data file to ingest.

        Returns:
        pd.DataFrame: A dataframe containing the ingested data.
        """
        logging.info("Ingesting data using the current strategy.")
        return self._strategy.ingest(file_path)


# Example usage
if __name__ == "__main__":
    # Example file path for XLSX file
    # file_path = "../data/raw/your_data_file.xlsx"

    # XLSX Ingestion Example
    # xlsx_ingestor = DataIngestor(XLSXIngestion(sheet_name=0))
    # df = xlsx_ingestor.ingest_data(file_path)

    # Show the first few rows of the ingested DataFrame if successful
    # if not df.empty:
    #     logging.info("Displaying the first few rows of the ingested data:")
    #     print(df.head())
    pass csv

プロジェクトの残りのZENMLステップを作成するために、上記と同じパターンに従います。ここからコピーできます。

ZenMLプロジェクトでMLOPSを理解する

すごい！ MLOPの最も重要な部分の1つを作成して学習しておめでとうございます。初めてのので、少し圧倒されても大丈夫です。最初の生産グレードMLモデルを実行するときにすべてが理にかなっているため、あまりストレスをかけないでください。

パイプラインの構築

パイプラインを構築する時。いいえ、水や油を運ばないようにします。パイプラインは、完全な機械学習ワークフローを形成するために、特定の順序で編成された一連のステップです。 @Pipelineデコレーターは、ZenMLで使用されて、上記で作成した手順を含むパイプラインを指定します。このアプローチにより、次のステップの入力として1つのステップの出力を使用できるようにします。

これがTraining_pipeline.py：

です

# Make sure you have Python 3.10 or above installed
python --version

# Make a new Python environment using any method
python3.10 -m venv myenv 

# Activate the environment
source myenv/bin/activate

# Install the requirements from the provided source above
pip install -r requirements.txt

# Install the Zenml server
pip install zenml[server] == 0.66.0

# Initialize the Zenml server
zenml init

# Launch the Zenml dashboard
zenml up

Training_Pipeline.pyを実行して、MLモデルを1回クリックしてトレーニングできます。 Zenmlダッシュボードでパイプラインを確認できます。

ZenMLプロジェクトでMLOPSを理解するモデルの詳細を確認し、複数のモデルをトレーニングし、端末で次のコードを実行してMLFLowダッシュボードで比較することもできます。

展開パイプラインの作成

import logging
import pandas as pd
from abc import ABC, abstractmethod

# Setup logging configuration
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

# Abstract Base Class for Data Ingestion Strategy
# ------------------------------------------------
# This class defines a common interface for different data ingestion strategies.
# Subclasses must implement the `ingest` method.
class DataIngestionStrategy(ABC):
    @abstractmethod
    def ingest(self, file_path: str) -> pd.DataFrame:
        """
        Abstract method to ingest data from a file into a DataFrame.

        Parameters:
        file_path (str): The path to the data file to ingest.

        Returns:
        pd.DataFrame: A dataframe containing the ingested data.
        """
        pass
    
# Concrete Strategy for XLSX File Ingestion
# -----------------------------------------
# This strategy handles the ingestion of data from an XLSX file.
class XLSXIngestion(DataIngestionStrategy):
    def __init__(self, sheet_name=0):
        """
        Initializes the XLSXIngestion with optional sheet name.

        Parameters:
        sheet_name (str or int): The sheet name or index to read, default is the first sheet.
        """
        self.sheet_name = sheet_name

    def ingest(self, file_path: str) -> pd.DataFrame:
        """
        Ingests data from an XLSX file into a DataFrame.

        Parameters:
        file_path (str): The path to the XLSX file.

        Returns:
        pd.DataFrame: A dataframe containing the ingested data.
        """
        try:
            logging.info(f"Attempting to read XLSX file: {file_path}")
            df = pd.read_excel(file_path,dtype={'InvoiceNo': str, 'StockCode': str, 'Description':str}, sheet_name=self.sheet_name)
            logging.info(f"Successfully read XLSX file: {file_path}")
            return df
        except FileNotFoundError:
            logging.error(f"File not found: {file_path}")
        except pd.errors.EmptyDataError:
            logging.error(f"File is empty: {file_path}")
        except Exception as e:
            logging.error(f"An error occurred while reading the XLSX file: {e}")
        return pd.DataFrame()


# Context Class for Data Ingestion
# --------------------------------
# This class uses a DataIngestionStrategy to ingest data from a file.
class DataIngestor:
    def __init__(self, strategy: DataIngestionStrategy):
        """
        Initializes the DataIngestor with a specific data ingestion strategy.

        Parameters:
        strategy (DataIngestionStrategy): The strategy to be used for data ingestion.
        """
        self._strategy = strategy

    def set_strategy(self, strategy: DataIngestionStrategy):
        """
        Sets a new strategy for the DataIngestor.

        Parameters:
        strategy (DataIngestionStrategy): The new strategy to be used for data ingestion.
        """
        logging.info("Switching data ingestion strategy.")
        self._strategy = strategy

    def ingest_data(self, file_path: str) -> pd.DataFrame:
        """
        Executes the data ingestion using the current strategy.

        Parameters:
        file_path (str): The path to the data file to ingest.

        Returns:
        pd.DataFrame: A dataframe containing the ingested data.
        """
        logging.info("Ingesting data using the current strategy.")
        return self._strategy.ingest(file_path)


# Example usage
if __name__ == "__main__":
    # Example file path for XLSX file
    # file_path = "../data/raw/your_data_file.xlsx"

    # XLSX Ingestion Example
    # xlsx_ingestor = DataIngestor(XLSXIngestion(sheet_name=0))
    # df = xlsx_ingestor.ingest_data(file_path)

    # Show the first few rows of the ingested DataFrame if successful
    # if not df.empty:
    #     logging.info("Displaying the first few rows of the ingested data:")
    #     print(df.head())
    pass csv

次に、deployment_pipeline.py

を作成します

展開パイプラインを実行すると、Zenmlダッシュボードでこのようなビューが表示されます。

import os
import sys
sys.path.append(os.path.dirname(os.path.dirname(__file__)))

import pandas as pd
from src.ingest_data import DataIngestor, XLSXIngestion
from zenml import step

@step
def data_ingestion_step(file_path: str) -> pd.DataFrame:
    """
    Ingests data from an XLSX file into a DataFrame.

    Parameters:
    file_path (str): The path to the XLSX file.

    Returns:
    pd.DataFrame: A dataframe containing the ingested data.
    """
    # Initialize the DataIngestor with an XLSXIngestion strategy
    
    ingestor = DataIngestor(XLSXIngestion())
    
    # Ingest data from the specified file
    
    df = ingestor.ingest_data(file_path)
    
    return df

おめでとうございます。お客様のローカルインスタンスでMLFLOWとZENMLを使用して最高のモデルを展開しました。

Flask Appを作成します ZenMLプロジェクトでMLOPSを理解する

次のステップは、モデルをエンドユーザーに投影するフラスコアプリを作成することです。そのためには、テンプレートフォルダー内にapp.pyとindex.htmlを作成する必要があります。以下のコードに従って、app.py：

を作成します

index.htmlファイルを作成するには、以下のコードに従ってください：

あなたのapp.pyは、実行後にこのように見えるはずです：

#import csvimport os
import sys
sys.path.append(os.path.dirname(os.path.dirname(__file__)))
from steps.data_ingestion_step import data_ingestion_step
from steps.handling_missing_values_step import handling_missing_values_step
from steps.dropping_columns_step import dropping_columns_step
from steps.detecting_outliers_step import detecting_outliers_step
from steps.feature_engineering_step import feature_engineering_step
from steps.data_splitting_step import data_splitting_step
from steps.model_building_step import model_building_step
from steps.model_evaluating_step import model_evaluating_step
from steps.data_resampling_step import data_resampling_step
from zenml import Model, pipeline


@pipeline(model=Model(name='CLTV_Prediction'))
def training_pipeline():
    """
    Defines the complete training pipeline for CLTV Prediction.
    Steps:
    1. Data ingestion
    2. Handling missing values
    3. Dropping unnecessary columns
    4. Detecting and handling outliers
    5. Feature engineering
    6. Splitting data into train and test sets
    7. Resampling the training data
    8. Model training
    9. Model evaluation
    """
    # Step 1: Data ingestion
    raw_data = data_ingestion_step(file_path='data/Online_Retail.xlsx')

    # Step 2: Drop unnecessary columns
    columns_to_drop = ["Country", "Description", "InvoiceNo", "StockCode"]
    refined_data = dropping_columns_step(raw_data, columns_to_drop)

    # Step 3: Detect and handle outliers
    outlier_free_data = detecting_outliers_step(refined_data)

    # Step 4: Feature engineering
    features_data = feature_engineering_step(outlier_free_data)
    
    # Step 5: Handle missing values
    cleaned_data = handling_missing_values_step(features_data)
    
    # Step 6: Data splitting
    train_features, test_features, train_target, test_target = data_splitting_step(cleaned_data,"CLTV")

    # Step 7: Data resampling
    train_features_resampled, train_target_resampled = data_resampling_step(train_features, train_target)

    # Step 8: Model training
    trained_model = model_building_step(train_features_resampled, train_target_resampled)

    # Step 9: Model evaluation
    evaluation_metrics = model_evaluating_step(trained_model, test_features, test_target)

    # Return evaluation metrics
    return evaluation_metrics


if __name__ == "__main__":
    # Run the pipeline
    training_pipeline()

mlflow ui

最後のステップは、GitHubリポジトリにこれらの変更をコミットし、クラウドサーバーにモデルをオンラインで展開することです。このプロジェクトでは、無料のレンダリングサーバーにapp.pyを展開し、

render.comにアクセスして、プロジェクトのgithubリポジトリをレンダリングに接続します。 ZenMLプロジェクトでMLOPSを理解するそれだけです。最初のMLOPSプロジェクトを正常に作成しました。あなたがそれを楽しんだことを願っています！

結論

mlopsは、データの摂取からモデルの展開まで、機械学習ワークフローの複雑さを管理する上で不可欠な実践となっています。オープンソースMLOPSフレームワークであるZenMLを活用することにより、顧客生涯価値（CLTV）予測のための生産グレードMLモデルを構築、トレーニング、展開するプロセスを合理化しました。モジュラーコーディング、堅牢なパイプライン、シームレスな統合により、エンドツーエンドプロジェクトを効率的に作成する方法を実証しました。企業がAI駆動型のソリューションにますます依存しているため、ZenMLのようなフレームワークは、チームがチームをエンパワーして、最小限の手動介入でスケーラビリティ、再現性、パフォーマンスを維持します。

キーテイクアウト

mlopsはMLライフサイクルを簡素化し、自動パイプラインを介してエラーを減らし、効率を高めます。
エンドツーエンドのパイプラインの構築には、データの摂取から展開まで、明確な手順を定義することが含まれます。

以上がZenMLプロジェクトでMLOPSを理解するの詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。

Python flask html define for Error auto using Interface copy default this input table github windows macos devops linux everything

声明：

この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。

前の記事：Elon Musk＆Sam Altmanは、5,000億ドルを超えるスターゲートプロジェクトを超えて衝突します次の記事：Elon Musk＆Sam Altmanは、5,000億ドルを超えるスターゲートプロジェクトを超えて衝突します

続きを見る