首頁 >科技週邊 >人工智慧 >了解Zenml項目的MLOP

了解Zenml項目的MLOP

Lisa Kudrow
Lisa Kudrow原創
2025-03-08 11:16:09489瀏覽

AI革命>我們大多數人都忽略了一個非常關鍵的問題 - 我們如何維護這些複雜的AI系統?這就是機器學習操作(MLOP)發揮作用的地方。在此博客中,我們將通過構建一個端到端項目來了解MLOP的重要性。

>本文是> > data Science Blogathon的一部分。 目錄的>

>什麼是mlops?管道

    常見問題
  • 什麼是mlops?
  • MLOPS
  • MLOPS授權機器學習工程師簡化ML模型生命週期的過程。生產機器學習很困難。機器學習生命週期由許多複雜的組件組成,例如數據攝入,數據準備,模型培訓,模型調整,模型部署,模型監視,解釋性等等。 MLOP通過強大的管道自動化過程的每個步驟,以減少手動錯誤。這是一種協作實踐,可以通過最低限度的手動工作和最大的有效操作來簡化您的AI基礎架構。將MLOP視為具有某些香料的AI行業的Devops。
  • 什麼是zenml?
  • Zenml是一個開源MLOPS框架,可簡化機器學習工作流程的開發,部署和管理。通過利用MLOP的原理,它與各種工具和基礎架構無縫集成,為用戶提供了一種模塊化方法,可以在單個工作場所下維護其AI工作流程。 Zenml提供了諸如Auto-Logs,Meta-Data跟踪器,模型跟踪器,實驗跟踪器,Artifact Store和簡單的Python Decorators諸如核心邏輯無復雜配置的功能。
  • >
  • 通過動手項目了解MLOP
  • >現在,我們將在端到端的簡單生產級數據科學項目的幫助下了解MLOP的實施。在此項目中,我們將創建並部署機器學習模型,以預測客戶的客戶壽命價值(CLTV)。 CLTV是公司使用的關鍵指標,以查看他們長期從客戶那裡獲得多少損益。使用此指標,一家公司可以選擇進一步花費或不花錢購買目標廣告,等等。
  • >
  • 讓我們開始在下一部分中實施項目。

    初始配置

    現在,讓我們直接進入項目配置。首先,我們需要從UCI機器學習存儲庫下載在線零售數據集。 Windows不支持ZenML,因此我們需要使用Linux(Windows中的WSL)或MacOS。接下來下載unignts.txt。現在,讓我們進入終端以進行幾個配置。

    >
    # Make sure you have Python 3.10 or above installed
    python --version
    
    # Make a new Python environment using any method
    python3.10 -m venv myenv 
    
    # Activate the environment
    source myenv/bin/activate
    
    # Install the requirements from the provided source above
    pip install -r requirements.txt
    
    # Install the Zenml server
    pip install zenml[server] == 0.66.0
    
    # Initialize the Zenml server
    zenml init
    
    # Launch the Zenml dashboard
    zenml up

    現在,只需使用默認登錄憑據登錄Zenml儀表板(無需密碼)。

    恭喜您已經成功完成了項目配置。

    >

    >探索性數據分析(EDA)

    現在是時候讓我們的數據弄髒數據了。我們將創建用於分析我們數據的Ajupyter筆記本。

    >

    >

    pro tip:進行自己的分析而不關注我。 >

    >您只需關注本筆記本,我們在該筆記本上創建了不同的數據分析方法以在我們的項目中使用。 >

    現在,假設您已經執行了數據分析的份額,那麼讓我們直接跳到辛辣的部分。 >

    將Zenml的步驟定義為模塊化編碼

    為了增加代碼的模塊化和重複性,@Step Decorator是從Zenml中使用的,該裝飾器組織了我們的代碼以傳遞到Pipelines Hassle Hastle Free,減少了錯誤的機會。

    >

    在我們的源文件夾中,我們將在初始化它們之前為每個步驟編寫方法。我們通過為每種方法的策略(數據攝入,數據清潔,功能工程等)創建一個抽象方法來遵循每種方法的系統設計模式。

    >攝入數據的示例代碼

    > ingest_data.py

    的代碼示例

    >我們將遵循此模式來創建其餘方法。您可以從給定的github存儲庫複製代碼。

    import logging
    import pandas as pd
    from abc import ABC, abstractmethod
    
    # Setup logging configuration
    logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
    
    # Abstract Base Class for Data Ingestion Strategy
    # ------------------------------------------------
    # This class defines a common interface for different data ingestion strategies.
    # Subclasses must implement the `ingest` method.
    class DataIngestionStrategy(ABC):
        @abstractmethod
        def ingest(self, file_path: str) -> pd.DataFrame:
            """
            Abstract method to ingest data from a file into a DataFrame.
    
            Parameters:
            file_path (str): The path to the data file to ingest.
    
            Returns:
            pd.DataFrame: A dataframe containing the ingested data.
            """
            pass
        
    # Concrete Strategy for XLSX File Ingestion
    # -----------------------------------------
    # This strategy handles the ingestion of data from an XLSX file.
    class XLSXIngestion(DataIngestionStrategy):
        def __init__(self, sheet_name=0):
            """
            Initializes the XLSXIngestion with optional sheet name.
    
            Parameters:
            sheet_name (str or int): The sheet name or index to read, default is the first sheet.
            """
            self.sheet_name = sheet_name
    
        def ingest(self, file_path: str) -> pd.DataFrame:
            """
            Ingests data from an XLSX file into a DataFrame.
    
            Parameters:
            file_path (str): The path to the XLSX file.
    
            Returns:
            pd.DataFrame: A dataframe containing the ingested data.
            """
            try:
                logging.info(f"Attempting to read XLSX file: {file_path}")
                df = pd.read_excel(file_path,dtype={'InvoiceNo': str, 'StockCode': str, 'Description':str}, sheet_name=self.sheet_name)
                logging.info(f"Successfully read XLSX file: {file_path}")
                return df
            except FileNotFoundError:
                logging.error(f"File not found: {file_path}")
            except pd.errors.EmptyDataError:
                logging.error(f"File is empty: {file_path}")
            except Exception as e:
                logging.error(f"An error occurred while reading the XLSX file: {e}")
            return pd.DataFrame()
    
    
    # Context Class for Data Ingestion
    # --------------------------------
    # This class uses a DataIngestionStrategy to ingest data from a file.
    class DataIngestor:
        def __init__(self, strategy: DataIngestionStrategy):
            """
            Initializes the DataIngestor with a specific data ingestion strategy.
    
            Parameters:
            strategy (DataIngestionStrategy): The strategy to be used for data ingestion.
            """
            self._strategy = strategy
    
        def set_strategy(self, strategy: DataIngestionStrategy):
            """
            Sets a new strategy for the DataIngestor.
    
            Parameters:
            strategy (DataIngestionStrategy): The new strategy to be used for data ingestion.
            """
            logging.info("Switching data ingestion strategy.")
            self._strategy = strategy
    
        def ingest_data(self, file_path: str) -> pd.DataFrame:
            """
            Executes the data ingestion using the current strategy.
    
            Parameters:
            file_path (str): The path to the data file to ingest.
    
            Returns:
            pd.DataFrame: A dataframe containing the ingested data.
            """
            logging.info("Ingesting data using the current strategy.")
            return self._strategy.ingest(file_path)
    
    
    # Example usage
    if __name__ == "__main__":
        # Example file path for XLSX file
        # file_path = "../data/raw/your_data_file.xlsx"
    
        # XLSX Ingestion Example
        # xlsx_ingestor = DataIngestor(XLSXIngestion(sheet_name=0))
        # df = xlsx_ingestor.ingest_data(file_path)
    
        # Show the first few rows of the ingested DataFrame if successful
        # if not df.empty:
        #     logging.info("Displaying the first few rows of the ingested data:")
        #     print(df.head())
        pass csv

    >寫下所有方法後,是時候初始化Zenml步驟中的步驟文件夾了。現在,我們到目前為止創建的所有方法將在Zenml步驟中使用。

    示例攝入的示例代碼了解Zenml項目的MLOP

    > data_ingestion_step.py的示例代碼:

    >

    >我們將遵循與上述相同的模式,以創建我們項目中的其餘ZenML步驟。您可以從這裡複製它們。

    >

    import os
    import sys
    sys.path.append(os.path.dirname(os.path.dirname(__file__)))
    
    import pandas as pd
    from src.ingest_data import DataIngestor, XLSXIngestion
    from zenml import step
    
    @step
    def data_ingestion_step(file_path: str) -> pd.DataFrame:
        """
        Ingests data from an XLSX file into a DataFrame.
    
        Parameters:
        file_path (str): The path to the XLSX file.
    
        Returns:
        pd.DataFrame: A dataframe containing the ingested data.
        """
        # Initialize the DataIngestor with an XLSXIngestion strategy
        
        ingestor = DataIngestor(XLSXIngestion())
        
        # Ingest data from the specified file
        
        df = ingestor.ingest_data(file_path)
        
        return df

    哇!祝賀創建和學習MLOP最重要的部分之一。可以讓一些不知所措,因為這是您的第一次。不要承受太大的壓力,因為當您運行第一級生產級ML模型時,一切都會很有意義。

    >

    構建管道

    是時候構建我們​​的管道了。不,不要攜帶水或油。管道是按特定順序組織的一系列步驟,以形成我們完整的機器學習工作流程。 @PiPeline裝飾器在Zenml中用於指定將包含我們上面創建的步驟的管道。這種方法確保我們可以將一個步驟的輸出用作下一步的輸入。

    這是我們的triagn_pipeline.py:

    # Make sure you have Python 3.10 or above installed
    python --version
    
    # Make a new Python environment using any method
    python3.10 -m venv myenv 
    
    # Activate the environment
    source myenv/bin/activate
    
    # Install the requirements from the provided source above
    pip install -r requirements.txt
    
    # Install the Zenml server
    pip install zenml[server] == 0.66.0
    
    # Initialize the Zenml server
    zenml init
    
    # Launch the Zenml dashboard
    zenml up
    >現在我們可以單擊一次訓練_pipeline.py來訓練我們的ML模型。您可以檢查Zenml儀表板中的管道:

    了解Zenml項目的MLOP

    我們可以檢查我們的模型詳細信息,還可以通過在終端中運行以下代碼來訓練多個模型,並在MLFlow儀表板中進行比較。

    >

    import logging
    import pandas as pd
    from abc import ABC, abstractmethod
    
    # Setup logging configuration
    logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
    
    # Abstract Base Class for Data Ingestion Strategy
    # ------------------------------------------------
    # This class defines a common interface for different data ingestion strategies.
    # Subclasses must implement the `ingest` method.
    class DataIngestionStrategy(ABC):
        @abstractmethod
        def ingest(self, file_path: str) -> pd.DataFrame:
            """
            Abstract method to ingest data from a file into a DataFrame.
    
            Parameters:
            file_path (str): The path to the data file to ingest.
    
            Returns:
            pd.DataFrame: A dataframe containing the ingested data.
            """
            pass
        
    # Concrete Strategy for XLSX File Ingestion
    # -----------------------------------------
    # This strategy handles the ingestion of data from an XLSX file.
    class XLSXIngestion(DataIngestionStrategy):
        def __init__(self, sheet_name=0):
            """
            Initializes the XLSXIngestion with optional sheet name.
    
            Parameters:
            sheet_name (str or int): The sheet name or index to read, default is the first sheet.
            """
            self.sheet_name = sheet_name
    
        def ingest(self, file_path: str) -> pd.DataFrame:
            """
            Ingests data from an XLSX file into a DataFrame.
    
            Parameters:
            file_path (str): The path to the XLSX file.
    
            Returns:
            pd.DataFrame: A dataframe containing the ingested data.
            """
            try:
                logging.info(f"Attempting to read XLSX file: {file_path}")
                df = pd.read_excel(file_path,dtype={'InvoiceNo': str, 'StockCode': str, 'Description':str}, sheet_name=self.sheet_name)
                logging.info(f"Successfully read XLSX file: {file_path}")
                return df
            except FileNotFoundError:
                logging.error(f"File not found: {file_path}")
            except pd.errors.EmptyDataError:
                logging.error(f"File is empty: {file_path}")
            except Exception as e:
                logging.error(f"An error occurred while reading the XLSX file: {e}")
            return pd.DataFrame()
    
    
    # Context Class for Data Ingestion
    # --------------------------------
    # This class uses a DataIngestionStrategy to ingest data from a file.
    class DataIngestor:
        def __init__(self, strategy: DataIngestionStrategy):
            """
            Initializes the DataIngestor with a specific data ingestion strategy.
    
            Parameters:
            strategy (DataIngestionStrategy): The strategy to be used for data ingestion.
            """
            self._strategy = strategy
    
        def set_strategy(self, strategy: DataIngestionStrategy):
            """
            Sets a new strategy for the DataIngestor.
    
            Parameters:
            strategy (DataIngestionStrategy): The new strategy to be used for data ingestion.
            """
            logging.info("Switching data ingestion strategy.")
            self._strategy = strategy
    
        def ingest_data(self, file_path: str) -> pd.DataFrame:
            """
            Executes the data ingestion using the current strategy.
    
            Parameters:
            file_path (str): The path to the data file to ingest.
    
            Returns:
            pd.DataFrame: A dataframe containing the ingested data.
            """
            logging.info("Ingesting data using the current strategy.")
            return self._strategy.ingest(file_path)
    
    
    # Example usage
    if __name__ == "__main__":
        # Example file path for XLSX file
        # file_path = "../data/raw/your_data_file.xlsx"
    
        # XLSX Ingestion Example
        # xlsx_ingestor = DataIngestor(XLSXIngestion(sheet_name=0))
        # df = xlsx_ingestor.ingest_data(file_path)
    
        # Show the first few rows of the ingested DataFrame if successful
        # if not df.empty:
        #     logging.info("Displaying the first few rows of the ingested data:")
        #     print(df.head())
        pass csv
    創建部署管道

    接下來,我們將創建deployment_pipeline.py

    import os
    import sys
    sys.path.append(os.path.dirname(os.path.dirname(__file__)))
    
    import pandas as pd
    from src.ingest_data import DataIngestor, XLSXIngestion
    from zenml import step
    
    @step
    def data_ingestion_step(file_path: str) -> pd.DataFrame:
        """
        Ingests data from an XLSX file into a DataFrame.
    
        Parameters:
        file_path (str): The path to the XLSX file.
    
        Returns:
        pd.DataFrame: A dataframe containing the ingested data.
        """
        # Initialize the DataIngestor with an XLSXIngestion strategy
        
        ingestor = DataIngestor(XLSXIngestion())
        
        # Ingest data from the specified file
        
        df = ingestor.ingest_data(file_path)
        
        return df
    在運行部署管道時,我們將在zenml儀表板中獲得這樣的視圖:

    了解Zenml項目的MLOP

    恭喜您在本地實例中使用MLFLOW和ZENML部署了最佳模型。

    創建燒瓶應用

    我們的下一步是創建一個將我們的模型投射到最終用戶的燒瓶應用程序。為此,我們必須在模板文件夾中創建一個app.py和index.html。請按照以下代碼創建app.py:

    >

    #import csvimport os
    import sys
    sys.path.append(os.path.dirname(os.path.dirname(__file__)))
    from steps.data_ingestion_step import data_ingestion_step
    from steps.handling_missing_values_step import handling_missing_values_step
    from steps.dropping_columns_step import dropping_columns_step
    from steps.detecting_outliers_step import detecting_outliers_step
    from steps.feature_engineering_step import feature_engineering_step
    from steps.data_splitting_step import data_splitting_step
    from steps.model_building_step import model_building_step
    from steps.model_evaluating_step import model_evaluating_step
    from steps.data_resampling_step import data_resampling_step
    from zenml import Model, pipeline
    
    
    @pipeline(model=Model(name='CLTV_Prediction'))
    def training_pipeline():
        """
        Defines the complete training pipeline for CLTV Prediction.
        Steps:
        1. Data ingestion
        2. Handling missing values
        3. Dropping unnecessary columns
        4. Detecting and handling outliers
        5. Feature engineering
        6. Splitting data into train and test sets
        7. Resampling the training data
        8. Model training
        9. Model evaluation
        """
        # Step 1: Data ingestion
        raw_data = data_ingestion_step(file_path='data/Online_Retail.xlsx')
    
        # Step 2: Drop unnecessary columns
        columns_to_drop = ["Country", "Description", "InvoiceNo", "StockCode"]
        refined_data = dropping_columns_step(raw_data, columns_to_drop)
    
        # Step 3: Detect and handle outliers
        outlier_free_data = detecting_outliers_step(refined_data)
    
        # Step 4: Feature engineering
        features_data = feature_engineering_step(outlier_free_data)
        
        # Step 5: Handle missing values
        cleaned_data = handling_missing_values_step(features_data)
        
        # Step 6: Data splitting
        train_features, test_features, train_target, test_target = data_splitting_step(cleaned_data,"CLTV")
    
        # Step 7: Data resampling
        train_features_resampled, train_target_resampled = data_resampling_step(train_features, train_target)
    
        # Step 8: Model training
        trained_model = model_building_step(train_features_resampled, train_target_resampled)
    
        # Step 9: Model evaluation
        evaluation_metrics = model_evaluating_step(trained_model, test_features, test_target)
    
        # Return evaluation metrics
        return evaluation_metrics
    
    
    if __name__ == "__main__":
        # Run the pipeline
        training_pipeline()
    為創建index.html文件,請按照以下代碼:

    執行後您的app.py應該像這樣:
    mlflow ui

    了解Zenml項目的MLOP>現在的最後一步是在您的github存儲庫中提交這些更改並在任何云服務器上在線部署模型,對於此項目,我們將在免費渲染服務器上部署app.py,您也可以這樣做。

    訪問render.com,並將您的github存儲庫連接到渲染中。

    > 就是這樣。您已成功創建了第一個MLOP項目。希望你喜歡它!

    結論

    MLOP已成為管理機器學習工作流程(從數據攝入到模型部署)的複雜性的必不可少的實踐。通過利用開源MLOPS框架Zenml,我們簡化了為客戶壽命價值(CLTV)預測的構建,培訓和部署生產級ML模型的過程。通過模塊化編碼,強大的管道和無縫集成,我們演示瞭如何有效地創建一個端到端的項目。隨著企業越來越依賴AI驅動的解決方案,Zenml授權團隊之類的框架以最少的手動干預來保持可伸縮性,可重複性和性能。

    >

    鑰匙要點

      MLOPS簡化了ML生命週期,通過自動管道來降低錯誤並提高效率。
    • > zenml提供了用於管理機器學習工作流程的模塊化的可重複使用的編碼結構。
    • >構建端到端管道涉及定義明確的步驟,從數據攝入到部署。
    • >部署管道和燒瓶應用程序確保ML模型已準備就緒且可訪問。
    • >
    • > Zenml和MLFlow等工具可啟用ML項目的無縫跟踪,監視和優化。 >
    • 常見問題

    > Q1。什麼是MLOP,為什麼重要? MLOP(機器學習操作)通過自動化數據攝入,模型培訓,部署和監視,確保效率和可伸縮性等過程來簡化ML生命週期。 Zenml是為了什麼? Zenml是一個開源MLOPS框架,可簡化使用模塊化和可重複使用的代碼的機器學習工作流的開發,部署和管理。

    Q3。我可以在Windows上使用Zenml嗎? Zenml不直接支持Windows,但可以與WSL(Linux的Windows子系統)一起使用。 Zenml中管道的目的是什麼? Zenml中的管道定義了一系列步驟,確保了機器學習項目的結構化和可重複使用的工作流程。燒瓶應用程序如何與ML模型集成? Blask應用程序充當用戶界面,允許最終用戶輸入數據並從已部署的ML模型中接收預測。

以上是了解Zenml項目的MLOP的詳細內容。更多資訊請關注PHP中文網其他相關文章!

陳述:
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn