AI革命>我們大多數人都忽略了一個非常關鍵的問題 - 我們如何維護這些複雜的AI系統?這就是機器學習操作(MLOP)發揮作用的地方。在此博客中,我們將通過構建一個端到端項目來了解MLOP的重要性。
>本文是> > data Science Blogathon的一部分。 目錄的>
現在,讓我們直接進入項目配置。首先,我們需要從UCI機器學習存儲庫下載在線零售數據集。 Windows不支持ZenML,因此我們需要使用Linux(Windows中的WSL)或MacOS。接下來下載unignts.txt。現在,讓我們進入終端以進行幾個配置。
># Make sure you have Python 3.10 or above installed python --version # Make a new Python environment using any method python3.10 -m venv myenv # Activate the environment source myenv/bin/activate # Install the requirements from the provided source above pip install -r requirements.txt # Install the Zenml server pip install zenml[server] == 0.66.0 # Initialize the Zenml server zenml init # Launch the Zenml dashboard zenml up
現在,只需使用默認登錄憑據登錄Zenml儀表板(無需密碼)。
恭喜您已經成功完成了項目配置。>
>探索性數據分析(EDA)>
>pro tip:進行自己的分析而不關注我。 >
>您只需關注本筆記本,我們在該筆記本上創建了不同的數據分析方法以在我們的項目中使用。現在,假設您已經執行了數據分析的份額,那麼讓我們直接跳到辛辣的部分。
為了增加代碼的模塊化和重複性,@Step Decorator是從Zenml中使用的,該裝飾器組織了我們的代碼以傳遞到Pipelines Hassle Hastle Free,減少了錯誤的機會。
>攝入數據的示例代碼
> ingest_data.py
的代碼示例
import logging import pandas as pd from abc import ABC, abstractmethod # Setup logging configuration logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s") # Abstract Base Class for Data Ingestion Strategy # ------------------------------------------------ # This class defines a common interface for different data ingestion strategies. # Subclasses must implement the `ingest` method. class DataIngestionStrategy(ABC): @abstractmethod def ingest(self, file_path: str) -> pd.DataFrame: """ Abstract method to ingest data from a file into a DataFrame. Parameters: file_path (str): The path to the data file to ingest. Returns: pd.DataFrame: A dataframe containing the ingested data. """ pass # Concrete Strategy for XLSX File Ingestion # ----------------------------------------- # This strategy handles the ingestion of data from an XLSX file. class XLSXIngestion(DataIngestionStrategy): def __init__(self, sheet_name=0): """ Initializes the XLSXIngestion with optional sheet name. Parameters: sheet_name (str or int): The sheet name or index to read, default is the first sheet. """ self.sheet_name = sheet_name def ingest(self, file_path: str) -> pd.DataFrame: """ Ingests data from an XLSX file into a DataFrame. Parameters: file_path (str): The path to the XLSX file. Returns: pd.DataFrame: A dataframe containing the ingested data. """ try: logging.info(f"Attempting to read XLSX file: {file_path}") df = pd.read_excel(file_path,dtype={'InvoiceNo': str, 'StockCode': str, 'Description':str}, sheet_name=self.sheet_name) logging.info(f"Successfully read XLSX file: {file_path}") return df except FileNotFoundError: logging.error(f"File not found: {file_path}") except pd.errors.EmptyDataError: logging.error(f"File is empty: {file_path}") except Exception as e: logging.error(f"An error occurred while reading the XLSX file: {e}") return pd.DataFrame() # Context Class for Data Ingestion # -------------------------------- # This class uses a DataIngestionStrategy to ingest data from a file. class DataIngestor: def __init__(self, strategy: DataIngestionStrategy): """ Initializes the DataIngestor with a specific data ingestion strategy. Parameters: strategy (DataIngestionStrategy): The strategy to be used for data ingestion. """ self._strategy = strategy def set_strategy(self, strategy: DataIngestionStrategy): """ Sets a new strategy for the DataIngestor. Parameters: strategy (DataIngestionStrategy): The new strategy to be used for data ingestion. """ logging.info("Switching data ingestion strategy.") self._strategy = strategy def ingest_data(self, file_path: str) -> pd.DataFrame: """ Executes the data ingestion using the current strategy. Parameters: file_path (str): The path to the data file to ingest. Returns: pd.DataFrame: A dataframe containing the ingested data. """ logging.info("Ingesting data using the current strategy.") return self._strategy.ingest(file_path) # Example usage if __name__ == "__main__": # Example file path for XLSX file # file_path = "../data/raw/your_data_file.xlsx" # XLSX Ingestion Example # xlsx_ingestor = DataIngestor(XLSXIngestion(sheet_name=0)) # df = xlsx_ingestor.ingest_data(file_path) # Show the first few rows of the ingested DataFrame if successful # if not df.empty: # logging.info("Displaying the first few rows of the ingested data:") # print(df.head()) pass csv
>寫下所有方法後,是時候初始化Zenml步驟中的步驟文件夾了。現在,我們到目前為止創建的所有方法將在Zenml步驟中使用。
示例攝入的示例代碼
>
import os import sys sys.path.append(os.path.dirname(os.path.dirname(__file__))) import pandas as pd from src.ingest_data import DataIngestor, XLSXIngestion from zenml import step @step def data_ingestion_step(file_path: str) -> pd.DataFrame: """ Ingests data from an XLSX file into a DataFrame. Parameters: file_path (str): The path to the XLSX file. Returns: pd.DataFrame: A dataframe containing the ingested data. """ # Initialize the DataIngestor with an XLSXIngestion strategy ingestor = DataIngestor(XLSXIngestion()) # Ingest data from the specified file df = ingestor.ingest_data(file_path) return df
哇!祝賀創建和學習MLOP最重要的部分之一。可以讓一些不知所措,因為這是您的第一次。不要承受太大的壓力,因為當您運行第一級生產級ML模型時,一切都會很有意義。
>是時候構建我們的管道了。不,不要攜帶水或油。管道是按特定順序組織的一系列步驟,以形成我們完整的機器學習工作流程。 @PiPeline裝飾器在Zenml中用於指定將包含我們上面創建的步驟的管道。這種方法確保我們可以將一個步驟的輸出用作下一步的輸入。
這是我們的triagn_pipeline.py:
# Make sure you have Python 3.10 or above installed python --version # Make a new Python environment using any method python3.10 -m venv myenv # Activate the environment source myenv/bin/activate # Install the requirements from the provided source above pip install -r requirements.txt # Install the Zenml server pip install zenml[server] == 0.66.0 # Initialize the Zenml server zenml init # Launch the Zenml dashboard zenml up>現在我們可以單擊一次訓練_pipeline.py來訓練我們的ML模型。您可以檢查Zenml儀表板中的管道:
>
import logging import pandas as pd from abc import ABC, abstractmethod # Setup logging configuration logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s") # Abstract Base Class for Data Ingestion Strategy # ------------------------------------------------ # This class defines a common interface for different data ingestion strategies. # Subclasses must implement the `ingest` method. class DataIngestionStrategy(ABC): @abstractmethod def ingest(self, file_path: str) -> pd.DataFrame: """ Abstract method to ingest data from a file into a DataFrame. Parameters: file_path (str): The path to the data file to ingest. Returns: pd.DataFrame: A dataframe containing the ingested data. """ pass # Concrete Strategy for XLSX File Ingestion # ----------------------------------------- # This strategy handles the ingestion of data from an XLSX file. class XLSXIngestion(DataIngestionStrategy): def __init__(self, sheet_name=0): """ Initializes the XLSXIngestion with optional sheet name. Parameters: sheet_name (str or int): The sheet name or index to read, default is the first sheet. """ self.sheet_name = sheet_name def ingest(self, file_path: str) -> pd.DataFrame: """ Ingests data from an XLSX file into a DataFrame. Parameters: file_path (str): The path to the XLSX file. Returns: pd.DataFrame: A dataframe containing the ingested data. """ try: logging.info(f"Attempting to read XLSX file: {file_path}") df = pd.read_excel(file_path,dtype={'InvoiceNo': str, 'StockCode': str, 'Description':str}, sheet_name=self.sheet_name) logging.info(f"Successfully read XLSX file: {file_path}") return df except FileNotFoundError: logging.error(f"File not found: {file_path}") except pd.errors.EmptyDataError: logging.error(f"File is empty: {file_path}") except Exception as e: logging.error(f"An error occurred while reading the XLSX file: {e}") return pd.DataFrame() # Context Class for Data Ingestion # -------------------------------- # This class uses a DataIngestionStrategy to ingest data from a file. class DataIngestor: def __init__(self, strategy: DataIngestionStrategy): """ Initializes the DataIngestor with a specific data ingestion strategy. Parameters: strategy (DataIngestionStrategy): The strategy to be used for data ingestion. """ self._strategy = strategy def set_strategy(self, strategy: DataIngestionStrategy): """ Sets a new strategy for the DataIngestor. Parameters: strategy (DataIngestionStrategy): The new strategy to be used for data ingestion. """ logging.info("Switching data ingestion strategy.") self._strategy = strategy def ingest_data(self, file_path: str) -> pd.DataFrame: """ Executes the data ingestion using the current strategy. Parameters: file_path (str): The path to the data file to ingest. Returns: pd.DataFrame: A dataframe containing the ingested data. """ logging.info("Ingesting data using the current strategy.") return self._strategy.ingest(file_path) # Example usage if __name__ == "__main__": # Example file path for XLSX file # file_path = "../data/raw/your_data_file.xlsx" # XLSX Ingestion Example # xlsx_ingestor = DataIngestor(XLSXIngestion(sheet_name=0)) # df = xlsx_ingestor.ingest_data(file_path) # Show the first few rows of the ingested DataFrame if successful # if not df.empty: # logging.info("Displaying the first few rows of the ingested data:") # print(df.head()) pass csv創建部署管道
import os import sys sys.path.append(os.path.dirname(os.path.dirname(__file__))) import pandas as pd from src.ingest_data import DataIngestor, XLSXIngestion from zenml import step @step def data_ingestion_step(file_path: str) -> pd.DataFrame: """ Ingests data from an XLSX file into a DataFrame. Parameters: file_path (str): The path to the XLSX file. Returns: pd.DataFrame: A dataframe containing the ingested data. """ # Initialize the DataIngestor with an XLSXIngestion strategy ingestor = DataIngestor(XLSXIngestion()) # Ingest data from the specified file df = ingestor.ingest_data(file_path) return df在運行部署管道時,我們將在zenml儀表板中獲得這樣的視圖:
創建燒瓶應用
>
#import csvimport os import sys sys.path.append(os.path.dirname(os.path.dirname(__file__))) from steps.data_ingestion_step import data_ingestion_step from steps.handling_missing_values_step import handling_missing_values_step from steps.dropping_columns_step import dropping_columns_step from steps.detecting_outliers_step import detecting_outliers_step from steps.feature_engineering_step import feature_engineering_step from steps.data_splitting_step import data_splitting_step from steps.model_building_step import model_building_step from steps.model_evaluating_step import model_evaluating_step from steps.data_resampling_step import data_resampling_step from zenml import Model, pipeline @pipeline(model=Model(name='CLTV_Prediction')) def training_pipeline(): """ Defines the complete training pipeline for CLTV Prediction. Steps: 1. Data ingestion 2. Handling missing values 3. Dropping unnecessary columns 4. Detecting and handling outliers 5. Feature engineering 6. Splitting data into train and test sets 7. Resampling the training data 8. Model training 9. Model evaluation """ # Step 1: Data ingestion raw_data = data_ingestion_step(file_path='data/Online_Retail.xlsx') # Step 2: Drop unnecessary columns columns_to_drop = ["Country", "Description", "InvoiceNo", "StockCode"] refined_data = dropping_columns_step(raw_data, columns_to_drop) # Step 3: Detect and handle outliers outlier_free_data = detecting_outliers_step(refined_data) # Step 4: Feature engineering features_data = feature_engineering_step(outlier_free_data) # Step 5: Handle missing values cleaned_data = handling_missing_values_step(features_data) # Step 6: Data splitting train_features, test_features, train_target, test_target = data_splitting_step(cleaned_data,"CLTV") # Step 7: Data resampling train_features_resampled, train_target_resampled = data_resampling_step(train_features, train_target) # Step 8: Model training trained_model = model_building_step(train_features_resampled, train_target_resampled) # Step 9: Model evaluation evaluation_metrics = model_evaluating_step(trained_model, test_features, test_target) # Return evaluation metrics return evaluation_metrics if __name__ == "__main__": # Run the pipeline training_pipeline()為創建index.html文件,請按照以下代碼:
執行後您的app.py應該像這樣:
mlflow ui
>現在的最後一步是在您的github存儲庫中提交這些更改並在任何云服務器上在線部署模型,對於此項目,我們將在免費渲染服務器上部署app.py,您也可以這樣做。
> 就是這樣。您已成功創建了第一個MLOP項目。希望你喜歡它!
結論
MLOP已成為管理機器學習工作流程(從數據攝入到模型部署)的複雜性的必不可少的實踐。通過利用開源MLOPS框架Zenml,我們簡化了為客戶壽命價值(CLTV)預測的構建,培訓和部署生產級ML模型的過程。通過模塊化編碼,強大的管道和無縫集成,我們演示瞭如何有效地創建一個端到端的項目。隨著企業越來越依賴AI驅動的解決方案,Zenml授權團隊之類的框架以最少的手動干預來保持可伸縮性,可重複性和性能。>
以上是了解Zenml項目的MLOP的詳細內容。更多資訊請關注PHP中文網其他相關文章!