집 >백엔드 개발 >파이썬 튜토리얼 >Python에서 Pydantic을 사용하는 모범 사례

Python에서 Pydantic을 사용하는 모범 사례

PHPz원래의: 2024-07-19 04:28:311543검색

Best Practices for Using Pydantic in Python

Pydantic은 유형 힌트를 사용하여 데이터 유효성 검사를 단순화하는 Python 라이브러리입니다. 데이터 무결성을 보장하고 자동 유형 검사 및 검증을 통해 데이터 모델을 생성하는 쉬운 방법을 제공합니다.

소프트웨어 애플리케이션에서는 오류, 보안 문제 및 예측할 수 없는 동작을 방지하기 위해 안정적인 데이터 검증이 중요합니다.

이 가이드는 모델 정의, 데이터 검증, 오류 처리 및 성능 최적화를 다루면서 Python 프로젝트에서 Pydantic을 사용하는 모범 사례를 제공합니다.

Pydantic 설치

Pydantic을 설치하려면 Python 패키지 설치 프로그램인 pip를 다음 명령과 함께 사용하세요.

pip install pydantic

이 명령은 Pydantic과 해당 종속성을 설치합니다.

기본 사용법

BaseModel에서 상속받는 클래스를 만들어 Pydantic 모델을 만듭니다. Python 유형 주석을 사용하여 각 필드의 유형을 지정하세요.

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

Pydantic은 int, str, float, bool, list 및 dict를 포함한 다양한 필드 유형을 지원합니다. 중첩된 모델과 사용자 정의 유형을 정의할 수도 있습니다.

from typing import List, Optional
from pydantic import BaseModel

class Address(BaseModel):
    street: str
    city: str
    zip_code: Optional[str] = None

class User(BaseModel):
    id: int
    name: str
    email: str
    age: Optional[int] = None
    addresses: List[Address]

Pydantic 모델을 정의한 후에는 필수 데이터를 제공하여 인스턴스를 생성하세요. Pydantic은 데이터의 유효성을 검사하고 필드가 지정된 요구 사항을 충족하지 않으면 오류를 발생시킵니다.

user = User(
    id=1,
    name="John Doe",
    email="john.doe@example.com",
    addresses=[{"street": "123 Main St", "city": "Anytown", "zip_code": "12345"}]
)

print(user)

# Output:
# id=1 name='John Doe' email='john.doe@example.com' age=None addresses=[Address(street='123 Main St', city='Anytown', zip_code='12345')]

Pydantic 모델 정의

Pydantic 모델은 Python 유형 주석을 사용하여 데이터 필드 유형을 정의합니다.

다음을 포함한 다양한 내장 유형을 지원합니다.

기본 유형: str, int, float, bool
컬렉션 유형: 목록, 튜플, 세트, 사전
옵션 유형: None이 될 수 있는 필드에 대한 입력 모듈의 옵션
Union 유형: 입력 모듈에서 필드를 지정하는 Union은 여러 유형 중 하나일 수 있습니다

예:

from typing import List, Dict, Optional, Union
from pydantic import BaseModel

class Item(BaseModel):
    name: str
    price: float
    tags: List[str]
    metadata: Dict[str, Union[str, int, float]]

class Order(BaseModel):
    order_id: int
    items: List[Item]
    discount: Optional[float] = None

사용자 정의 유형

내장 유형 외에도 Pydantic의 conint, constr 및 기타 제약 함수를 사용하여 사용자 정의 유형을 정의할 수 있습니다.

이를 통해 문자열의 길이 제한이나 정수의 값 범위와 같은 추가 유효성 검사 규칙을 추가할 수 있습니다.

예:

from pydantic import BaseModel, conint, constr

class Product(BaseModel):
    name: constr(min_length=2, max_length=50)
    quantity: conint(gt=0, le=1000)
    price: float

product = Product(name="Laptop", quantity=5, price=999.99)

필수 필드와 선택 필드

기본적으로 Pydantic 모델의 필드는 명시적으로 선택사항으로 표시되지 않는 한 필수입니다.

모델 인스턴스화 중에 필수 필드가 누락된 경우 Pydantic은 ValidationError를 발생시킵니다.

예:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

user = User(id=1, name="John Doe")


# Output
#  Field required [type=missing, input_value={'id': 1, 'name': 'John Doe'}, input_type=dict]

기본값이 있는 선택 필드

입력 모듈에서 Optional을 사용하고 기본값을 제공하여 필드를 선택적으로 만들 수 있습니다.

예:

from pydantic import BaseModel
from typing import Optional

class User(BaseModel):
    id: int
    name: str
    email: Optional[str] = None

user = User(id=1, name="John Doe")

이 예에서 이메일은 선택 사항이며 제공되지 않은 경우 기본값은 없음입니다.

중첩 모델

Pydantic을 사용하면 모델을 서로 중첩하여 복잡한 데이터 구조를 구현할 수 있습니다.

중첩 모델은 다른 모델의 필드로 정의되어 여러 수준에서 데이터 무결성과 검증을 보장합니다.

예:

from pydantic import BaseModel
from typing import Optional, List


class Address(BaseModel):
    street: str
    city: str
    zip_code: Optional[str] = None

class User(BaseModel):
    id: int
    name: str
    email: str
    addresses: List[Address]

user = User(
    id=1,
    name="John Doe",
    email="john.doe@example.com",
    addresses=[{"street": "123 Main St", "city": "Anytown"}]
)

중첩 데이터 관리 모범 사례

중첩된 모델로 작업할 때 다음 사항이 중요합니다.

각 수준에서 데이터 유효성 검사: 각 중첩 모델에 고유한 유효성 검사 규칙과 제약 조건이 있는지 확인하세요.
명확하고 일관된 명명 규칙을 사용하세요. 이렇게 하면 데이터 구조를 더 쉽게 읽고 유지 관리할 수 있습니다.
모델을 단순하게 유지하세요. 지나치게 복잡한 중첩 구조를 피하세요. 모델이 너무 복잡해지면 더 작고 관리하기 쉬운 구성요소로 나누는 것이 좋습니다.

데이터 검증

Pydantic에는 일반적인 데이터 검증 작업을 자동으로 처리하는 내장 검증기 세트가 포함되어 있습니다.

이러한 유효성 검사기에는 다음이 포함됩니다.

유형 유효성 검사: 필드가 지정된 유형 주석(예: int, str, list)과 일치하는지 확인합니다.
범위 유효성 검사: conint, constr, confloat와 같은 제약 조건을 사용하여 값 범위와 길이를 적용합니다.
형식 확인: 이메일 주소 확인을 위해 EmailStr과 같은 특정 형식을 확인합니다.
컬렉션 유효성 검사: 컬렉션 내의 요소(예: 목록, 사전)가 지정된 유형 및 제약 조건을 준수하는지 확인합니다.

이러한 유효성 검사기는 모델 내에서 데이터 무결성과 적합성을 보장하는 프로세스를 단순화합니다.

다음은 내장된 유효성 검사기를 보여주는 몇 가지 예입니다.

pydantic import BaseModel, EmailStr, conint, constr

class User(BaseModel):
    id: conint(gt=0)  # id must be greater than 0
    name: constr(min_length=2, max_length=50)  # name must be between 2 and 50 characters
    email: EmailStr  # email must be a valid email address
    age: conint(ge=18)  # age must be 18 or older

user = User(id=1, name="John Doe", email="john.doe@example.com", age=25)

이 예에서 사용자 모델은 내장된 유효성 검사기를 사용하여 ID가 0보다 크고, 이름이 2~50자인지, 이메일이 유효한 이메일 주소인지, 연령이 18세 이상인지 확인합니다.
이메일 검사기를 사용하려면 pydantic 확장 프로그램을 설치해야 합니다:

pip install pydantic[email]

Custom Validators

Pydantic allows you to define custom validators for more complex validation logic.

Custom validators are defined using the @field_validator decorator within your model class.

Example of a custom validator:

from pydantic import BaseModel, field_validator


class Product(BaseModel):
    name: str
    price: float

    @field_validator('price')
    def price_must_be_positive(cls, value):
        if value <= 0:
            raise ValueError('Price must be positive')
        return value

product = Product(name="Laptop", price=999.99)

Here, the price_must_be_positive validator ensures that the price field is a positive number.

Custom validators are registered automatically when you define them within a model using the @field_validator decorator. Validators can be applied to individual fields or across multiple fields.

Example of registering a validator for multiple fields:

from pydantic import BaseModel, field_validator


class Person(BaseModel):
    first_name: str
    last_name: str

    @field_validator('first_name', 'last_name')
    def names_cannot_be_empty(cls, value):
        if not value:
            raise ValueError('Name fields cannot be empty')
        return value

person = Person(first_name="John", last_name="Doe")

In this example, the names_cannot_be_empty validator ensures that both the first_name and last_name fields are not empty.

Using Config Classes

Pydantic models can be customized using an inner Config class.

This class allows you to set various configuration options that affect the model's behavior, such as validation rules, JSON serialization, and more.

Example of a Config class:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

    class Config:
        str_strip_whitespace = True  # Strip whitespace from strings
        str_min_length = 1  # Minimum length for any string field

user = User(id=1, name="  John Doe  ", email="john.doe@example.com")

print(user)

# Output:
# id=1 name='John Doe' email='john.doe@example.com'

In this example, the Config class is used to strip whitespace from string fields and enforce a minimum length of 1 for any string field.

Some common configuration options in Pydantic's Config class include:

str_strip_whitespace: Automatically strip leading and trailing whitespace from string fields.
str_min_length: Set a minimum length for any string field.
validate_default: Validate all fields, even those with default values.
validate_assignment: Enable validation on assignment to model attributes.
use_enum_values: Use the values of enums directly instead of the enum instances.
json_encoders: Define custom JSON encoders for specific types.

Error Handling

When Pydantic finds data that doesn't conform to the model's schema, it raises a ValidationError.

This error provides detailed information about the issue, including the field name, the incorrect value, and a description of the problem.

Here's an example of how default error messages are structured:

from pydantic import BaseModel, ValidationError, EmailStr

class User(BaseModel):
    id: int
    name: str
    email: EmailStr

try:
    user = User(id='one', name='John Doe', email='invalid-email')
except ValidationError as e:
    print(e.json())

# Output:
# [{"type":"int_parsing","loc":["id"],"msg":"Input should be a valid integer, unable to parse string as an integer","input":"one","url":"https://errors.pydantic.dev/2.8/v/int_parsing"},{"type":"value_error","loc":["email"],"msg":"value is not a valid email address: An email address must have an @-sign.","input":"invalid-email","ctx":{"reason":"An email address must have an @-sign."},"url":"https://errors.pydantic.dev/2.8/v/value_error"}]

In this example, the error message will indicate that id must be an integer and email must be a valid email address.

Customizing Error Messages

Pydantic allows you to customize error messages for specific fields by raising exceptions with custom messages in validators or by setting custom configurations.

Here’s an example of customizing error messages:

from pydantic import BaseModel, ValidationError, field_validator

class Product(BaseModel):
    name: str
    price: float

    @field_validator('price')
    def price_must_be_positive(cls, value):
        if value <= 0:
            raise ValueError('Price must be a positive number')
        return value

try:
    product = Product(name='Laptop', price=-1000)
except ValidationError as e:
    print(e.json())

# Output:
# [{"type":"value_error","loc":["price"],"msg":"Value error, Price must be a positive number","input":-1000,"ctx":{"error":"Price must be a positive number"},"url":"https://errors.pydantic.dev/2.8/v/value_error"}]

In this example, the error message for price is customized to indicate that it must be a positive number.

Best Practices for Error Reporting

Effective error reporting involves providing clear, concise, and actionable feedback to users or developers.

Here are some best practices:

Log errors: Use logging mechanisms to record validation errors for debugging and monitoring purposes.
Return user-friendly messages: When exposing errors to end-users, avoid technical jargon. Instead, provide clear instructions on how to correct the data.
Aggregate errors: When multiple fields are invalid, aggregate the errors into a single response to help users correct all issues at once.
Use consistent formats: Ensure that error messages follow a consistent format across the application for easier processing and understanding.

Examples of best practices in error reporting:

from pydantic import BaseModel, ValidationError, EmailStr
import logging

logging.basicConfig(level=logging.INFO)

class User(BaseModel):
    id: int
    name: str
    email: EmailStr

def create_user(data):
    try:
        user = User(**data)
        return user
    except ValidationError as e:
        logging.error("Validation error: %s", e.json())
        return {"error": "Invalid data provided", "details": e.errors()}

user_data = {'id': 'one', 'name': 'John Doe', 'email': 'invalid-email'}
response = create_user(user_data)
print(response)

# Output:
# ERROR:root:Validation error: [{"type":"int_parsing","loc":["id"],"msg":"Input should be a valid integer, unable to parse string as an integer","input":"one","url":"https://errors.pydantic.dev/2.8/v/int_parsing"},{"type":"value_error","loc":["email"],"msg":"value is not a valid email address: An email address must have an @-sign.","input":"invalid-email","ctx":{"reason":"An email address must have an @-sign."},"url":"https://errors.pydantic.dev/2.8/v/value_error"}]
# {'error': 'Invalid data provided', 'details': [{'type': 'int_parsing', 'loc': ('id',), 'msg': 'Input should be a valid integer, unable to parse string as an integer', 'input': 'one', 'url': 'https://errors.pydantic.dev/2.8/v/int_parsing'}, {'type': 'value_error', 'loc': ('email',), 'msg': 'value is not a valid email address: An email address must have an @-sign.', 'input': 'invalid-email', 'ctx': {'reason': 'An email address must have an @-sign.'}}]}

In this example, validation errors are logged, and a user-friendly error message is returned, helping maintain application stability and providing useful feedback to the user.

Performance Considerations

Lazy initialization is a technique that postpones the creation of an object until it is needed.

In Pydantic, this can be useful for models with fields that are costly to compute or fetch. By delaying the initialization of these fields, you can reduce the initial load time and improve performance.

Example of lazy initialization:

from pydantic import BaseModel
from functools import lru_cache

class DataModel(BaseModel):
    name: str
    expensive_computation: str = None

    @property
    @lru_cache(maxsize=1)
    def expensive_computation(self):
        # Simulate an expensive computation
        result = "Computed Value"
        return result

data_model = DataModel(name="Test")
print(data_model.expensive_computation)

In this example, the expensive_computation field is computed only when accessed for the first time, reducing unnecessary computations during model initialization.

Redundant Validation

Pydantic models automatically validate data during initialization.

However, if you know that certain data has already been validated or if validation is not necessary in some contexts, you can disable validation to improve performance.

This can be done using the model_construct method, which bypasses validation:

Example of avoiding redundant validation:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

# Constructing a User instance without validation
data = {'id': 1, 'name': 'John Doe', 'email': 'john.doe@example.com'}
user = User.model_construct(**data)

In this example, User.model_construct is used to create a User instance without triggering validation, which can be useful in performance-critical sections of your code.

Efficient Data Parsing

When dealing with large datasets or high-throughput systems, efficiently parsing raw data becomes critical.

Pydantic provides the model_validate_json method, which can be used to parse JSON or other serialized data formats directly into Pydantic models.

Example of efficient data parsing:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

json_data = '{"id": 1, "name": "John Doe", "email": "john.doe@example.com"}'
user = User.model_validate_json(json_data)
print(user)

In this example, model_validate_json is used to parse JSON data into a User model directly, providing a more efficient way to handle serialized data.

Controlling Validation

Pydantic models can be configured to validate data only when necessary.

The validate_default and validate_assignment options in the Config class control when validation occurs, which can help improve performance:

validate_default: When set to False, only fields that are set during initialization are validated.
validate_assignment: When set to True, validation is performed on field assignment after the model is created.

Example configuration:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

    class Config:
        validate_default = False  # Only validate fields set during initialization
        validate_assignment = True  # Validate fields on assignment

user = User(id=1, name="John Doe", email="john.doe@example.com")
user.email = "new.email@example.com"  # This assignment will trigger validation

In this example, validate_default is set to False to avoid unnecessary validation during initialization, and validate_assignment is set to True to ensure that fields are validated when they are updated.

Settings Management

Pydantic's BaseSettings class is designed for managing application settings, supporting environment variable loading and type validation.

This helps in configuring applications for different environments (e.g., development, testing, production).

Consider this .env file:

database_url=db
secret_key=sk
debug=False

Example of using BaseSettings:

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    database_url: str
    secret_key: str
    debug: bool = False

    class Config:
        env_file = ".env"

settings = Settings()
print(settings.model_dump())

# Output:
# {'database_url': 'db', 'secret_key': 'sk', 'debug': False}

In this example, settings are loaded from environment variables, and the Config class specifies that variables can be loaded from a .env file.

For using BaseSettings you will need to install an additional package:

pip install pydantic-settings

Managing settings effectively involves a few best practices:

Use environment variables: Store configuration values in environment variables to keep sensitive data out of your codebase.
Provide defaults: Define sensible default values for configuration settings to ensure the application runs with minimal configuration.
Separate environments: Use different configuration files or environment variables for different environments (e.g., .env.development, .env.production).
Validate settings: Use Pydantic's validation features to ensure all settings are correctly typed and within acceptable ranges.

Common Pitfalls and How to Avoid Them

One common mistake when using Pydantic is misapplying type annotations, which can lead to validation errors or unexpected behavior.

Here are a few typical mistakes and their solutions:

Misusing Union Types: Using Union incorrectly can complicate type validation and handling.
Optional Fields without Default Values: Forgetting to provide a default value for optional fields can lead to None values causing errors in your application.
Incorrect Type Annotations: Assigning incorrect types to fields can cause validation to fail. For example, using str for a field that should be an int.

Ignoring Performance Implications

Ignoring performance implications when using Pydantic can lead to slow applications, especially when dealing with large datasets or frequent model instantiations.

Here are some strategies to avoid performance bottlenecks:

Leverage Configuration Options: Use Pydantic's configuration options like validate_default and validate_assignment to control when validation occurs.
Optimize Nested Models: When working with nested models, ensure that you are not over-validating or duplicating validation logic.
Use Efficient Parsing Methods: Utilize model_validate_json and model_validate for efficient data parsing.
Avoid Unnecessary Validation: Use the model_construct method to create models without validation when the data is already known to be valid.

Overcomplicating Models

Overcomplicating Pydantic models can make them difficult to maintain and understand.

Here are some tips to keep models simple and maintainable:

모델 문서화: 독스트링과 주석을 사용하여 모델에 포함된 복잡한 유효성 검사 규칙이나 비즈니스 로직을 설명합니다.
논리를 적절하게 캡슐화합니다. 모델 정의가 복잡해지지 않도록 적절한 모델 방법이나 외부 유틸리티 내에 검증 및 비즈니스 논리를 유지합니다.
상속을 적게 사용: 상속은 코드 재사용을 촉진할 수 있지만 과도하게 사용하면 모델 계층 구조가 복잡해지고 따르기가 더 어려워질 수 있습니다.
과도한 중첩 방지: 깊게 중첩된 모델은 관리하기 어려울 수 있습니다. 균형잡힌 수준의 네스팅을 목표로 하세요.

결론

이 가이드에서는 Python 프로젝트에서 Pydantic을 효과적으로 사용하기 위한 다양한 모범 사례를 다루었습니다.

설치, 기본 사용법, 모델 정의 등 Pydantic을 시작하기 위한 기본 사항부터 시작했습니다. 그런 다음 사용자 정의 유형, 직렬화 및 역직렬화, 설정 관리와 같은 고급 기능을 조사했습니다.

애플리케이션이 원활하게 실행되도록 보장하기 위해 모델 초기화 최적화 및 효율적인 데이터 구문 분석과 같은 주요 성능 고려 사항이 강조되었습니다.

또한 유형 주석의 오용, 성능 영향 무시, 모델의 과도한 복잡화 등 일반적인 함정에 대해 논의하고 이를 방지하기 위한 전략을 제공했습니다.

실제 프로젝트에 이러한 모범 사례를 적용하면 Pydantic의 모든 기능을 활용하여 코드를 더욱 강력하고 유지 관리 가능하며 성능 좋게 만드는 데 도움이 됩니다.

위 내용은 Python에서 Pydantic을 사용하는 모범 사례의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!

Python json pip String Integer Float Object define if for while format include Validating Error Logging enum union bool int raise using class Length Collection number default this issue

성명：

이전 기사：Ibis를 사용한 선언적 다중 엔진 데이터 스택다음 기사：Ibis를 사용한 선언적 다중 엔진 데이터 스택