Advanced Topics

Custom Config Models

The BackendConfig is an implementation of a Pydantic BaseModel and can be extended to add more domain- or capability-specific configurations to the Inference Backend. Please see the example below in which the BackendConfig is extended and initialized as the CustomBackend’s backend_config_model class attribute.

 1from typing import Any, List
 2
 3from packflow import BackendConfig, InferenceBackend
 4
 5
 6class CustomConfigModel(BackendConfig):
 7    # New domain-specific fields with defaults
 8    model_path: str = "./resources/model.joblib"
 9
10    # Set fields without defaults so the backend cannot start without this provided
11    output_class_names: list[str]
12
13
14class CustomBackend(InferenceBackend):
15    backend_config_model = CustomConfigModel
16
17    def initialize(self):
18        self.logger.info(f"Loading model from {self.config.model_path}")
19        self.model = lambda x: x  # Mocking loading of a model
20
21    def execute(self, inputs: Any) -> Any:
22        return self.model(inputs)
23
24    def transform_outputs(self, outputs: Any) -> List[dict]:
25        cleaned = []
26        for row in outputs:
27            cleaned.append(dict(zip(self.config.output_class_names, row)))
28        return cleaned

Further down, the config fields are accessed via the self.config instance attribute.

Note

The BackendConfig will be interpreted and validated according to Pydantic’s validation rules. In the example above, because the output_class_names field is required, Pydantic will treat it as such. Therefore, any instances of CustomBackend must be initialized with the output_class_names keyword argument. For example:

backend = CustomBackend(output_class_names=["doubled"])

Config Hierarchy

The InferenceBackend in Packflow automatically loads and validates configurations passed through three methods:

  1. Base Config Arguments
    • These are the default values defined in the config.

  2. Keyword Arguments Passed at Backend Load Time
    • These will override the base config values.

  3. An optional JSON configuration file that will override all other configs
    • The Inference Backend will load this overrides file from a path passed through the BACKEND_CONFIG_FILE_PATH environment variable.

Important

All configurations are Deep Merged, meaning only explicitly set fields will override fields.

Example: If the base is {"parent": {"child_1": 1, "child_2": 2}} and the overrides are {"parent": {"child_2": 200, "child_3": 3}}, then the deep-merged output will be {"parent": {"child_1": 1, "child_2": 200, "child_3": 3}}

All configurations are validated through Pydantic, ensuring correctness at load time.

Why Use a JSON Configuration File?

An external JSON configuration file is especially helpful for custom configurations in production environments, which allow further configuration of conditions change. For example, if the input fields in the lab/training environment are INPUT0 and INPUT1 but in production, they are input_0 and input_1, the config.json can be configured without changing the code of the Inference Backend by creating the following file:

{
  "configs": {
    "feature_names": ["input_0", "input_1"]
  }
}

Then setting the BACKEND_CONFIG_FILE_PATH environment variable to the absolute path to the config file:

export BACKEND_CONFIG_FILE_PATH=/path/to/config.json

This will be automatically loaded and validated with the Inference Backend’s config model.

Important

The loaded JSON configuration file must contain a "configs" parent key or all values will be ignored. This behavior is the ensure the config file format is extensible to new fields in future releases of Packflow.

Example Use Cases

Here are some example use cases for custom configuration models:

  • Use Case #1: Dynamic Input Field/Feature Names
    • A machine learning model requires a specific set of features to be processed. - Create a custom configuration model that includes these features and use it to validate the input data.

  • Use Case #2: Test/Prod Configurations
    • A data processing pipeline requires different configurations for different environments (e.g., development, testing, production). - Create custom configuration models for each environment and use them to configure the pipeline.

Creating Reusable Backends

Inference backends can be designed to be reusable across multiple projects, sharing common functionality and reducing duplication.

Benefits of Reusable Backends

  • Write once, use many: A single reusable Inference Backend can support multiple projects that share similar requirements.

  • Simplified maintenance: Updates to the Inference Backend can be applied universally, reducing the effort needed to maintain multiple bespoke solutions.

Example: SklearnPipelineBackend

For instance, organizations that frequently use Scikit-Learn pipelines can create a single SklearnPipelineBackend. This Inference Backend can be configured to run multiple pipelines, leveraging:

  • Backend configurations: Specify key information like input names and types.

  • Optimized inference code: Reuse optimized code for inference, improving performance and reducing duplication.

Best Practices

  • Modular design: Design Inference Backends to be modular, making it easier to reuse and combine them.

  • Flexible configuration: Use configuration options to adapt the Inference Backend to work with different projects and requirements.

Logging Configuration

Packflow uses the loguru library for logging and defaults to the INFO log level to minimize noise in production environments. The log level can be controlled via the PACKFLOW_LOG_LEVEL environment variable.

Setting the Log Level

To change the log level, set the PACKFLOW_LOG_LEVEL environment variable to one of the following values:

  • DEBUG: Detailed diagnostic information useful for troubleshooting

  • INFO: General informational messages (default)

  • WARNING: Warning messages for potentially problematic situations

  • ERROR: Error messages for serious problems

  • CRITICAL: Critical messages for very serious errors

Example:

# Enable debug logging for detailed diagnostics
export PACKFLOW_LOG_LEVEL=DEBUG
python inference.py

# Use warning level to see only warnings and errors
export PACKFLOW_LOG_LEVEL=WARNING
python inference.py

Note

The verbose field in the BackendConfig controls whether execution metrics are logged during inference. This is separate from the overall log level and defaults to True. Set verbose=False to suppress metrics logging regardless of the log level.