Advanced Topics ############### .. _custom-config-models: Custom Config Models ==================== The ``BackendConfig`` is an implementation of a Pydantic ``BaseModel`` and can be extended to add more domain- or capability-specific configurations to the Inference Backend. Please see the example below in which the ``BackendConfig`` is extended and initialized as the ``CustomBackend``'s ``backend_config_model`` class attribute. .. literalinclude:: ../code-examples/usage/extending-config-model.py :language: python :linenos: :emphasize-lines: 6-11,18,27 Further down, the config fields are accessed via the `self.config` instance attribute. .. note:: The ``BackendConfig`` will be interpreted and validated according to `Pydantic's validation rules`_. In the example above, because the ``output_class_names`` field is required, Pydantic will treat it as such. Therefore, any instances of ``CustomBackend`` must be initialized with the ``output_class_names`` keyword argument. For example: ``backend = CustomBackend(output_class_names=["doubled"])`` .. _Pydantic's validation rules: https://docs.pydantic.dev/latest/concepts/models/#basic-model-usage .. _config-hierarchy: Config Hierarchy ---------------- The ``InferenceBackend`` in Packflow automatically loads and validates configurations passed through three methods: 1. Base Config Arguments - These are the default values defined in the config. 2. Keyword Arguments Passed at Backend Load Time - These will override the base config values. 3. An optional JSON configuration file that will override all other configs - The Inference Backend will load this overrides file from a path passed through the ``BACKEND_CONFIG_FILE_PATH`` environment variable. .. important:: All configurations are Deep Merged, meaning only explicitly set fields will override fields. **Example:** If the base is ``{"parent": {"child_1": 1, "child_2": 2}}`` and the overrides are ``{"parent": {"child_2": 200, "child_3": 3}}``, then the deep-merged output will be ``{"parent": {"child_1": 1, "child_2": 200, "child_3": 3}}`` All configurations are validated through Pydantic, ensuring correctness at load time. Why Use a JSON Configuration File? ---------------------------------- An external JSON configuration file is especially helpful for custom configurations in production environments, which allow further configuration of conditions change. For example, if the input fields in the lab/training environment are ``INPUT0`` and ``INPUT1`` but in production, they are ``input_0`` and ``input_1``, the config.json can be configured without changing the code of the Inference Backend by creating the following file: .. code-block:: json { "configs": { "feature_names": ["input_0", "input_1"] } } Then setting the ``BACKEND_CONFIG_FILE_PATH`` environment variable to the absolute path to the config file: .. code-block:: bash export BACKEND_CONFIG_FILE_PATH=/path/to/config.json This will be automatically loaded and validated with the Inference Backend's config model. .. important:: The loaded JSON configuration file *must* contain a ``"configs"`` parent key or all values will be ignored. This behavior is the ensure the config file format is extensible to new fields in future releases of Packflow. Example Use Cases ----------------- Here are some example use cases for custom configuration models: - Use Case #1: Dynamic Input Field/Feature Names - A machine learning model requires a specific set of features to be processed. - Create a custom configuration model that includes these features and use it to validate the input data. - Use Case #2: Test/Prod Configurations - A data processing pipeline requires different configurations for different environments (e.g., development, testing, production). - Create custom configuration models for each environment and use them to configure the pipeline. Creating Reusable Backends ========================== Inference backends can be designed to be reusable across multiple projects, sharing common functionality and reducing duplication. Benefits of Reusable Backends ----------------------------- * Write once, use many: A single reusable Inference Backend can support multiple projects that share similar requirements. * Simplified maintenance: Updates to the Inference Backend can be applied universally, reducing the effort needed to maintain multiple bespoke solutions. Example: SklearnPipelineBackend ------------------------------- For instance, organizations that frequently use Scikit-Learn pipelines can create a single ``SklearnPipelineBackend``. This Inference Backend can be configured to run multiple pipelines, leveraging: * **Backend configurations**: Specify key information like input names and types. * **Optimized inference code**: Reuse optimized code for inference, improving performance and reducing duplication. Best Practices -------------- * **Modular design**: Design Inference Backends to be modular, making it easier to reuse and combine them. * **Flexible configuration**: Use configuration options to adapt the Inference Backend to work with different projects and requirements. .. _logging-configuration: Logging Configuration ===================== Packflow uses the `loguru `_ library for logging and defaults to the ``INFO`` log level to minimize noise in production environments. The log level can be controlled via the ``PACKFLOW_LOG_LEVEL`` environment variable. Setting the Log Level --------------------- To change the log level, set the ``PACKFLOW_LOG_LEVEL`` environment variable to one of the following values: * ``DEBUG``: Detailed diagnostic information useful for troubleshooting * ``INFO``: General informational messages (default) * ``WARNING``: Warning messages for potentially problematic situations * ``ERROR``: Error messages for serious problems * ``CRITICAL``: Critical messages for very serious errors **Example:** .. code-block:: bash # Enable debug logging for detailed diagnostics export PACKFLOW_LOG_LEVEL=DEBUG python inference.py # Use warning level to see only warnings and errors export PACKFLOW_LOG_LEVEL=WARNING python inference.py .. note:: The ``verbose`` field in the ``BackendConfig`` controls whether execution metrics are logged during inference. This is separate from the overall log level and defaults to ``True``. Set ``verbose=False`` to suppress metrics logging regardless of the log level.