Scikit-Learn Classifier

This Notebook is designed to be an example for developing a modular, reusable Scikit-Learn classification backend. In this guide we will:

  1. Creating a project with the Poetry

  2. Train a classifier with Scikit-Learn

  3. Develop the Inference Backend for running the model with Packflow

  4. Load and validate the Backend from the installed package

Creating a Project

First, We’ll install poetry and create a new Project:

[1]:
%pip install poetry --quiet
Note: you may need to restart the kernel to use updated packages.
[2]:
%%sh

poetry new sklearn_classifier
Created package sklearn_classifier in sklearn_classifier

Next, we need to install a few dependencies to our poetry project:

[3]:
%%sh

poetry --directory ./sklearn_classifier add scikit-learn joblib pandas
Using version ^1.8.0 for scikit-learn
Using version ^1.5.3 for joblib
Using version ^3.0.0 for pandas

Updating dependencies
Resolving dependencies...

No dependencies to install or update

Writing lock file

Training a Iris Classifier

For our sample use-case, we’ll use the Scikit-Learn Iris dataset and train a simple Decision Tree Classifier:

[4]:
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True, as_frame=True)

X.sample(3)
[4]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
21 5.1 3.7 1.5 0.4
29 4.7 3.2 1.6 0.2
111 6.4 2.7 5.3 1.9

For simplicity, we will ignore best practices and train our model on the entire dataset:

[5]:
from sklearn.tree import DecisionTreeClassifier
import joblib

model = DecisionTreeClassifier()

model.fit(X, y)

joblib.dump(model, "sklearn_classifier/src/sklearn_classifier/model.joblib")
[5]:
['sklearn_classifier/src/sklearn_classifier/model.joblib']

The model has now been trained (fit) and serialized with joblib to the path output above.

Developing the Inference Backend

Now we can develop the Inference Backend for running and sharing the model with Packflow:

[6]:
%%writefile sklearn_classifier/src/sklearn_classifier/inference.py

# -- Packflow imports --
from packflow import InferenceBackend, BackendConfig
from packflow.utils.normalize import ensure_valid_output

# -- Imports that are required to run the model --
from pathlib import Path
import pandas as pd
import sklearn
import joblib


class SklearnClassifierConfig(BackendConfig):
    # Create a config field for where to load the model from
    serialized_model_path: str = Path(__file__).resolve().parent.joinpath('model.joblib')

    # Define the default input feature names
    feature_names: list[str] = [
        'sepal length (cm)',
        'sepal width (cm)',
        'petal length (cm)',
        'petal width (cm)'
    ]


class Backend(InferenceBackend):
    # override the default model with the custom model defined above
    backend_config_model = SklearnClassifierConfig

    def initialize(self):
        self.logger.info(f'Loading model from: {self.config.serialized_model_path}')
        self.model = joblib.load(self.config.serialized_model_path)

    def transform_inputs(self, inputs):
        """
        Convert input array (this backend uses the Numpy Preprocessor) to a Pandas DataFrame
        """
        return pd.DataFrame(columns=self.config.feature_names, data=inputs)


    def execute(self, inputs):
        """
        Run the Pandas DataFrame through the loaded model
        and return the predicted class.
        """
        return self.model.predict(inputs)

    def transform_outputs(self, outputs):
        """
        Use Packflow.dev to convert outputs to safe return types.

        Note:
            This method is less flexible and does not apply business-logic.
            However, for this demo we will assume the output does not need
            any special postprocessing.
        """
        return ensure_valid_output(outputs, parent_key='class')


# Set defaults for base fields
backend = Backend(
    input_format='numpy'
)
Writing sklearn_classifier/src/sklearn_classifier/inference.py

We will also need to add the inference module to the package by adding it to the __init__.py file so it can be imported:

[7]:
%%writefile sklearn_classifier/src/sklearn_classifier/__init__.py

from . import inference
Overwriting sklearn_classifier/src/sklearn_classifier/__init__.py

Now that we’ve created an inference.py file to our poetry package, we can use Packflow’s ModuleLoader to import the backend and run it wherever needed.

Validating the Inference Backend

Now that we’ve create a snapshot, let’s load and validate the backend is running as expected:

[8]:
%pip install ./sklearn_classifier --quiet
Note: you may need to restart the kernel to use updated packages.

IMPORTANT

You will likely need to restart the kernel in this notebook to proceed with loading and running the inference backend!

[9]:
from packflow.loaders import ModuleLoader

# Import from the installed Poetry package
# We want to import the `backend` object from the `inference` module
# we will also pass a relative
backend = ModuleLoader("sklearn_classifier.inference:backend").load()

backend
2026-01-21 14:16:08.037 | DEBUG    | packflow.utils.normalize.base:_import_module:30 - TorchScalarHandler Type Converter is not available. Reason: No module named 'torch'
2026-01-21 14:16:08.038 | DEBUG    | packflow.utils.normalize.base:_import_module:30 - TorchTensorHandler Type Converter is not available. Reason: No module named 'torch'
2026-01-21 14:16:08.039 | DEBUG    | packflow.utils.normalize.base:_import_module:30 - PillowImageHandler Type Converter is not available. Reason: No module named 'PIL'
2026-01-21 14:16:08.059 | DEBUG    | packflow.backend.configuration:load_backend_configuration:63 - Loaded raw configuration: {'input_format': 'numpy'}
2026-01-21 14:16:08.060 | INFO     | packflow.backend.configuration:load_backend_configuration:67 - Configuration: SklearnClassifierConfig(verbose=True, input_format=<InputFormats.NUMPY: 'numpy'>, rename_fields={}, feature_names=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], flatten_nested_inputs=False, flatten_lists=False, nested_field_delimiter='.', serialized_model_path=PosixPath('/Users/cdao-user/.pyenv/versions/3.11.14/envs/packflow/lib/python3.11/site-packages/sklearn_classifier/model.joblib'))
2026-01-21 14:16:08.060 | INFO     | sklearn_classifier.inference:initialize:31 - Loading model from: /Users/cdao-user/.pyenv/versions/3.11.14/envs/packflow/lib/python3.11/site-packages/sklearn_classifier/model.joblib
2026-01-21 14:16:08.061 | INFO     | packflow.backend.base:_initialize:103 - Initialized Backend in 0.0009 ms
[9]:
Backend[
  SklearnClassifierConfig(verbose=True, input_format=<InputFormats.NUMPY: 'numpy'>, rename_fields={}, feature_names=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], flatten_nested_inputs=False, flatten_lists=False, nested_field_delimiter='.', serialized_model_path=PosixPath('/Users/cdao-user/.pyenv/versions/3.11.14/envs/packflow/lib/python3.11/site-packages/sklearn_classifier/model.joblib'))
]
[10]:
from sklearn.datasets import load_iris

X, _ = load_iris(return_X_y=True, as_frame=True)

outputs = backend.validate(X.sample(10).to_dict("records"))

outputs[:5]
2026-01-21 14:16:08.071 | INFO     | packflow.backend.base:__call__:86 - ExecutionMetrics(batch_size=10, execution_times=ExecutionTimes(preprocess=0.01938, transform_inputs=0.09275, execute=0.60546, transform_outputs=0.02338), total_execution_time=0.74097)
[10]:
[{'class': 0}, {'class': 0}, {'class': 1}, {'class': 0}, {'class': 1}]

Conclusion

In this example notebook, we went through a simple example of how to create an inference backend for a scikit-learn classifier.

Try extending this example further to support custom output logic or supporting different model types!