{ "cells": [ { "cell_type": "markdown", "id": "b784c0c7-7729-408a-a68f-a2b407ce03e1", "metadata": {}, "source": [ "# Scikit-Learn Classifier" ] }, { "cell_type": "markdown", "id": "b09ed980-8367-4f2e-a1ba-cc255a6797dd", "metadata": {}, "source": [ "This Notebook is designed to be an example for developing a modular, reusable Scikit-Learn classification backend. \n", "In this guide we will:\n", "\n", "1. Creating a project with the Poetry\n", "2. Train a classifier with Scikit-Learn\n", "3. Develop the Inference Backend for running the model with Packflow\n", "4. Load and validate the Backend from the installed package\n", "\n", "## Creating a Project\n", "\n", "First, We'll install poetry and create a new Project:" ] }, { "cell_type": "code", "execution_count": 1, "id": "1eb9b828-b480-4c50-86b6-6e434fd08cad", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install poetry --quiet" ] }, { "cell_type": "code", "execution_count": 2, "id": "2d09201e-ced1-4eee-a680-39cf483990f7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created package sklearn_classifier in sklearn_classifier\n" ] } ], "source": [ "%%sh\n", "\n", "poetry new sklearn_classifier" ] }, { "cell_type": "markdown", "id": "d2cf8853-9d9b-4d4a-a2cc-669b8360e0dc", "metadata": {}, "source": [ "Next, we need to install a few dependencies to our poetry project:" ] }, { "cell_type": "code", "execution_count": 3, "id": "87ae0685-890b-4f05-8ae9-181d88e2c791", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using version ^1.8.0 for scikit-learn\n", "Using version ^1.5.3 for joblib\n", "Using version ^3.0.0 for pandas\n", "\n", "Updating dependencies\n", "Resolving dependencies...\n", "\n", "No dependencies to install or update\n", "\n", "Writing lock file\n" ] } ], "source": [ "%%sh\n", "\n", "poetry --directory ./sklearn_classifier add scikit-learn joblib pandas" ] }, { "cell_type": "markdown", "id": "007e6aa3-0141-45b0-92f4-6689bc7d05ab", "metadata": {}, "source": [ "## Training a Iris Classifier\n", "\n", "For our sample use-case, we'll use the Scikit-Learn Iris dataset and train a simple Decision Tree Classifier:" ] }, { "cell_type": "code", "execution_count": 4, "id": "bc223ca1-2fc7-4d29-8b5f-1dcba964e9bc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | sepal length (cm) | \n", "sepal width (cm) | \n", "petal length (cm) | \n", "petal width (cm) | \n", "
|---|---|---|---|---|
| 21 | \n", "5.1 | \n", "3.7 | \n", "1.5 | \n", "0.4 | \n", "
| 29 | \n", "4.7 | \n", "3.2 | \n", "1.6 | \n", "0.2 | \n", "
| 111 | \n", "6.4 | \n", "2.7 | \n", "5.3 | \n", "1.9 | \n", "