About Packflow
What is Packflow?
packflow is a software development kit (SDK) that simplifies the development process for writing production-ready
inference code and standardizes packaging of AI/ML running on streaming data sources.
Many existing packaging frameworks are catered towards inference APIs and often require custom preprocessing steps. This can be particularly challenging when dealing with data sources that typically generate data one row at a time in key-value pairs (e.g., firewall logs or message streams).
Packflow, however, is optimized to run models on either individual events or batches of events, streamlining development and reducing the need for additional preprocessing. The framework automatically handles both single-record and batch processing with the same code, eliminating the need to write separate logic for streaming (one-at-a-time) versus batch scenarios. By leveraging Packflow, teams can focus on building and deploying models with custom out-of-the-box workflows and utilities, significantly reducing the time and effort required to onboard new capabilities.
Packflow also ships with a lightweight command-line interface (CLI) that can scaffold new projects, template code, and archive or package project files in a consistent, reproducible format, enabling customizable and maintainable deployment pipelines for downstream users.
What is Packflow Not?
Packflow is not meant to facilitate or replace any steps relating to model training or validation. A Packflow project should contain only the required artifacts (e.g., serialized model files), requirements, and code for inference. Including additional training artifacts and code may result in Packflow gathering incorrect runtime requirements or lead to unnecessary latency.
Terminology
The following terms are essential to understanding the packflow framework:
Core Definitions
Event: A single, discrete unit of data processed by the model. It is analogous to a row in a dataset - for example, a single firewall log entry or one message from a data stream.
Batch: A collection of events, represented as a list of length N, where each element is an individual event.
Records: The data structure containing a batch of events, typically a list of dictionaries. This term is inspired by
pandasand used extensively throughout the documentation.