Minimal Scale Inference Engine

Open-source minimal inference serving engine. Strips complexity to expose core concepts of LLM serving — continuous batching, KV-cache management, and efficient GPU scheduling for educational and production use.

Project Category

MLOps

Tech Stacks:

View Live Demo

Github Code

Overview

Minimal Scale Inference Engine is an educational and production-ready inference server that strips away complexity to expose the core algorithms behind modern LLM serving.

Key Features

Continuous batching with annotated implementation
KV-cache management with visual debugging
GPU scheduling with profiling hooks

Automated SRE with Agentic Workflow and eBPF

This project brings Site Reliability Engineering into a new era by combining agentic workflows with the power of eBPF. Instead of reactive firefighting, the system acts as an intelligent agent within infrastructure—observing, diagnosing, and remediating issues in real time.

Project Category

AI Engineering

Tech Stacks:

PyTorch

Kubernetes

FastAPI

TypeScript

View Project