░░░░░░  ░░  ░    ░░    ░░░░░░░░   ░░        ░░  ░    ░░    ░░░░░░░░░░░░ 
  ░░   ░  ░ ░    ░░░   ░  ░░  ░  ░░        ░  ░ ░    ░░░░░░░░     ░    ░
  ▒▒  ▒    ▒▒▒▒▒▒▒▒▒▒  ▒  ▒▒  ▒▒▒         ▒    ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒    ▒
  ▒▒  ▒▒▒▒▒▒▒▒▒▒▒▒▒  ▒▒▒  ▒▒  ▒▒▒         ▒▒▒▒▒▒▒▒▒▒▒▒▒    ▒▒▒▒▒▒ ▒    ▒
  ▓▓  ▓    ▓▓    ▓▓   ▓▓  ▓▓  ▓  ▓▓       ▓    ▓▓    ▓▓    ▓▓     ▓    ▓
  ▓▓  ▓    ▓▓    ▓▓    ▓▓▓▓▓▓▓▓   ▓▓      ▓    ▓▓    ▓▓    ▓▓▓▓▓▓▓▓▓▓▓▓

░░░░░░  ░░  ░    ░░    ░░░░░░░░   ░░        ░░  ░    ░░    ░░░░░░░░░░░░ 
  ░░   ░  ░ ░    ░░░   ░  ░░  ░  ░░        ░  ░ ░    ░░░░░░░░     ░    ░
  ▒▒  ▒    ▒▒▒▒▒▒▒▒▒▒  ▒  ▒▒  ▒▒▒         ▒    ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒    ▒
  ▒▒  ▒▒▒▒▒▒▒▒▒▒▒▒▒  ▒▒▒  ▒▒  ▒▒▒         ▒▒▒▒▒▒▒▒▒▒▒▒▒    ▒▒▒▒▒▒ ▒    ▒
  ▓▓  ▓    ▓▓    ▓▓   ▓▓  ▓▓  ▓  ▓▓       ▓    ▓▓    ▓▓    ▓▓     ▓    ▓
  ▓▓  ▓    ▓▓    ▓▓    ▓▓▓▓▓▓▓▓   ▓▓      ▓    ▓▓    ▓▓    ▓▓▓▓▓▓▓▓▓▓▓▓

░░░░░░  ░░  ░    ░░    ░░░░░░░░   ░░        ░░  ░    ░░    ░░░░░░░░░░░░ 
  ░░   ░  ░ ░    ░░░   ░  ░░  ░  ░░        ░  ░ ░    ░░░░░░░░     ░    ░
  ▒▒  ▒    ▒▒▒▒▒▒▒▒▒▒  ▒  ▒▒  ▒▒▒         ▒    ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒    ▒
  ▒▒  ▒▒▒▒▒▒▒▒▒▒▒▒▒  ▒▒▒  ▒▒  ▒▒▒         ▒▒▒▒▒▒▒▒▒▒▒▒▒    ▒▒▒▒▒▒ ▒    ▒
  ▓▓  ▓    ▓▓    ▓▓   ▓▓  ▓▓  ▓  ▓▓       ▓    ▓▓    ▓▓    ▓▓     ▓    ▓
  ▓▓  ▓    ▓▓    ▓▓    ▓▓▓▓▓▓▓▓   ▓▓      ▓    ▓▓    ▓▓    ▓▓▓▓▓▓▓▓▓▓▓▓

MLOps/AI Systems Performance Engineer

I build the infrastructure that makes AI systems work in production —
from inference engine internals to self-healing Kubernetes platforms.

I build the infrastructure that
makes AI systems work in
production —
from inference engine internals
to self-healing Kubernetes
platforms.

Explore my work in MLOps, AI infrastructure, and systems performance engineering. From deploying OSS models on Kubernetes to building developer tools and teaching AI engineering courses.

Featured Project:

SentinelOps - Automated SRE with Agentic Workflow and eBPF

SentinelOps - Automated SRE with Agentic Workflow
and eBPF

SentinelOps -
Automated SRE with
Agentic Workflow
and eBPF

SRE

eBPF

AI-powered Site Reliability Engineering platform leveraging eBPF for deep kernel-level observability and agentic workflows for automated incident response. Self-healing infrastructure that detects, analyzes, and resolves production issues without human intervention.

View Project

Architecture

Deep dives into MLOps, AI systems performance, and infrastructure engineering. Blogs covering production ML pipelines, CUDA optimization, distributed training, and cloud-native AI — with detailed discussions.

MLOps

AI Engineering

AI Systems

Infrastructure

0 Minute Read

Ray Cluster, কেন?

এই সব কিছু আসলে একজন মেশিন লার্নিং ইঞ্জিনিয়ারের সমাধান করার কথা না, কিন্তু তারপরও তাকে নিজেই করতে হয় নতুবা প্ল্যাটফর্ম ইঞ্জিনিয়ারদের সাহায্য নিতে হয়। মজার ব্যাপার হচ্ছে, এভাবেই আমরা যাকে "মেশিন লার্নিং ইঞ্জিনিয়ারিং" বলছি, সেটা কিন্তু ধীরে ধীরে আল্টিমেটলি "সফটওয়্যার ইঞ্জিনিয়ারিং"-ই হয়ে যাচ্ছে। তো এই সিনারিওটা আমরা মেশিন লার্নিং ইঞ্জিনিয়ার হিসেবে কীভাবে সমাধান করতে পারি ? আমরা তা করতে পারি Ray ক্লাস্টার বিল্ড করার মাধ্যমে। Ray মূলত একটা ডিস্ট্রিবিউটেড কম্পিউটিং ফ্রেমওয়ার্ক, যেটা এই মেশিন লার্নিং ওয়ার্কলোড কে ক্লাস্টারে থাকা নোডগুলোর মধ্যে খুব ইফেশিয়েন্টলি ডিস্ট্রিবিউট করে দেয়।

Read the blog

MLOps

Infrastructure

15 Minute Read

𝗛𝗼𝘄 𝘁𝗼 𝗕𝘂𝗶𝗹𝗱 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗮𝗻𝗱 𝗖𝗼𝘀𝘁 𝗘𝗳𝗳𝗲𝗰𝘁𝗶𝘃𝗲 𝗗𝗮𝘁𝗮 𝗪𝗮𝗿𝗲𝗵𝗼𝘂𝘀𝗲𝘀 𝗳𝗼𝗿 𝗦𝗠𝗕𝘀?

Modernizing data warehouses with a hybrid Azure approach enables centralized storage, real‑time analytics, and secure integration across tools like Synapse, Data Lake, Stream Analytics, and Power BI to deliver scalable insights and compliance‑ready infrastructure.

Read the blog

MLOps

Infrastructure

12 Minute Read

Building Production ML Pipelines with Kubernetes

A deep dive into designing fault-tolerant, scalable ML training and serving pipelines on K8s — from resource scheduling to model versioning.

Read the blog

I build the infrastructure that makes AI systems work in production — from inference engine internals to self-healing Kubernetes platforms.

I build the infrastructure that makes AI systems work in production — from inference engine internals to self-healing Kubernetesplatforms.

SentinelOps - Automated SRE with Agentic Workflow and eBPF

SentinelOps - Automated SRE with Agentic Workflow and eBPF

SentinelOps - Automated SRE with Agentic Workflow and eBPF

Ray Cluster, কেন?

𝗛𝗼𝘄 𝘁𝗼 𝗕𝘂𝗶𝗹𝗱 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗮𝗻𝗱 𝗖𝗼𝘀𝘁 𝗘𝗳𝗳𝗲𝗰𝘁𝗶𝘃𝗲 𝗗𝗮𝘁𝗮 𝗪𝗮𝗿𝗲𝗵𝗼𝘂𝘀𝗲𝘀 𝗳𝗼𝗿 𝗦𝗠𝗕𝘀?

Building Production ML Pipelines with Kubernetes

I build the infrastructure that makes AI systems work in production —
from inference engine internals to self-healing Kubernetes platforms.

I build the infrastructure that
makes AI systems work in
production —
from inference engine internals
to self-healing Kubernetes
platforms.

SentinelOps - Automated SRE with Agentic Workflow
and eBPF

SentinelOps -
Automated SRE with
Agentic Workflow
and eBPF