Skip to content
View KernelOverseer's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Organizations

@didmathematikoi

Block or report KernelOverseer

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
KernelOverseer/README.md

Aymane Biri — Senior Software/AI Engineer

LLM/DL inference optimization • Deployment orchestration • Systems engineering
Agadir, Morocco · [email protected] · +212 696 408 522


👋 About me

I build reliable, high-performance AI systems from first principles—bridging low-level engineering with pragmatic product delivery.
Currently at Omniops (KSA), focusing on LLM/DL inference performance (latency/throughput/cost) and deployment orchestration at scale.

🚀 What I work on

  • Serving pipelines for LLMs & DL models: token streaming, concurrency control, batching, KV cache/memory efficiency
  • Orchestrating inference across clusters: Kubernetes + queues + autoscaling + observability
  • Production toolchains: Python/TS, FastAPI/Flask, React, Docker, K8s, Postgres, Redis, RabbitMQ, Celery

🧠 Highlights

  • Led 1337 AI Exploration Lab (8 → 16 engineers): CV for industrial inspection, chemical process modeling, HR RAG chatbot, SFM stock tracking
  • Built iOS Bluetooth plugin + proximity algorithm for Wiqaytna (Moroccan COVID tracing app)
  • Security background (web audits, IR tooling) and microservices ERP (auth, BI, i18n, automation)
  • Bronze — MCPC 2020, first to solve Problem C; 1st place OpenSourceDays 2019 & 2021

🛠️ Core stack

Languages: C/C++, Python, JS/TS
AI/Serving: vLLM, RAG patterns, (Ops: batching, streaming, caching, tracing)
Backend/Infra: FastAPI/Flask, Docker, Kubernetes, Celery, RabbitMQ, Redis, Postgres
Frontend: React
Domains: Inference perf, systems programming, reliability engineering

🔩 Principles I optimize for

  • Latency/Throughput/Cost trade-offs with measurable SLOs
  • Determinism & debuggability via structured logs, traces, and health signals
  • Simple-by-default architectures that scale without heroics

📌 Selected work

  • caLLMe — Voice-first real-time LLM assistant (VAD → STT → Gen → TTS with interruptibility) — link
  • K8s Inference Orchestrator — Queue-routed tasks, autoscaling, backpressure, observability
  • HR RAG Chatbot — Policy/benefits QA with retrieval + structured outputs
  • Industrial CV — Inspection + predictive maintenance pipelines

📫 Reach me


**Languages:** Arabic (native), English (professional), French (very good) · **Hobbies:** Electronics, Psychology, Guitar & Guembri

Pinned Loading

  1. auto-graph auto-graph Public

    A web tool for creating and visualizing graphs and trees, and trying out code.

    TypeScript 20 1

  2. RT RT Public

    Forked from Pinkyboi/RT

    A Raytracing program from scratch in C language, with complex shapes, texture mapping, soft shadows, multiple lights and fractals.

    C 27 2

  3. KSICARDOOM KSICARDOOM Public

    A DOOM and Duke Nukem 3D style game with ray-casting, featuring a level editor and multiplayer from scratch in C

    C 63

  4. BnademOverflow/libCplus BnademOverflow/libCplus Public archive

    Wonderful library with lots of useful functions, algorithms and data structures in C

    C 51 5

  5. beginners_guide_to_raycasting beginners_guide_to_raycasting Public

    code for my video guide about raycasting https://www.youtube.com/watch?v=DFZnzCbmlng

    C 10

  6. caLLMe caLLMe Public

    Realtime voice conversation with llm models using an asynchronous Voice to Text to Voice pipeline.

    Python 19 1