ML Infrastructure · Rihal · Oman

Firas Al Wadhahi

I build and run the on-premise GPU platform that puts large language models into production — reliably, at the latency and cost that actually matter.

›

On-premise LLM deployment

Contact View Work

DGX H200vLLMKubernetesSLURMAgentic GenAI

firas@om-mct-01: ~

┌─ RACK · OM-MCT-01 ─────────────────────┐

UNITSTATUS

▣DGX H2008×H200 · 141GB

····

▣DGX H2008×H200 · 141GB

····

▤vLLMinference · paged-attn

····

▤KUBERNETESorchestration · MIG

····

▤SLURMscheduler · queue

····

░STORAGEnvme · weights

····

└────────────────────────────────────────┘

01About

Research-grade models, production-grade systems.

I lead AI/ML infrastructure at Rihal, a technology company headquartered in Muscat, Oman. My work centres on building and operating production-grade systems for large language models — from bare-metal GPU provisioning through inference optimisation to the agentic products that ship to end users.

Our compute platform runs on NVIDIA DGX H200 hardware. I own the full stack: SLURM scheduling, Kubernetes orchestration, vLLM serving, and the platform abstraction layers that let product teams deploy GenAI capabilities without reasoning about what sits beneath. On-premise by design — data sovereignty and inference cost are non-negotiable in the markets we serve.

I care about systems that are honest about their limits. The hard constraints of hardware, latency, and accuracy matter more to me than benchmark leaderboards. If a model cannot serve a request in under two seconds on the worst-case hardware, the capability does not exist.

profile.jsonlive

RoleML Manager

CompanyRihal

LocationMuscat, OM

FocusLLM Infra · GPU · Agents

HardwareNVIDIA DGX H200

ServingvLLM

DeploymentOn-premise

p99 Latency< 2.0s

02Experience

05/2024 — Presentcurrent

Machine Learning Manager

Rihal · Muscat, Oman

›Own the on-premise GPU platform end-to-end — DGX H200 cluster, SLURM scheduling, Kubernetes orchestration, vLLM serving.
›Lead the ML team shipping agentic GenAI products into production across the region.
›Set the inference SLOs: sub-two-second worst-case latency on the hardware that actually exists.

06/2022 — 05/2024Muscat, Oman

Lead Machine Learning Engineer

Rihal · Muscat, Oman

›Built the platform abstraction that lets product teams deploy LLM capabilities without reasoning about the metal beneath.
›Stood up the observability and serving stack for self-hosted models.

10/2021 — 06/2022Muscat, Oman

Machine Learning Engineer

Rihal · Muscat, Oman

›Shipped applied ML features from research notebook to production endpoint.

05/2019 — 10/2021Muscat, Oman

Logistics Technology Strategist

ASYAD Group · Muscat, Oman

›Drove data-engineering and technology strategy across national logistics operations.

03Education

MSc, Data Science & Machine Learning

Sultan Qaboos University

2021 — 2024

Muscat, Oman

B.Eng, Computer Engineering

Caledonian College of Engineering

2015 — 2019

Muscat, Oman

04Stack

The cluster

Drag to orbit · Hover a subsystem · Click to inspect

05Selected Work

Things I've shipped

PROJECT

Insurance Portal

AI agent that autonomously issues vehicle insurance policies end-to-end.

PythonLangChainvLLMFastAPIAgents

PROJECT

GPU-as-a-Service

DGX H200 cluster-backed GPU hosting for clients, built on Kubernetes.

KubernetesNVIDIA DGXH200MIGInfrastructure

PROJECT

LLM Guardrails Workshop

Arabic-language workshop on LLM safety and guardrails.

LLM SafetyArabicGuardrailsWorkshopGenAI

06Contact

Get in touch

firas@om-mct-01: ~/contact

LinkedInin/firas-al-wadhahi ↗

Based inMuscat, Oman

Start a conversation