ML Manager · Rihal · Oman
Firas Al Wadhahi
ML Manager — Rihal
About
I lead AI/ML infrastructure at Rihal, a technology company headquartered in Muscat, Oman. My work centres on building and operating production-grade systems for large language models — from bare-metal GPU provisioning through inference optimisation to the agentic products that ship to end users. The mandate is clear: get research-grade models into production reliably, at the latency and cost that actually matter.
Our compute platform runs on NVIDIA DGX H200 hardware. I own the full stack: SLURM scheduling, Kubernetes orchestration, vLLM serving, and the platform abstraction layers that let product teams deploy GenAI capabilities without needing to reason about what sits beneath. On-premise by design — data sovereignty and inference cost are non-negotiable constraints in the markets we serve.
Before ML infrastructure I worked across data engineering and applied research. I care about systems that are honest about their limits — the hard constraints of hardware, latency, and accuracy matter more to me than benchmark leaderboards. If a model cannot serve a request in under two seconds on the worst-case hardware, the capability does not exist.
Experience
2022 — Present
Muscat, Oman
ML Manager
RihalCurrent- //Architected and operate a DGX H200 GPU cluster for on-premise LLM inference, serving production traffic at sub-2s P95 latency.
- //Deployed and maintain a vLLM-based multi-model serving stack on Kubernetes, supporting 10+ concurrent GenAI products.
- //Led development of agentic systems — RAG pipelines, tool-calling agents, and LLM guardrails — shipped to enterprise customers.
- //Built an internal GPU-as-a-Service platform abstracting SLURM and Kubernetes from product teams.
- //Established MLOps practices: model versioning, drift detection, rollback pipelines, and cost attribution per workload.
2020 — 2022
Muscat, Oman
Senior Data Engineer
Rihal- //Designed and delivered large-scale data pipelines for government and enterprise clients across Oman.
- //Introduced dbt and Airflow, reducing pipeline failure rate by 60% and cutting time-to-insight from days to hours.
- //Prototyped early ML models for fraud detection and demand forecasting, laying the groundwork for the ML practice.
2018 — 2020
Remote
Data & ML Consultant
Independent Consulting- //Delivered applied ML projects for clients in insurance and logistics, focusing on predictive modelling and pipeline automation.
- //Built a real-time telematics scoring engine later integrated into the Bima Automation platform at Rihal.
Skills & Stack
The toolkit
Hover or tap a node to identify it · Drag to rotate
Selected Work
Projects
Bima Automation
AI agent that autonomously issues vehicle insurance policies end-to-end.
GPU-as-a-Service
DGX H200 cluster-backed GPU hosting for clients, built on Kubernetes.
LLM Guardrails Workshop
Arabic-language workshop on LLM safety and guardrails.