FAW

ML Manager · Rihal · Oman

Firas Al Wadhahi

ML Manager — Rihal

On-premise LLM deployment

About

I lead AI/ML infrastructure at Rihal, a technology company headquartered in Muscat, Oman. My work centres on building and operating production-grade systems for large language models — from bare-metal GPU provisioning through inference optimisation to the agentic products that ship to end users. The mandate is clear: get research-grade models into production reliably, at the latency and cost that actually matter.

Our compute platform runs on NVIDIA DGX H200 hardware. I own the full stack: SLURM scheduling, Kubernetes orchestration, vLLM serving, and the platform abstraction layers that let product teams deploy GenAI capabilities without needing to reason about what sits beneath. On-premise by design — data sovereignty and inference cost are non-negotiable constraints in the markets we serve.

Before ML infrastructure I worked across data engineering and applied research. I care about systems that are honest about their limits — the hard constraints of hardware, latency, and accuracy matter more to me than benchmark leaderboards. If a model cannot serve a request in under two seconds on the worst-case hardware, the capability does not exist.

Experience

2022 — Present

Muscat, Oman

ML Manager

RihalCurrent
  • //Architected and operate a DGX H200 GPU cluster for on-premise LLM inference, serving production traffic at sub-2s P95 latency.
  • //Deployed and maintain a vLLM-based multi-model serving stack on Kubernetes, supporting 10+ concurrent GenAI products.
  • //Led development of agentic systems — RAG pipelines, tool-calling agents, and LLM guardrails — shipped to enterprise customers.
  • //Built an internal GPU-as-a-Service platform abstracting SLURM and Kubernetes from product teams.
  • //Established MLOps practices: model versioning, drift detection, rollback pipelines, and cost attribution per workload.

2020 — 2022

Muscat, Oman

Senior Data Engineer

Rihal
  • //Designed and delivered large-scale data pipelines for government and enterprise clients across Oman.
  • //Introduced dbt and Airflow, reducing pipeline failure rate by 60% and cutting time-to-insight from days to hours.
  • //Prototyped early ML models for fraud detection and demand forecasting, laying the groundwork for the ML practice.

2018 — 2020

Remote

Data & ML Consultant

Independent Consulting
  • //Delivered applied ML projects for clients in insurance and logistics, focusing on predictive modelling and pipeline automation.
  • //Built a real-time telematics scoring engine later integrated into the Bima Automation platform at Rihal.

Skills & Stack

The toolkit

Hover or tap a node to identify it · Drag to rotate

LLM Infrastructure
GPU & Hardware
Orchestration
DevOps / MLOps
Languages & Frameworks

Selected Work

Projects

Get in touch

Contact

Let's build something precise.