Skip to main content

Abnob Doss

Data Engineer @ Vitol

Building data platforms across every paradigm: low-latency tick systems, petabyte-scale ETL, and enterprise data infrastructure.

About

Over 7+ years at Vitol and Citi, I've built data platforms across fundamentally different paradigms: sub-millisecond KDB+ tick databases, petabyte-scale Spark ETL frameworks, and enterprise Oracle platform re-engineering. At Vitol, I build data access tooling and ETL pipelines for the company's data platform.

Skills

Application Development

Python 7 yrs Java 2 yrs KDB+/q 2 yrs

Big Data & Data Engineering

Spark Hive Apache Arrow Oracle Luigi

Experience

Vitol

Current

Houston, TX · April 2024 – Present (1+ yr)

Data Engineer

April 2024 – Present (1+ yr)

Python Apache Arrow
  • Built a new data access library for the company's market data platform, providing pandas-like tabular queries alongside timeseries access, adopted company-wide and enabling self-service data access across trading desks.
  • Scaled the data platform to 30+ new feeds and migrated 100+ legacy feeds to next-gen ETL, including a real-time streaming application replacing legacy ingestion, with zero-downtime transitions.

Citi

June 2017 – April 2024 (7 yrs)

Big Data Engineer (Vice President)

Strategic Ledger · Irving, TX · September 2021 – April 2024 (2.5 yrs)

Spark Java Oracle
  • Solved the scaling limits of a 40 TB Oracle RAC platform, profiling and re-engineering across every layer from RAC configuration and memory management to sharding strategies and application query patterns, avoiding a costly platform migration and eliminating downstream reporting timeouts.
  • Optimized data pipeline throughput across the full ingestion-to-query stack, benchmarking wire protocols, tuning Spark processing, and evaluating Apache Iceberg to accelerate downstream data consumption.
  • Deployed a Python/Luigi + Spark ETL framework across two org-wide data initiatives, leading teams of up to 10 engineers and integrating 100+ production tables at 120+ TB annual throughput.

Big Data Engineer (Assistant Vice President)

Genesis Wholesale · Irving, TX · July 2020 – September 2021 (1 yr)

Spark Python Hive Luigi
  • Led a 15-person team replacing an end-of-life Netezza platform with a Hive-based data warehouse, designing a hybrid strategy that offloaded historical data while preserving current-day processing on a reduced footprint.
  • Built a Python/Luigi + Spark ETL framework and onboarded 20+ wholesale banking products within a year, reducing end-to-end pipeline runtimes from 2 hours to 10 minutes and building a platform that grew to 2+ petabytes.
  • Guided consultants on building a T+1 balance projection model, ensuring reliable estimations for senior management reporting when upstream feeds were missing or late.

Software Engineer (Officer)

Commodities Technology · Houston, TX · June 2018 – July 2020 (2 yrs)

KDB+/q Python
  • Scaled a greenfield KDB+ tick data platform across three commodities exchanges (CME, LME, ICE), managing ICE production certification from feed handler validation through KDB+ analytics.
  • Optimized tick-to-trade latency through Solarflare kernel bypass and hardware-level server tuning for the commodities electronic trading platform.
  • Built a market access control dashboard with real-time kill switch capabilities, for the commodities electronic trading desk, interfacing with KDB+ market access services.
  • Reduced a trader's pricing model runtime from 4 hours to 10 minutes, then generalized it into a reusable Python framework adopted by multiple traders for energy product price modeling.

Summer Analyst (Intern)

Commodities Technology · Houston, TX · June 2017 – August 2017

Java Python
  • Diagnosed and fixed a memory leak in a legacy Java regression tool, restoring the team's ability to test critical trading reports.
  • Migrated database deployments from a legacy tool to the team's CI/CD pipeline, saving over 100 hours of developer time per year.
  • Built release coordination tooling for a global team, saving tens of developer hours monthly by automating compliance validation for rotating release managers.

Publication

Automatic Exercise Recognition with Machine Learning

Precision Health and Medicine, 2020, Volume 843 · December 2018

View on Springer

Education

Texas A&M University

Texas A&M University

Bachelor of Science in Computer Science

Minors in Neuroscience & Mathematics

May 2018