Currently at Nutanix · ex-Meta

Hi, I'm Dheeraj.
Software Engineer.

Started at Meta in 2022 on Horizon Worlds (The Metaverse) enabling creators to build games, then moved to MTIA working on the PyTorch compiler and GPU kernels for Meta's AI accelerator. Now at Nutanix building agents and RAG on top of their hybrid cloud. Bit of a generalist. I like working across pretty different kinds of problems.

Previously & now
Nutanix
Meta
scroll ↓
01 · experience

Where I've been building

From creator tools in Meta's metaverse and compilers on Meta's AI silicon, to GPU research at ASU, to the hybrid cloud at Nutanix.

☁️ Hybrid cloud · hypervisorNutanix
☁️Nutanix

Software Engineer, Enterprise AI

Nutanix

Now

Apr 2025 to Present

Building an enterprise RAG product and an agent platform on Nutanix's hybrid-cloud stack.

  • Built a RAG application where users ask natural-language questions over their own documents. Handles upload, parsing, chunking, and retrieval end to end, with ChromaDB as the vector store.
  • Productionized the RAG pipeline: containerized the services and deployed them on Kubernetes so the full flow runs as a real product, not a notebook demo.
  • Building an agent app that exposes MCP servers as tools so users can converse with the agent and get work done across those surfaces.
PythonChromaDBLLMsRAGMCPDockerKubernetes
🧠 MTIA · Meta siliconFX Graph Compiler
🧠Meta

Software Engineer, FX Graph Compiler (Meta Training & Inference Accelerators)

Meta (MTIA)

Aug 2024 to Apr 2025

Compiler work on Meta's in-house AI silicon (MTIA). Python frontend, C++ backend, and device kernels. Lowering PyTorch graphs down to accelerator-ready code.

  • Added compiler passes at the FX-graph level: architecture-based op decompositions, pre/post TorchInductor op decompositions, and tensor broadcasting.
  • Lowered 2 IG production models end-to-end onto MTIA silicon. Authored FX-level op decompositions, wrote kernels, and made model-level changes as needed.
  • Proposed and shipped support for data-dependent dynamic operators on MTIA, leveraging PyTorch 2.0 Dynamo tracer features to handle shape-dependent control flow.
PythonC++PyTorchTorchInductorDynamoCompilersCUDAKernels
🥽 Horizon WorldsMeta
🥽Meta

Software Engineer, Horizon Creator Tools

Meta (Reality Labs)

Jun 2022 to Aug 2024

Creator tooling in Horizon Worlds, Meta's cross-platform metaverse game engine. Work spanned graphics, scripting, and the asset pipeline.

  • Delivered Unity Asset Bundles as an ingestible asset format for Horizon (C# + React), enabling scriptable animations, custom shaders, and collisions with native Horizon entities. Grew user retention ~20% by unlocking more expressive worlds.
  • Shipped a Unity extension using UI Toolkit that let creators upload UABs directly into Horizon, backed by Hack + GraphQL services for file storage, user validation, and side-effect tracking.
  • Co-architected and launched Horizon Templates (Horizon's answer to Unreal Blueprints / Unity Prefabs) in C#, Hack, GraphQL, and React. Reusable, version-controlled assets with instantiation, overrides, and property reversion.
  • Built Runtime Inspector, a real-time debugging and profiling tool that removed full Unity reloads and saved developers roughly 2 minutes per day.
  • Migrated ~30% of the Horizon scripting API from the legacy Codeblocks engine to a new TypeScript-based scripting system.
  • Ported ~50% of Horizon's asset-library game components from Unity C# to a new C++ standalone engine for better performance and cross-platform support.
C#C++TypeScriptReactVRHack (PHP)GraphQLUnityUI Toolkit
🧪 CUDA · decision-tree inferenceASU · Tempe
🧪Arizona State University

Graduate Research Assistant

Arizona State University · Tempe, AZ

May 2021 to Dec 2021

Autonomous-vehicle falsification research plus GPU benchmarking of decision-tree inference.

  • Integrated the SVL Simulator with S-TaLiRo (a MATLAB toolbox for falsifying cyber-physical systems) via a Python bridge that drove simulations until falsification. Modified SVL's C# source to support custom scenarios.
  • Built a CUDA C++ GPU algorithm for decision-tree inference that was at least 2x faster than our optimized CPU baseline. Also explored FIL (cuML) for forest inference, tree re-organization, and compiling decision trees into tensor computations.
CUDAC++PythonC#MATLAB
📐 Delaunay on GPU · Oculus VRBITS Pilani
🧠BITS Pilani

Undergraduate Research Assistant

BITS Pilani · Pilani, India

Aug 2019 to May 2020

GPU graphics and VR-based biomechanics research during undergrad.

  • Parallelized Delaunay triangulation on GPU using CUDA C++ to generate meshes from point sets (Jan to May 2020).
  • Tracked neck biomechanics with an Oculus Rift VR headset and mapped them to a real human skull. Computed each vertebra's position via inverse kinematics in OpenSim + Unity, then plotted position over time to surface defects (Aug to Dec 2019).
CUDAC++UnityOpenSimVR
02 · skills

The stack I ship with

Low-level systems, GPU pipelines, and shiny pixels. I'm happiest at the seam between them.

Languages

C++C#TypeScriptPythonHack (PHP)JavaJavaScriptSQL
🛠️

Frameworks & Libraries

ReactGraphQLCUDAPyTorchRAYOpenGLNumPypandasOpenCVTensorFlowDjangoROS
🎮

3D / Game Engines

Unity 3DUnreal EngineGodotBlenderMaya
🧰

Tools

DockerPostmanLaTeXGit
03 · projects

Projects & open source

Grad-school research, graphics experiments, and a from-scratch web game engine. Everything links straight to GitHub.

🎮

EverythingJs

Apr 2025

Cross-platform game engine for the web, built from scratch on PlayCanvas, WebGL, React, Node, and Zustand.

TypeScriptWebGLPlayCanvasReact
🏏

CricketHero

May 2023

Browser cricket game prototype. HTML/JS side project.

HTMLJavaScriptGame
🧬

NEAT

Apr 2022

Python implementation of NeuroEvolution of Augmenting Topologies (NEAT) for evolving neural networks.

PythonMLEvolutionary
🚗

BlackBox-SVL

Nov 2021

Python interface between the SVL Simulator and the S-TaLiRo falsification toolbox for end-to-end AV scenario testing. Built during ASU research.

PythonC#Simulation
🌓

Computer Graphics Projects

Sep 2021

OpenGL graphics experiments including shadow mapping for static and dynamic objects. Fixed shadow acne, jaggy edges, and z-buffer issues along the way.

C++OpenGLGraphics
🌲

Decision Tree on GPU

Jun 2021

CUDA C++ implementation of decision-tree inference, benchmarked against cuML's FIL and CPU baselines. At least 2x faster than the optimized CPU version.

CUDAC++cuML
🧬

NN from Scratch (GPU)

Mar 2021

Neural network implemented from scratch with GPU kernels. Hands-on companion to the autodiff project.

PythonCUDAML
🧮

Automatic Differentiator

Feb 2021

Reverse-mode automatic differentiation for a neural-network computation graph (ReLU, Softmax, etc.) with CUDA C++ kernels wired in via ctypes.

PythonCUDAC++ML
🗄️

Mini-base Extension

Jan 2021

Extended the Mini-base DBMS with a skyline operator (NestedLoops, BlockNestedLoops, SortFirst, BtreeSorted) plus Hash and Clustered B-Tree indexing.

JavaDatabases
🌌

N-Body Simulation

Nov 2018

Classic n-body gravitational simulation in C++.

C++Simulation
04 · education

Where I learned the fundamentals

🎓Arizona State University

M.S. in Computer Science

Arizona State University, Tempe

Jan 2021 to May 2022

Two years of grad CS at ASU. Mostly filled in the systems, compilers, and ML foundations I hadn't had a formal course in before. Came in handy later on the compiler side at Meta.

📘BITS Pilani

M.Sc. Physics and B.E. Electronics & Communication Engineering

Birla Institute of Technology & Science, Pilani

2015 to 2020

Five year dual program. Physics was the math heavy side, still where most of my intuition for linear algebra comes from. Electronics was the hardware side: signals, digital logic, computer architecture. That's where my interest in low level code started.

05 · certifications

Udacity Nanodegrees

🏅

Udacity · Deep Learning Nanodegree

CNN for dog-breed classification, RNN for Seinfeld script generation, GAN for celebrity faces.

🏅

Udacity · Computer Vision Nanodegree

Haar-cascade face detection, YOLO pedestrian detection, CNN facial keypoints, CNN-RNN image captioning, SLAM for a virtual bot.

🏅

Udacity · Deep Reinforcement Learning Nanodegree

Deep Q-Network (DQN) bot in Unity, actor-critic (DDPG) for a double-jointed arm, and more.

06 · let's talk

Get in touch

Open to chatting about systems engineering, graphics, game engines, or anything you think I'd find interesting. ✉️