Saurav Muralidharan

Sr. Research Scientist @ NVIDIA

Email: mail [at]

I am a senior researcher in the Programming Systems Research Group at NVIDIA. My current work focuses on improving the performance and efficiency of deep neural networks. More broadly, I'm interested in research problems that lie at the intersection of systems and machine learning.

Prior to joining NVIDIA, I received my Ph.D. in Computer Science from the University of Utah under the guidance of Prof. Mary Hall. While at Utah, I worked on machine learning-based techniques to improve the performance, portability, and energy efficiency of GPU programs.


[ Google Scholar | DBLP ]
A Programming System for Model Compression
V. Joseph, S. Muralidharan, A. Garg, M. Garland, G. Gopalakrishnan
NeurIPS Systems for ML Workshop, December 2019, Vancouver, Canada.  
Designing a Tunable Nested Data-Parallel Programming System
S. Muralidharan, M. Garland, A. Sidelnik, M. Hall,
ACM Transactions on Architecture and Code Optimization (TACO '16)
Abstractions and Strategies for Adaptive Programming
S. Muralidharan
Ph.D. Dissertation, University of Utah, December 2016. 
Architecture-Adaptive Code Variant Tuning
S. Muralidharan, A. Roy, M. Hall, M. Garland, P. Rai
21st ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16), April 2016, Atlanta, GA.  
A Collection-Oriented Programming Model for Performance Portability
S. Muralidharan, M. Garland, B. Catanzaro, A. Sidelnik, M. Hall
20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '15) (short paper), February 2015, San Francisco, CA. 
Nitro: A Framework for Adaptive Code Variant Tuning
S. Muralidharan, M. Shantharam, M. Hall, M. Garland, B. Catanzaro
28th IEEE International Parallel & Distributed Processing Symposium (IPDPS '14), May 2014, Phoenix, AZ.   
Towards Making Autotuning Mainstream
P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, A. Venkat
International Journal of High Performance Computing Applications, Volume 27 (IJHPCA '13), November 2013.
Galaxia: A Semi-Decentralized System for Implementing Secure-Group P2P Networks
S. Muralidharan, S. Koroth, N. Anto, R. Pandarachalil
International Conference on Networks & Communications (NetCoM '09), December 2009, Chennai, India.


A Programmable Approach to Model Compression
V. Joseph, S. Muralidharan, A. Garg, M. Garland, G. Gopalakrishnan
arXiv 1911.02497 (2019).  

Open-Source Software


Condensa is a framework for programmable model compression in Python. It comes with a set of built-in compression operators which may be used to compose complex compression schemes targeting specific combinations of DNN architecture, hardware platform, and optimization objective. To recover any accuracy lost during compression, Condensa uses a constrained optimization formulation of model compression and employs an Augmented Lagrangian-based algorithm as the optimizer.

Nitro Autotuning Framework

Nitro is a programmer-directed code variant tuning framework, jointly developed by the University of Utah and NVIDIA Research. It utilizes machine learning-based classification to automatically find the best implementation (variant) of a computation for a given input. Nitro provides C++ and Python interfaces for programmers to specify variants, input dataset features, and constraints.

Professional Service