I am a researcher in the programming systems team at NVIDIA. My current work focuses on improving the performance and efficiency of deep neural networks through pruning & quantization, neural architecture search (NAS), and compiler- and runtime-based approaches. More broadly, I'm interested in research problems that lie at the intersection of systems and machine learning.
Prior to joining NVIDIA, I received my Ph.D. in Computer Science from the University of Utah under the guidance of Prof. Mary Hall. While at Utah, I worked on machine learning-based techniques to improve the performance, portability, and energy efficiency of GPU programs.
|Reliable Model Compression via Label-Preservation-Aware Loss Functions|
V. Joseph, S. A. Siddiqui, A. Bhaskara, G. Gopalakrishnan, S. Muralidharan, M. Garland, S. Ahmed, A. Dengel
arXiv 2012.01604 (2020). [ pdf]
|A Programmable Approach to Neural Network Compression|
V. Joseph, G. Gopalakrishnan, S. Muralidharan, M. Garland, A. Garg
IEEE Micro Special Issue on Machine Learning for Systems, 2020.
[ code | pdf (arXiv) | talk]
|Designing a Tunable Nested Data-Parallel Programming System
S. Muralidharan, M. Garland, A. Sidelnik, M. Hall,
ACM Transactions on Architecture and Code Optimization (TACO '16). [ pdf]
|Abstractions and Strategies for Adaptive Programming
Ph.D. Dissertation, University of Utah, December 2016. [ pdf]
|Architecture-Adaptive Code Variant Tuning
S. Muralidharan, A. Roy, M. Hall, M. Garland, P. Rai
ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16), April 2016, Atlanta, GA.
[ pdf | slides]
|A Collection-Oriented Programming Model for Performance Portability
S. Muralidharan, M. Garland, B. Catanzaro, A. Sidelnik, M. Hall
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '15) (short paper), February 2015, San Francisco, CA. [ pdf]
|Nitro: A Framework for Adaptive Code Variant Tuning
S. Muralidharan, M. Shantharam, M. Hall, M. Garland, B. Catanzaro
IEEE International Parallel & Distributed Processing Symposium (IPDPS '14), May 2014, Phoenix, AZ.
[ code | pdf | slides]
|Towards Making Autotuning Mainstream
P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, A. Venkat
International Journal of High Performance Computing Applications, Volume 27 (IJHPCA '13), November 2013.
Condensa is a framework for programmable model compression in Python. It comes with a set of built-in compression operators which may be used to compose complex compression schemes targeting specific combinations of DNN architecture, hardware platform, and optimization objective. To recover any accuracy lost during compression, Condensa uses a constrained optimization formulation of model compression and employs an Augmented Lagrangian-based algorithm as the optimizer.
|Nitro Autotuning Framework
Nitro is a programmer-directed code variant tuning framework, jointly developed by the University of Utah and NVIDIA Research. It utilizes machine learning-based classification to automatically find the best implementation (variant) of a computation for a given input. Nitro provides C++ and Python interfaces for programmers to specify variants, input dataset features, and constraints.