Model Compression (Pruning and Quantization Strategies)

Srinivas

Dr. Srinivas Rana

 

Date : 02/04/2025

 

Abstract:

Model compression, focusing specifically on pruning and quantization strategies, is an essential topic in machine learning. As machine learning models become more complex and resource-demanding, exploring ways to make them more efficient without sacrificing their performance is crucial. Pruning and quantization are two key techniques that help reduce the size of models, improve inference speed, and lower resource consumption, making them more suitable for deployment on edge devices or in environments with limited computational power.

Bio:

Dr. Srinivas Rana is currently a Senior ML Scientist at Wadhwani AI, where he leads a portfolio of healthcare solutions pertaining to screening or diagnostics across diverse domains. Previously, he was working in the UK with different organizations in the field of medical devices and life sciences. He holds a PhD from IIT Madras.