My research lies in the theoretical foundations of reinforcement learning and multi-armed bandits, with a primary focus on sequential decision-making under uncertainty. Grounded in stochastic inference, information theory, and coding-theoretic methods, my work aims to characterize the fundamental limits of learning and to design provably efficient algorithms with strong statistical guarantees.
A central theme of my research is the analysis of exploration–exploitation trade-offs using tools from probability theory, concentration inequalities, and information-theoretic lower bounds. I study regret minimization and best-arm identification problems in stochastic and structured bandit models, with an emphasis on tight finite-time performance guarantees and matching minimax lower bounds. This perspective highlights how information acquisition, uncertainty quantification, and adaptive sampling jointly govern learning efficiency under partial feedback.
An important direction of my current and future work concerns distributed and decentralized learning, including distributed multi-armed bandits and federated learning setups. Leveraging my background in coding and communication theory, I investigate how communication constraints, information compression, and limited feedback impact learning performance in networked environments. My goal is to develop algorithms that are statistically optimal while being communication-efficient, and to characterize fundamental trade-offs between regret, communication cost, and scalability in large-scale learning systems.
At IIT Kharagpur, my research aims to strengthen the theoretical foundations of bandits and reinforcement learning, mentor students in rigorous mathematical analysis, and foster interdisciplinary collaborations across AI, mathematics, and systems research.