Earlier Deep Learning systems practically aimed to learn to reason almost from scratch from data, bypassing readily available domain knowledge; and know-how of well-known reasoning processes (for example physical processes, mathematical theorems). As a result, the pre-Transformers era deep learning models achieved state-of-the-art accuracy in hard tasks, while being hard to interpret (non-interpretable), showing bias of different kinds, and exhibiting logically inconsistent results -- such as not being being able to solve simpler problems. This raises the question whether models have actually mastered the underlying tasks, following a sound reasoning path. Even in this era of foundational models (upto large reasoning models), LLMs (and LRMs) expected to learn such processes mostly from a large corpora using various objectives that does not explicitly enforce learning of sound reasoning processes (lacking any guarantees of formal systems). Naturally, the issue of inconsistency, and lack of reasoning ability still persists.
In Tr^2AIL (Trust and Transparency in AI using Logic) lab, the umbrella theme of our research group is to bridge this gap by evaluating, enhancing, and explaining AI reasoning systems under the lens of Logic (or well-known reasoning processes). A part of effort is dedicated towards proposing various benchmarks to evaluate logical properties of the systems; marking a shift from the holistic accuracy as a standalone statistic. Insights from evaluation further informs enhancement, such as our reasoning dimension-wide evaluation informs our multi-hop proof generation for various NLP (QA/NLI) tasks, new tool-augmented models for complex math problems, neuro-symbolic models logical problem solving in natural language, and Process Reward Models for physical safety and logical coherence evaluations. We are also exploring the impact of our robust reasoning solutions in the field of Healthcare, and education.
-
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation by Pandey S. K., Vashishtha S. , Das D. , Choudhury M. , Aditya S. NAACL 2025 9158-9176 (2025)
-
Image Understanding using vision and reasoning through Scene Description Graph by Aditya S., Yang Y. , Baral C. , Aloimonos Y. Comput. Vis. Image Underst. 173 33-45 (2017)
-
Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs by Puerto H., Tutek M., Aditya S., Zhu X., Gurevych I. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference 11234-11258 (2024)
-
ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments by Ray S., Gupta K., Kundu S., Kasat P.A., Aditya S., Goyal P. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference 15594-15608 (2024)
-
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning by Das D., Banerjee D. , Aditya S. , Kulkarni A. NAACL 2024 - (2024)
-
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning by Das D., Banerjee D., Aditya S., Kulkarni A. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 1 942-966 (2024)
-
TEXT2AFFORD: Probing Object Affordance Prediction abilities of Language Models solely from Text by Adak S., Agrawal D., Mukherjee A., Aditya S. CoNLL 2024 - 28th Conference on Computational Natural Language Learning, Proceedings of the Conference 342-364 (2024)
-
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks by Rao A., Vashistha S., Naik A., Aditya S., Choudhury M. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings 16802-16830 (2024)
-
Tricking LLMs into Disobedience: Understanding, Analyzing, and Preventing Jailbreaks by Rao A., Vashishtha S. , Naik A. , Aditya S. , Choudhury M. LREC-COLING 2024 - (2024)
-
Prover: Generating Intermediate Steps for NLI with Commonsense Knowledge Retrieval and Next-Step Prediction by Ghoshal D., Aditya S. , Choudhury M. AACL-IJCNLP 872-884 (2023)
Principal Investigator
- Agentic Verifiers - Provably Safe Test-time Scaling for Reasoning Models MICROSOFT RESEARCH LAB INDIA PRIVATE LIMITED
Co-Principal Investigator
- Using Large Language Models to Enhance Learning Efficiency and Student Engagement in Indian Education System IIT KHARAGPUR AI4ICPS I HUB FOUNDATION
Ph. D. Students
Arghyadeep Ghosh
Area of Research: Neurosymbolic Reasoning in NLP, Safe AI
Ishan Sahu
Area of Research: Learning Enabled Cyber Physical Systems
Sachin Vashistha
Area of Research: Reasoning, NLP, Causality
Subha Mondal
Area of Research: Reasoning, NLP, Education and LLMs
MS Students
Aritra Dutta
Area of Research: Reasoning, Vision and Language
Kunal Kingkar Das
Area of Research: Vision and Language