Sriram Balasubramanian
PhD candidate at the University of Maryland, working on mechanistic interpretability and the safety of advanced AI systems.
About
I'm a PhD candidate in Computer Science at the University of Maryland, College Park, advised by Prof. Soheil Feizi. I work on uncovering the mechanisms that drive the success of modern neural networks — I think a principled understanding of these mechanisms is essential for safely developing and reliably controlling advanced AI.
More broadly, I'm concerned about the impact of advanced AI on human systems and the role of humanity in an era of widespread, superhuman AI. Previously, I was a research fellow at Microsoft Research India, and before that I did my BTech (Hons.) in CS at IIT Bombay.
Research
Interpretability
Decomposing vision and language models into interpretable parts — heads, MLPs, circuits — and tying them to human-readable concepts.
Robustness
The geometry of model decisions — blind spots, masking, and the gap between what models claim and what they actually compute.
AI & Society
Where generative models meet the world: artistic style copyright, the (im)possibility of reliable AI-text detection, and other safety-adjacent questions.
Selected publications
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models
Rethinking Artistic Copyright Infringement in the Era of Text-to-Image Generative Models
Can AI-Generated Text be Reliably Detected? Stress Testing AI Text Detectors Under Various Attacks
Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP
Exploring the Geometry of Blind Spots in Vision Models
Towards Improved Input Masking for Convolutional Neural Networks
Simulating Network Paths with Recurrent Buffering Units
What's in a Name? Are BERT Named Entity Representations just as Good for any other Name?
Full list on Google Scholar.
Experience
-
2022 — Now
PhD candidate, Computer Science
University of Maryland, College Park
Advised by Prof. Soheil Feizi. Mechanistic interpretability of vision and language models; CoT faithfulness; post-hoc attribution.
-
2020 — 2022
Research Fellow
Microsoft Research India
With Nagarajan Natarajan and Venkata Padmanabhan — building ML models for simulating network behaviour.
-
2016 — 2020
BTech (Honours), Computer Science
Indian Institute of Technology Bombay
Bachelor's thesis with Prof. Sunita Sarawagi on the robustness of BERT-derived representations to uncommon named entities.