We research on foundations and application of approaches that make AI explainable and controllable.
Mechanistic interpretability for AI Safety
We try to understand the mechanisms of DNN models, and reveal how their internal mechanisms lead to their behavior, especially the behavior related to safety. We advance the frontier of mechanistic interpretability, probing DNNs including language models and beyond. We analyze the DNNs at multiple abstraction levels: layerwise, module-wise, attention-wise, neuron-wise, SAE features, etc. The flip side of interpretability is controllability. We also study how to control the DNNs by modifying their internal mechanisms.
Related works include:
- Truth Neurons (2025)
- What does the Knowledge Neuron Thesis Have to do with Knowledge? (2024)
- A State-Vector Framework for Dataset Effects (2023)
- Predicting Fine-Tuning Performance with Probing (2023)
- On the Data Requirements of Probing (2022)
- An Information-Theoretic View on Selecting Linguistic Probes (2020)
Transparent and Efficient DNNs
We try to make DNNs transparent and efficient, joining the two goals that were traditionally considered separately. The directions we are exploring include: pruning a majority portion of parameters, developing parameter-efficient and/or quantized adapters and other architectural components, and accelerating them on the appropriate hardwares. We are also interested in studying the trade-off between transparency and efficiency.
Related works include:
Reasoning, Explanation and AI4Research
We develop AI tools for scientific research. We are interested in how AI can help researchers along multiple steps in the process of producing novel scientific knowledge. These steps include understanding scientific papers, reasoning about the knowledge, creating scientific hypotheses, planning experiments, analyzing experiment results, generating scientific explanations, writing academic papers, etc. These eventually lead to the discovery of novel knowledge.
Related works include:
- $ACCORD$: Closing the Commonsense Measurability Gap (2025)
- What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning (2025)
- LLM-Generated Black-box Explanations can be Adversarially Helpful (2024)
- Scenarios and Approaches for Situated Natural Language Explanations (2024)
Applications of AI Agents
We explore novel use cases of AI agents driven by strong, general-purpose foundational models (including but not limited to language models and vision-language models). We consider the problems that have profound real-world impacts, including: finance, sports, and education, etc. We explore innovative architectures for these agents, and develop benchmarks that rigorously evaluate their performances.
Related works include:
Special thanks to the sponsors for supporting our researches: