Research projects

We research on foundations and application of approaches that make AI explainable and controllable.

Mechanistic interpretability for AI Safety

We try to understand the mechanisms of DNN models, and reveal how their internal mechanisms lead to their behavior, especially the behavior related to safety. We advance the frontier of mechanistic interpretability, probing DNNs including language models and beyond. We analyze the DNNs at multiple abstraction levels: layerwise, module-wise, attention-wise, neuron-wise, SAE features, etc. The flip side of interpretability is controllability. We also study how to control the DNNs by modifying their internal mechanisms.

Related works include:

Transparent and Efficient DNNs

We try to make DNNs transparent and efficient, joining the two goals that were traditionally considered separately. The directions we are exploring include: pruning a majority portion of parameters, developing parameter-efficient and/or quantized adapters and other architectural components, and accelerating them on the appropriate hardwares. We are also interested in studying the trade-off between transparency and efficiency.

Related works include:

Reasoning, Explanation and AI4Research

We develop AI tools for scientific research. We are interested in how AI can help researchers along multiple steps in the process of producing novel scientific knowledge. These steps include understanding scientific papers, reasoning about the knowledge, creating scientific hypotheses, planning experiments, analyzing experiment results, generating scientific explanations, writing academic papers, etc. These eventually lead to the discovery of novel knowledge.

Related works include:

Applications of AI Agents

We explore novel use cases of AI agents driven by strong, general-purpose foundational models (including but not limited to language models and vision-language models). We consider the problems that have profound real-world impacts, including: finance, sports, and education, etc. We explore innovative architectures for these agents, and develop benchmarks that rigorously evaluate their performances.

Related works include:

Special thanks to the sponsors for supporting our researches: