Multimodal LLMs
Vision-language understanding, retrieval-augmented and in-context learning for large multimodal models.
§ Hello, I'm
Algorithm Engineer · Meituan
I work on Multimodal Large Language Models and Continual Learning, with a long-standing interest in building intelligent visual systems that are continual and data-efficient. I received my M.Sc. from the FVL Lab, Fudan University, advised by Prof. Yu-Gang Jiang and Prof. Zhineng Chen, and my B.Eng. from East China University of Science and Technology.
Our paper TDR (Task-Decoupled Retrieval for In-Context Learning) is released on arXiv.
Contributed to the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge (CVPR 2024 Workshops).
MRN is accepted to ICCV 2023.
TCIL is accepted to AAAI 2023.
Vision-language understanding, retrieval-augmented and in-context learning for large multimodal models.
Class- and task-incremental learning that resists catastrophic forgetting under dynamic data streams.
Few-shot and representation learning for robust visual systems with limited supervision.
* denotes my contribution. Selected works below — see Google Scholar for the full list.