Anomaly Detection (AD) focuses on identifying anomalies by learning exclusively from normal samples, as collecting anomalous data is often costly and limited due to its long-tailed distribution. Given its practical significance, AD has been widely deployed in applications such as industrial defect inspection and security surveillance. In recent years, Multimodal Large Language Models (MLLMs) have shown revolutionary capabilities across various vision tasks, yet their potential in AD remains underexplored. How can AD benefit from MLLMs? What breakthroughs can MLLMs bring to this long-established vision problem? In this talk, we will discuss state-of-the-art MLLM-based AD methods for both image and video scenarios. Key advancements, including few-/zero-shot learning, multimodal reasoning, instruction data, benchmarking, and real-world adaptability, will be highlighted. We will conclude by exploring future directions and open challenges, aiming to bridge the gap between AD research and the rapid progress in MLLMs.
Shao-Yuan Lo is a Research Scientist at Honda Research Institute USA. He received his Ph.D. from Johns Hopkins University in 2023, and M.S. and B.S. degrees from National Chiao Tung University in 2019 and 2017, respectively. His recent research focuses on Multimodal LLMs and Trustworthy AI. He has first/corresponding-authored nearly 20 publications, such as IEEE T-PAMI, IEEE T-IP, CVPR and ECCV. He won the Best Paper Award at ACM Multimedia Asia 2019.