| 
            
              | 
                  Kun-Yu Lin 
                 
		I am now a post-doctoral research fellow at the University of Hong Kong, under the supervision of Prof. Kai Han. 
		I earned my PhD degree from Sun Yat-sen University, under the supervision of Prof. Wei-Shi Zheng. 
		Before that, I received my Bachelor's degree and Master's degree from Sun Yat-Sen University.
		During my PhD, I was fortunate to have the opportunity to study as a visiting student at MMLab@NTU, under the supervision of Prof. Chen Change Loy and Prof. Henghui Ding.  
                My research interests include computer vision and machine learning. 
                 
                  Email  / 
                  Scholar  / 
                  Github
                 |   |  
		  
              	  | News |   | ❅ 09/2025: Two papers were accepted to NeurIPS25. |   | ❅ 07/2025: One paper was accepted to TPAMI. |   | ❅ 06/2025: Three papers were accepted to ICCV25. |   | ❅ 05/2025: Releasing Panoptic Captioning, a novel captioning task to seek the minimum text equivalence of images. |   | ❅ 02/2025: Four papers were accepted to CVPR2025. Sincerely congratulations to Jiaming, Yi-Xing, Yu and Wei-Jin. |   | ❅ 12/2024: One paper was accepted to AAAI2025. |   | ❅ 07/2024: One paper was accepted to TPAMI. |   | ❅ 03/2024: Releasing XOV-Action, the first cross-domain open-vocabulary action recognition benchmark! |   | ❅ 09/2023: One paper was accepted to NeurIPS2023. |   | ❅ 09/2023: One paper was accepted to TPAMI. |   | ❅ 07/2023: One paper was accepted to ICCV2023. |   | ❅ 03/2023: Two papers were accepted to CVPR2023. |    |  |  
              
              | Selected Works
                Most of my research works are about video understanding, trustworthy deep learning, and vision-language models. 
		Some works are highlighted.
		# denotes equal contributions. * denotes corresponding author. 
                 |  
    
      |   | Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text Kun-Yu Lin, Hongjun Wang, Weining Ren, Kai Han*
 NeurIPS, 2025
 arXiv
        /
        project page
 
	A novel vision-language task, named panoptic captioning, leading to comprehensive text representations to seek the conceptual minimum text equivalence of images.   
         |  
      |   | ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations Tianming Liang, Kun-Yu Lin, Chaolei Tan, Jianguo Zhang, Wei-Shi Zheng, Jian-Fang Hu*
 ICCV, 2025
 arXiv
        /
        project page
        /
        github
 
		A strong referring video object segmentation model based on visual grounding foundations. The core of the runner-up solution for the PVUW Challenge RVOS Track at CVPR 2025.   
         |  
      |   | Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization Jiaming Zhou, Ke Ye, Jiayi Liu, Teli Ma, Zifan Wang, Ronghe Qiu, Kun-Yu Lin, Zhilin Zhao, Junwei Liang*
 NeurIPS, 2025
 arXiv
	/
        project page
 
		A cross-task manipulation generalization benchmark to evaluate existing Vision-Language-Action (VLA) models, and a novel generalizable VLA method. 
         |  
      |   | Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation Jiaming Zhou, Teli Ma, Kun-Yu Lin, Ronghe Qiu, Zifan Wang, Junwei Liang*
 CVPR, 2025
 arXiv
	/
        project page
	/
        github
 
		A new paradigm, utilizing paired human-robot videos, to adapt human-data pretrained models for robotic manipulation.  
         |  
      |   | Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks Yu Zhou#, Dian Zheng#, Qijie Mo, Renjie Lu, Kun-Yu Lin*, Wei-Shi Zheng*
 CVPR, 2025, Highlight
 paper
	/
        arXiv
 
	A general unlearning solution for any class-centric tasks, without using any retained data and any pretrained model knowledge. 
         |  
      |   | ParGo: Bridging Vision-Language with Partial and Global Views An-Lan Wang, Bin Shan, Wei Shi, Kun-Yu Lin, Xiang Fei, Guozhi Tang, Lei Liao, Jingqun Tang, Can Huang, Wei-Shi Zheng*
 AAAI, 2025
 paper
	/
        arXiv
        /
        github
 
	A novel connector for bridging vision and language modalities by leveraging both global and partial views, and a large-scale image-text datasets consisting of detailed captions.  
         |  
      |   | Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition Kun-Yu Lin, Henghui Ding, Jiaming Zhou, Yu-Ming Tang, Yi-Xing Peng, Zhilin Zhao, Chen Change Loy, Wei-Shi Zheng
 arXiv, 2024
 arXiv
        /
        github
 
	The first benchmark, named XOV-Action, for the cross-domain open-vocabulary action recognition task, 
	and a simple yet effective method to address the scene bias for the task. 
         |  
      |   | Human-Centric Transformer for Domain Adaptive Action Recognition Kun-Yu Lin, Jiaming Zhou, Wei-Shi Zheng*
 TPAMI, 2025
 paper
	/
        arXiv
 
	A human-centric video network to address the context bias in domain adaptive action recognition.  
         |  
      |   | Diversifying Spatial-Temporal Perception for Video Domain Generalization Kun-Yu Lin, Jia-Run Du, Yipeng Gao, Jiaming Zhou, Wei-Shi Zheng*
 NeurIPS, 2023
 paper
        /
        arXiv
        /
        github
 
	A diversity-aware video network to address domain-specific bias in video domain generalization. 
         |  
      |   | Event-Guided Procedure Planning from Instructional Videos with Text Supervision An-Lan Wang#, Kun-Yu Lin#, Jia-Run Du, Jingke Meng*, Wei-Shi Zheng*
 ICCV, 2023
 paper
        /
        arXiv
 
	A new event-guided paradigm to address the semantic gap between observed states and unobserved actions for procedure planning in instructional videos. 
         |  
      |   | AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection Yipeng Gao#, Kun-Yu Lin#, Junkai Yan, Yaowei Wang, Wei-Shi Zheng*
 CVPR, 2023
 paper
        /
        github
 
	An asymmetric adaptation paradigm for few-shot domain adaptive object detection. 
         |  
      |   | DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition Jiayu Jiao#, Yu-Ming Tang#, Kun-Yu Lin, Yipeng Gao, Jinhua Ma, Yaowei Wang, Wei-Shi Zheng*
 TMM, 2023
 paper
        /
        arXiv
        /     
        project page
        /
        github
 
	A new vision transformer architecture for efficient and effective visual understanding. 
         |  
      |   | Supervision Adaptation Balancing In-distribution Generalization and Out-of-distribution Detection Zhilin Zhao, Longbing Cao, Kun-Yu Lin
 TPAMI, 2023
 paper
        /
        arxiv
        /
        github
 
	A theorectical method to balancing in-distribution generalization and out-of-distribution detection.
         |  
      |   | Revealing the Distributional Vulnerability of Discriminators by Implicit Generators Zhilin Zhao, Longbing Cao, Kun-Yu Lin
 TPAMI, 2023
 paper
        /
        arxiv
        /
        github
 
	A theorectical method based on implicit generators to improve out-of-distribution detection. 
         |  
      |   | Adversarial Partial Domain Adaptation by Cycle Inconsistency Kun-Yu Lin, Jiaming Zhou, Yukun Qiu, Wei-Shi Zheng*
 ECCV, 2022
 paper
        /
        github
 
	A simple yet effective method based on cycle transformation to filter out outlier classes in partial domain adaptation. 
         |  
            
              |  | Reviewer of CVPR23, CVPR24, CVPR25 Reviewer of ICCV23, ICCV25
 Reviewer of ECCV24
 Reviewer of ICLR25
 Reviewer of NeurIPS24, NeurIPS25
 Reviewer of TPAMI
 Reviewer of IJCV
 
 |  
            
              | 
 
			This website borrows from Jon Barron.
                 |  |