
(I am currently open to internship opportunities. Feel free to reach out!)
Hello! My name is Shih-Po (Robert) Lee, 李仕柏 in Mandarin. I am a fourth-year PhD student in Khoury College of Computer Sciences at Northeastern University, advised by
Ehsan Elhamifar.
My research focuses on video understanding, computer vision, and deep learning. I am currently working on multimodal large language models for understanding long egocentric procedural videos.
News
June 19, 2026
Our paper, "ESTANet: Efficient Online Error Detection in Procedural Videos via Prediction Inconsistency", has been accepted to ECCV 2026.
Jun 5, 2026
I will present "AXG-Reasoner: Error Detection and Explanation in Long Task Videos with Vision–Language Models" in the morning session in CVPR 2026, Denver.
Feb 20, 2026
Our paper has been accepted to CVPR 2026.
Oct 22, 2025
Attended ICCV 2025 in Hawaii and presented "Error Recognition in Procedural Videos using Generalized Task Graph".
May 5, 2025
Joined Honda Research Institute USA as a research intern.
Research Projects





HuPR: A Benchmark for Human Pose Estimation Using Millimeter Wave Radar
WACV 2023 · January

GSVNet: Guided Spatially-Varying Convolution for Fast Semantic Segmentation on Video
ICME 2021 · July

Weakly-Supervised Image Semantic Segmentation Using Graph Convolutional Networks
ICME 2021 · July
Experience

Research Intern · Honda Research Institute USA
May 2025 – August 2025

Visiting Student · Electrical & Computer Engineering
September 2021 – March 2022

Research Assistant · UW-NCTU AI Lab
August 2020 – August 2022

Teaching Assistant · MediaTek In-house AI Training Program
August 2019 – January 2020
Education
PhD in Khoury College of Computer Sciences
September 2022 – Present

M.S. in Computer Science and Engineering
September 2018 – August 2020
B.S. in Computer Science and Engineering
September 2014 – July 2018
Publications
S.-P. Lee, R. Ghoddoosian, F. Siddiqui, E. Sachdeva, and B. Dariush, "ESTANet: Efficient Online Error Detection in Procedural Videos via Prediction Inconsistency," ECCV, September 2026
S.-P. Lee and E. Elhamifar, "AXG-Reasoner: Error Detection and Explanation in Long Task Videos with Vision–Language Models," CVPR, June 2026
S.-P. Lee and E. Elhamifar, “Error Recognition in Procedural Videos using Generalized Task Graph.” ICCV, October 2025.
S.-P. Lee, Z. Lu, Z. Zhang, M. Hoai, and E. Elhamifar, “Error Detection in Egocentric Procedural Task Videos.” CVPR, June 2024.
S.-P. Lee, N. P. Kini, W.-H. Peng, C.-W. Ma, and J.-N. Hwang, “HuPR: A Benchmark for Human Pose Estimation Using Millimeter Wave Radar.” WACV, January 2023.
S.-P. Lee, S. C. Chen, and W. H. Peng, “GSVNet: Guided Spatially-Varying Convolution for Fast Semantic Segmentation on Video.” ICME, July 2021.
S. Y. Pan, C. Y. Lu, S.-P. Lee, and W. H. Peng, “Weakly-Supervised Image Semantic Segmentation Using Graph Convolutional Networks.” ICME, July 2021.