I am an Assistant Professor in the Computer Science Department of UT Dallas and lead the Computer Vision and Multimodal Computing (CVMC) Lab. Before coming to UTD, I finished my PhD at University of Rochester, advised by Chenliang Xu, my master degree at Tsinghua University working with Wenming Yang, and B.E degree at Xidian University. I was a visiting student at SIAT advised by Yu Qiao. I did internships at Adobe Research with Dingzeyu Li and Meta with Alexander Richard. I am interested in solving core computer vision, computer audition, and machine learning problems and applying the developed learning approaches to broad AI applications, such as multisensory perception, computational photography, AR/VR, accessibility, and healthcare. My work has been recognized with awards including the AAAI New Faculty Highlights, Cisco Faculty Research Award, and Amazon Research Award.
Call for papers:
We are organizing an Audio Imagination workshop at NeurIPS 2024. We invite submissions for Main Paper and Demo Tracks. Paper will be submitted through OpenReview. OpenReview Submission Link - Audio Imagination 2024. The submission deadline is 09/20/2024. Please go to Submission Page for submission instructions and more details.
|
Students at UTD:
Sai Nagender Vasireddy (PhD student; Fall 2022)
Shijian Deng (PhD student; Spring 2023)
Saksham Singh Kushwaha (PhD student; Summer 2023)
Jia Li (PhD student; Fall 2024)
Xinpeng Li (PhD student; Fall 2024)
Ziru Huang (Visiting student; Tsinghua University; Spring 2024)
Michael Yang (K12; Summer 2023)
Matthew Wang (K12; Summer 2023)
Collaborated External Students:
Tianyu Yang (PhD student at University of Notre Dame)
Shentong Mo (PhD student at Carnegie Mellon University)
Kai Wang (PhD student at University of Toronto)
Alumni:
Yiyang Nan (Graduate student at Brown University; Spring 2023-2024; Next: researcher at Cohere for AI)
William Doan (Undergraduate; Fall 2023 - Summer 2024; Jonsson School of Engineering and Computer Science Award)
Zeke Barnett (K12; Parish Episcopal School at Dallas, Spring 2023 - Spring 2024; Next Undergraduate at CMU)
Anikait Bharadwaj (K12; Frisco ISD; Spring 2024)
Aditya Kulkarni (Undergraduate; Spring 2023)
Atmin Mehul Sheth (Undergraduate at UTD; 2023)
Yuxin Ye (Graduate student at Tsinghua University)
Yichen Chi (Graduate student at Tsinghua University)
Junhao Gu (PhD student at Tsinghua University)
Jiamiao Zhang (Graduate student at Tsinghua University)
Sen Fang (Undergraduate at Victoria University, Next: PhD student at Rutgers University)
Sasha Kaplan (Undergraduate; Spring 2023)
Sisi Aarukapalli (Undergraduate; Summer 2023)
Harsh Singh (PhD student at UTD; Spring and Summer 2023; Next: CV MSC at MBZUAI)
Yulang Wu (Graduate student at UTD CS, Spring 2023; Next: Postdoc at University of California San Francisco)
Guangyao Li (PhD student at Renmin University of China, Fall 2020 - Spring 2023)
Shijian Deng (Graduate student at University of Rochester; next: PhD student at UTD)
Hai Wang (Graduate student at Tsinghua University; next: PhD student at UCL)
Sizhe Li (Undergraduate student at University of Rochester; next: Visiting student at MIT)
Yiyang Su (Undergraduate at University of Rochester; next: PhD student at Michigan State University)
Rohan Sharma (Graduate student at University of Rochester; next: PhD student at SUNY Buffalo)
Chenxiao Guan (Undergraduate at University of Rochester; next: Graduate student at CMU)
Most recent publications on Google Scholar.
‡ indicates equal contribution.
Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Saksham Singh Kushwaha, Jianbo Ma, Mark R. P. Thomas, Yapeng Tian, and Avery Bruni
Preprint'24.
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Kai Wang, Shijian Deng, Jing Shi, Dimitrios Hatzinakos, Yapeng Tian
Preprint'24.
Text-to-Audio Generation Synchronized with Videos
Shentong Mo, Jing Shi, Yapeng Tian
Preprint'24.
Robust Active Speaker Detection in Noisy Environments
Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian
Preprint'24.
DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models
Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu
Preprint'23.
SignDiff: Learning Diffusion Models for American Sign Language Production
Sen Fang, Chunyu Sui, Xuedong Zhang, Yapeng Tian
Preprint'23.
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei, Di Hu, Yapeng Tian, Xuelong Li
Preprint'22.
Audio-Visual Dataset Distillation
Saksham Singh Kushwaha, Siva Sai Nagender Vasireddy, Kai Wang, Yapeng Tian
TMLR'24: Transactions on Machine Learning Research
Continual Audio-Visual Sound Separation
Weiguo Pian, Yiyang Nan, Shijian Deng, Shentong Mo, Yunhui Guo, Yapeng Tian
NeurIPS'24: The Annual Conference on Neural Information Processing Systems
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition
Shijian Deng, Erin Kosloski, Siddhi Patel, Zeke A Barnett, Yiyang Nan, Alexander M Kaplan, Sisira Aarukapalli, William Doan, Matthew Wang, Harsh Singh, Rollins Pamela, Yapeng Tian
TMM'24: IEEE Transactions on Multimedia.
emoji_events CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision (Belonging & Inclusion Best Paper)
Jaewook Lee, Andrew D. Tjahjadi, Jiho Kim, Junpu Yu, Minji Park, Jiawen Zhang, Jon E. Froehlich, Yapeng Tian, Yuhang Zhao
UIST'24: ACM Symposium on User Interface Software and Technology.
Towards Long Form Audio-visual Video Understanding
Wenxuan Hou, Guangyao Li, Yapeng Tian, Di Hu
TOMM'24: ACM Trans. on Multimedia Computing, Communications and App.
Benchmarking and Optimizing Federated Learning with Hardware-related Metrics
Kai Pan, Yapeng Tian, Yinhe Han, Yiming Gan
BMVC'24: British Machine Vision Conference
EgoVSR: Towards High-Quality Egocentric Video Super-Resolution
Yichen Chi, Junhao Gu, Jiamiao Zhang, Wenming Yang, Yapeng Tian
TCSVT'24: IEEE Transactions on Circuits and Systems for Video Technology.
MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos
Zheng Ning, Zheng Zhang, Jerrick Ban, Kaiwen Jiang, Ruohong Gan, Yapeng Tian, Toby Jia-Jun Li
C&C'24: ACM Conference on Creativity & Cognition.
AV-Mamba: Cross-Modality Selective State Space Models for Audio-Visual Question Answering
Ziru Huang, Jia Li, Wenjie Zhao, Yunhui Guo, Yapeng Tian
CVPRW'24: CVPR Signt and Sound Workshop
Learning Continual Audio-Visual Sound Separation Models
Weiguo Pian, Yiyang Nan, Shijian Deng, Shentong Mo, Yunhui Guo, Yapeng Tian
CVPRW'24: CVPR Signt and Sound Workshop
Audio-Visual Autism Behavior Recognition with MMLMs
Shijian Deng, Erin Kosloski, Siddhi Patel, Zeke A Barnett, Yiyang Nan, Alexander M Kaplan, Sisira Aarukapalli, William Doan, Matthew Wang, Harsh Singh, Rollins Pamela, Yapeng Tian
CVPRW'24: CVPR Signt and Sound Workshop
Dataset distillation for audio-visual datasets
Saksham Singh Kushwaha, Siva Sai Nagender Vasireddy, Kai Wang, Yapeng Tian
CVPRW'24: CVPR Signt and Sound Workshop
DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
Steven Hogue, Chenxu Zhang, Hamza Daruger, Yapeng Tian, Xiaohu Guo
CVPRW'24: CVPR HuMoGen Workshop
Towards Efficient Audio-Visual Learners via Empowering Pre-trained Vision Transformers with Cross-Modal Adaptation
Kai Wang, Yapeng Tian, Dimitrios Hatzinakos
CVPRW'24: CVPR Multimodal Foundation Models Workshop
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Tanvir Mahmud, Shentong Mo, Yapeng Tian, Diana Marculescu
CVPRW'24: CVPR Efficient Deep Learning for Computer Vision Workshop
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud, Yapeng Tian, Diana Marculescu
CVPR'24: IEEE/CVF Conference on Computer Vision and Pattern Recognition
OSCaR: Object State Captioning and State Change Representation
Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu
NAACL'24: The North American Chapter of the Association for Computational Linguistics (Findings)
SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers
Zheng Ning, Brianna Wimer, Kaiwen Jiang, Keyi Chen, Jerrick Ban, Yapeng Tian, Yuhang Zhao, Toby Li
CHI'24: The ACM Conference on Human Factors in Computing Systems.
STADNet: Spatial-Temporal Attention-Guided Dual-Path Network for cardiac cine MRI super-resolution
Jun Lyu, Shuo Wang, Yapeng Tian, Jing Zou, Shunjie Dong, Chengyan Wang, Angelica I Aviles-Rivero, Jing Qin
MIA'24: Medical Image Analysis
Unveiling cross modality bias in visual question answering: A causal view with possible worlds vqa
Ali Vosoughi‡, Shijian Deng‡, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo
TMM'24: IEEE Transactions on Multimedia
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
Yuxin Ye, Wenming Yang, Yapeng Tian
WACV'24: Winter Conference on Applications of Computer Vision.
Disentangled counterfactual learning for physical audiovisual commonsense reasoning
Changsheng Lv, Shuai Zhang, Yapeng Tian, Mengshi Qi, Huadong Ma
NeurIPS'23: The Annual Conference on Neural Information Processing Systems.
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
NeurIPS'23: The Annual Conference on Neural Information Processing Systems.
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
Zheng Zhang‡, Zheng Ning‡, Chenliang Xu Yapeng Tian, Toby Jia-Jun Li
UIST'23: ACM Symposium on User Interface Software and Technology.
Towards Robust Active Speaker Detection
Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian
ICCVW'23: ICCV AV4D Workshop .
Position-Aware Audio-Visual Separation for Spatial Audio
Yuxin Ye, Wenming Yang, Yapeng Tian
ICCVW'23: ICCV AV4D Workshop .
Towards Better Egocentric Action Understanding in a Multi-Input Multi-Output View
Wenxuan Hou, Ruoxuan Feng, Yixin Xu, Yapeng Tian, Di Hu
ICCVW'23: ICCV AV4D Workshop .
Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields
Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
ICCVW'23: ICCV AV4D Workshop .
Separating Invisible Sounds Toward Universal Audio-Visual Scene-Aware Sound Separation
Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu
ICCVW'23: ICCV AV4D Workshop .
Audio-Visual Class-Incremental Learning
Weiguo Pian‡, Shentong Mo‡, Yunhui Guo, Yapeng Tian
ICCV'23: IEEE/CVF International Conference on Computer Vision.
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Shentong Mo‡, Weiguo Pian‡, Yapeng Tian
ICCV'23: IEEE/CVF International Conference on Computer Vision.
DiffIR: Efficient Diffusion Model for Image Restoration
Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Luc Van Gool
ICCV'23: IEEE/CVF International Conference on Computer Vision.
Dual Arbitrary Scale Super-Resolution for Multi-Contrast MRI
Jiamiao Zhang, Yichen Chi, Jun Lyu, Wenming Yang, Yapeng Tian
MICCAI'23: Medical Image Computing and Computer-Assisted Intervention.
Meta-Learning based Degradation Representation for Blind Super-Resolution
Bin Xia, Yapeng Tian, Yulun Zhang, Yucheng Hang, Wenming Yang, Qingmin Liao
TIP'23: IEEE Transactions on Image Processing.
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation
Shentong Mo, Yapeng Tian
CVPRW'23: CVPR Sight and Sound Workshop.
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo, Jing Shi, Yapeng Tian
CVPRW'23: CVPR Sight and Sound Workshop.
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
CVPRW'23: CVPR Sight and Sound Workshop.
Audio-Visual Grouping Network for Sound Localization from Mixtures
Shentong Mo, Yapeng Tian
CVPR'23: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Egocentric Audio-Visual Object Localization
Chao Huang, Yapeng Tian, Anurag Kumar, and Chenliang Xu
CVPR'23: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Structured Sparsity Learning for Efficient Video Super-Resolution
Bin Xia, Jingwen He, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Luc Van Gool
CVPR'23: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Knowledge Distillation based Degradation Estimation for Blind Super-Resolution
Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc Van Gool
ICLR'23: International Conference on Learning Representations.
Basic Binary Convolution Unit for Binarized Image Restoration Network
Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc Van Gool
ICLR'23: International Conference on Learning Representations.
Stdan: deformable attention network for space-time video super-resolution
Hai Wang, Xiaoyu Xiang, Yapeng Tian, Wenming Yang, Qingmin Liao
TNNLS'23: IEEE Transactions on Neural Networks and Learning Systems.
GDSSR: Toward Real-World Ultra-High-Resolution Image Super-Resolution
Yichen Chi, Wenming Yang, Yapeng Tian
SPL'23: IEEE Signal Processing Letters.
Towards Unified, Explainable, and Robust Multisensory Perception
Yapeng Tian
AAAI'23: AAAI Conference on Artificial Intelligence. (NFH program)
Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing
Shentong Mo, Yapeng Tian
NeurIPS'22: The Annual Conference on Neural Information Processing Systems.
Learning Spatio-Temporal Downsampling for Effective Video Upscaling
Xiaoyu Xiang, Yapeng Tian, Vijay Rengarajan, Lucas Young, Bo Zhu, Rakesh Ranjan
ECCV'22: European Conference on Computer Vision.
Audio-Visual Scene Understanding Towards Unified, Explainable, and Robust Multisensory Perception
Yapeng Tian
PhD Thesis
DuDoCAF: Dual-Domain Cross-Attention Fusion with Recurrent Transformer for Fast Multi-contrast MR Imaging
Jun Lyu, Bin Sui, Chengyan Wang, Yapeng Tian, Qi Dou, and Jing Qin
MICCAI'22: Medical Image Computing and Computer Assisted Intervention.
Audio-Visual Object Localization in Egocentric Videos
Chao Huang, Yapeng Tian, Anurag Kumar, and Chenliang Xu
CVPRW'22: CVPR Workshops
Egocentric audio-visual learning.
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li‡, Yake Wei‡, Yapeng Tian‡, Chenliang Xu, Ji-Rong Wen, and Di Hu
CVPR'22 Oral: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Transformer-empowered Multi-contrast MRI Super-Resolution
Guangyuan Li, Jun Lv, Yapeng Tian, Qi Dou, Chengyan Wang, Chenliang Xu, Jing Qin
CVPR'22: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Coarse-to-Fine Embedded PatchMatch and Multi-Scale Dynamic Aggregation for Reference-based Super-Resolution
Bin Xia, Yapeng Tian, Yucheng Hang, Wenming Yang, Qingmin Liao, Jie Zhou
AAAI'22: The AAAI Conference on Artificial Intelligence.
Efficient Non-Local Contrastive Attention for Image Super-Resolution
Bin Xia‡, Yucheng Hang‡, Yapeng Tian, Wenming Yang, Qingmin Liao, Jie Zhou
AAAI'22: The AAAI Conference on Artificial Intelligence.
Space-Time Memory Network for Sounding Object Localization in Videos
Sizhe Li‡, Yapeng Tian‡, and Chenliang Xu
BMVC'21: The British Machine Vision Conference.
Video Matting via Consistency-Regularized Graph Neural Networks
Tiantian Wang, Sifei Liu, Yapeng Tian, Kai Li, and Ming-Hsuan Yang
ICCV'21: IEEE/CVF International Conference on Computer Vision.
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian and Chenliang Xu
CVPR'21: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Yapeng Tian, Di Hu, and Chenliang Xu
CVPR'21: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian, Dingzeyu Li, and Chenliang Xu
ECCV'20 Spotlight: European Conference on Computer Vision.
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution
Xiaoyu Xiang‡, Yapeng Tian‡, Yulun Zhang, Yun Fu, Jan Allebach, and Chenliang Xu
CVPR'20: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
TDAN: Temporally Deformable Alignment Network for Video Super-Resolution
Yapeng Tian, Yulun Zhang, Yun Fu, and Chenliang Xu
CVPR'20: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
This is the first work that uses deformable alignment to address video restoration.
Deep Audio Prior
Yapeng Tian, Chenliang Xu, and Dingzeyu Li
CVPRW'20: CVPR Workshops.
Residual Dense Network for Image Super-Resolution
Yulun Zhang, Yapeng Tian, Yu Kong , Bineng Zhong, Yun Fu
TPAMI'20: IEEE Transactions on Pattern Analysis and Machine Intelligence.
CFSNet: Toward a Controllable Feature Space for Image Restoration
Wei Wang‡, Ruiming Guo‡, Yapeng Tian, and Wenming Yang
ICCV'19: IEEE/CVF International Conference on Computer Vision.
Interpretable and Controllable Audio-Visual Video Captioning
Yapeng Tian, Chenxiao Guan, Goodman Justin, Marc Moore, and Chenliang Xu
CVPRW'19: CVPR Workshops.
Multisensory interpretability in terms of the audio-visual video captioning task.
LCSCNet: Linear Compressing Based Skip-Connecting Network for ISR
Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue, Qingmin Liao
TIP'19: IEEE Trans. Image Processing.
Deep Learning for Single Image Super-Resolution: A Brief Review
Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, JingHao Xue, Qingmin Liao
TMM'19: IEEE Trans. Multimedia.
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, Chenliang Xu
ECCV'18: European Conference on Computer Vision.
Residual Dense Network for Image Super-Resolution
Yulun Zhang, Yapeng Tian, Yu Kong , Bineng Zhong, Yun Fu
CVPR'18 Spotlight: IEEE/CVF Conf. on Computer Vision and Pattern Recognition.
NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results
Timofte et al.
CVPRW'17: CVPR Workshops.
Consistent Coding Scheme for Single-Image Super-Resolution
Wenming Yang, Yapeng Tian, Fei Zhou, Qingmin Liao, Hai Chen, Chenglin Zheng
TMM'16: EEE Trans. Multimedia. (First student author)
Anchored Neighborhood Regression based SISR from Self-examples
Yapeng Tian, Fei Zhou, Wenming Yang, Xuesen Shang, Qingmin Liao
ICIP'16: IEEE International Conference on Image Processing.
SISR Using Clustering-Based Global Regression and Propagation Filtering
Wenming Yang, Yapeng Tian, Fei Zhou, ..., Qingmin Liao
ACPR'15 Oral: Asian Conference on Pattern Recognition. (First student author)
Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Saksham Singh Kushwaha, Jianbo Ma, Mark R. P. Thomas, Yapeng Tian, and Avery Bruni
Preprint'24.
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Kai Wang, Shijian Deng, Jing Shi, Dimitrios Hatzinakos, Yapeng Tian
Preprint'24.
Text-to-Audio Generation Synchronized with Videos
Shentong Mo, Jing Shi, Yapeng Tian
Preprint'24.
Robust Active Speaker Detection in Noisy Environments
Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian
Preprint'24.
DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models
Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu
Preprint'23.
SignDiff: Learning Diffusion Models for American Sign Language Production
Sen Fang, Chunyu Sui, Xuedong Zhang, Yapeng Tian
Preprint'23.
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei, Di Hu, Yapeng Tian, Xuelong Li
Preprint'22.
Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Saksham Singh Kushwaha, Jianbo Ma, Mark R. P. Thomas, Yapeng Tian, and Avery Bruni
Preprint'24.
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Kai Wang, Shijian Deng, Jing Shi, Dimitrios Hatzinakos, Yapeng Tian
Preprint'24.
Text-to-Audio Generation Synchronized with Videos
Shentong Mo, Jing Shi, Yapeng Tian
Preprint'24.
Robust Active Speaker Detection in Noisy Environments
Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian
Preprint'24.
DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models
Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu
Preprint'23.
SignDiff: Learning Diffusion Models for American Sign Language Production
Sen Fang, Chunyu Sui, Xuedong Zhang, Yapeng Tian
Preprint'23.
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei, Di Hu, Yapeng Tian, Xuelong Li
Preprint'22.
Audio-Visual Dataset Distillation
Saksham Singh Kushwaha, Siva Sai Nagender Vasireddy, Kai Wang, Yapeng Tian
TMLR'24: Transactions on Machine Learning Research
Continual Audio-Visual Sound Separation
Weiguo Pian, Yiyang Nan, Shijian Deng, Shentong Mo, Yunhui Guo, Yapeng Tian
NeurIPS'24: The Annual Conference on Neural Information Processing Systems
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition
Shijian Deng, Erin Kosloski, Siddhi Patel, Zeke A Barnett, Yiyang Nan, Alexander M Kaplan, Sisira Aarukapalli, William Doan, Matthew Wang, Harsh Singh, Rollins Pamela, Yapeng Tian
TMM'24: IEEE Transactions on Multimedia.
Towards Long Form Audio-visual Video Understanding
Wenxuan Hou, Guangyao Li, Yapeng Tian, Di Hu
TOMM'24: ACM Trans. on Multimedia Computing, Communications and App.
MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos
Zheng Ning, Zheng Zhang, Jerrick Ban, Kaiwen Jiang, Ruohong Gan, Yapeng Tian, Toby Jia-Jun Li
C&C'24: ACM Conference on Creativity & Cognition.
AV-Mamba: Cross-Modality Selective State Space Models for Audio-Visual Question Answering
Ziru Huang, Jia Li, Wenjie Zhao, Yunhui Guo, Yapeng Tian
CVPRW'24: CVPR Signt and Sound Workshop
Learning Continual Audio-Visual Sound Separation Models
Weiguo Pian, Yiyang Nan, Shijian Deng, Shentong Mo, Yunhui Guo, Yapeng Tian
CVPRW'24: CVPR Signt and Sound Workshop
Audio-Visual Autism Behavior Recognition with MMLMs
Shijian Deng, Erin Kosloski, Siddhi Patel, Zeke A Barnett, Yiyang Nan, Alexander M Kaplan, Sisira Aarukapalli, William Doan, Matthew Wang, Harsh Singh, Rollins Pamela, Yapeng Tian
CVPRW'24: CVPR Signt and Sound Workshop
Dataset distillation for audio-visual datasets
Saksham Singh Kushwaha, Siva Sai Nagender Vasireddy, Kai Wang, Yapeng Tian
CVPRW'24: CVPR Signt and Sound Workshop
DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
Steven Hogue, Chenxu Zhang, Hamza Daruger, Yapeng Tian, Xiaohu Guo
CVPRW'24: CVPR HuMoGen Workshop
Towards Efficient Audio-Visual Learners via Empowering Pre-trained Vision Transformers with Cross-Modal Adaptation
Kai Wang, Yapeng Tian, Dimitrios Hatzinakos
CVPRW'24: CVPR Multimodal Foundation Models Workshop
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Tanvir Mahmud, Shentong Mo, Yapeng Tian, Diana Marculescu
CVPRW'24: CVPR Efficient Deep Learning for Computer Vision Workshop
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud, Yapeng Tian, Diana Marculescu
CVPR'24: IEEE/CVF Conference on Computer Vision and Pattern Recognition
SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers
Zheng Ning, Brianna Wimer, Kaiwen Jiang, Keyi Chen, Jerrick Ban, Yapeng Tian, Yuhang Zhao, Toby Li
CHI'24: The ACM Conference on Human Factors in Computing Systems.
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
Yuxin Ye, Wenming Yang, Yapeng Tian
WACV'24: Winter Conference on Applications of Computer Vision.
Disentangled counterfactual learning for physical audiovisual commonsense reasoning
Changsheng Lv, Shuai Zhang, Yapeng Tian, Mengshi Qi, Huadong Ma
NeurIPS'23: The Annual Conference on Neural Information Processing Systems.
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
NeurIPS'23: The Annual Conference on Neural Information Processing Systems.
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
Zheng Zhang‡, Zheng Ning‡, Chenliang Xu Yapeng Tian, Toby Jia-Jun Li
UIST'23: ACM Symposium on User Interface Software and Technology.
Towards Robust Active Speaker Detection
Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian
ICCVW'23: ICCV AV4D Workshop .
Position-Aware Audio-Visual Separation for Spatial Audio
Yuxin Ye, Wenming Yang, Yapeng Tian
ICCVW'23: ICCV AV4D Workshop .
Towards Better Egocentric Action Understanding in a Multi-Input Multi-Output View
Wenxuan Hou, Ruoxuan Feng, Yixin Xu, Yapeng Tian, Di Hu
ICCVW'23: ICCV AV4D Workshop .
Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields
Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
ICCVW'23: ICCV AV4D Workshop .
Separating Invisible Sounds Toward Universal Audio-Visual Scene-Aware Sound Separation
Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu
ICCVW'23: ICCV AV4D Workshop .
Audio-Visual Class-Incremental Learning
Weiguo Pian‡, Shentong Mo‡, Yunhui Guo, Yapeng Tian
ICCV'23: IEEE/CVF International Conference on Computer Vision.
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Shentong Mo‡, Weiguo Pian‡, Yapeng Tian
ICCV'23: IEEE/CVF International Conference on Computer Vision.
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation
Shentong Mo, Yapeng Tian
CVPRW'23: CVPR Sight and Sound Workshop.
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo, Jing Shi, Yapeng Tian
CVPRW'23: CVPR Sight and Sound Workshop.
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
CVPRW'23: CVPR Sight and Sound Workshop.
Audio-Visual Grouping Network for Sound Localization from Mixtures
Shentong Mo, Yapeng Tian
CVPR'23: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Egocentric Audio-Visual Object Localization
Chao Huang, Yapeng Tian, Anurag Kumar, and Chenliang Xu
CVPR'23: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Towards Unified, Explainable, and Robust Multisensory Perception
Yapeng Tian
AAAI'23: AAAI Conference on Artificial Intelligence. (NFH program)
Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing
Shentong Mo, Yapeng Tian
NeurIPS'22: The Annual Conference on Neural Information Processing Systems.
Audio-Visual Scene Understanding Towards Unified, Explainable, and Robust Multisensory Perception
Yapeng Tian
PhD Thesis
Audio-Visual Object Localization in Egocentric Videos
Chao Huang, Yapeng Tian, Anurag Kumar, and Chenliang Xu
CVPRW'22: CVPR Workshops
Egocentric audio-visual learning.
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li‡, Yake Wei‡, Yapeng Tian‡, Chenliang Xu, Ji-Rong Wen, and Di Hu
CVPR'22 Oral: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Space-Time Memory Network for Sounding Object Localization in Videos
Sizhe Li‡, Yapeng Tian‡, and Chenliang Xu
BMVC'21: The British Machine Vision Conference.
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian and Chenliang Xu
CVPR'21: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Yapeng Tian, Di Hu, and Chenliang Xu
CVPR'21: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian, Dingzeyu Li, and Chenliang Xu
ECCV'20 Spotlight: European Conference on Computer Vision.
Deep Audio Prior
Yapeng Tian, Chenliang Xu, and Dingzeyu Li
CVPRW'20: CVPR Workshops.
Interpretable and Controllable Audio-Visual Video Captioning
Yapeng Tian, Chenxiao Guan, Goodman Justin, Marc Moore, and Chenliang Xu
CVPRW'19: CVPR Workshops.
Multisensory interpretability in terms of the audio-visual video captioning task.
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, Chenliang Xu
ECCV'18: European Conference on Computer Vision.
EgoVSR: Towards High-Quality Egocentric Video Super-Resolution
Yichen Chi, Junhao Gu, Jiamiao Zhang, Wenming Yang, Yapeng Tian
TCSVT'24: IEEE Transactions on Circuits and Systems for Video Technology.
STADNet: Spatial-Temporal Attention-Guided Dual-Path Network for cardiac cine MRI super-resolution
Jun Lyu, Shuo Wang, Yapeng Tian, Jing Zou, Shunjie Dong, Chengyan Wang, Angelica I Aviles-Rivero, Jing Qin
MIA'24: Medical Image Analysis
Structured Sparsity Learning for Efficient Video Super-Resolution
Bin Xia, Jingwen He, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Luc Van Gool
CVPR'23: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Stdan: deformable attention network for space-time video super-resolution
Hai Wang, Xiaoyu Xiang, Yapeng Tian, Wenming Yang, Qingmin Liao
TNNLS'23: IEEE Transactions on Neural Networks and Learning Systems.
Learning Spatio-Temporal Downsampling for Effective Video Upscaling
Xiaoyu Xiang, Yapeng Tian, Vijay Rengarajan, Lucas Young, Bo Zhu, Rakesh Ranjan
ECCV'22: European Conference on Computer Vision.
Video Matting via Consistency-Regularized Graph Neural Networks
Tiantian Wang, Sifei Liu, Yapeng Tian, Kai Li, and Ming-Hsuan Yang
ICCV'21: IEEE/CVF International Conference on Computer Vision.
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution
Xiaoyu Xiang‡, Yapeng Tian‡, Yulun Zhang, Yun Fu, Jan Allebach, and Chenliang Xu
CVPR'20: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
TDAN: Temporally Deformable Alignment Network for Video Super-Resolution
Yapeng Tian, Yulun Zhang, Yun Fu, and Chenliang Xu
CVPR'20: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
This is the first work that uses deformable alignment to address video restoration.
DiffIR: Efficient Diffusion Model for Image Restoration
Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Luc Van Gool
ICCV'23: IEEE/CVF International Conference on Computer Vision.
Dual Arbitrary Scale Super-Resolution for Multi-Contrast MRI
Jiamiao Zhang, Yichen Chi, Jun Lyu, Wenming Yang, Yapeng Tian
MICCAI'23: Medical Image Computing and Computer-Assisted Intervention.
Meta-Learning based Degradation Representation for Blind Super-Resolution
Bin Xia, Yapeng Tian, Yulun Zhang, Yucheng Hang, Wenming Yang, Qingmin Liao
TIP'23: IEEE Transactions on Image Processing.
Knowledge Distillation based Degradation Estimation for Blind Super-Resolution
Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc Van Gool
ICLR'23: International Conference on Learning Representations.
Basic Binary Convolution Unit for Binarized Image Restoration Network
Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc Van Gool
ICLR'23: International Conference on Learning Representations.
GDSSR: Toward Real-World Ultra-High-Resolution Image Super-Resolution
Yichen Chi, Wenming Yang, Yapeng Tian
SPL'23: IEEE Signal Processing Letters.
DuDoCAF: Dual-Domain Cross-Attention Fusion with Recurrent Transformer for Fast Multi-contrast MR Imaging
Jun Lyu, Bin Sui, Chengyan Wang, Yapeng Tian, Qi Dou, and Jing Qin
MICCAI'22: Medical Image Computing and Computer Assisted Intervention.
Transformer-empowered Multi-contrast MRI Super-Resolution
Guangyuan Li, Jun Lv, Yapeng Tian, Qi Dou, Chengyan Wang, Chenliang Xu, Jing Qin
CVPR'22: IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Coarse-to-Fine Embedded PatchMatch and Multi-Scale Dynamic Aggregation for Reference-based Super-Resolution
Bin Xia, Yapeng Tian, Yucheng Hang, Wenming Yang, Qingmin Liao, Jie Zhou
AAAI'22: The AAAI Conference on Artificial Intelligence.
Efficient Non-Local Contrastive Attention for Image Super-Resolution
Bin Xia‡, Yucheng Hang‡, Yapeng Tian, Wenming Yang, Qingmin Liao, Jie Zhou
AAAI'22: The AAAI Conference on Artificial Intelligence.
Residual Dense Network for Image Super-Resolution
Yulun Zhang, Yapeng Tian, Yu Kong , Bineng Zhong, Yun Fu
TPAMI'20: IEEE Transactions on Pattern Analysis and Machine Intelligence.
CFSNet: Toward a Controllable Feature Space for Image Restoration
Wei Wang‡, Ruiming Guo‡, Yapeng Tian, and Wenming Yang
ICCV'19: IEEE/CVF International Conference on Computer Vision.
LCSCNet: Linear Compressing Based Skip-Connecting Network for ISR
Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue, Qingmin Liao
TIP'19: IEEE Trans. Image Processing.
Deep Learning for Single Image Super-Resolution: A Brief Review
Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, JingHao Xue, Qingmin Liao
TMM'19: IEEE Trans. Multimedia.
Residual Dense Network for Image Super-Resolution
Yulun Zhang, Yapeng Tian, Yu Kong , Bineng Zhong, Yun Fu
CVPR'18 Spotlight: IEEE/CVF Conf. on Computer Vision and Pattern Recognition.
NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results
Timofte et al.
CVPRW'17: CVPR Workshops.
Consistent Coding Scheme for Single-Image Super-Resolution
Wenming Yang, Yapeng Tian, Fei Zhou, Qingmin Liao, Hai Chen, Chenglin Zheng
TMM'16: EEE Trans. Multimedia. (First student author)
Anchored Neighborhood Regression based SISR from Self-examples
Yapeng Tian, Fei Zhou, Wenming Yang, Xuesen Shang, Qingmin Liao
ICIP'16: IEEE International Conference on Image Processing.
SISR Using Clustering-Based Global Regression and Propagation Filtering
Wenming Yang, Yapeng Tian, Fei Zhou, ..., Qingmin Liao
ACPR'15 Oral: Asian Conference on Pattern Recognition. (First student author)
Organizer:
Senior Program Committee or Area Chair:
Session Chair:
Conference Program Committee/Reviewer:
Journal Reviewer:
Talks and Seminars:
KTH Dive-Deep Seminar, Dec. 2021
RIT PhD Colloquium Series, Oct. 2021
UIST Belonging & Inclusion Best Paper, 2024
Amazon Research Award, 2024
Undergraduate Research Apprenticeship Program (URAP) award, 2023 and 2024
Cisco Faculty Research Award, 2023
AAAI New Faculty Highlights, 2023
CVPR Doctoral Consortium, 2022
Top 10% of High-Scoring Reviewers for NeurIPS, 2020
Outstanding Graduate of Tsinghua University, 2017
Outstanding Master Thesis Award, Tsinghua University, 2017
National Scholarship, Tsinghua University, 2016
Full CV in PDF.
This website was built with jekyll based on a template from Martin Saveski.