
OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!
If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.
Requested Article:
VatLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiushi Zhu, Long Zhou, Ziqiang Zhang, et al.
IEEE Transactions on Multimedia (2023) Vol. 26, pp. 1055-1064
Open Access | Times Cited: 15
Qiushi Zhu, Long Zhou, Ziqiang Zhang, et al.
IEEE Transactions on Multimedia (2023) Vol. 26, pp. 1055-1064
Open Access | Times Cited: 15
Showing 15 citing articles:
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang, Jianbo Ma, Santiago Pascual, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 14, pp. 15492-15501
Open Access | Times Cited: 8
Heng Wang, Jianbo Ma, Santiago Pascual, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 14, pp. 15492-15501
Open Access | Times Cited: 8
Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, et al.
2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2023)
Open Access | Times Cited: 12
Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, et al.
2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2023)
Open Access | Times Cited: 12
Expression Prompt Collaboration Transformer for universal referring video object segmentation
Jiajun Chen, Jiacheng Lin, Guojin Zhong, et al.
Knowledge-Based Systems (2025), pp. 113006-113006
Closed Access
Jiajun Chen, Jiacheng Lin, Guojin Zhong, et al.
Knowledge-Based Systems (2025), pp. 113006-113006
Closed Access
Target speaker lipreading by audio-visual self-distillation pretraining and speaker adaptation
Jing-Xuan Zhang, Tingzhi Mao, Longjiang Guo, et al.
Expert Systems with Applications (2025) Vol. 272, pp. 126741-126741
Closed Access
Jing-Xuan Zhang, Tingzhi Mao, Longjiang Guo, et al.
Expert Systems with Applications (2025) Vol. 272, pp. 126741-126741
Closed Access
Audio-visual representation learning via knowledge distillation from speech foundation models
Jing-Xuan Zhang, Genshun Wan, Jianqing Gao, et al.
Pattern Recognition (2025), pp. 111432-111432
Closed Access
Jing-Xuan Zhang, Genshun Wan, Jianqing Gao, et al.
Pattern Recognition (2025), pp. 111432-111432
Closed Access
BRAVEn: Improving Self-supervised pre-training for Visual and Auditory Speech Recognition
Alexandros Haliassos, Andreas Zinonos, Rodrigo Mira, et al.
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2024), pp. 11431-11435
Open Access | Times Cited: 3
Alexandros Haliassos, Andreas Zinonos, Rodrigo Mira, et al.
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2024), pp. 11431-11435
Open Access | Times Cited: 3
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi, Krishna C. Puvvada, Jagadeesh Balam, et al.
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2024), pp. 10211-10215
Open Access | Times Cited: 3
Maxime Burchi, Krishna C. Puvvada, Jagadeesh Balam, et al.
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2024), pp. 10211-10215
Open Access | Times Cited: 3
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Qiushi Zhu, Jie Zhang, 裕二 池谷, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 17, pp. 19768-19776
Open Access | Times Cited: 1
Qiushi Zhu, Jie Zhang, 裕二 池谷, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 17, pp. 19768-19776
Open Access | Times Cited: 1
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Jeongsoo Choi, Se Jin Park, Minsu Kim, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 22, pp. 27315-27327
Closed Access | Times Cited: 1
Jeongsoo Choi, Se Jin Park, Minsu Kim, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 22, pp. 27315-27327
Closed Access | Times Cited: 1
FedSea: Federated Learning via Selective Feature Alignment for Non-IID Multimodal Data
Min Tan, Yinfu Feng, Lingqiang Chu, et al.
IEEE Transactions on Multimedia (2023) Vol. 26, pp. 5807-5822
Closed Access | Times Cited: 2
Min Tan, Yinfu Feng, Lingqiang Chu, et al.
IEEE Transactions on Multimedia (2023) Vol. 26, pp. 5807-5822
Closed Access | Times Cited: 2
Do VSR Models Generalize Beyond LRS3?
Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Eustache Le Bihan, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 6621-6630
Open Access
Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Eustache Le Bihan, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 6621-6630
Open Access
Efficient audio-visual information fusion using encoding pace synchronization for Audio-Visual Speech Separation
Xinmeng Xu, Weiping Tu, Yuhong Yang
Information Fusion (2024), pp. 102749-102749
Closed Access
Xinmeng Xu, Weiping Tu, Yuhong Yang
Information Fusion (2024), pp. 102749-102749
Closed Access
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim, Jeong Hun Yeo, Se Jin Park, et al.
(2024), pp. 1311-1320
Closed Access
Minsu Kim, Jeong Hun Yeo, Se Jin Park, et al.
(2024), pp. 1311-1320
Closed Access
Multimodal large model pretraining, adaptation and efficiency optimization
Lixia Ji, Shijie Xiao, J. Feng, et al.
Neurocomputing (2024) Vol. 619, pp. 129138-129138
Closed Access
Lixia Ji, Shijie Xiao, J. Feng, et al.
Neurocomputing (2024) Vol. 619, pp. 129138-129138
Closed Access
Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping
Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Haithem Boussaid, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 13744-13755
Open Access | Times Cited: 1
Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Haithem Boussaid, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 13744-13755
Open Access | Times Cited: 1