
OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!
If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.
Requested Article:
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang, Yuwei Fang, Chenguang Zhu, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2023) Vol. 37, Iss. 9, pp. 10880-10890
Open Access | Times Cited: 18
Ziyi Yang, Yuwei Fang, Chenguang Zhu, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2023) Vol. 37, Iss. 9, pp. 10880-10890
Open Access | Times Cited: 18
Showing 18 citing articles:
Unifying Vision, Text, and Layout for Universal Document Processing
Zineng Tang, Ziyi Yang, Guoxin Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 19254-19264
Open Access | Times Cited: 51
Zineng Tang, Ziyi Yang, Guoxin Wang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 19254-19264
Open Access | Times Cited: 51
VatLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiushi Zhu, Long Zhou, Ziqiang Zhang, et al.
IEEE Transactions on Multimedia (2023) Vol. 26, pp. 1055-1064
Open Access | Times Cited: 15
Qiushi Zhu, Long Zhou, Ziqiang Zhang, et al.
IEEE Transactions on Multimedia (2023) Vol. 26, pp. 1055-1064
Open Access | Times Cited: 15
Listen as you wish: Fusion of audio and text for cross-modal event detection in smart cities
Haoyu Tang, Yupeng Hu, Yunxiao Wang, et al.
Information Fusion (2024) Vol. 110, pp. 102460-102460
Closed Access | Times Cited: 4
Haoyu Tang, Yupeng Hu, Yunxiao Wang, et al.
Information Fusion (2024) Vol. 110, pp. 102460-102460
Closed Access | Times Cited: 4
MM-Path: Multi-modal, Multi-granularity Path Representation Learning
Ronghui Xu, Hanyin Cheng, Chenjuan Guo, et al.
(2025), pp. 1703-1714
Closed Access
Ronghui Xu, Hanyin Cheng, Chenjuan Guo, et al.
(2025), pp. 1703-1714
Closed Access
A survey on advancements in image–text multimodal models: From general techniques to biomedical implementations
Ruifeng Guo, Jingxuan Wei, Linzhuang Sun, et al.
Computers in Biology and Medicine (2024) Vol. 178, pp. 108709-108709
Closed Access | Times Cited: 2
Ruifeng Guo, Jingxuan Wei, Linzhuang Sun, et al.
Computers in Biology and Medicine (2024) Vol. 178, pp. 108709-108709
Closed Access | Times Cited: 2
Foreground and Text-lines Aware Document Image Rectification
Heng Li, Xiangping Wu, Qingcai Chen, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 19517-19526
Closed Access | Times Cited: 6
Heng Li, Xiangping Wu, Qingcai Chen, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 19517-19526
Closed Access | Times Cited: 6
Topic and Style-aware Transformer for Multimodal Emotion Recognition
Shuwen Qiu, Nitesh Sekhar, Prateek Singhal
Findings of the Association for Computational Linguistics: ACL 2022 (2023), pp. 2074-2082
Open Access | Times Cited: 4
Shuwen Qiu, Nitesh Sekhar, Prateek Singhal
Findings of the Association for Computational Linguistics: ACL 2022 (2023), pp. 2074-2082
Open Access | Times Cited: 4
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Zineng Tang, Ziyi Yang, Mahmoud Khademi, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 34, pp. 27415-27424
Closed Access | Times Cited: 1
Zineng Tang, Ziyi Yang, Mahmoud Khademi, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) Vol. 34, pp. 27415-27424
Closed Access | Times Cited: 1
Real-Time Audio-Visual End-To-End Speech Enhancement
Zirun Zhu, Hemin Yang, Min Tang, et al.
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023), pp. 1-5
Open Access | Times Cited: 3
Zirun Zhu, Hemin Yang, Min Tang, et al.
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023), pp. 1-5
Open Access | Times Cited: 3
Self-supervised Cross-modal Pretraining for Speech Emotion Recognition and Sentiment Analysis
Iek‐Heng Chu, Ziyi Chen, Xinlu Yu, et al.
(2022), pp. 5105-5114
Open Access | Times Cited: 4
Iek‐Heng Chu, Ziyi Chen, Xinlu Yu, et al.
(2022), pp. 5105-5114
Open Access | Times Cited: 4
FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval
Yanzhe Chen, Huasong Zhong, Xiangteng He, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 2, pp. 1228-1236
Open Access
Yanzhe Chen, Huasong Zhong, Xiangteng He, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 2, pp. 1228-1236
Open Access
UVMO: Deep unsupervised visual reconstruction-based multimodal-assisted odometry
Songrui Han, Mingchi Li, Hongying Tang, et al.
Pattern Recognition (2024) Vol. 153, pp. 110573-110573
Closed Access
Songrui Han, Mingchi Li, Hongying Tang, et al.
Pattern Recognition (2024) Vol. 153, pp. 110573-110573
Closed Access
Digital Human Intelligent Interaction System Based on Multimodal Pre-training Mode
Xuliang Yang, Yong Fang, Lili Wang, et al.
Applied Artificial Intelligence (2024) Vol. 38, Iss. 1
Open Access
Xuliang Yang, Yong Fang, Lili Wang, et al.
Applied Artificial Intelligence (2024) Vol. 38, Iss. 1
Open Access
Efficient Low-Dimensional Representation Via Manifold Learning-Based Model for Multimodal Sentiment Analysis
Xingang Wang, Mengyi Wang, Hai Cui, et al.
(2024), pp. 1-7
Closed Access
Xingang Wang, Mengyi Wang, Hai Cui, et al.
(2024), pp. 1-7
Closed Access
MM-Reasoner: A Multi-Modal Knowledge-Aware Framework for Knowledge-Based Visual Question Answering
Mahmoud Khademi, Ziyi Yang, Felipe Vieira Frujeri, et al.
(2023), pp. 6571-6581
Open Access | Times Cited: 1
Mahmoud Khademi, Ziyi Yang, Felipe Vieira Frujeri, et al.
(2023), pp. 6571-6581
Open Access | Times Cited: 1
Inferential Knowledge-Enhanced Integrated Reasoning for Video Question Answering
Jianguo Mao, Wenbin Jiang, Hong Liu, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2023) Vol. 37, Iss. 11, pp. 13380-13388
Open Access
Jianguo Mao, Wenbin Jiang, Hong Liu, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2023) Vol. 37, Iss. 11, pp. 13380-13388
Open Access
MDLDroid: Multimodal Deep Learning Based Android Malware Detection
Narendra Singh, Somanath Tripathy
Lecture notes in computer science (2023), pp. 159-177
Closed Access
Narendra Singh, Somanath Tripathy
Lecture notes in computer science (2023), pp. 159-177
Closed Access
Generate to Understand for Representation in One Pre-training Stage
Changshang Xue, Xiao‐Fang Zhong, Xiaoqing Liu
(2023), pp. 258-267
Closed Access
Changshang Xue, Xiao‐Fang Zhong, Xiaoqing Liu
(2023), pp. 258-267
Closed Access