OpenAlex Citation Counts

OpenAlex Citations Logo

OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open access mode, named after the Library of Alexandria. It's citation coverage is excellent and I hope you will find utility in this listing of citing articles!

If you click the article title, you'll navigate to the article, as listed in CrossRef. If you click the Open Access links, you'll navigate to the "best Open Access location". Clicking the citation count will open this listing for that article. Lastly at the bottom of the page, you'll find basic pagination options.

Requested Article:

VLP: A Survey on Vision-language Pre-training
Feilong Chen, Duzhen Zhang, Minglun Han, et al.
Deleted Journal (2023) Vol. 20, Iss. 1, pp. 38-56
Open Access | Times Cited: 131

Showing 1-25 of 131 citing articles:

Multimodal Learning With Transformers: A Survey
Peng Xu, Xiatian Zhu, David A. Clifton
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023) Vol. 45, Iss. 10, pp. 12113-12132
Open Access | Times Cited: 359

Vision-Language Pre-Training with Triple Contrastive Learning
Jinyu Yang, Jiali Duan, Son N. Tran, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15650-15659
Open Access | Times Cited: 184

Vision-Language Models for Vision Tasks: A Survey
J Zhang, Jiaxing Huang, Sheng Jin, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence (2024) Vol. 46, Iss. 8, pp. 5625-5644
Open Access | Times Cited: 132

Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey
Xiao Wang, Guangyao Chen, Guangwu Qian, et al.
Deleted Journal (2023) Vol. 20, Iss. 4, pp. 447-482
Open Access | Times Cited: 96

Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai, Basil Mustafa, А. И. Колесников, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Open Access | Times Cited: 90

CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training
Tianyu Huang, Bowen Dong, Yunhan Yang, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 22100-22110
Open Access | Times Cited: 57

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Peng Wu, Xuerong Zhou, Guansong Pang, et al.
Proceedings of the AAAI Conference on Artificial Intelligence (2024) Vol. 38, Iss. 6, pp. 6074-6082
Open Access | Times Cited: 26

Vision-language models for medical report generation and visual question answering: a review
Iryna Hartsock, Ghulam Rasool
Frontiers in Artificial Intelligence (2024) Vol. 7
Open Access | Times Cited: 21

BB-GeoGPT: A framework for learning a large language model for geographic information science
Yifan Zhang, Zhiyun Wang, Zhengting He, et al.
Information Processing & Management (2024) Vol. 61, Iss. 5, pp. 103808-103808
Closed Access | Times Cited: 20

Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong, Wei Ji, Junbin Xiao, et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2022), pp. 6439-6455
Open Access | Times Cited: 44

Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
Jishnu Mukhoti, Tsung‐Yu Lin, Omid Poursaeed, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 19413-19423
Open Access | Times Cited: 35

Two Birds With One Stone: Knowledge-Embedded Temporal Convolutional Transformer for Depression Detection and Emotion Recognition
Wenbo Zheng, Lan Yan, Fei‐Yue Wang
IEEE Transactions on Affective Computing (2023) Vol. 14, Iss. 4, pp. 2595-2613
Closed Access | Times Cited: 25

Federated Incremental Semantic Segmentation
Jiahua Dong, Duzhen Zhang, Cong Yang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 3934-3943
Open Access | Times Cited: 25

Bidirectional generation of structure and properties through a single molecular foundation model
Jinho Chang, Jong Chul Ye
Nature Communications (2024) Vol. 15, Iss. 1
Open Access | Times Cited: 17

FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein, David Bensaïd, Shaked Brody, et al.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2024), pp. 5677-5688
Open Access | Times Cited: 17

From image to language: A critical analysis of Visual Question Answering (VQA) approaches, challenges, and opportunities
Md Farhan Ishmam, Md Sakib Hossain Shovon, M. F. Mridha, et al.
Information Fusion (2024) Vol. 106, pp. 102270-102270
Open Access | Times Cited: 14

Review of multimodal machine learning approaches in healthcare
Felix H. Krones, Umar Marikkar, Guy Parsons, et al.
Information Fusion (2024), pp. 102690-102690
Open Access | Times Cited: 14

UniPAD: A Universal Pre-Training Paradigm for Autonomous Driving
Honghui Yang, Sha Zhang, Di Huang, et al.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024), pp. 15238-15250
Closed Access | Times Cited: 9

Real-time localization and navigation method for autonomous vehicles based on multi-modal data fusion by integrating memory transformer and DDQN
Li Zha, Gong Chen, Kunfeng Lv
Image and Vision Computing (2025), pp. 105484-105484
Closed Access | Times Cited: 1

Parameter-Efficient Transfer Learning for Remote Sensing Image–Text Retrieval
Yuan Yuan, Yang Zhan, Zhitong Xiong
IEEE Transactions on Geoscience and Remote Sensing (2023) Vol. 61, pp. 1-14
Open Access | Times Cited: 20

Transferable Multimodal Attack on Vision-Language Pre-training Models
Haodi Wang, Kai Dong, Zhilei Zhu, et al.
2022 IEEE Symposium on Security and Privacy (SP) (2024) Vol. 34, pp. 1722-1740
Closed Access | Times Cited: 7

Scaling-up medical vision-and-language representation learning with federated learning
Siyu Lu, Zheng Liu, Tianlin Liu, et al.
Engineering Applications of Artificial Intelligence (2023) Vol. 126, pp. 107037-107037
Closed Access | Times Cited: 16

RNA trafficking and subcellular localization—a review of mechanisms, experimental and predictive methodologies
Jun Wang, Marc Horlacher, Lixin Cheng, et al.
Briefings in Bioinformatics (2023) Vol. 24, Iss. 5
Open Access | Times Cited: 15

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
Kan Wu, Houwen Peng, Zhenghong Zhou, et al.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 21913-21923
Open Access | Times Cited: 15

Page 1 - Next Page

Scroll to top