Multi-evidence and multi-modal fusion network for ground-based cloud recognition

Shuang Liu, Mei Li, Zhong Zhang, Baihua Xiao, Tariq S. Durrani

Research output: Contribution to journalArticlepeer-review

24 Citations (Scopus)
44 Downloads (Pure)


In recent times, deep neural networks have drawn much attention in ground-based cloud recognition. Yet such kind of approaches simply center upon learning global features from visual information, which causes incomplete representations for ground-based clouds. In this paper, we propose a novel method named multi-evidence and multi-modal fusion network (MMFN) for ground-based cloud recognition, which could learn extended cloud information by fusing heterogeneous features in a unified framework. Namely, MMFN exploits multiple pieces of evidence, i.e., global and local visual features, from ground-based cloud images using the main network and the attentive network. In the attentive network, local visual features are extracted from attentive maps which are obtained by refining salient patterns from convolutional activation maps. Meanwhile, the multi-modal network in MMFN learns multi-modal features for ground-based cloud. To fully fuse the multi-modal and multi-evidence visual features, we design two fusion layers in MMFN to incorporate multi-modal features with global and local visual features, respectively. Furthermore, we release the first multi-modal ground-based cloud dataset named MGCD which not only contains the ground-based cloud images but also contains the multi-modal information corresponding to each cloud image. The MMFN is evaluated on MGCD and achieves a classification accuracy of 88.63% comparative to the state-of-the-art methods, which validates its effectiveness for ground-based cloud recognition.
Original languageEnglish
Article number464
Number of pages20
JournalRemote Sensing
Issue number3
Publication statusPublished - 2 Feb 2020


  • ground-based cloud recognition
  • convolution neural network (CNN)
  • feature fusion


Dive into the research topics of 'Multi-evidence and multi-modal fusion network for ground-based cloud recognition'. Together they form a unique fingerprint.

Cite this