A review of few-shot image recognition using semantic information

Liyong  Guo; Erzam  Marlisah; Hamidah  Ibrahim; Noridayu  Manshor

doi:10.18488/76.v10i2.3472

Authors

Liyong Guo Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia. https://orcid.org/0009-0002-2465-3914
Erzam Marlisah Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia. https://orcid.org/0000-0002-3763-0191
Hamidah Ibrahim Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia. https://orcid.org/0000-0002-9900-0531
Noridayu Manshor Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Selangor, Malaysia. https://orcid.org/0000-0002-5188-3793

DOI:

https://doi.org/10.18488/76.v10i2.3472

Abstract

In recent years, the utilization of deep learning techniques has been employed in the field of image recognition with the aim of improving performance. However, deep learning demands a substantial amount of labeled data for model training, a process that is both expensive and time-consuming. In order to tackle this particular difficulty, the approach of few-shot learning (FSL) has emerged as a viable alternative. FSL, or Few-Shot Learning, is a computational approach that aims to replicate the cognitive processes observed in humans. By using a small set of examples and experiences, FSL enables the acquisition of new concepts. Research in the field of FSL has investigated many approaches to extracting the highest amount of information from limited data or making use of affordable and easily accessible sources of information. Researchers have been incorporating outside data into FSL techniques more frequently. This paper conducts an in-depth exploration of leveraging semantic information to enhance few-shot learning. By reviewing papers from the last five years in WOS, IEEE, and Science Direct (some papers in arXiv are also used), this study delves into the strategies employed to bridge the gap between visual and semantic information. The review extends to encompass zero-shot learning, which is considered a subcategory of FSL, enriching the analysis. Moreover, this paper identifies the potential of employing semantic information to enhance fine-grained few-shot (FGFS) learning. Techniques such as direct projection and the application of generative adversarial networks (GANs) emerge as promising avenues to accomplish this enhancement.