| چکیده انگلیسی مقاله |
The rapid increase in the volume, diversity, and complexity of visual content in the digital world has made the need for designing and implementing visual content search and retrieval systems highly evident. Currently, we are facing a massive scale of visual data on the web, for which the conventional approaches based on manual and human-generated metadata are not sufficient to handle the diversity and sheer volume. The enormous volume of data generated on the web, without a high-accuracy and high-speed solution for understanding and retrieving it, will join the digital archives forever and never be found again. Recently, there have been significant efforts for retrieving these images, particularly in the fields of Content-Based Image Retrieval (CBIR) and Semantic Image Retrieval (SIR). Content-based and semantic image retrieval systems have the capability to search and retrieve images based on their internal content and high-level human-understandable semantics, rather than just the metadata that may be associated with them. This paper provides a comprehensive review of the latest advancements in the field of content-based image retrieval in recent years. It aims to critically discuss the strengths and weaknesses of each research area in content-based retrieval, and provide an overall framework of this process and the progress made in areas such as image preprocessing, feature extraction and embedding, machine learning, benchmark datasets, similarity matching, and performance evaluation. Finally, the paper presents novel research approaches, challenges, and suggestions for better advancing research in this field. The sections of the paper are organized as follows: After the introduction, Section 2 describes the components of a CBIR system framework, and with a cursory look at classical and traditional methods, it will delve into the workings of modern approaches and their associated challenges. In Section 3, we will provide an overview of the concept of "relevance feedback" and explain the need for this method to enhance the retrieval performance in CBIR systems, followed by an introduction to the prominent solutions in this domain. Finally, in Section 4, we will present a review of the image datasets commonly used in the field of content-based image retrieval, along with a discussion of their characteristics. IGiven the recent advancements in the field of computer vision and image processing, especially in the area of "image-text relationship" and how to integrate the two to improve retrieval performance, the focus of a large part of this study has been on the solutions in this area and the performance of the prominent methods. The current main research in this field is monopolized by large companies and organizations with access to vast financial resources, which has slowed down the progress of research and academic work in this field. These companies, with access to unimaginable data and financial resources, have trained well-known and sometimes unknown models on a very large scale (billions of images and texts), and after the training is complete, they have placed the final model in various web services without publishing the details of the research conducted. The important point is that the scale law applies in this field, and any entity that has more access to computational and storage resources will be able to train better and more accurate models, which has made it less possible for small research units and universities to enter this field and wait for the publication of research by the aforementioned organizations and companies. There is a dire need to introduce effective solutions in this field that require limited resources and are capable of achieving high accuracy and competitiveness with the massive models, with a fraction of the budget required to train them. This has happened in the field of large language models, and after two years, multiple research groups have been able to achieve the accuracy of the Chat-GPT4 language model from OpenAI and with the ability to run on home devices, and it is necessary for research in this field to shift from focusing on achieving accuracy with greater scale to focusing on achieving accuracy with lower cost, otherwise this field will remain in the monopoly of companies focused on greater profits |