摘要: |
细粒度图像分类因其类别间细微差异和背景干扰等问题而备受关注.为此,本文提出了一种随机引导多尺度融合网络(RG-MSF-Net),以充分挖掘细粒度图像的多层次特征信息,并提升分类性能.RG-MSF-Net由随机引导特征提取(RGFE)模块、多尺度特征融合(MSFF)模块以及动态显著性分类模块(DSCM)组成.其中,RGFE模块通过随机重组图像块和引入空间注意力机制,增强模型的泛化能力和对局部及全局特征的关注;MSFF模块结合Swin-Transformer和特征金字塔网络(FPN),实现多尺度特征的高效融合;DSCM则通过动态掩码和加权分类机制,提升模型对关键特征的关注能力.实验结果表明,RG-MSF-Net在CUB-200-2011和NA-Birds两个细粒度鸟类分类数据集上分别达到91.4%和90.5%的Top-1准确率,与多种先进方法相比,具有竞争性. |
关键词: 细粒度图像分类 随机引导特征提取(RGFE) 多尺度特征融合(MSFF) 显著性分类 |
DOI:10.20192/j.cnki.JSHNU(NS).2025.02.004 |
分类号:TN391.7 |
基金项目:国家自然科学基金(62477013);上海市自然科学基金(24ZR1456100);上海市科委项目(24DZ2305900);上海市促进产业高质量发展专项资金(2211106) |
|
A fine-grained bird image classification method based on random block guidance and multi-level feature fusion |
LIN Xiao, WANG Zhengkai, LI Yan
|
College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China
|
Abstract: |
Much attention was attracted for fine-grained image classification due to challenges such as subtle differences between categories and background interference. A random guided multi-scale fusion network (RG-MSF-Net) was proposed to fully exploit the multi-level feature information of fine-grained images and improve classification performance. RG-MSF-Net was consisted of a random guided feature extraction (RGFE) module, a multi-scale feature fusion (MSFF) module, and a dynamic saliency classification module (DSCM). The generalisation ability of the model was enhanced by RGFE module and local and global features were paid attention to by randomly reorganized image blocks and a spatial attention mechanism was introduced. Swin-transformer and feature pyramid network (FPN) were combined by MSFF module to achieve efficient fusion for multi-scale features. The ability to focus on key features was improved by DSCM through dynamic masks and weighted classification mechanisms. Experimental results showed that RG-MSF-Net achieved Top-1 accuracy of 91.4% and 90.5% on the fine-grained bird classification datasets CUB-200-2011 and NA-Birds, respectively, which was competitive compared to a variety of advanced methods. |
Key words: fine-grained image classification random guided feature extraction (RGFE) multi-scale feature fusion (MSFF) saliency classification |