English

Journal of Chinese Agricultural Mechanization

Journal of Chinese Agricultural Mechanization ›› 2025, Vol. 46 ›› Issue (3): 182-187.DOI: 10.13733/j.jcam.issn.2095-5553.2025.03.027

• Agricultural Informationization Engineering • Previous Articles     Next Articles

Enhanced FoveaBox with multi-granularity feature perception for green apple occlusion detection

Ren Jingjing1, Zhang Xiaoyong1, Jia Weikuan2   

  1. (1. Department of Intelligence and Information Engineering, Taiyuan University, Taiyuan, 030032, China; 
    2. School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China)
  • Online:2025-03-15 Published:2025-03-13

基于多粒度特征感知的FoveaBox绿色苹果抗遮挡检测模型

任晶晶1,张小勇1,贾伟宽2   

  1. (1. 太原学院智能与信息工程系,太原市,030032; 2. 山东师范大学信息科学与工程学院,济南市,250358)
  • 基金资助:
    国家自然科学基金面上项目(62372278);山西省高等学校科技创新项目(2024L386);山东省自然科学基金(ZR2020MF076)

Abstract:

Fruit detection is a crucial sub-task in smart agriculture, as its accuracy significantly impacts the performance of various operational tasks. However, current feature extraction networks, particularly convolutional neural networks, primarily extract features from local receptive fields. This limitation hinders the detection of fruits occluded by branches and leaves, and fruits overlapped, ultimately culminating in suboptimal detection accuracy. To improve the detection precision of occluded targets, in this study, an enhanced FoveaBox target detection model is proposed. First, the Swin Transformer is employed as the backbone network, enabling the extraction of multi-granularity hierarchical features from a global receptive field. This approach overcomes the constraints of traditional convolutional networks, which only extract features from local regions, thereby improving the representational capacity of feature mapping. Next, the Feature Pyramid Network is utilized to aggregate shallow, high-resolution features with high-level semantic information through lateral connections and a topdown structure. This aggregation enhances the model's ability to detect occluded objects. The pyramidal features are then fed into the Fovea head network, which consists of a classification sub-network and a bounding box sub-network for object detection. Finally, the method is iteratively optimized using Focal Loss and the Smooth L1 function until the model converges. Experimental results demonstrate that the proposed occlusion-resistant FoveaBox detection model, its average precision can reach 86.3% under the IoU threshold of 0.5, which is superior to advanced models such as FCOS, TOOD and LAD. It significantly improves the detection accuracy of occluded targets.

Key words: occluded apple detection, multi-granularity feature perception, FoveaBox, Swin Transformer, area similarity calculation

摘要:

目标果实检测精度直接影响果园智能作业的效率,当前以卷积神经网络为代表的特征提取网络仅从局部感受野中提取特征用于目标检测,果实受枝叶遮挡或果实间重叠时存在一定的局限性,导致检测精度偏低。为提升被遮挡目标果实的检测精度,提出抗遮挡的FoveaBox果实检测优化模型。首先,新模型引入Swin Transformer作为骨干网络,通过计算块间的相似度,打破传统卷积仅从局部区域提取特征的限制,从而增强特征映射的表征能力;其次,采用特征金字塔网络,通过横向连接和自顶向下结构聚合浅层高分辨率特征与高层语义信息,输出金字塔型特征映射;然后,将金字塔型特征映射输入Fovea头部网络中,利用分类子网络与边界框子网络进行检测目标;最后,通过焦点损失函数Focal Loss与Smooth L1对模型进行迭代寻优,直至模型收敛。验证表明,优化模型在IoU为0.5阈值下的平均精确度可达86.3%,优于FCOS、TOOD与LAD等先进模型。提出的抗遮挡的FoveaBox可在一定程度上提升被遮挡目标的检测精确度。

关键词: 被遮挡苹果检测, 多粒度特征感知, FoveaBox, Swin Transformer, 区域相似度计算

CLC Number: