English

中国农机化学报

中国农机化学报 ›› 2025, Vol. 46 ›› Issue (6): 162-168.DOI: 10.13733/j.jcam.issn.2095-5553.2025.06.024

• 设施农业与植保机械工程 • 上一篇    下一篇

 基于多模态分层注意力网络的玉米病害叶片识别

商炎亮1,吴凯2,周文3   

  1. (1. 许昌电气职业学院,河南许昌, 461000; 2. 郑州大学信息工程学院,郑州市,450001; 
    3. 中国科学院计算技术研究所计算机体系结构国家重点实验室,北京市,100190)
  • 出版日期:2025-06-15 发布日期:2025-05-22
  • 基金资助:
    国家自然科学基金(61972362)

Leaf recognition of corn disease based on multimodal hierarchical attention network

Shang Yanliang1, Wu Kai2, Zhou Wen3   

  1. (1. Xuchang Electrical Vocational College, Xuchang, 461000, China; 2. School of Information and Engineering, Zhengzhou University, Zhengzhou, 450001, China; 3. State Key Laboratory of Computer Architecture, Institute of Computing Technology of Chinese Academy of Sciences, Beijing, 100190, China)
  • Online:2025-06-15 Published:2025-05-22

摘要:

玉米叶片病害会影响叶片光合作用,严重影响玉米籽粒的灌浆,及时准确地检测玉米病害叶片有助于提高玉米的产量和质量,为此提出一种基于多模态分层注意力网络的小样本玉米病害叶片识别方法。首先,利用视觉特征提取网络VGG16将输入的玉米叶片映射到视觉语义空间,并逐层计算支持分支和查询分支间的语义关联。然后,利用文本转换器将玉米叶片文本标签映射到文本语义空间,并利用模态交叉注意力建立视觉与文本之间的上下文语义关联,尽可能地聚焦病害区域。最后,利用掩码平均池化技术生成指导未知玉米叶片病害的泛化原型集。在自建和开源的玉米病害叶片数据集上进行测试。结果表明:所提出模型在自建数据集上可以实现96.08%的识别精度,在开源的Plant Village数据集上可以实现98.11%的识别精度。

关键词: 玉米病害, 叶片识别, 视觉语义, 文本语义, 多模态分层注意力

Abstract:

 Corn leaf diseases can affect the photosynthesis of the corn leaves, and seriously affect the filling of corn kernels. Therefore, timely and accurate detection and identification of corn diseased leaves is helpful to improve the yield and quality of corn. To address this problem, a recognition method of corn disease leaf in small sample based on multi‑modal hierarchical attention network is proposed. Firstly, a visual feature extraction network VGG16 is used to map the input corn leaf into the visual semantic space and the semantic correlation between the supporting branches and query branches is calculated layer by layer. Secondly, a text converter is used to map the corn leaf text labels into the text semantic space, and a modality cross‑attention mechanism is employed to establish the contextual semantic correlation between vision and text, focusing on the disease area as much as possible. Finally, a masked average pooling technique is used to generate a generalized prototype set that guides the identification of unknown corn diseased leaves. Through testing on self‑built and open‑source corn disease leaf datasets, the proposed model achieves an recognition accuracy of 96.08% on the self‑built dataset and 98.11% on the open source Plant Village dataset.

Key words: corn disease, leaf recognition, visual semantics, text semantics, multimodal hierarchical attention

中图分类号: