English

中国农机化学报

中国农机化学报 ›› 2024, Vol. 45 ›› Issue (7): 180-187.DOI: 10.13733/j.jcam.issn.2095-5553.2024.07.027

• 农业信息化工程 • 上一篇    下一篇

基于多模态特征对齐的作物病害叶片检测

周一帆1, 2,刘东洋3,周宇平4   

  1. 1. 驻马店职业技术学院信息工程学院,河南驻马店,463000; 2. 驻马店职业技术学院,
    河南省乡村智慧农业工程研究中心,河南驻马店,463000; 3. 中国农业大学信息与电气工程学院,
    北京市,100083; 4. 山东农业大学作物生物学国家重点试验室,山东泰安,271018
  • 出版日期:2024-07-15 发布日期:2024-06-24
  • 基金资助:
    国家自然科学基金(32072357);河南省高等学校青年骨干教师资助计划(2015GGJS-300)

Detection of crop disease leaf based on multimodal feature alignment 

Zhou Yifan1, 2, Liu Dongyang3, Zhou Yuping4   

  1. 1. School of Information Engineering, Zhumadian Vocational and Technical College, Zhumadian, 463000, China; 
    2. Henan Rural Smart Agriculture Engineering Research Center, Zhumadian Vocational and Technical College, Zhumadian, 
    463000, China; 3. College of Information and Electrical Engineering, China Agricultural University, Beijing, 
    100083, China; 4. State Key Laboratory of Crop Biology, Shandong Agricultural University, Taian, 271018, China
  • Online:2024-07-15 Published:2024-06-24

摘要: 针对现有农作物病害叶片检测方法利用图像特征定位叶片病害区域精度不高的问题,提出一种基于多模态特征对齐的作物病害叶片检测新方法。在训练阶段,利用视觉编码器和文本编码器将农作物叶片集中的图片和文本进行编码,并根据视觉编码特征定位给定图片中的病害区域,利用视觉和文本编码融合特征实现病害区域病害类型的细粒度分类。在推理阶段,利用预训练的病害区域定位模块定位给定测试图片中的病害区域,并将其提取的病害区域作为预训练分类模型的输入;通过计算预测文本值与文本集中原始标签之间的相似度值,快速给出病害区域的细粒度分类结果。在多个开源的农作物病害数据集上进行测试,所提出方法在马铃薯、番茄、苹果和草莓四种类型的病害叶片数据集上精准率分别为0.957 4、0.961 1、0.958 0和0.950 2,综合性能更优,具有较好实用价值。

关键词: 病害叶片检测, 多模态特征, 视觉编码特征, 文本编码特征, 细粒度分类

Abstract: Aiming at the problem that the existing methods of crop disease leaf detection were not accurate enough to locate the leaf disease region by using image features,  a new method of crop disease leaf detection based on multimodal feature alignment was proposed. During the training phase, image and text from a collection of crop leaves were first encoded using visual and text encoders. The diseased areas in a given image were located according to the visual encoding features, and the integration of visual and text encoding features was used to achieve finegrained classification of the type of disease in the diseased area. In the inference phase, the pretrained disease area localization module was used to locate the diseased areas in a given test image, and the extracted diseased areas were used as input for a pretrained classification model. Finally, by calculating the similarity between the predicted text values and the original labels in the text set, a rapid finegrained classification result for the diseased area was obtained. Tests on several opensource crop disease datasets show that the proposed method can achieve high precision rates of 0.957 4, 0.961 1, 0.958 0, and 0.950 2 on potato, tomato, apple, and strawberry datasets, respectively. It has better comprehensive perfor mance and good paratical application value.

Key words: disease leaf detection, multimodal feature, visual encoding features, text encoding features, finegrained classification

中图分类号: