English

中国农机化学报

中国农机化学报 ›› 2024, Vol. 45 ›› Issue (1): 274-284.DOI: 10.13733/j.jcam.issn.2095-5553.2024.01.038

• 农业智能化研究 • 上一篇    下一篇

基于改进YOLOv5l的设施番茄3D信息检测方法

林森1, 2,许童羽1,葛禹豪2,马璟3,孙添龙2,赵春江2   

  • 出版日期:2024-01-15 发布日期:2024-02-06
  • 基金资助:
    北京市科技计划课题(Z211100004621006);“科技创新2030”项目子课题(2021ZD0113602)

3D information detection method for facility greenhouse tomato based on improved YOLOv5l

Lin Sen1, 2, Xu Tongyu1, Ge Yuhao2, Ma Jing3, Sun Tianlong2, Zhao Chunjiang2   

  • Online:2024-01-15 Published:2024-02-06

摘要: 针对温室环境中由于遮挡和光线复杂等原因造成的果实识别和定位不准确这一问题,将深度学习目标检测算法与Intel RealSense D435i深度相机相结合,提出一种获取番茄在三维空间中协同位置的方法,用于温室中采摘机器人执行番茄定位和采摘任务。基于YOLOv5网络,使用GhostConvolution替换原始网络中的CSP结构,并采用BiFPN的多尺度连接方法,最大限度地利用不同特征层提取番茄特征信息,以提高边界框回归的准确性。比较不同的注意机制,并选择CBAM注意机制插入到模型的特征提取网络中。该模型通过RGBD相机获取检测到的番茄的中心点,并计算其在相机坐标系中的空间坐标信息。为最大限度地减少复杂温室环境对目标识别以及最终采摘效果的影响,筛选所有超过1.5 m的视频流,以便视觉算法只专注于识别和检测1.5 m范围内的目标。试验表明,模型检测红色和绿色番茄的平均精度均值分别为82.4%和82.2%。最后,介绍深度相机与目标检测网络相结合以检测番茄物体深度的方法。为番茄采摘机器人视觉系统提供理论支持。

关键词: 番茄, 深度学习, 采摘机器人, 3D目标检测, YOLOv5

Abstract: To solve the problem of inaccurate fruit recognition and positioning caused by obstruction and complex light conditions in greenhouse environments. This study combines the deep learning object detection algorithm with the Intel RealSense D435i depth camera. And we propose a method to obtain the coordinated position of the tomato in threedimensional space, which is used for the picking robot in the greenhouse to perform the tomato positioning and picking task. Based on the YOLOv5 network, we use GhostConvolution to replace the CSP structure in the original network. And we adopted the multiscale connection method of BiFPN to maximize the use of the tomato feature information extracted by different feature layers to improve the accuracy of bounding box regression. This article compared different attention mechanisms and selected the CBAM Attention mechanism to insert into the models feature extraction network. Then, the model obtains the center point of the tomato detected in the twodimensional video stream data through the RGBD camera and calculates the tomatos spatial coordinate information in the camera coordinate system. To minimize the impact of the complex greenhouse environment on target recognition and the final picking effect, we filter all video streams over 1.5 meters so that the vision algorithm only focuses on the recognition and detection of targets within a range of 1.5 meters. The mean average precision of red and green tomatoes was 82.4% and 82.2%. Finally, this article introduces a method for combining a depth camera with an object detection network to detect the depth of tomato objects. It provide theoretical support for the tomato picking robot vision system.

Key words: tomato, deep learning, picking robot, 3Dobject detection, YOLOv5

中图分类号: