English

中国农机化学报

中国农机化学报 ›› 2022, Vol. 43 ›› Issue (10): 149-156.DOI: 10.13733/j.jcam.issn.2095-5553.2022.10.022

• 农业智能化研究 • 上一篇    下一篇

基于深度强化学习的气振盘式播种机械臂运动规划方法研究

陈进,张志巧,廖彩淇,唐学明   

  1. 江苏大学机械工程学院,江苏镇江,212013
  • 出版日期:2022-10-15 发布日期:2022-09-19
  • 基金资助:
    国家自然科学基金(31871528)

Research on motion planning method of vacuumvibration seeding manipulator based on deep reinforcement learning

Chen Jin, Zhang Zhiqiao, Liao Caiqi, Tang Xueming.   

  • Online:2022-10-15 Published:2022-09-19

摘要: 动态改变机械臂的既定规划对提高气振盘式播种流水线的工作效率具有重要意义。根据现有气振盘式播种流水线的工作特点,提出一种基于深度强化学习的气振盘式播种机械臂运动规划方法。利用感知行动空间建立马尔可夫模型,结合吸种盘位姿及目标动作状态,设计一种基于过程动作的奖励函数,引导吸种盘携种至流水线上方区域配合育秧盘进行跟随排种。利用V-REP对播种环境参数进行重构,建立机械臂动作集强化学习框架,证明改进近端策略优化算法可有效加快收敛速度。仿真试验表明:吸种盘在X,Y,Z轴上的位置误差绝对值小于2.5,3.0,1.1 mm,各轴上偏转角误差绝对值小于1.20°,1.14°,1.28°,最大距离误差为4.5 mm,平均播种周期约为5.1 s。该方法各项指标均达到设计要求,可为提高气振盘式播种流水线工作效率提供依据。

关键词: 气振盘式, 机械臂, 深度强化学习, 奖励函数

Abstract: It is essential to improve the work efficiency of the vacuumvibration seeding assembly line by dynamically changing the established plan of the manipulator. According to the working characteristics of the existing vacuumvibration seeder, a motion planning method of the vacuumvibration seeding manipulator based on deep reinforcement learning was proposed. A Markov model was established by using the perception and action space. A reward function of process action matching was designed in combining the position and posture of the suction plate and the target action state, which guided the suction plate to move to the upper area of the assembly line and made the suction plate following the seedling tray to row seeds. V-REP was used to rebuild the seeding environment, and the manipulator action set reinforcement learning framework was established. The improved proximal strategy optimization algorithm could accelerate the convergence speed. Simulation experiments indicated that the absolute value of position error of the suction plate on X, Y, and Z axis was less than 2.5, 3.0, and 1.1 mm respectively. The absolute value of deflection angle error on each axis was less than 1.20°, 1.14°, and 1.28°. The maximum distance error was 4.5 mm. The average sowing period was 5.1 s. All the indexes of the method meet the design requirements, which can provide a basis for improving the efficiency of the vacuumvibration seeding assembly line.

Key words: vacuumvibration, manipulator, deep reinforcement learning, reward function

中图分类号: