Research on motion planning method of vacuumvibration seeding manipulator based on deep reinforcement learning

doi:10.13733/j.jcam.issn.2095-5553.2022.10.022

Abstract

Abstract: It is essential to improve the work efficiency of the vacuumvibration seeding assembly line by dynamically changing the established plan of the manipulator. According to the working characteristics of the existing vacuumvibration seeder, a motion planning method of the vacuumvibration seeding manipulator based on deep reinforcement learning was proposed. A Markov model was established by using the perception and action space. A reward function of process action matching was designed in combining the position and posture of the suction plate and the target action state, which guided the suction plate to move to the upper area of the assembly line and made the suction plate following the seedling tray to row seeds. V-REP was used to rebuild the seeding environment, and the manipulator action set reinforcement learning framework was established. The improved proximal strategy optimization algorithm could accelerate the convergence speed. Simulation experiments indicated that the absolute value of position error of the suction plate on X, Y, and Z axis was less than 2.5, 3.0, and 1.1 mm respectively. The absolute value of deflection angle error on each axis was less than 1.20°, 1.14°, and 1.28°. The maximum distance error was 4.5 mm. The average sowing period was 5.1 s. All the indexes of the method meet the design requirements, which can provide a basis for improving the efficiency of the vacuumvibration seeding assembly line.

Key words: vacuumvibration, manipulator, deep reinforcement learning, reward function

摘要： 动态改变机械臂的既定规划对提高气振盘式播种流水线的工作效率具有重要意义。根据现有气振盘式播种流水线的工作特点，提出一种基于深度强化学习的气振盘式播种机械臂运动规划方法。利用感知行动空间建立马尔可夫模型，结合吸种盘位姿及目标动作状态，设计一种基于过程动作的奖励函数，引导吸种盘携种至流水线上方区域配合育秧盘进行跟随排种。利用V-REP对播种环境参数进行重构，建立机械臂动作集强化学习框架，证明改进近端策略优化算法可有效加快收敛速度。仿真试验表明：吸种盘在X，Y，Z轴上的位置误差绝对值小于2.5，3.0，1.1 mm，各轴上偏转角误差绝对值小于1.20°，1.14°，1.28°，最大距离误差为4.5 mm，平均播种周期约为5.1 s。该方法各项指标均达到设计要求，可为提高气振盘式播种流水线工作效率提供依据。

关键词: 气振盘式, 机械臂, 深度强化学习, 奖励函数

CLC Number:

S223.2

Chen Jin, Zhang Zhiqiao, Liao Caiqi, Tang Xueming.. Research on motion planning method of vacuumvibration seeding manipulator based on deep reinforcement learning[J]. Journal of Chinese Agricultural Mechanization, 2022, 43(10): 149-156.

陈进, 张志巧, 廖彩淇, 唐学明. 基于深度强化学习的气振盘式播种机械臂运动规划方法研究[J]. 中国农机化学报, 2022, 43(10): 149-156.

References

［1］　
韦运余. 气振盘式精密播种流水线及控制系统设计与试验研究［D］. 镇江: 江苏大学, 2020.

Wei Yunyu. Design of control system and experimental study on assembly line for vacuumvibration tray precision seeding ［D］. Zhenjiang: Jiangsu University, 2020.

［2］　
赵毓, 管公顺, 郭继峰, 等. 基于多智能体强化学习的空间机械臂轨迹规划［J］. 航空学报, 2021, 42(1): 259-269.

Zhao Yu, Guan Gongshun,Guo Jifeng, et al. Trajectory planning of space manipulator based on multiagent reinforcement learning ［J］. Acta Aeronautica et Astronautica Sinica, 2021, 42(1): 259-269.

［3］　
Iriondo A, Lazkano E, Susperregi L, et al. Pick and place operations in logistics using a mobile manipulator controlled with deep reinforcement learning ［J］. Applied Sciences, 2019, 9(2): 348.

［4］　
Deng Z, Zhang J. Learning synergiesbased inhand manipulation with reward shaping ［J］. CAAI Transactions on Intelligence Technology, 2020, 5(2): 1-9.

［5］　
Wang Z, Li H, Wu Z, et al. A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in threedimensional continuous space ［J］. International Journal of Advanced Robotic Systems, 2021, 18(1): 1729881421989546.

［6］　
Tang C Y, Liu C H, Chen W K, et al. Implementing action mask in proximal policy optimization (PPO) algorithm ［J］. ICT Express, 2020, 6(3): 200-203.

［7］　
Kim S, Jang I, Kim H, et al. Learning robot manipulation based on modular reward shaping ［C］. 2020 International Conference on Information and Communication Technology Convergence (ICTC). IEEE, 2020: 883-886.

［8］　
Sangiovanni B, Rendiniello A, Incremona G P, et al. Deep reinforcement learning for collision avoidance of robotic manipulators ［C］. 2018 European Control Conference (ECC). IEEE, 2018: 2063-2068.

［9］　
Dewey D. Reinforcement learning and the reward engineering principle ［C］. 2014 AAAI Spring Symposium Series, 2014.

［10］　
龚智强. 气吸振动盘式精密排种装置理论与试验研究［D］. 镇江: 江苏大学, 2013.

Gong Zhiqiang. Theoretical and experimental study on vacuumvibration tray precision seeding device ［D］. Zhenjiang: Jiangsu University, 2014.

［11］　
Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization ［C］. International Conference on Machine Learning. PMLR, 2015: 1889-1897.

［12］　
张振, 黄炎焱, 张永亮, 等. 基于近端策略优化的作战实体博弈对抗算法［J］. 南京理工大学学报, 2021, 45(1): 77-83.

Zhang Zhen, Huang Yanyan, Zhang Yongliang, et al. Battle entity confrontation algorithm based on proximal policy optimization ［J］. Journal of Nanjing University of Science and Technology, 2021, 45(1): 77-83.

［13］　
Bellman R. A Markovian decision process ［J］. Journal of mathematics and mechanics, 1957: 679-684.

［14］　
Zhou D, Jia R, Yao H, et al. Robotic arm motion planning based on residual reinforcement learning ［C］. 2021 13th International Conference on Computer and Automation Engineering (ICCAE). IEEE, 2021: 89-94.

［15］　
祝亢, 黄珍, 王绪明. 基于深度强化学习的智能船舶航迹跟踪控制［J］. 中国舰船研究, 2021, 16(1): 105-113.

Zhu Kang, Huang Zhen, Wang Xuming. Tracking control of intelligent ship based on deep reinforcement learning ［J］. Chinese Journal of Ship Research, 2021, 16(1): 105-113.

［16］　
Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms ［J］. arXiv preprint arXiv: 1707.06347, 2017.

［17］　
Zhang D, Bailey C P. Obstacle avoidance and navigation utilizing reinforcement learning with reward shaping ［C］. Artificial Intelligence and Machine Learning for MultiDomain Operations Applications II. International Society for Optics and Photonics, 2020, 11413: 114131H.

［18］　
杨惟轶, 白辰甲, 蔡超, 等. 深度强化学习中稀疏奖励问题研究综述［J］. 计算机科学, 2020, 47(3): 182-191.

Yang Weiyi, Bai Chenjia, Cai Chao, et al. Survey on sparse reward in deep reinforcement learning ［J］. Computer Science, 2020, 47(3): 182-191.

［19］　
Ng A Y, Harada D, Russell S. Policy invariance under reward transformations: Theory and application to reward shaping ［C］. Proceedings of the Sixteenth International Conference on Machine Learning, 1999, 99: 278-287.

［20］　
任建新, 黄民, 刘相权, 等. 基于Visual Studio与V-REP的货物拣选机器人联合仿真［J］. 重庆理工大学学报(自然科学), 2020, 34 (8): 87-94.

Ren Jianxin, Huang Min, Liu Xiangquan, et al. CoSimulating of cargo picking robot using Visual Studio and V-REP［J］. Journal of Chongqing Institute of Technology, 2020, 34 (8): 87-94.

［21］　
占宏，王剑城. 基于Web与V-REP的机器人远程控制虚拟仿真平台［J］. 计算技术与自动化, 2021, 40(2): 16-20.

Zhan Hong, Wang Jiancheng. Virtual simulation platform for robots remote control based on web and V-REP ［J］. Computing Technology and Automation, 2021, 40(2): 16-20.

[1]	Lu Ding, Wang Weibing, Han Shuai. Cosimulation study and experiment on motion control of seedling picking manipulator [J]. Journal of Chinese Agricultural Mechanization, 2023, 44(2): 8-13.
[2]	Liu Zhongchao, Fan Lingyan, Zhai Tiansong, Zhao Zhiyuan. . Design of an artificial limb picking arm based on STM32 [J]. Journal of Chinese Agricultural Mechanization, 2021, 42(9): 164-169.