Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning

文档序号：1929916 发布日期：2021-12-07 浏览：4次中文

阅读说明：本技术 一种基于深度强化学习的机械臂六自由度实时抓取方法 (Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning ) 是由禹鑫燚徐靖黄睿邹超欧林林陈磊于 2021-08-24 设计创作，主要内容包括：本发明涉及基于深度强化学习的机械臂六自由度实时抓取方法。包括如下步骤：步骤一：通过双目相机采集抓取操作台上物体的图像信息；步骤二：利用YOLOv5剪枝网络模型对图像进行目标检测训练；步骤三：建立强化学习网络模型；步骤四：通过机器人正逆运动学完成机械臂抓取移动；步骤五：进行强化学习模型训练,使得机械臂完成抓取动作；本发明克服现有技术的缺点,提出一种易实现、适用性高的,基于YOLOv5剪枝网络和Policy Gradient强化学习方法的实时物体检测系统,此系统在保证高精度的同时,可以实现快速实时的目标检测并完成抓取动作。(The invention relates to a six-degree-of-freedom real-time grabbing method of a mechanical arm based on deep reinforcement learning. The method comprises the following steps: the method comprises the following steps: acquiring image information of an object on a grabbing operation table through a binocular camera; step two: carrying out target detection training on the image by using a YOLOv5 pruning network model; step three: establishing a reinforcement learning network model; step four: the grabbing movement of the mechanical arm is completed through the forward and backward kinematics of the robot; step five: performing reinforcement learning model training to enable the mechanical arm to complete a grabbing action; the invention overcomes the defects of the prior art, provides a real-time object detection system which is easy to implement and high in applicability and is based on a YOLOv5 pruning network and a Policy Gradient reinforcement learning method, and the system can realize quick real-time target detection and complete grabbing actions while ensuring high precision.)

1. The six-degree-of-freedom real-time grabbing method of the mechanical arm based on the deep reinforcement learning is characterized by comprising the following steps of: the method comprises the following steps:

step 1: acquiring image information of an object on a grabbing operation table through a binocular camera;

step 2: carrying out target detection training on the image by using a YOLOv5 pruning network model;

and step 3: establishing a reinforcement learning network model;

and 4, step 4: the grabbing movement of the mechanical arm is completed through the forward and backward kinematics of the robot;

and 5: performing reinforcement learning model training to enable the mechanical arm to complete a grabbing action;

2. the six-degree-of-freedom real-time grabbing method for the mechanical arm based on the deep reinforcement learning, according to claim 1, is characterized in that: the step 2 comprises the following specific steps:

2.1): in order to reduce the possibility of gradient explosion and gradient disappearance and reduce the influence of the pooling layer on gradient calculation, the step size parameter of the convolutional layer is set to 2 by referring to the Resnet jump layer connection structure of the residual error network, and Batch Normalization (BN) is added to the rest of the convolutional layers. By reference to the CSPNet network structure, the CSP1_ X module is composed of a CBL module, a Res _ unint module, a convolutional layer and a Concate; the CSP2_ X module is composed of a convolutional layer and X Res _ unit modules, concatees. The input layer is composed of convolution layer, Batch Normalization (BN), and Leaky _ Relu activation function. According to the size of the input color image and the size of an anchor frame in a learning data set, the purpose of self-adaptive multi-scale prediction is achieved;

2.2): by using the automatic learning based on training data and the K-meas clustering algorithm, the YOLOv5 can relearn the size of the anchor frame even if the size of the target object in the data set is different from that in the COCO data set, so as to obtain a preset anchor frame suitable for the prediction of the boundary of the object in the custom data set. The prediction formula in the forward inference of the YOLOv5 pruning network model is as follows:

b_y＝σ(t_y)+c_y (1)

b_x＝σ(t_x)+c_x (2)

predicting the target detection frame to obtain the relative central coordinate value b of the prediction frame relative to the current characteristic diagram_x,b_yAnd width and height b of prediction box_w,b_h，c_x,c_yIs the upper left corner coordinate, p, of the output feature map grid cell_w,p_hThe width and height of the anchor frame. t is t_x,t_yCoordinate offset value predicted for the network, t_w,t_hIs the scale factor predicted for the network.

2.3): designing a Yolov5 Loss function, and replacing a Smooth L1 Loss function with a GIOU _ Loss cross entropy Loss, wherein the designed target confidence coefficient Loss function is as follows:

whereinNetwork output c_iObtained by Sigmoid function

2.4): the target category loss function also adopts binary cross entropy, and the designed target category loss function is as follows:

wherein the content of the first and second substances,network output c_iObtained by Sigmoid functionAnd (3) the Sigmoid probability of the j-th class target in the target detection box i is represented:

2.5): the target location loss function is an MSE loss function, as follows:

wherein:

whereinSeating representing a prediction boxThe standard offset (predicted by YOLOv3 is the coordinate offset value),coordinate offset representing real frame, (b)^x,b^y,b^w,b^h) To predict the parameters of the frame, (c)^x,c^y,p^w,p^h) As parameters of the anchor frame, (g)^x,g^y,g^w,g^h) Parameters of the real frame;

2.6): and adding all the loss functions through weights to obtain a total loss function:

L(O,o,C,c,l,g)＝λ_confL_conf(o,c)+λ_claL_cla(O,C)+λ_locL_loc(l,g) (11)

2.7): firstly, continuously reducing the total loss function of the model through steps 2-1.2-2.4 so as to update the weight of the model, and obtaining a trained weight parameter. Then, the updated model weight parameters are imported into the YOLOv5 pruning model. And secondly, the image information of the object on the grabbing operation table collected in the step 1 is used as the input of a network model, and the image information is output as a central coordinate point and a label value of the object in the image.

3. The six-degree-of-freedom real-time grabbing method for the mechanical arm based on the deep reinforcement learning, according to claim 1, is characterized in that: the step 3 comprises the following specific steps:

3.1): the network is forward-inferred by the following formula:

where equation (12) represents the expected reward under state s, action a, where a_tIndicating the action taken at time t, s_tIndicating the state at time t, r_tRepresenting the reward at time t; equation (13) represents the total reward function of the network; equation (14) is a state distribution function; equation (15) represents a state-action function.

3.2.1): designing a reinforcement learning network loss function, and adopting a calculation cross entropy loss function, wherein the formula is as follows:

where τ is s₀a₀s₁a₁...s_na_n.., Markov process.

Due to the fact thatP_r{ a | s } ═ pi (s, a), so equation (17) can be obtained.

3.2.2): the weight update function is as follows:

wherein f is_ωSxA → R is a pairWhen f is an approximation function of_ωWhen the minimum value Δ ω is 0, equation (19) can be derived

3.2.3): when equation (20) is satisfied, then the final loss function is obtained by the weighting coefficients, as follows:

3.3): firstly, a network model is designed according to the formula: the feature extraction network consists of a convolutional layer, a Batch Normalization layer, a Max boosting pooling layer, and a full connection layer. Then, the model weight is updated by reducing the loss function in step 3-2.3, resulting in trained weight parameters. And then, importing the updated weight parameters into the reinforcement learning network model. Secondly, the color image and the depth image obtained in the step 1 are subjected to scaling and normalization processing, so that the formats of the two images meet the input requirements of the reinforcement learning network. And transversely splicing the two tensors output by the feature extraction network through the concat of the Pythroch, sending the two tensors into a network consisting of Batch Normalization and a convolution layer, and outputting a feature probability heat point diagram with the size of 12544.

3.4): and finally, sorting the output tensors into 16 thermodynamic diagrams with the size of 28 × 28, and finding out the coordinates where the maximum probability grabbing points are located, namely outputting a group of three-dimensional coordinate arrays.

3.5): converting each element in the output three-dimensional array into an angle of the tail end of the mechanical arm rotating around three coordinate axes of x, y and z respectively, wherein a specific conversion formula is as follows:

a_x＝((best_pix_ind[0]-14)*30/28)-pi (22)

b_y＝((best_pix_ind[1]-14)*30/28) (23)

r_z＝(best_pix_ind[2]*180/16) (24)

wherein a is_xThe rotation angle of the tail end of the mechanical arm around the x axis is represented as the roll angle of the end effector; b_yThe rotation angle of the tail end of the mechanical arm around the y axis is expressed, and the rotation angle is the pitch angle of the end effector; r is_zThe rotation angle of the tail end of the mechanical arm around the z-axis is represented, and the yaw angle of the end effector is obtained.

4. The six-degree-of-freedom real-time grabbing method for the mechanical arm based on the deep reinforcement learning, according to claim 1, is characterized in that: the step 4 comprises the following specific steps:

firstly, 6 joint angle degrees of the mechanical arm in the current state are solved through robot inverse kinematics. Then, the object center coordinates obtained by the YOLOv5 recognition module in the step 2 and the three-dimensional rotation amount of the tail end of the mechanical arm obtained by the strong learning network output in the step 3 are sent to the positive kinematics of the robot, so that the movement track of the grabbing gesture of the tail end executor after the mechanical arm moves to the target point can be obtained, the tail end executor is further controlled to close the clamp, the grabbing action is attempted, and when the grabbing is successful, the reinforcement learning network returns to 1; when the capture fails, the reinforcement learning network reports 0.

5. The six-degree-of-freedom real-time grabbing method for the mechanical arm based on the deep reinforcement learning, according to claim 1, is characterized in that: the step 5 comprises the following steps:

and (4) continuously performing the step (4) to obtain a series of reinforcement learning network return values, and continuously updating the model weight parameters by reducing the loss function in the reinforcement learning model. And finally, importing the trained weight parameters into the model, and continuously repeating the step 4 to complete the six-degree-of-freedom real-time detection and grabbing task of the mechanical arm.

Technical Field

The invention belongs to a technology for grabbing objects by a mechanical arm in real time based on deep reinforcement learning, and particularly relates to a YOLOv5 pruning network, Kinevt forward and reverse kinematics, CoppelliaSim Edu simulation software and a Policy Gradient reinforcement learning strategy.

Background

Grabbing is a fundamental and important problem in robotics, and although it is critical, solutions to this problem have been unsatisfactory. However, with the rapid development of deep learning and reinforcement learning in recent years, a plurality of feasible ideas are provided for the intelligent mechanical arm grabbing mode. Real-time target detection technology is a popular research in the field of computer vision in recent years, and the technology comprises design of a lightweight target detection network, production of a target data set, research of a model deployment carrier and the like. Among them, one of the most superficial applications lies in the field of accurate and fast intelligent sorting, such as robot intelligent sorting on an unmanned assembly line. ,

in an unmanned robot intelligent sorting environment, how to obtain a proper grabbing posture of a mechanical arm is always a great problem which puzzles the robot to realize the purpose of automatically grabbing a target object. In the early days of research, Antonio Bicchi and Vijay Kumar et al worked to find the appropriate gripping pose of the robotic arm from traditional physical pose analysis. (Antonio Bicchi and Vijay Kumar. "binding profiling and contact: A review". In: IEEE International Conference on Robotics and Automation (ICRA). Vol.1.IEEE.2000, pp.348-353.). However, these methods based on physical analysis not only require the calculation of large amounts of experimental data, which necessarily requires a lot of time and computational costs, but also have to have accurate object models, which are not always universal. It is difficult to apply these algorithms to target objects that are not recorded in the data set.

With the development of deep learning and computer vision, Lerrel Pinto and Abhinav Gupta et al propose data-driven or learning-based methods to solve this problem. (Lerrel Pinto and Abhinav Gupta, "supplementing to and from 50k tries and 700robot hours". In:2016IEEE International conference on robotics and automation (ICRA). IEEE.2016, pp.3406-3413.). Firstly, the method is based on the grabbing of a two-dimensional plane, and Sulabh Kumra and Christopher Kanan et al generate the corresponding gesture of the two-dimensional plane through the grabbing gesture in the learning data set, so that high accuracy is obtained on the two-dimensional grabbing index. However, the two-dimensional plane grabbing model has many limited requirements on grabbing postures, the grabber at the tail end of the mechanical arm can only approach to an object from top to bottom, and in the practical application process, the single grabbing direction greatly limits the application of the mechanical arm to intelligent grabbing, for example, the grabber is difficult to grab a horizontally placed wood board.

Thus, the mechanical arm 6 degree of freedom (6-DOF) grasping idea is proposed. Although the 6D pose estimation proposed by Sida Peng et al can implement 6-DOF capture of object objects in a data set, the success rate of capturing object objects that are not recorded in the data set is low, and therefore the method cannot be popularized to new application scenarios. (Sida Pen et al, "Pvnet: Pixel-wise fusing network for 6dof site evaluation". In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition.2019, pp.4561-4570). The pointentgpd proposed by Hongzhuo Liang uses a two-step sampling-evaluation method to determine a reliable grab pose by evaluating a large number of samples. However, this method is certainly quite time consuming. (Hongzhu Liang et al, "Detecting gram configurations from point sets". In:2019International Conference on Robotics and Automation (ICRA). IEEE.2019, pp.3629-3635). Florence et al perform gesture transitions from the existing grab gesture. However, these methods have relatively low success rates in the face of unknown target objects and objects whose geometry is not similar to the data set. (Peter Florence, Lucas Manueli, and Russ Tedrake. "Dense Object Nets: Learning Dense Visual Object Descriptors By and For robotics management". In: Conference on Robot Learning (CoRL) (2018)). Mousavian et al input a partial point cloud view angle shot by an RGBD camera to a neural network and output a 6-DOF grabbing pose. However, due to potential failures in sensor storage and transmission, the three-dimensional point cloud depth data input to the network is not stable compared to conventional two-dimensional RGB picture data. Therefore, a mechanical arm real-time target detection and grabbing system formed by combining related modules such as real-time target detection, reinforcement learning, forward and reverse kinematics and the like becomes a problem to be solved at present.

Disclosure of Invention

The invention overcomes the defects of the prior art and provides a six-degree-of-freedom real-time grabbing method of a mechanical arm, which is easy to implement and high in applicability. According to the invention, a YOLOv5 pruning network and a Policy Gradient reinforcement learning model are established, so that high precision is ensured, and quick and real-time target detection and grabbing actions can be realized.

The invention takes an image sequence as input, firstly, a YOLOv5 pruning model is utilized to carry out target detection and identification on each frame of image, the model consists of a convolution layer, a Batch Normal layer (BN layer), a LeakyRelu layer and an upper sampling layer, and the structure diagram of a network model is shown in figure 1. In the process of constructing the network model, pruning is carried out on the channels of the convolutional layers, importance ranking is carried out on each channel in the convolutional layers by taking the size of a gamma parameter in the BN layer as an index for measuring the importance of each channel, a percentage threshold of pruning is set, and the channels with the importance degree lower than the threshold are cut; training the trimmed neural network model, and finely adjusting neural network parameters; and repeating the trimming and fine-tuning steps, and stopping trimming after the indexes are reached. YOLOv5 uses Mosaic data to enhance the training speed and network accuracy of the operation lifting model, and proposes adaptive anchor frame calculation and adaptive picture scaling. An Intel RealSense D415 binocular camera is installed at the tail end of the mechanical arm, object image information on the operating platform is collected through the binocular camera, and a center coordinate point and an object label of a captured object are obtained by sending the object image information into a pruned YOLOv5 model. Then, color and depth image information acquired by a binocular camera is subjected to normalization processing, a processing result is sent into a trained reinforcement learning network, a grabbing confidence coefficient and a maximum probability grabbing point are output, the image grabbing point is converted into an angle of the tail end of the mechanical arm required to rotate around a coordinate axis, namely two-dimensional image information is converted into a three-dimensional hemispherical grabbing angle diagram, and the three-dimensional hemispherical grabbing angle diagram is shown in an attached figure 2. And inputting an object center coordinate point output by the YOLOv5 pruning model and three rotation angles output by the reinforcement learning network into the positive kinematics of the robot to obtain a corresponding mechanical arm motion track, and finishing the mechanical arm grabbing action.

The invention relates to a mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning, which comprises the following specific steps of:

step 1: the image information of the object on the grabbing operation table is acquired through a binocular camera:

firstly, an Intel D415 depth camera is vertically fixed at the tail end of a mechanical arm, so that the Intel D415 depth camera can acquire complete image information of an object on a grabbing operation table.

Step 2: and (3) carrying out target detection training on the image by using a YOLOv5 pruning network model:

step 2.1: when a YOLOv5 pruning network model is designed, the depth of a YOLOv5 main network is deepened by considering a residual error structure shortcut design in a ResNet network, the downsampling of a convolutional layer is realized by setting a step length parameter in the convolutional layer, and the identification accuracy of the YOLOv5 network is improved. The next but three convolutional layers last used for prediction are followed by a Batch Normalization (BN) operation, followed by a Leaky _ Relu activation function. A top-down characteristic pyramid multi-scale model structure is adopted, and three characteristic graphs output by a network are fused through an upsampling operation, so that the purpose of multi-scale prediction is achieved.

Step 2.2: the method aims to accelerate regression of a prediction frame, and a prediction formula in forward inference of a model network is as follows:

b_y＝σ(t_y)+c_y (1)

b_x＝σ(t_x)+c_x (2)

b_x,b_yis the relative center coordinate value of the prediction box on the feature map of the corresponding size. b_w,b_hIs the width and height of the prediction box. c. C_x,c_yIs the upper left corner coordinate, p, of the output feature map grid cell_w,p_hThe width and height of the anchor frame. t is t_x,t_yFor predicted coordinate offset values, t_w,t_hIs a predicted scaling factor;

step 2.3: designing a YOLOv5 Loss function, and replacing a Smooth L1 Loss function with a GIOU _ Loss function, so that the detection precision of the algorithm is further improved, the Loss function is reduced and the model weight parameters are updated by continuously training data in a data set;

step 2.4: first, the updated model weight parameters are imported into the YOLOv5 pruning model. And secondly, the image information of the object on the grabbing operation table collected in the step 1 is used as the input of a network model, and the image information is output as a central coordinate point and a label value of the object in the image.

And step 3: establishing a reinforcement learning network model:

step 3.1: designing a reinforcement learning network loss function, and calculating a cross entropy loss function, thereby further improving the detection precision of the algorithm;

step 3.1.1: designing a target confidence coefficient loss function;

step 3.1.2: establishing a weight updating function;

step 3.1.3: obtaining a final loss function through the weight coefficient;

step 3.2: the reinforcement learning network is composed of a plurality of feature extraction networks. Firstly, performing feature extraction on the color picture and the depth information obtained in the step 1 by utilizing a multilayer convolutional neural network to respectively obtain two tensors of color and depth. And then splicing the two tensors by using the concat of the Pythroch, and sending the two tensors into an ordered container consisting of BatchNormalization (BN) and a convolution layer to obtain the capture probability feature tensor.

Step 3.3: and finally, sorting the output tensors into 16 thermodynamic diagrams with the size of 28 × 28, and finding out the coordinates where the maximum probability grabbing points are located, namely outputting a group of three-dimensional coordinate arrays.

Step 3.4: converting each element in the array into an angle of rotation of the tail end of the mechanical arm around three coordinate axes of x, y and z respectively, wherein a specific conversion formula is as follows:

a_x＝((best_pix_ind[0]-14)*30/28)-pi (22)

b_y＝((best_pix_ind[1]-14)*30/28) (23)

r_z＝(best_pix_ind[2]*180/16) (24)

And 4, step 4: the grabbing and moving of the mechanical arm are completed through the forward and backward kinematics of the robot:

And 5: and (3) performing reinforcement learning model training to ensure that the mechanical arm finishes the grabbing action:

and (4) continuously performing the step (4) to obtain a series of reinforcement learning network return values, and continuously updating the model weight parameters by reducing the loss function in the reinforcement learning model. And finally, importing the trained weight parameters into the model, and repeating the step 4 to complete the six-degree-of-freedom real-time detection and grabbing task of the mechanical arm.

In conclusion, the method has the advantages that the neural network channels are pruned on the basis of ensuring the existing high-precision detection of the YOLOv5 recognition model, so that the calculation amount and the storage amount of the neural network are reduced, and the influence on the model performance is avoided. Meanwhile, a reinforcement learning network is designed aiming at the method, the defects of complex calculation and high time cost of mechanical arm grabbing gesture deduction through traditional physics are overcome, and the problem that the mechanical arm 6-DOF grabbing gesture cannot be applied to target objects which are not recorded in a data set is solved. The method not only ensures the high grabbing success rate of the mechanical arm model, but also is beneficial to the generalization of reinforcement learning, namely the method can be applied to new grabbed objects, and solves the problems of time-consuming calculation of the traditional method and reduction of instability of the point cloud model of the input part. The invention realizes the real-time detection of the captured object and the function of 6-DOF capture.

Drawings

FIG. 1 is a block diagram of the YOLOv5 model in accordance with the present invention;

FIG. 2 is a three-dimensional hemispherical view of an end effector according to the present invention;

FIG. 3 is a training flow diagram of the YOLOv5 of the present invention;

FIG. 4 is a flow diagram of a reinforcement learning network in accordance with the present invention;

fig. 5 is a flow chart of the real-time detection and grasping of the robotic arm in the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The invention discloses a mechanical arm real-time grabbing method based on a YOLOv5 pruning network and reinforcement learning, which comprises the following specific processes:

step 1: the image information of the object on the grabbing operation table is acquired through a binocular camera: firstly, an Intel D415 depth camera is vertically fixed at the tail end of a mechanical arm, so that the Intel D415 depth camera can acquire complete image information of an object on a grabbing operation table.

Step 2, carrying out target detection training on the image by using a YOLOv5 pruning network model;

step 2.1: since, in theory, the deeper the network, the better its performance. However, experiments show that the derivative of the activation HAN function is needed in the back propagation process, and if the derivative is greater than 1, the gradient update will increase towards an exponential explosion mode as the number of network layers increases, i.e. gradient explosion; if the derivative is less than 1, the gradient update information decreases towards an exponential decay mode as the number of network layers increases, i.e. the gradient disappears. In order to reduce the possibility of gradient explosion and gradient disappearance, the invention designs a Resnet jump layer connection structure of a reference residual error network when a Yolov5 prunes a network model, sets the step size parameter of a convolutional layer to be 2, and adds Batch Normalization (BN) to the rest convolutional layers. The CSP1_ X module consists of a CBL module, a Res _ unit module, a convolutional layer and a Concate; the CSP2_ X module is composed of a convolutional layer and X Res _ unit modules, concatees. The input layer is composed of convolution layer, Batch Normalization (BN), and Leaky _ Relu activation function. According to the size of the input color image and the size of an anchor frame in the learning data set, the purpose of self-adaptive multi-scale prediction is achieved.

Step 2.2: by using the automatic learning based on training data and the K-meas clustering algorithm, the YOLOv5 can relearn the size of the anchor frame even if the size of the target object in the data set is different from that in the COCO data set, so as to obtain a preset anchor frame suitable for the prediction of the boundary of the object in the custom data set. The prediction formula in the forward inference of the YOLOv5 pruning network model is as follows:

b_y＝σ(t_y)+c_y (1)

b_x＝σ(t_x)+c_x (2)

predicting the target detection frame to obtain the relative central coordinate value b of the prediction frame relative to the current characteristic diagram_x,b_yAnd width and height b of prediction box_w,b_h，c_x,c_yIs to be transportedGo out the upper left corner coordinate, p, of the feature map grid cell_w,p_hThe width and height of the anchor frame. t is t_x,t_yCoordinate offset value predicted for the network, t_w,t_hIs the scale factor predicted for the network.

Step 2.3: designing a Yolov5 Loss function, and replacing a Smooth L1 Loss function with a GIOU _ Loss cross entropy Loss, wherein the designed target confidence coefficient Loss function is as follows:

whereinNetwork output c_iObtained by Sigmoid function

Step 2.4: the target category loss function also adopts binary cross entropy, and the designed target category loss function is as follows:

step 2.5: the target location loss function is an MSE loss function, as follows:

wherein:

whereinCoordinate offset values representing the prediction box (YOLOv3 predicts coordinate offset values),coordinate offset representing real frame, (b)^x,b^y,b^w,b^h) To predict the parameters of the frame, (c)^x,c^y,p^w,p^h) As parameters of the anchor frame, (g)^x,g^y,g^w,g^h) Parameters of the real frame;

step 2.6: and adding all the loss functions through weights to obtain a total loss function:

L(O,o,C,c,l,g)＝λ_confL_conf(o,c)+λ_claL_cla(O,C)+λ_locL_loc(l,g) (11)

step 2.7: firstly, continuously reducing the total loss function of the model through steps 2-1.2-2.4 so as to update the weight of the model, and obtaining a trained weight parameter. Then, the updated model weight parameters are imported into the YOLOv5 pruning model. And secondly, the image information of the object on the grabbing operation table collected in the step 1 is used as the input of a network model, and the image information is output as a central coordinate point and a label value of the object in the image.

Step 3, establishing a reinforcement learning model:

step 3.1: the network is forward-inferred by the following formula:

Step 3.2.1: designing a reinforcement learning network loss function, and adopting a calculation cross entropy loss function, wherein the formula is as follows:

where τ is s₀a₀s₁a₁...s_na_n.., Markov process.

Due to the fact thatP_r{ a | s } ═ pi (s, a), so equation (17) can be obtained.

Step 3.2.2: the weight update function is as follows:

wherein f is_ωSxA → R is a pairWhen f is an approximation function of_ωWhen the minimum value Δ ω is 0, equation (19) can be derived

Step 3.2.3: when equation (20) is satisfied, then the final loss function is obtained by the weighting coefficients, as follows:

step 3.3: firstly, a network model is designed according to the formula: the feature extraction network consists of a convolutional layer, a Batch Normalization layer, a Max boosting pooling layer, and a full connection layer. Then, the model weight is updated by reducing the loss function in step 3-2.3, resulting in trained weight parameters. And then, importing the updated weight parameters into the reinforcement learning network model. Secondly, the color image and the depth image obtained in the step 1 are subjected to scaling and normalization processing, so that the formats of the two images meet the input requirements of the reinforcement learning network. And transversely splicing the two tensors output by the feature extraction network through the concat of the Pythroch, sending the two tensors into a network consisting of Batch Normalization and a convolution layer, and outputting a feature probability heat point diagram with the size of 12544.

Step 3.4: and finally, sorting the output tensors into 16 thermodynamic diagrams with the size of 28 × 28, and finding out the coordinates where the maximum probability grabbing points are located, namely outputting a group of three-dimensional coordinate arrays.

Step 3.5: converting each element in the output three-dimensional array into an angle of the tail end of the mechanical arm rotating around three coordinate axes of x, y and z respectively, wherein a specific conversion formula is as follows:

a_x＝((best_pix_ind[0]-14)*30/28)-pi (22)

b_y＝((best_pix_ind[1]-14)*30/28) (23)

r_z＝(best_pix_ind[2]*180/16) (24)

And 4, finishing mechanical arm grabbing movement through forward and backward kinematics of the robot:

And 5: and (3) performing reinforcement learning model training to ensure that the mechanical arm finishes the grabbing action:

15页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：具有路线规划稳定机构的机器人系统

Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning

相关技术

网友询问留言