Motion quality assessment method based on human body pose estimation and terminal equipment

文档序号：1800861 发布日期：2021-11-05 浏览：21次中文

阅读说明：本技术 一种基于人体位姿估计的运动质量评估方法及终端设备 (Motion quality assessment method based on human body pose estimation and terminal equipment ) 是由张盛孙万虎周茂林于 2021-07-30 设计创作，主要内容包括：本发明公开一种基于人体位姿估计的运动质量评估方法及终端设备,该方法包括：接收视频帧,采用快速目标检测方式进行人体检测；将检测到的人体部分从视频帧原图抠出,输入到轻量型人体位姿估计网络进行人体2D关节点检测；将每一帧关节点检测结果进行滤波；将经过滤波后的关节点进行运动质量评估；记录运动量,给出运动质量总体打分。本发明能在一般具有AI加速能力的嵌入式平台进行实时运动捕捉及评估,通过快速人体检测、滤波、动作质量评估、动作自动计数等提供一个实时基于人体位姿估计的运动捕捉算法来进行技术支撑,通过动作比对并且对动作质量打分来监测运动质量,通过设计三种人体运动状态实现动作自动计数,实现高效的实时用户体验。(The invention discloses a motion quality assessment method based on human body pose estimation and a terminal device, wherein the method comprises the following steps: receiving a video frame, and performing human body detection by adopting a rapid target detection mode; the detected human body part is extracted from the original image of the video frame and input to a light-weight human body pose estimation network for human body 2D joint point detection; filtering the detection result of each frame of joint points; evaluating the motion quality of the filtered joint points; and recording the motion amount and giving a total score of motion quality. The invention can carry out real-time motion capture and evaluation on a general embedded platform with AI acceleration capability, provides a motion capture algorithm based on human pose estimation in real time for technical support through rapid human body detection, filtering, motion quality evaluation, automatic motion counting and the like, monitors the motion quality through motion comparison and scoring of the motion quality, realizes automatic motion counting through designing three human motion states, and realizes high-efficiency real-time user experience.)

1. A motion quality assessment method based on human body pose estimation is characterized by comprising the following steps:

s1: receiving a video frame, and detecting the pose of a human body by adopting a rapid target detection mode;

s2: the detected human body part is extracted from the original image of the video frame and input into a light-weight human body pose estimation network to detect the human body 2D joint points;

s3: filtering the joint point detection result of each frame;

s4: evaluating the motion quality of the filtered joint points;

s5: and recording the motion amount and giving a total score of motion quality.

2. The motion quality assessment method based on human pose estimation according to claim 1, characterized in that: in the step S1, a lightweight human body detector is used to initialize a human body pose detection area, and the object position frame prediction process using the anchor point is as follows:

s11: unifying the scales of the input video frame pictures to a fixed size;

s12: dividing an image with a fixed size into S multiplied by S grids, generating B rectangular frames with different scales for each grid, generating S multiplied by B anchor points, and predicting the object type and the object with the object center point falling in the grid by each grid;

s13: and outputting the corrected value of the anchor point, filtering to obtain an object detection result, and training the human body detector.

3. The motion quality assessment method based on human pose estimation according to claim 1, characterized in that: in the step S2, a fast human body detector is used, or the result feedback of the human body pose detection is used to perform human body position location; the two modes can be alternately applied to improve the calculation efficiency.

4. The motion quality assessment method based on human pose estimation according to claim 3, characterized in that: the fast human body detector is used for positioning the position of a human body during initialization, and after the operation is stable, the two modes are switched according to the following rules:

when the confidence coefficient of human pose detection in the initialization and operation processes is lower than a certain threshold value, providing a position area of a human body in a video frame by using a rapid human body detector;

when the confidence coefficient of the human body pose detection in the stable operation process is higher than a threshold value, the human body position is determined by using the video historical frame, and the computing resources of the human body high-speed detector are released, so that the time delay of the human body position detection process is reduced.

5. The motion quality assessment method based on human pose estimation according to claim 3, characterized in that: the calculation formula for positioning the human body by using the result feedback of the human body pose detection is as follows:

W＝(1+λ)(max{p_i(x)}-min{p_i(x)})，H＝(1+η)(max{p_i(y)}-min(p_i(y)})，

wherein i is 1, 2, N, λ, η are design parameters for adjusting the pose detection result to more approximate the detection scale of the fast human body detector, and p is the position of the human body joint point in the human body pose detection result;

obtaining the bounding box of the human body position, and positioning the coordinate point (x) at the upper left corner of the bounding box₀，y₀)，

6. The motion quality assessment method based on human pose estimation according to claim 1, characterized in that: in the step S2, a deep learning network based on an hourglass network structure is used to detect the human body pose, so as to complete 2D estimation of the human body pose and obtain the 2D positions of the human body joint points.

7. The motion quality assessment method based on human pose estimation according to claim 1, characterized in that: in the step S3The filter adapts the new cut-off frequency of the low-pass filter for the sampling point according to the speed of the estimated signal to track the dynamic signal and remove the signal jitter: wherein T is_eIs the signal sampling time interval, f_cIs the cut-off frequency of the low-pass filter, the parameter (beta, f)_cmin，T_e) Are the parameter values of the filter.

8. The motion quality assessment method based on human pose estimation according to claim 1, characterized in that: in the step S4, on the basis of detecting the pose of the human body, the angle between the position of the joint point and the joint vector is used to compare the similarity of the two actions in the real-time video stream;

taking a real-time online action video stream as a main branch, sampling a real-time video frame according to a down-sampling rate gamma, selecting an action video frame to be compared, and recording a collection timestamp and a corresponding standard action video frame; video frames { f) within retention time window τ_i1, · t, t +1, t +2,. t + n }; selecting standard motion video frame S according to selected time stamp_tSelecting the most similar frame for comparison for the video frames within the time window tau, wherein the similarity is defined by the Euclidean distance of two types of video frames after the position of the normalized joint point:

before video frame comparison, normalization operation is carried out on the pose detection result of the human body:wherein (x)₀，y₀) Is the selected normalized origin of coordinates; obtaining the normalized coordinate of each key point after normalization, and calculating to obtain the relationKey angle:

setting the full score of the motion as omega, wherein the score ratio of the position error and the angle error of the joint point is mu: v, the fraction of each of the two types of errors is:the error scoring function expression is:the position error and the angle error of the normalized joint point are processed by a scoring function to obtain the weight delta of the joint point_iAnd angle weight Θ_iTo obtain an overall score:

9. the motion quality assessment method based on human pose estimation according to claim 1, characterized in that: in the step S4, the performing motion quality evaluation on the joint point after filtering includes: action counting, action correction and action suggestion.

10. The motion quality assessment method based on human pose estimation according to claim 1, characterized in that: in the step S5, three human motion states, which are 0, 1 and 2 respectively, are designed; transitioning from state 0 to state 1, recording one state transition; completing one cycle when transitioning from state 1 to state 2; recording the number of times of one movement through the state transition process of 0 → 1 → 2, and initializing the state machine to the state 0, thus accumulating; wherein the state transition is represented using a difference between the position coordinates of the joint point between the previous and next three frames.

11. A terminal device for motion quality assessment based on human pose estimation, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-10 when executing the computer program.

Technical Field

The invention relates to the field of motion quality evaluation, in particular to a motion quality evaluation method based on human body pose estimation and a terminal device.

Background

With the rapid development of economic culture, the pursuit of people for physical and mental life is continuously enhanced, and people begin to pay more attention to physical exercise of their bodies to build up the bodies to cope with heavy working pressure or keep healthy bodies for life. At present, fitness activities are generally carried out in special fitness rooms, and individuals can hire professional fitness coaches to carry out professional guidance on self fitness work, so that certain economic cost needs to be borne, and more importantly, more time cost needs to be paid. With the rapid development of artificial intelligence technology, nowadays, an AI fitness coach can be completely created as an individual personal coach through the support of AI technology, and most importantly, the on-call fitness coach can more scientifically manage and guide the health of an individual.

Considering that an artificial intelligence model is generally deployed on a GPU server, motion capture and motion quality evaluation cannot be carried out on human pose estimation in real time at present.

Disclosure of Invention

In order to make up for the defect that the motion quality estimation of the human body pose estimation in the prior art is not real-time, the invention provides a motion quality estimation method based on the human body pose estimation and a terminal device.

The technical problem of the invention is solved by the following technical scheme:

the invention provides a motion quality assessment method based on human body pose estimation, which is characterized by comprising the following steps of: s1: receiving a video frame, and detecting the pose of a human body by adopting a rapid target detection mode; s2: the detected human body part is extracted from the original image of the video frame and input into a light-weight human body pose estimation network to detect the human body 2D joint points; s3: filtering the joint point detection result of each frame; s4: evaluating the motion quality of the filtered joint points; s5: and recording the motion amount and giving a total score of motion quality.

In some embodiments, in step S1, initializing a human body pose detection area using a lightweight human body detector, and the object position frame prediction process using an anchor point includes: s11: unifying the scales of the input video frame pictures to a fixed size; s12: dividing an image with a fixed size into S multiplied by S grids, generating B rectangular frames with different scales for each grid, generating S multiplied by B anchor points, and predicting the object type and the object with the object center point falling in the grid by each grid; s13: and outputting the corrected value of the anchor point, filtering to obtain an object detection result, and training the human body detector.

In some embodiments, in the step S2, the human body position is located by using a fast human body detector, or by using the result feedback of the human body pose detection; the two modes can be alternately applied to improve the calculation efficiency.

In some embodiments, the fast human body detector is used for positioning the human body position during initialization, and after the operation is stable, the switching between the two modes is performed according to the following rules:when the confidence coefficient of human pose detection in the initialization and operation processes is lower than a certain threshold value, providing a position area of a human body in a video frame by using a rapid human body detector; when the confidence coefficient of the human body pose detection in the stable operation process is higher than a threshold value, the human body position is determined by using the video historical frame, and the computing resources of the human body high-speed detector are released, so that the time delay of the human body position detection process is reduced.

In some embodiments, the calculation formula for human body positioning using the result feedback of human body pose detection is: w ═ 1+ λ (max { p)_i(x)}-min{p_i(x)})，H＝(1+η)(max{p_i(y)}-min{p_i(y), wherein i is 1, 2.. N, λ, η is a design parameter for adjusting the pose detection result to more closely approach the detection scale of the fast human body detector, and p is the position of the human body joint point in the human body pose detection result; obtaining the bounding box of the human body position, and positioning the coordinate point (x) at the upper left corner of the bounding box₀，y₀)，

In some embodiments, in the step S2, a deep learning network based on an hourglass network structure is used to detect the human body pose, so as to complete 2D estimation of the human body pose and obtain the 2D positions of the human body joint points.

In some embodiments, in the step S3, the filter adapts the cut-off frequency of the new low-pass filter for the sampling point according to the velocity of the estimated signal to track the dynamic signal and remove the signal jitter: wherein T is_eIs the signal sampling time interval, f_cIs the cut-off frequency of the low-pass filter, the parameter (beta, f)_cmin，T_e) Are the parameter values of the filter.

In some embodiments, in the step S4, based on the detection of the human body pose, the angle between the joint point position and the joint vector is used to compare the similarity of the two motions in the real-time video stream; taking a real-time online action video stream as a main branch, sampling a real-time video frame according to a down-sampling rate gamma, selecting an action video frame to be compared, and recording a collection timestamp and a corresponding standard action video frame; video frames { f) within retention time window τ_i1,. t +1, t +2,. t + n }; selecting standard motion video frame S according to selected time stamp_tSelecting the most similar frame for comparison for the video frames within the time window tau, wherein the similarity is defined by the Euclidean distance of two types of video frames after the position of the normalized joint point:before video frame comparison, normalization operation is carried out on the pose detection result of the human body:wherein (x)₀，y₀) Is the selected normalized origin of coordinates; chinese angelica root-barkObtaining the normalized coordinates of each key point after normalization, and calculating to obtain a key included angle:setting the full score of the motion as omega, wherein the score ratio of the position error and the angle error of the joint point is mu: v, the fraction of each of the two types of errors is:the error scoring function expression is:the position error and the angle error of the normalized joint point are processed by a scoring function to obtain the weight delta of the joint point_iAnd angle weight Θ_iTo obtain an overall score:

in some embodiments, the step of S4, the evaluating the motion quality of the joint point after filtering includes: action counting, action correction and action suggestion.

In some embodiments, in the step S5, three human motion states are designed, which are 0, 1 and 2; transitioning from state 0 to state 1, recording one state transition; completing one cycle when transitioning from state 1 to state 2; recording the number of times of one movement through the state transition process of 0 → 1 → 2, and initializing the state machine to the state 0, thus accumulating; wherein the state transition is represented using a difference between the position coordinates of the joint point between the previous and next three frames.

The invention also provides a terminal device for motion quality assessment based on human body pose estimation, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, and is characterized in that the processor implements the steps of any one of the methods when executing the computer program.

Compared with the prior art, the invention has the advantages that: the invention provides a real-time motion capture algorithm based on human body pose estimation for technical support through rapid human body detection, filtering and motion quality evaluation, and motion quality is monitored by setting motion comparison and scoring motion quality; the human body position and pose detection area is initialized by adopting the lightweight human body detector, embedded AI equipment can be better compatible, real-time motion capture and evaluation can be carried out on a more general embedded platform with AI acceleration capability, and efficient real-time user experience is realized.

In some embodiments, the beneficial effects of the present invention compared to the prior art include: the human body is positioned by using two modes of quick human body detectors or result feedback of human body pose detection, and the two modes are alternately applied, so that the calculation efficiency can be improved.

In some embodiments, the beneficial effects of the present invention compared to the prior art include: the automatic counting of the actions is realized by designing three human motion states.

Drawings

FIG. 1 is a schematic flow diagram of an embodiment of the present invention;

FIG. 2 is a network structure diagram of a feature extraction based backbone network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of human joint points and angle selection according to an embodiment of the present invention;

FIG. 4 is a scoring function image of an embodiment of the present invention;

fig. 5 is a diagram illustrating an automatic counting function based on a state machine according to an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and preferred embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms of orientation such as left, right, up, down, top and bottom in the present embodiment are only relative concepts to each other or are referred to the normal use state of the product, and should not be considered as limiting.

The real-time motion quality evaluation method capable of running on the embedded equipment with the AI acceleration function comprises the following specific steps of: s1: receiving a video frame, and detecting the pose of a human body by adopting a rapid target detection mode; s2: the detected human body part is extracted from the original image of the video frame and input into a light-weight human body pose estimation network to detect the human body 2D joint points; s3: filtering the joint point detection result of each frame; s4: evaluating the motion quality of the filtered joint points; s5: and recording the motion amount and giving a total score of motion quality. The general flow diagram is shown with reference to fig. 1.

And (5) rapidly detecting the human body. In order to realize efficient real-time user experience, simultaneously extract human body foreground and enhance the stability of human body pose detection during initialization of an algorithm program, a lightweight human body detector is considered to be adopted as the algorithm program to initialize a human body pose detection area, and the embedded AI device can be better compatible. A very light-weight target detection algorithm YOLOv4-tiny is adopted, the real-time detection speed of 371fps can be achieved on a 1080Ti-GPU, high precision can be kept, meanwhile, the target application scene is not very complex, and the algorithm can meet actual design requirements. YOLOv4-tiny adopts CSPDarknet53 after optimized compression as a backbone network for feature extraction, and uses a CSPBlock module to perform fusion of spatial information in the backbone network, and a network structure thereof is shown in fig. 2. Yolov4-tiny adopts the object position frame prediction mode of anchor point. The algorithm prediction process comprises the following steps: firstly, unifying the scales of input pictures to a fixed size; dividing an image with a fixed size into grids of S multiplied by S, generating B rectangular frames with different scales for each grid by anchor points, and finally generating S multiplied by B anchor points, wherein each grid is only responsible for predicting the object type and the object of which the object center point falls in the grid; and filtering the corrected value of the network output anchor point by an NMS algorithm to obtain a final object detection result, wherein only a human body detector needs to be trained.

Cutting a human body target frame. In order to further optimize program codes, reduce calculation overhead and improve real-time performance, the method designs two modes for detecting the position of the human body: the first is to use a fast human body detector to carry people onPositioning of the body; the second is to use the result feedback of the human body pose detection to perform human body positioning. The two modes can be alternately applied in the program design, so that the calculation efficiency of the program can be greatly improved. The fast human body detector is used for positioning the human body position when the computer program is initialized, and after the program runs stably, the two human body position positioning modes are switched according to the following rules:

when the confidence coefficient of human pose detection in the process of algorithm program initialization and program operation is lower than a certain threshold, providing a position area of a human body in a video frame by using a rapid human body detector; in the stable operation process of the algorithm program, when the confidence coefficient of human pose detection is higher than a threshold value, the human body position is determined by using the video historical frame, and at the moment, the calculation resources of the human body high-speed detector are released, so that the efficiency is improved, and the time delay in the human body position detection process is reduced. Pose feedback positioning method |: w ═ 1+ λ (max { p)_i(x)}-min{p_i(x)})，H＝(1+η)(max{p_i(y)}-min{p_i(y), where t is 1, 2.. N, λ, η are design parameters that can be adjusted to adjust the pose detection result closer to the detection scale of the fast human body detector, and p is the position of the human body joint point in the human body pose detection result. Thus, the surrounding frame of the human body position is obtained, and the coordinate point (x) at the upper left corner of the human body position frame is positioned at the same time₀，y₀)，

And (5) quickly detecting the pose of the human body. In order to construct a real-time intelligent human motion quality system, the method needs a lightweight real-time human pose estimation algorithm to detect the human pose, and then can complete the overall motion quality estimation. In the method, a deep learning network based on a hourglass network structure is adopted to detect the human body pose, and 2D human body pose estimation is completed to further obtain the 2D position of a human body joint point.

And a filter. By using light weightThe detection of the human body pose by the level depth network is the detection result of a single frame video image, the detection error problem existing in the algorithm is considered, and the human body pose detection result of the algorithm is simply corrected by adopting the time sequence information of the video frame in the method, and the oneEurofilter is adopted. The OneEurofilter is a self-adaptive first-order low-pass filter, can adapt to the cut-off frequency of a new low-pass filter for a sampling point according to the estimated signal speed, can better track a dynamic signal, has low delay and can play a role in removing signal jitter:wherein T is_eIs the signal sampling time interval, f_cI.e. the cut-off frequency of the low-pass filter, the parameters (beta, f) of which_cmin，T_e) Is the parameter value of the filter and needs to be set empirically. In general (beta, f)_cmin) The relatively small point is more beneficial to eliminating the position jitter effect of the joint point.

And (5) evaluating the action quality. In order to monitor the movement quality, the method designs a mode of comparing the movement and scoring the movement quality. On the basis of detection of the human body pose, the method compares the action similarity by using the angle between the joint point position and the joint vector, and an example is shown in fig. 3.

When comparing two actions in a real-time video stream, the problem to be solved is the action alignment problem, and the motion quality monitoring designed in the method aims at a general real-time human motion video frame and a section of standard action video frame. When a camera acquires video frames in real time and calculates the video frames simultaneously, a large amount of uncertain frame loss phenomena exist, so a method for sampling and aligning actions is designed in the method. Firstly, considering that the offline standard motion video frames are preprocessed, the number of dropped frames is relatively small, and the motion of the standard video frames is exactly one or several cycles (as short and continuous as possible) of the motion. Therefore, the real-time on-line motion video stream is taken as the main branch, and the real-time video frame is acquired according to the down sampling rate gammaSelecting action video frames needing to be compared, and recording a collection timestamp to correspond to the standard action video frames; while preserving video frames f within a time window τ of the normal video frames_iI |, 1, 2,. t, t +1, t +2,. t + n }, and the video frame at the time t is the video frame selected by the sampling. Selecting standard motion video frame S according to selected time stamp_tOnly need be at { f_iSelecting a frame with the most similarity from 1, 2,. t, t +1, t +2,. t + n } for comparison; the similarity here is defined using the euclidean distance of the two types of video frames after the normalized joint point position, i.e.:the position coordinates of the joint points in the above formula are all coordinates after the normalization operation.

In order to eliminate the influence of the human body shape and the camera sight distance, before video frame comparison, normalization operation needs to be carried out on the human body pose detection result:wherein (x)₀，y₀) The selected normalized origin of coordinates can be selected as required. After normalization, the normalized coordinates of each key point can be obtained, and the key included angle can be further calculated according to the mode of fig. 3:

all the comparison parameters are ready, then the motion quality is evaluated, the full score of the motion is set to be omega in the design, and the score ratio of the position error of the joint point to the angle error is mu: v, the fraction of each of the two types of errors at this time is:the error scoring function expression designed in the method is as follows:scoring function image, shown with reference to fig. 4. The position error and the angle error of the normalized joint point are processed by a scoring function to obtain the weight delta of the joint point_iAnd the angular weight theta_iAnd then the overall score is obtained:the scoring function includes three parameters (k, α, β)), which need to be set based on actual data, and the scoring functions used for joint error and angle error are identical, but the specific parameter values are not the same.

The actions are automatically counted. The method also proposes an automatic counting function based on a state machine, as shown with reference to fig. 5. Considering the periodicity of the motion state of the human body, three states are designed, namely 0, 1 and 2; the state 0 is changed into the state 1, and the state change needs to be recorded once; completing one cycle when transitioning from state 1 to state 2; only through the state transition process of 0 → 1 → 2, the number of movements is recorded, while the state machine is initialized to state 0, thus accumulating. The state transition therein is expressed using the difference between the position coordinates of the joint point between the previous and next three frames.

The terminal device for estimating motion quality evaluation based on the human body pose comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, and is characterized in that the processor realizes the steps of any one of the methods when executing the computer program.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

11页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于外骨骼的智能运动康复治疗与训练系统

Motion quality assessment method based on human body pose estimation and terminal equipment

相关技术

网友询问留言