Method and system for determining feature points based on multiple image acquisition devices

文档序号：1833262 发布日期：2021-11-12 浏览：8次中文

阅读说明：本技术 基于多个图像获取设备确定特征点的方法及系统 (Method and system for determining feature points based on multiple image acquisition devices ) 是由张硕张惠斌于 2021-07-15 设计创作，主要内容包括：本发明提供一种基于多个图像获取设备确定特征点的方法及系统,其中方法包括：根据所获取的动态图像生成每个图像获取设备各自的动态图像文件；根据预先设置的配置文件为每个动态图像文件确定各自的神经网络,利用每个动态图像文件各自的神经网络进行数据处理,以获取与每个图像获取设备相关联的热图集；对多个热图集进行数据融合以获得每个图像获取设备的经过数据融合的热图集,进而获得包括主体对象的三维信息；以及对每个动态图像文件进行图像识别以确定三维信息所涉及的客体对象的对象信息,基于客体对象的对象信息确定特征点修复类型,根据特征点修复类型对所述三维信息进行特征点的修复,以确定与主体对象相关联的多个特征点。(The invention provides a method and a system for determining feature points based on a plurality of image acquisition devices, wherein the method comprises the following steps: generating a respective dynamic image file of each image acquisition device according to the acquired dynamic image; determining a respective neural network for each dynamic image file according to a preset configuration file, and performing data processing by using the respective neural network for each dynamic image file to acquire a heat map set associated with each image acquisition device; performing data fusion on the heat map sets to obtain the data-fused heat map set of each image acquisition device, and further obtaining three-dimensional information including the subject object; and performing image recognition on each dynamic image file to determine object information of an object related to the three-dimensional information, determining a characteristic point repairing type based on the object information of the object, and repairing the characteristic points of the three-dimensional information according to the characteristic point repairing type to determine a plurality of characteristic points associated with the subject object.)

1. a method of determining feature points based on a plurality of image acquisition devices, the method comprising:

each image acquisition device respectively acquires a dynamic image of the main object based on the respective reference position, and generates a respective dynamic image file of each image acquisition device according to the acquired dynamic image;

determining a respective neural network for each dynamic image file according to a preset configuration file, and performing data processing by using the respective neural network for each dynamic image file to acquire a heat map set associated with each image acquisition device;

performing data fusion on the plurality of heat map sets to obtain a data-fused heat map set for each image acquisition device, obtaining three-dimensional information including the subject object based on the data-fused heat map set for each image acquisition device; and

performing image recognition on each dynamic image file to determine object information of an object related to the three-dimensional information, determining a characteristic point repairing type based on the object information of the object, and repairing the characteristic points of the three-dimensional information according to the characteristic point repairing type to determine a plurality of characteristic points associated with the subject object.

2. The method according to claim 1, further comprising, before each image acquisition device respectively acquires a dynamic image of the subject object based on the respective reference position:

a position attribute and a direction attribute of the subject object are acquired, and a reference position is determined for each of the plurality of image acquisition apparatuses based on the position attribute and the direction attribute.

3. The method of claim 2, the location attribute comprising: the location coordinates of the subject object and/or the location area of the subject object.

4. The method of claim 2, the directional attribute comprising: a single orientation information of the subject object or a plurality of orientation information of the subject object.

5. The method of claim 2, the obtaining the location attribute and the orientation attribute of the subject object comprising:

receiving input data and analyzing the input data to determine a position attribute and a direction attribute of the subject object; or

And acquiring the positioning information of the main body object by using the positioning equipment, and determining the position attribute and the direction attribute of the main body object according to the positioning information.

6. The method of claim 2, the determining a reference location for each of a plurality of image acquisition devices based on the location attribute and the orientation attribute comprising:

determining a plurality of candidate positions for acquiring a dynamic image of the target object based on the position attribute and the direction attribute;

determining a reference position for each image acquisition device from a plurality of candidate positions;

wherein the reference position of each image capturing device is different.

7. The method of claim 1, wherein each image acquisition device respectively acquiring the dynamic image of the subject object based on the respective reference position comprises:

each image acquisition device respectively acquires a dynamic image of the subject object at a respective predetermined photographing angle at a respective reference position; or

Each image acquisition apparatus forms a movement path based on a respective reference position, selects a photographing position by the respective movement path, and acquires a moving image of the subject object at the photographing position at the selected photographing angle, respectively.

8. The method of claim 1, wherein generating a respective motion image file for each image capture device from the captured motion images comprises:

obtaining a dynamic image data stream according to the dynamic image obtained by each image obtaining device;

a respective motion image file for each image capture device is generated using the motion image data stream.

9. The method of claim 1, wherein the preset configuration file comprises a name of the neural network and parameter information of the neural network.

10. The method of claim 9, wherein determining a respective neural network for each dynamic image file according to a preset profile comprises:

determining a neural network to be used according to the name of the neural network in a preset configuration file;

carrying out parameter configuration on the neural network to be used according to the parameter information of the neural network;

determining the neural network subjected to parameter configuration as the respective neural network of each dynamic image file;

wherein the neural network of each dynamic image file is the same neural network.

11. The method of claim 1, wherein the data processing using the respective neural network for each dynamic image file to obtain the set of heat maps associated with each image acquisition device comprises:

the dynamic image files acquired by each image acquisition device are subjected to data processing by using the respective neural network of each dynamic image file to acquire a thermal atlas associated with each image acquisition device.

12. The method of claim 1, the data fusing the plurality of heat map sets to obtain a data fused heat map set for each image acquisition device comprising:

selecting in turn each of the plurality of heat maps as a current heat map set to:

determining each heat map set of the plurality of heat map sets other than the current heat map set as a fused heat map set;

the current heat map set is data fused based on the plurality of fused heat map sets to obtain a data fused current heat map set for each image acquisition device.

13. The method of claim 1, the obtaining three-dimensional information including a subject object based on the data-fused heat map set of each image acquisition device comprising:

identifying the characteristic points of the main object based on the heat map set subjected to data fusion of each image acquisition device to obtain a plurality of two-dimensional characteristic points at the same moment;

and calibrating internal and external parameters for the two-dimensional feature points according to the coordinates of the world coordinate system of each image acquisition device and the image coordinates, and acquiring three-dimensional information including the main object based on the internal and external parameters.

14. The method of claim 1, wherein the performing image recognition on each dynamic image file to determine object information of a subject object to which the three-dimensional information relates comprises:

image recognition is performed on each of the moving image files by using an image recognition device to determine object information of an object to which the three-dimensional information relates.

15. The method of claim 1 or 14, the object information comprising: the number of object objects and the type of object objects.

16. The method of claim 1, wherein determining a feature point repair type based on object information of the object, and performing feature point repair on the three-dimensional information according to the feature point repair type to determine a plurality of feature points associated with the subject object comprises:

analyzing the object information of the object objects to determine that the number of the object objects is zero, and determining that the characteristic point repairing type is not repaired;

and when the characteristic point repairing type is not repairing, not repairing the characteristic points of the three-dimensional information according to the characteristic point repairing type, and directly determining a plurality of characteristic points associated with the main object according to the three-dimensional information of the subject object.

17. The method of claim 1, wherein determining a feature point repair type based on object information of the object, and performing feature point repair on the three-dimensional information according to the feature point repair type to determine a plurality of feature points associated with the subject object comprises:

analyzing the object information of the object objects to determine that the number of the object objects is not zero and the types of the object objects are auxiliary objects, and determining that the characteristic point repairing type is auxiliary object repairing;

integrally marking the subject object in the three-dimensional information by using a subject-object identification network, and extracting feature points, spatial features and temporal features of the subject object based on the integral mark;

carrying out feature point identification and feature point tracking on the object, wherein the feature point tracking comprises marking physical shape and position information on each frame, and extracting spatial features and time features of the object;

and repairing the characteristic points of the three-dimensional information according to the auxiliary object repairing so as to determine a plurality of characteristic points associated with the main object.

18. The method of claim 17, wherein when the assist object is a follow-up assist object, the spatial features of the subject object and the object in each frame of the image and the temporal features in successive frames vary;

when the auxiliary object is a fixed auxiliary object, the spatial characteristics of the subject object in each frame of image and the temporal characteristics of the continuous frames are changed, while the object has stable spatial characteristics in each frame of image and the temporal characteristics of the continuous frames are kept consistent;

there is a fluctuation in the spatial characteristics of the subject object and the interactive portion of the object.

19. The method of claim 18, the repairing feature points of the three-dimensional information according to an auxiliary object repair to determine a plurality of feature points associated with the subject object comprises:

processing the three-dimensional information subjected to data fusion by using a host-object recognition network, wherein the host-object recognition network is a recognition network fusing a posture recognition network from top to bottom and an object recognition network based on deep convolution;

marking the subject object and the object to obtain a subject object range B_sAnd an object range B_o；

In the subject object range B_sInitially recognizing a plurality of parts of the subject object through a bottom-up gesture recognition network to obtain a plurality of initial feature points S_parts；

In the object range B_oMarking the object as O;

extracting spatial feature F of each frame_spaceAnd temporal characteristics F of successive frames_time；

To space characteristic F_spaceAnd time characteristic F_timePerforming fusion to determine object type of object, and performing sub-scene labeling s of follow-up auxiliary object and fixed auxiliary object respectively_iWherein i is 1, 2; wherein s is₁Is a follow-up type auxiliary object and s₂Is a fixed auxiliary object; wherein the spatial feature F_spaceThe method comprises the following steps: shape, volume, angle, texture, color, gradient, and location; time characteristic F_timeIncluding displacement, velocity, context information, and rotation;

for s₁Sub-scene, s in subject object range over time₁Sub-sceneAnd s of the object range₁Sub-sceneThere is a dynamic intersection betweenWherein T is₁Is a first time and T₂A second time;

extracting time characteristic and space characteristic of the subject object and the object, and respectively recording the time characteristic and the space characteristic as the space characteristic of the subject objectSubject object temporal featuresSpatial characterization of objectAnd object temporal characteristics

Under the condition that the object type of the auxiliary object is a follow-up auxiliary object, constructing a local interactive feature extraction operator A_swithoTo aim atIntersection time period [ T₁，T₂]Re-identifying the initial feature points of the plurality of occlusion parts to determine a plurality of feature points S 'associated with the subject object'_parts；

For s₂Sub-scene, s in subject object range over time₂Sub-sceneAnd s of the object range₂Sub-sceneThere is a dynamic intersection between

Extracting time characteristic and space characteristic of the subject object and the object, and respectively recording the time characteristic and the space characteristic as the space characteristic of the subject objectSubject object temporal featuresAnd spatial characteristics of the objectWherein the object class at the auxiliary object is s of a stationary auxiliary object₂Under the scene, the objectStationary, and thus there is no temporal characterization of the objectCharacteristic;

at dynamic intersectionTime period [ T ] of₁,T₂]Therein is composed ofEnter intoFor each occlusion time t_jExtracting operator A by using local interactive characteristics of the subject object and the object_sbyoCombined with prior knowledge of kinematics K_priorFor each frame f under occlusion (t ═ t)_j) The patch is performed to determine a plurality of feature points S ″, associated with the subject object_parts。

20. The method of claim 1, wherein determining a feature point repair type based on object information of the object, and performing feature point repair on the three-dimensional information according to the feature point repair type to determine a plurality of feature points associated with the subject object comprises:

21. The method of claim 20, wherein the spatial characteristics of the subject object and the object in each frame of the image and the temporal characteristics of the object in successive frames vary.

22. The method of claim 1, the repairing feature points of the three-dimensional information according to an auxiliary object repair to determine a plurality of feature points associated with a subject object comprises:

acquiring an object range B of each of a subject object and an object_iWherein N is more than or equal to i and more than or equal to 1, N is the total number of the subject object and the object, and i is a natural number;

for each B_iIdentifying respective portions of the subject object and the object through a bottom-up gesture recognition network to obtain a plurality of initial feature pointsWherein N is the total number of the subject objects and the object objects, and M is the total number of all the initial feature points of the subject objects or the object objects, whereinThe method comprises the steps of obtaining a plurality of initial characteristic points of a subject object and a plurality of initial characteristic points of an object;

by B_iAndconstruct interaction recognition operator A_interactiveBased on interactionRecognition operator A_interactiveIdentifying and repairing feature points of the subject object and the object in the portion of the interaction region;

constructing a subject-object recognition operator A_sandoBased on host-object recognition operator A_sandoRespectively calculateSpace-time characteristics ofSpatio-temporal featuresObject range B corresponding to each of subject object and object_i；

By passingCalculating trajectories over time of support points for each of a subject object and an objectWherein T is the upper limit of the time range, S is space, subscript S-T is space-time characteristics, S-spatial is space, T-temporal is time and the trajectories of all support points in the subject object and the object over timeBy comparisonAnd Co_tDetermining object objectsAnd a subject object S;

realigning B according to subject object and object_iPerforming preliminary classification to obtain subject object range B_sHe and guestBody object range B_o(ii) a For the time dimension, subject object range B_sAnd an object range B_oThere is a dynamic intersectionT₂≥t≥T₁(ii) a Wherein T is₁Is a first time and T₂Is the second time

Extracting temporal and spatial features of both the subject object and the object at [ T ]₁,T₂]In the interaction time period of (2), respectively recorded as the space characteristics of the subject objectSubject object temporal featuresSpatial characterization of objectAnd object temporal characteristics

Host-object recognition operator A_sandoBased on the characteristic point position information and the point-to-point direction information, the mechanical information is further addedRespectively calculating dynamic intersection by using lever principleInner partAndmoment of anda force arm; combined with a priori knowledge of kinematics R_priorRecalibrating the initial feature points of the subject object and the object to obtain a plurality of feature points B 'associated with the subject object'_sAnd a plurality of feature points B 'associated with the object'_o。

23. The method of claim 23, wherein operator a is identified based on interaction when the number of object objects is 1_interactiveIdentifying and repairing feature points of the subject object and the object in the portion of the interaction region includes:

the feature points of the subject object and the object in the part of the interaction region are identified to obtain a data subject object range B by using a top-down gesture recognition network in combination with a bottom-up gesture recognition network₁And an object range B₂And a plurality of initial feature points of the subject objectAnd a plurality of initial feature points of the object

Identifying operator A by interaction_interactiveTo B, pair₁And B₂Carrying out secondary matching in a complementary range on the eliminated feature points;

updated to obtain the latest feature pointsAre respectively obtainedAnd

24. a system for determining feature points based on a plurality of image acquisition devices, the system comprising:

a plurality of image acquisition devices, each of which respectively acquires a dynamic image of the subject object based on a respective reference position and generates a respective dynamic image file for each image acquisition device from the acquired dynamic image;

the data processing equipment is used for determining a respective neural network for each dynamic image file according to a preset configuration file, and performing data processing by using the respective neural network for each dynamic image file so as to acquire a heat map set associated with each image acquisition equipment;

a data fusion device for data-fusing the plurality of heat map sets to obtain a data-fused heat map set for each image acquisition device, and obtaining three-dimensional information including the subject object based on the data-fused heat map set for each image acquisition device; and

the image recognition device is used for performing image recognition on each dynamic image file to determine object information of an object related to the three-dimensional information, determining a characteristic point repairing type based on the object information of the object, and performing characteristic point repairing on the three-dimensional information according to the characteristic point repairing type to determine a plurality of characteristic points associated with the subject object.

25. A computer-readable storage medium, characterized in that the storage medium stores a computer program for performing the method of any of the preceding claims 1-23.

26. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any of claims 1-23.

Technical Field

The present invention relates to the field of image recognition technology, and more particularly, to a method and system for determining feature points based on a plurality of image acquisition devices, a computer-readable storage medium, and an electronic device.

Background

Currently, posture assessment techniques in the field of computer vision have been widely used in fields such as sports events, clinical surgery simulation teaching, brain function development, and rehabilitation training. However, due to the symmetry of the human body structure and the complexity of an open scene, the conventional posture assessment method has the problem that the identification of key points or feature points fails under the conditions of local visual field, human body special body positions, human-human interaction, human-object interaction and the like.

In the case of rehabilitation, the rehabilitation exercises may be performed by performing active or passive exercises of local joints, such as flexion and extension of elbow joints, pronation and supination of forearm, etc. Video acquisition often employs close-range local views. At this time, key points or characteristic points cannot be identified due to incomplete human body information acquired in the visual field. Rehabilitation exercises may be performed in different positions, such as a supine position, a lateral position, or a sitting position. For example, when standing or sitting at a side position, the self-shielding problem can be generated due to the symmetrical structure of the human body, so that one side close to the lens can shield the far side, and the identification of key points or characteristic points at the far side fails.

In addition, the rehabilitation training often needs the therapist to cooperate with passive training, and at this moment, because of therapist and patient's in close contact with can produce each other and shelter from to lead to key point or characteristic point discernment failure problem. For example, when a therapist assists a rehabilitation patient to perform forearm pronation and supination passive training, the patient adopts a face lying position, and the therapist adopts a patient tray sitting position. At this time, the close contact between the two bodies can cause the failure of the identification of key points or characteristic points of the wrist of the upper arm and key points or characteristic points of partial lower limbs.

Rehabilitation training often needs to be accomplished in conjunction with associated auxiliary devices, such as pull cords, bouncers, and the like. The identification of key points or characteristic points can also fail due to the occlusion of certain parts of the body by the auxiliary device during the use process.

At present, an effective solution to the problem of failure in identifying key points or feature points is lacking in the prior art. According to the invention, by designing an accurate classification frame, rehabilitation medicine, a motion control theory, computer vision and artificial intelligence are fused, and the problem of key point estimation failure caused by human body self-shielding and human-to-human and human-to-object mutual shielding in posture estimation is solved.

Disclosure of Invention

In order to solve the problems in the prior art, the present invention provides a method for determining feature points based on a plurality of image acquisition devices, the method comprising:

Before each image acquisition device respectively acquires a dynamic image of the subject object based on the respective reference position, the method further comprises:

The location attributes include: the location coordinates of the subject object and/or the location area of the subject object.

The directional attributes include: a single orientation information of the subject object or a plurality of orientation information of the subject object.

The acquiring the position attribute and the direction attribute of the subject object includes:

receiving input data and analyzing the input data to determine a position attribute and a direction attribute of the subject object; or

The determining a reference position for each of a plurality of image acquisition devices based on the position attribute and the orientation attribute comprises:

determining a plurality of candidate positions for acquiring a dynamic image of the target object based on the position attribute and the direction attribute;

determining a reference position for each image acquisition device from a plurality of candidate positions;

wherein the reference position of each image capturing device is different.

Each of the image acquisition devices respectively acquiring a moving image of the subject object based on the respective reference positions includes:

each image acquisition device respectively acquires a dynamic image of the subject object at a respective predetermined photographing angle at a respective reference position; or

Generating a respective dynamic image file for each image acquisition device from the acquired dynamic image includes:

obtaining a dynamic image data stream according to the dynamic image obtained by each image obtaining device;

a respective motion image file for each image capture device is generated using the motion image data stream.

The preset configuration file comprises the name of the neural network and parameter information of the neural network.

The determining a respective neural network for each dynamic image file according to a preset configuration file comprises:

determining a neural network to be used according to the name of the neural network in a preset configuration file;

carrying out parameter configuration on the neural network to be used according to the parameter information of the neural network;

determining the neural network subjected to parameter configuration as the respective neural network of each dynamic image file;

wherein the neural network of each dynamic image file is the same neural network.

Processing data using the respective neural network of each dynamic image file to obtain a set of heat maps associated with each image capture device comprises:

The data fusing the plurality of heat map sets to obtain a data fused heat map set for each image acquisition device comprises:

selecting in turn each of the plurality of heat maps as a current heat map set to:

determining each heat map set of the plurality of heat map sets other than the current heat map set as a fused heat map set;

the current heat map set is data fused based on the plurality of fused heat map sets to obtain a data fused current heat map set for each image acquisition device.

The obtaining three-dimensional information including the subject object based on the data-fused heat map set of each image acquisition device comprises:

The image recognition for each dynamic image file to determine the object information of the object related to the three-dimensional information includes:

image recognition is performed on each of the moving image files by using an image recognition device to determine object information of an object to which the three-dimensional information relates.

The object information includes: the number of object objects and the type of object objects.

analyzing the object information of the object objects to determine that the number of the object objects is zero, and determining that the characteristic point repairing type is not repaired;

there is a fluctuation in the spatial characteristics of the subject object and the interactive portion of the object.

marking the subject object and the object to obtain a subject object range B_sAnd an object range B_o；

In the object range B_oMarking the object as O;

extracting spatial feature F of each frame_spaceAnd temporal characteristics F of successive frames_time；

To space characteristic F_spaceAnd time characteristic F_timePerforming fusion to determine object type of object, and performing sub-scene labeling s of follow-up auxiliary object and fixed auxiliary object respectively_iWherein i is 1, 2; wherein s is₁Is a follow-up type auxiliary object and s₂Is a fixed auxiliary object; wherein the spatial feature F_spaceThe method comprises the following steps: shape, volume, angle, texture, colour, gradientAnd a location; time characteristic F_timeIncluding displacement, velocity, context information, and rotation;

for s₁Sub-scene, s in subject object range over time₁Sub-sceneAnd s of the object range₁Sub-sceneThere is a dynamic intersection betweenWherein T is₁Is a first time and T₂A second time;

For s₂Sub-sceneS in the subject object range over time₂Sub-sceneAnd s of the object range₂Sub-sceneThere is a dynamic intersection between

Extracting time characteristic and space characteristic of the subject object and the object, and respectively recording the time characteristic and the space characteristic as the space characteristic of the subject objectSubject object temporal featuresAnd spatial characteristics of the objectWherein the object class at the auxiliary object is s of a stationary auxiliary object₂Under the scene, the object is still, so that the time characteristic of the object does not existCharacteristic;

at dynamic intersectionTime period [ T ] of₁，T₂]Therein is composed ofEnter intoFor each occlusion time t_jUsing local interaction features of subject and object objectsExtracting operator A_sbyoCombined with prior knowledge of kinematics K_priorFor each frame f under occlusion (t ═ t)_j) The patch is performed to determine a plurality of feature points S ″, associated with the subject object_parts。

The spatial characteristics of the subject object and the object in each frame of the image and the temporal characteristics in successive frames vary.

Performing feature point restoration on the three-dimensional information according to the auxiliary object restoration to determine a plurality of feature points associated with the subject object includes:

acquiring an object range B of each of a subject object and an object_iJ N is more than or equal to i and more than or equal to 1, N is the total number of the subject object and the object, and i is a natural number;

to is directed atEach B_iIdentifying respective portions of the subject object and the object through a bottom-up gesture recognition network to obtain a plurality of initial feature pointsWherein N is the total number of the subject objects and the object objects, and M is the total number of all the initial feature points of the subject objects or the object objects, whereinThe method comprises the steps of obtaining a plurality of initial characteristic points of a subject object and a plurality of initial characteristic points of an object;

by B_iAndconstruct interaction recognition operator A_interactiveBased on the interaction recognition operator A_interactiveIdentifying and repairing feature points of the subject object and the object in the portion of the interaction region;

constructing a subject-object recognition operator A_sando(where sando denotes a subject and an object) recognition operator A based on the subject and the object_sandoRespectively calculateSpace-time characteristics ofSpatio-temporal featuresObject range B corresponding to each of subject object and object_i；

By passingCalculating trajectories over time of support points for each of a subject object and an objectWherein T is the upper limit of the time range, S is space, subscript S-T is space-time characteristics, S-spatia is space, T-temporal is time and the trajectories of all support points in the subject object and the object over timeBy comparisonAnd Co_tDetermining object objectsAnd a subject object S;

realigning B according to subject object and object_iPerforming preliminary classification to obtain subject object range B_sAnd an object range B_o(ii) a For the time dimension, subject object range B_sAnd an object range B_oThere is a dynamic intersectionWherein T is₁Is a first time and T₂Is the second time

Extracting temporal and spatial features of both the subject object and the object at [ T ]₁，T₂]In the interaction time period of (2), respectively recorded as the space characteristics of the subject objectSubject object temporal featuresSpatial characterization of objectAnd object temporal characteristics

Host-object recognition operator A_sandoIn the utilization ofBased on the position information of the feature points and the information of the opposite directions, the mechanical information is further addedM is more than or equal to j and more than or equal to 1; respectively calculating dynamic intersection by using lever principleInner partAndthe moment and arm of force; combined with a priori knowledge of kinematics R_priorRecalibrating the initial feature points of the subject object and the object to obtain a plurality of feature points B 'associated with the subject object'_sAnd a plurality of feature points B 'associated with the object'_o。

When the number of object objects is 1, the operator A is identified based on the interaction_interactiveIdentifying and repairing feature points of the subject object and the object in the portion of the interaction region includes:

Identifying operator A by interaction_interactiveTo B, pair₁And B₂Carrying out secondary matching in a complementary range on the eliminated feature points;

updated to obtain the latest feature pointsAre respectively obtainedAnd

according to another aspect of the present invention, there is provided a system for determining feature points based on a plurality of image acquisition apparatuses, the system comprising:

According to another aspect of the invention, a computer-readable storage medium is characterized in that the storage medium stores a computer program for executing any of the methods described above.

According to another aspect of the present invention, an electronic apparatus, characterized in that the electronic apparatus comprises:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement any of the methods described above.

The invention integrates the theoretical knowledge of rehabilitation medicine, motion control and the like into the posture estimation of computer vision, thereby well solving the problem of the failure of the estimation of various key points in the rehabilitation training process. By fusing the prior knowledge of the rehabilitation scheme, the local characteristics of the motion are enhanced on the traditional part recognition network, the incomplete human motion recognition is realized, and the problem of failure in key point estimation caused by local vision is solved. Through multi-camera data acquisition, intersecting visual field data are fused, physical space data enhancement is realized, and the identification precision of the network on self-shielding caused by human body symmetry is improved. The problem of human-human and human-object mutual shielding is solved by actively marking a subject and an object (namely, the subject is a rehabilitation patient, the object is an assistant, a rehabilitation doctor or a rehabilitation therapist), fusing rehabilitation medicine and motion control theory knowledge, extracting multi-dimensional features to construct an interactive recognition operator, reprocessing key point recognition network results, re-matching the associated wrong parts in the human-human interaction process and completing the lost parts in the human-object interaction process.

Drawings

A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:

fig. 1 is a flowchart of a method of determining feature points based on a plurality of image acquisition devices according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a functional module according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of determining the precise location of a point of an object in space according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of generating three-dimensional information according to an embodiment of the invention;

FIG. 5 is a schematic diagram of feature point extraction according to an embodiment of the invention;

FIG. 6 is a diagram illustrating feature point extraction according to another embodiment of the present invention;

FIG. 7 is a diagram illustrating feature point extraction according to yet another embodiment of the present invention;

FIG. 8 is a diagram illustrating feature point extraction according to still another embodiment of the present invention;

fig. 9 is a schematic configuration diagram of a system for determining feature points based on a plurality of image acquisition apparatuses according to an embodiment of the present invention.

Detailed Description

Fig. 1 is a flow chart of a method 100 of determining feature points based on multiple image acquisition devices according to an embodiment of the present invention. The method 100 begins at step 101.

In step 101, each image capturing device respectively captures a moving image of the subject based on the respective reference position, and generates a respective moving image file for each image capturing device from the captured moving image. Before each image acquisition device respectively acquires a dynamic image of the subject object based on the respective reference position, the method further comprises: a position attribute and a direction attribute of the subject object are acquired, and a reference position is determined for each of the plurality of image acquisition apparatuses based on the position attribute and the direction attribute.

The location attributes include: the location coordinates of the subject object and/or the location area of the subject object. The directional attributes include: a single orientation information of the subject object or a plurality of orientation information of the subject object.

The acquiring the position attribute and the direction attribute of the subject object includes: receiving input data and analyzing the input data to determine a position attribute and a direction attribute of the subject object; or acquiring the positioning information of the main body object by using the positioning equipment, and determining the position attribute and the direction attribute of the main body object according to the positioning information.

The determining a reference position for each of a plurality of image acquisition devices based on the position attribute and the orientation attribute comprises: determining a plurality of candidate positions for acquiring a dynamic image of the target object based on the position attribute and the direction attribute; determining a reference position for each image acquisition device from a plurality of candidate positions; wherein the reference position of each image capturing device is different.

Each of the image acquisition devices respectively acquiring a moving image of the subject object based on the respective reference positions includes: each image acquisition device respectively acquires a dynamic image of the subject object at a respective predetermined photographing angle at a respective reference position; or each image capturing apparatus forms a movement path based on the respective reference position, selects a photographing position by the respective movement path, and separately captures a moving image of the subject at the photographing position at the selected photographing angle.

Generating a respective dynamic image file for each image acquisition device from the acquired dynamic image includes: obtaining a dynamic image data stream according to the dynamic image obtained by each image obtaining device; a respective motion image file for each image capture device is generated using the motion image data stream.

At step 102, a respective neural network is determined for each dynamic image file according to a preset configuration file, and data processing is performed by using the respective neural network of each dynamic image file to acquire a thermal atlas associated with each image acquisition device.

The preset configuration file comprises the name of the neural network and parameter information of the neural network. The determining a respective neural network for each dynamic image file according to a preset configuration file comprises: determining a neural network to be used according to the name of the neural network in a preset configuration file; carrying out parameter configuration on the neural network to be used according to the parameter information of the neural network; determining the neural network subjected to parameter configuration as the respective neural network of each dynamic image file; wherein the neural network of each dynamic image file is the same neural network.

Processing data using the respective neural network of each dynamic image file to obtain a set of heat maps associated with each image capture device comprises: the dynamic image files acquired by each image acquisition device are subjected to data processing by using the respective neural network of each dynamic image file to acquire a thermal atlas associated with each image acquisition device.

At step 103, data fusion is performed on the plurality of heat map sets to obtain a data-fused heat map set for each image acquisition device, and three-dimensional information including the subject object is obtained based on the data-fused heat map set for each image acquisition device.

The data fusing the plurality of heat map sets to obtain a data fused heat map set for each image acquisition device comprises: selecting in turn each of the plurality of heat maps as a current heat map set to: determining each heat map set of the plurality of heat map sets other than the current heat map set as a fused heat map set; the current heat map set is data fused based on the plurality of fused heat map sets to obtain a data fused current heat map set for each image acquisition device.

The obtaining three-dimensional information including the subject object based on the data-fused heat map set of each image acquisition device comprises: identifying the characteristic points of the main object based on the heat map set subjected to data fusion of each image acquisition device to obtain a plurality of two-dimensional characteristic points at the same moment; and calibrating internal and external parameters for the two-dimensional feature points according to the coordinates of the world coordinate system of each image acquisition device and the image coordinates, and acquiring three-dimensional information including the main object based on the internal and external parameters.

In step 104, image recognition is performed on each moving image file to determine object information of an object to which the three-dimensional information relates, a feature point repair type is determined based on the object information of the object, and feature point repair is performed on the three-dimensional information according to the feature point repair type to determine a plurality of feature points associated with the subject object.

The image recognition for each dynamic image file to determine the object information of the object related to the three-dimensional information includes: image recognition is performed on each of the moving image files by using an image recognition device to determine object information of an object to which the three-dimensional information relates. The object information includes: the number of object objects and the type of object objects.

Determining a characteristic point repairing type based on object information of the object, and repairing the characteristic point of the three-dimensional information according to the characteristic point repairing type to determine a plurality of characteristic points associated with the subject object comprises: analyzing the object information of the object objects to determine that the number of the object objects is not zero and the type of the object objects is an auxiliary object (the auxiliary object is a handheld device or the like or a fixed device such as a large instrument), determining the characteristic point repairing type as auxiliary object repairing; integrally marking the subject object in the three-dimensional information by using a subject-object identification network, and extracting feature points, spatial features and temporal features of the subject object based on the integral mark; carrying out feature point identification and feature point tracking on the object, wherein the feature point tracking comprises marking physical shape and position information on each frame, and extracting spatial features and time features of the object; and repairing the characteristic points of the three-dimensional information according to the auxiliary object repairing so as to determine a plurality of characteristic points associated with the main object.

When the auxiliary object is a follow-up type auxiliary object, the spatial characteristics of the subject object and the object in each frame of image and the temporal characteristics in the continuous frames are changed; when the auxiliary object is a fixed auxiliary object, the spatial characteristics of the subject object in each frame of image and the temporal characteristics of the continuous frames are changed, while the object has stable spatial characteristics in each frame of image and the temporal characteristics of the continuous frames are kept consistent; there is a fluctuation in the spatial characteristics of the subject object and the interactive portion of the object.

The repairing of the feature points of the three-dimensional information according to the auxiliary object repairing to determine a plurality of feature points associated with the subject object includes: processing the three-dimensional information subjected to data fusion by using a host-object recognition network, wherein the host-object recognition network is a recognition network fusing a posture recognition network from top to bottom and an object recognition network based on deep convolution;

subject objects (subject objects such as a target object or a person to be recognized) and object objects are marked to obtain a subject object range B_sAnd an object range B_o；

In the object range B_oMarking the object as O;

extracting spatial feature F of each frame_spaceAnd temporal characteristics F of successive frames_time；

To space characteristic F_spaceAnd time characteristic F_timePerforming fusion to determine object class of object, performing sub-scene labeling s of follow-up type auxiliary object (such as handheld device or mobile device) and fixed type auxiliary object (such as fixed device or wall body), respectively_iWherein i is 1, 2; wherein s is₁Is a follow-up type auxiliary object and s₂Is a fixed auxiliary object; wherein the spatial feature F_spaceThe method comprises the following steps: shape, volume, angle, texture, color, gradient, and location; time characteristic F_timeIncluding displacement, velocity, context information, and rotation;

for s₁Sub-scene, s in subject object range over time₁Sub-sceneAnd s of the object range₁Sub-sceneThere is a dynamic intersection betweenWherein T is₁Is a first time and T₂A second time;

For s₂Sub-scene, s in subject object range over time₂Sub-sceneAnd s of the object range₂Sub-sceneThere is a dynamic intersection between

Extracting time characteristic and space characteristic of the subject object and the object, and respectively recording the time characteristic and the space characteristic as the space characteristic of the subject objectSubject object temporal featuresAnd spatial characteristics of the objectWherein the object class at the auxiliary object is s of a stationary auxiliary object₂Under the scene, the object is still, so that the time characteristic of the object does not existCharacteristic;

at dynamic intersectionTime period [ T ] of₁，T₂]Therein is composed ofEnter intoFor each occlusion time t_jExtracting operator A by using local interactive characteristics of the subject object and the object_sbyoCombined with prior knowledge of kinematics K_priorFor each frame f under occlusion (t ═ t)_j) The patch is applied such that a plurality of feature points s "associated with the subject object are determined_parts。

analyzing the object information of the object objects to determine that the number of the object objects is not zero and the type of the object objects is an auxiliary object (for example, an auxiliary person in rehabilitation), determining the characteristic point repairing type as auxiliary object repairing;

The spatial characteristics of the subject object and the object in each frame of the image and the temporal characteristics in successive frames vary. Performing feature point restoration on the three-dimensional information according to the auxiliary object restoration to determine a plurality of feature points associated with the subject object includes:

By passingCalculating trajectories over time of support points for each of a subject object and an objectWherein T is the upper limit of the time range, S is the space, subscript S-T is the space-time characteristic, S-spatial is the space, and T-temporal is the time; and the trajectories of all the support points in the subject object and the object over timeBy comparisonAnd Co_tDetermining object objectsAnd a subject object S;

realigning B according to subject object and object_iPerforming preliminary classification to obtain subject object range B_sAnd an object range B_o(ii) a For the time dimension, subject object range B_sAnd an object range B_oThere is a dynamic intersectionT₂≥t≥T₁(ii) a Wherein T is₁Is a first time and T₂Is the second time

Host-object recognition operator A_sandoBased on the characteristic point position information and the point-to-point direction information, the mechanical information is further addedRespectively calculating dynamic intersection by using lever principleInner partAndthe moment and arm of force; combined with a priori knowledge of kinematics R_priorRecalibrating the initial feature points of the subject object and the object to obtain a plurality of feature points B 'associated with the subject object'_sAnd a plurality of feature points B 'associated with the object'_o。

Identifying operator A by interaction_interactiveTo B, pair₁And B₂Carrying out secondary matching in a complementary range on the eliminated feature points;

updated to obtain the latest feature pointsAre respectively obtainedAnd

fig. 2 is a schematic structural diagram of a functional module according to an embodiment of the present invention. The invention relates to a data fusion module, a scene classification module, an occlusion repair module, a quantitative analysis module and a scale mapping module from the functional structure aspect. Such as where the data fusion module is to:

in the attitude estimation process, the single-machine position mode has poor identification effect on the conditions such as shielding and the like, and cannot accurately position the attitude position information of the human body space. Multiple-station information is a feasible and efficient solution to this problem, as shown in fig. 3, where different stations C_uAnd c_vSimultaneously shooting any point of the space object to respectively obtain plane imagesAndthe exact position of the spatial object point P can be deduced back from the projection geometry calculation.

The mainstream two-dimensional attitude recognition utilizes the heatmap to be used for each key point of the regression attitude, the invention utilizes two paths of neural networks to respectively obtain the heatmaps of different machine positions, and then two paths of heatmap images are fused to obtain new two paths of heatmaps again so as to obtain the space three-dimensional attitude information, wherein the specific flow chart is shown in figure 4.

The scene classification module and the shielding restoration module are used for: in the fields of sports events, clinical surgery simulation teaching, brain function development, rehabilitation training and the like, the method belongs to a semi-open scene which follows industry related standard specifications and has specific constraint conditions, namely certain rule requirements on the position of a human body, the interaction between people and objects and the interaction between people and people are met.

Therefore, the first link of the overall processing flow is scene classification, and the scenes are divided into three scenes, namely single, person-to-object and person-to-person, by means of a classification network for automatic person-to-object identification and the like.

1) Single: the single scene means that only one main body in the identification area is used as a tracking target, and the rest is used as interference factors to be eliminated.

2) Human and animal: the subject and the object in the scene are automatically identified through scene classification, and by taking rehabilitation training as an example, the subject is a rehabilitation patient, and the object is a rehabilitation aid. The subject is marked integrally by the recognition network, and then the key point information is extracted. The object is identified and tracked, i.e. each frame is marked with physical shape, position information, etc. And the human-object interaction scene is divided into a human-object and a human-dependent object two-seed scene by extracting the spatial characteristics and the time characteristics of the object.

2.1 human beings

In this scenario, the human and object of the subject, the spatial features in each frame of the image, and the temporal features in successive frames, are all changing. I.e. both the subject's person and the object are moving.

2.2 human rest

Under the scene, only the spatial characteristics of the person of the subject in each frame of image and the time characteristics of the continuous frames are changed, the object of the object has stable spatial characteristics in each frame of image, the time characteristics in the continuous frames are kept consistent, and the spatial characteristics only in the interaction part of the subject and the object can fluctuate. I.e. the object is stationary or relatively stable, and the subject person is moving.

The specific identification scheme comprises the following steps: the fused data result directly enters a subject-object recognition network (namely a top-down human body posture recognition network and an object recognition network based on deep convolution are fused), firstly, the human and object of the subject are marked on the whole, and a range frame regression bounding box of the subject and the object is obtained, wherein B is the range frame regression bounding box of the subject and the object_s，B_o. In B_sWithin the range, the initial accurate recognition is carried out on each part of the human body through a bottom-up network, and S is marked_parts. In B_oIn this range, the object to be detected is represented by O. At the same time, extracting the spatial features of each frameF_space(such as shape, volume, angle, position, texture, color, gradient, and location) and temporal features F of successive frames_time(displacement, velocity, context information, rotation, etc.). Fusing spatial features F_spaceAnd time characteristic F_timeDetermining the state of the object, and respectively performing two sub-scene marks of ' human object ' and ' human dependent object_i，i＝1，2。

For s₁Under the scene, as time advances, inAndthere is a dynamic intersection between the rangesAt this time, the time and space characteristics of the subject and the object are extracted simultaneously and recorded asOperator A for extracting local interaction characteristics of constructed human object_swithoI.e. a subject with subject pairIntersection time period [ T₁，T₂]Accurately identifying each shielding part again to obtain complete key point information marked as S'_parts。

For s₂Under the scene, as time advances, inAndthere is a dynamic intersection between the rangesAt the moment, the time and space characteristics of the subject and the object are simultaneously extracted and respectivelyIs marked as(s₂The object in the scene is not moved, so that the object does not existFeatures, or aspects thereof). In thatIntersection time period [ T₁，T₂]，Because of the entryRange, and occlusion occurs for each occlusion time t_iExtracting operator A by using human dependence object local interaction characteristics_sbyoSubjects, i.e., Subjects, combined with the kinematic prior knowledge K of rehabilitation training_priorFor each frame f under occlusion (t ═ t)_i) Repairing to obtain new key part identification result S'_parts。

3) Person-to-person:

the fused data result directly enters a top-down human posture recognition network to obtain a Boundingbox range of each person in the data, and the Boundingbox range is marked as B_iN is not less than i and not less than 1, and N is the total number of people. For each B_iEntering a bottom-up gesture recognition network to recognize corresponding human body parts which are respectively marked asWherein N is the total number of people, and M is the total number of all key points of the human BODY (the number of the key points is slightly different according to different posture models, such as BODY _25, BODY _19, BODY _23, BODY _25B, BODY _135, COCO _ BODY, MPI _ BODY and the like).

3.1) by B_iAndconstruct interaction recognition operator A_interactiveAnd each part of the human body in the interaction area is identified and repaired, the specific flow is as follows,

data B generated after initial top-down and bottom-up network combination₁And B₂And anAndas shown in fig. 5.

Secondly, performing secondary matching of complementary bounding boxes on eliminated key points in the two bounding boxes through a PAF operator, as shown in FIG. 6:

updating the latest joint pointAre respectively obtainedAnd the healing effect is shown in fig. 7.

3.2) identifying operator A by constructing host-object_sandoRespectively calculate eachSpace-time characteristics ofCorresponding to each human body region B_i. By passingCalculating the track of each human body supporting point along with timeAnd the trajectory of the global support point as a whole over time By comparisonAnd Co_tIdentifying objectsAnd a main body S.

To get the object and host_iPerforming primary classification, respectively marking as B_sAnd B_oFor the time dimension, dynamic intersection of the bounding boxes of the host and the object occursAt this time, the time and space characteristics of the subject and the object are extracted simultaneously and recorded as In [ T ]₁，T₂]During the interaction period, as shown in fig. 8.

Operator A_sandoBased on the key point position information and the point-to-point direction information, the mechanical information is further added M is more than or equal to j and more than or equal to 1. Because the human skeleton is a rigid structure, useLever principle, calculating contact area separatelyInner partAndmoment and moment arm. And fusing prior knowledge R of the rehabilitation training scheme_priorRecalibrating host object with B'_sAnd B'_o。

Thus, the shielding completion work and the subject and object identification work under each scene of single person, person and object and person are completed.

The quantitative analysis module is used for obtaining accurate Boundingbox data B of a host and an object (only the host under the condition of one person) based on the result of the scene classification and occlusion repair module_iN is more than or equal to i and more than or equal to 1, and corresponding complete key point data S'_partsAndon the basis, various quantitative characteristics are counted and calculated, wherein the quantitative characteristics mainly comprise static characteristics, dynamic characteristics, statistical characteristics, kinematic characteristics and the like.

The scale mapping module is used for carrying out normalization processing on the plurality of characteristic points of the target object to obtain a plurality of universal characteristic points, and determining the motion attribute of the target object based on the change angle and/or the movement data of at least one universal characteristic point. And mapping the operation attribute of the target object into a data file with a preset format according to a predefined data mapping rule.

Fig. 9 is a schematic structural diagram of a system 900 for determining feature points based on a plurality of image acquisition devices according to an embodiment of the present invention. The system 900 includes: an image acquisition device 901, a data processing device 902, a data fusion device 903, and an image recognition device 904.

A plurality of image capturing apparatuses 901 each of which respectively captures a moving image of a subject object based on a respective reference position, and generates a respective moving image file for each image capturing apparatus from the captured moving image. Before each image acquisition device respectively acquires a dynamic image of the subject object based on the respective reference position, the method further comprises: a position attribute and a direction attribute of the subject object are acquired, and a reference position is determined for each of the plurality of image acquisition apparatuses based on the position attribute and the direction attribute.

And a data processing device 902, configured to determine a respective neural network for each dynamic image file according to a preset configuration file, and perform data processing by using the respective neural network for each dynamic image file to obtain a thermal atlas associated with each image obtaining device.

A data fusion device 903 for data fusing the plurality of heat map sets to obtain a data fused heat map set for each image acquisition device, and obtaining three-dimensional information including the subject object based on the data fused heat map set for each image acquisition device.

An image recognition device 904 for performing image recognition on each of the moving image files to determine object information of the object to which the three-dimensional information relates, determining a feature point repair type based on the object information of the object, and performing feature point repair on the three-dimensional information according to the feature point repair type to determine a plurality of feature points associated with the subject object.

subject objects (subject objects such as a target object or a person to be recognized) and object objects are marked to obtain a subject object range B_sAnd an object range B_o；

In the object range B_oMarking the object as O;

extracting spatial feature F of each frame_spaceAnd temporal characteristics F of successive frames_time；

for s₁Sub-scene, s in subject object range over time₁Sub-sceneAnd an object range s; sub-sceneThere is a dynamic intersection betweenWherein T is₁Is a first time and T₂A second time;

For s₂Sub-scene, s in subject object range over time₂Sub-sceneAnd s of the object range₂Sub-sceneThere is a dynamic intersection between

Extracting time characteristic and space characteristic of the subject object and the object, and respectively recording the time characteristic and the space characteristic as the space characteristic of the subject objectSubject object temporal featuresAnd spatial characteristics of the objectWherein the object class at the auxiliary object is s of a stationary auxiliary object₂Under the scene, the object is still, so that the time characteristic of the object does not existCharacteristic;

at dynamic intersectionTime period [ T ] of₁，T₂]Therein is composed ofEnter intoFor each occlusion time t_jExtracting operator A by using local interactive characteristics of the subject object and the object_sbyoCombined with prior knowledge of kinematics K_priorFor each frame f under occlusion (t ═ t)_j) The patch is performed to determine a plurality of feature points S ″, associated with the subject object_parts。

Extracting temporal and spatial features of both the subject object and the object at [ T ]₁，T₂]Within the interaction time period of (1), respectivelySubject object spatial featuresSubject object temporal featuresSpatial characterization of objectAnd object temporal characteristics

Host-object recognition operator A_sandoBased on the characteristic point position information and the point-to-point direction information, the mechanical information is further addedRespectively calculating dynamic intersection by using lever principleInner partAndthe moment and arm of force; combined with a priori knowledge of kinematics R_priorRecalibrating the initial feature points of the subject object and the object to obtain a plurality of feature points B 'associated with the subject object'_sAnd a plurality of feature points B 'associated with the object'_o。

the part of the subject object and the object in the interaction area is subjected to the combination of the top-down gesture recognition network and the bottom-up gesture recognition networkTo obtain a data subject object range B₁And an object range B₂And a plurality of initial feature points of the subject objectAnd a plurality of initial feature points of the object

Identifying operator A by interaction_interactiveTo B, pair₁And B₂Carrying out secondary matching in a complementary range on the eliminated feature points;

updated to obtain the latest feature pointsAre respectively obtainedAnd

29页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：健身方案信息的生成方法、装置和系统

Method and system for determining feature points based on multiple image acquisition devices

相关技术

网友询问留言