Method and device for generating sight line estimation model and method and device for estimating sight line

文档序号:106333 发布日期:2021-10-15 浏览:16次 中文

阅读说明:本技术 视线估计模型的生成方法和装置、视线估计方法和装置 (Method and device for generating sight line estimation model and method and device for estimating sight line ) 是由 江筱 武锐 郭少博 于 2021-07-16 设计创作,主要内容包括:本公开实施例公开了一种视线估计模型的生成方法和装置、视线估计方法和装置、计算机可读存储介质及电子设备,其中,该方法包括:通过待训练的视线估计模型中的主干网络对样本眼部图像进行预测,得到预测眼部特征;通过待训练的视线估计模型中的重构网络对预测眼部特征进行视线方向预测,得到预测视线方向;基于预测视线方向和标注视线方向确定第一损失函数的参数;基于预测眼部特征和标注眼部特征确定第二损失函数的参数;基于第一损失函数和第二损失函数,训练视线估计模型。本公开实施例实现了将视线预测任务分解为两个预测任务,使训练得到的视线预测模型进行视线预测的过程更加细化,预测精度更高,对使用场景的适应性更佳。(The embodiment of the disclosure discloses a method and a device for generating a sight line estimation model, a sight line estimation method and a device, a computer readable storage medium and an electronic device, wherein the method comprises the following steps: predicting the sample eye image through a backbone network in a sight estimation model to be trained to obtain predicted eye characteristics; predicting the sight direction of the predicted eye features through a reconstruction network in a sight estimation model to be trained to obtain a predicted sight direction; determining parameters of a first loss function based on the predicted gaze direction and the annotated gaze direction; determining parameters of a second loss function based on the predicted eye features and the labeled eye features; based on the first loss function and the second loss function, a line of sight estimation model is trained. The embodiment of the disclosure decomposes the sight prediction task into two prediction tasks, so that the sight prediction process of the trained sight prediction model is more detailed, the prediction precision is higher, and the adaptability to the use scene is better.)

1. A method of generating a gaze estimation model, comprising:

predicting the sample eye image through a backbone network in a sight estimation model to be trained to obtain predicted eye features, wherein the sample eye image has corresponding eye labeling features and a labeled sight direction;

predicting the sight direction of the predicted eye features through a reconstruction network in the sight estimation model to be trained to obtain a predicted sight direction;

determining parameters of a first loss function based on the predicted gaze direction and the annotated gaze direction;

determining parameters of a second loss function based on the predicted eye feature and the annotated eye feature;

training the gaze estimation model based on the first loss function and the second loss function.

2. The method according to claim 1, wherein the predicting the sample eye image through a trunk network in the sight line estimation model to be trained to obtain a predicted eye feature comprises:

determining a sample type of the sample eye image;

if the sample type indicates that the sample eye image is a sample synthesized eye image, predicting the sample synthesized eye image through the trunk network to obtain a first predicted eye sub-feature corresponding to the sample synthesized eye image;

if the sample type indicates that the sample eye image is a sample real eye image, predicting the sample real eye image through the backbone network to obtain a second predicted eye sub-feature corresponding to the sample real eye image;

and obtaining a predicted eye feature based on the first predicted eye sub-feature and the second predicted eye sub-feature.

3. The method according to claim 2, wherein the predicting the gaze direction of the predicted eye feature through a reconstruction network in the gaze estimation model to be trained to obtain a predicted gaze direction comprises:

and respectively predicting the sight direction of the first predicted eye sub-feature and the second predicted eye sub-feature through a reconstruction network in the sight estimation model to be trained to obtain a first predicted sight direction corresponding to the sample synthesized eye image and a second predicted sight direction corresponding to the sample real eye image.

4. The method of claim 3, wherein said determining parameters of a first loss function based on the predicted gaze direction and the annotated gaze direction comprises:

determining parameters of a first loss function based on a first marked sight line direction and a first predicted sight line direction corresponding to the sample synthesized eye image, and a second marked sight line direction and a second predicted sight line direction corresponding to the sample real eye image;

the determining parameters of a second loss function based on the predicted eye features and the annotated eye features comprises:

and determining parameters of a second loss function based on the first annotated ocular feature and the first predicted ocular sub-feature corresponding to the sample synthesized ocular image.

5. The method of claim 1, wherein the method further comprises:

and evaluating the predicted sight direction through a sight uncertainty evaluation network in the sight estimation model to be trained to obtain evaluation information representing uncertainty of the predicted sight direction.

6. The method according to one of claims 1-5, wherein the eye labeling feature comprises labeling an eyeball radius and labeling eye keypoint information, and the second loss function comprises a first sub-loss function and a second sub-loss function;

predicting the sample eye image through a backbone network in the sight estimation model to be trained to obtain predicted eye characteristics, wherein the predicting comprises the following steps:

predicting the sample eye image through the backbone network to obtain predicted eyeball radius and predicted eye key point information;

the determining parameters of a second loss function based on the predicted eye features and the annotated eye features comprises:

determining parameters of the first sub-loss function based on the predicted eye radius and the annotated eye radius;

determining parameters of the second sub-loss function based on the predicted eye keypoint information and the annotated eye keypoint information.

7. A gaze estimation method, comprising:

acquiring an eye image to be estimated;

inputting the eye image to be estimated into a backbone network of a pre-trained sight estimation model to obtain predicted eye feature information, wherein the sight estimation model is obtained by pre-training based on the method of one of claims 1 to 6;

and inputting the predicted eye feature information into a reconstruction network included in the sight estimation model to obtain predicted sight direction information.

8. A generation apparatus of a sight line estimation model, comprising:

the first prediction module is used for predicting the sample eye image through a trunk network in a sight estimation model to be trained to obtain predicted eye features, and the sample eye image has corresponding marked eye features and marked sight directions;

the second prediction module is used for predicting the sight direction of the predicted eye features through a reconstruction network in the sight estimation model to be trained to obtain a predicted sight direction;

a first determination module to determine parameters of a first loss function based on the predicted gaze direction and the annotated gaze direction;

a second determination module for determining parameters of a second loss function based on the predicted ocular feature and the annotated ocular feature;

a training module to train the gaze estimation model based on the first loss function and the second loss function.

9. A gaze estimation device, comprising:

the acquisition module is used for acquiring an eye image to be estimated;

a third prediction module, configured to input the eye image to be estimated into a backbone network of a pre-trained eye estimation model to obtain predicted eye feature information, where the eye estimation model is obtained by training in advance based on the method according to any one of claims 1 to 6;

and the fourth prediction module is used for inputting the predicted eye feature information into a reconstruction network included in the sight estimation model to obtain predicted sight direction information.

10. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the preceding claims 1-7.

11. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1 to 7.

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a gaze estimation model, a gaze estimation method and apparatus, a computer-readable storage medium, and an electronic device.

Background

The gaze tracking technology is used for estimating the gaze direction of a subject, and is an important technology in human-computer interaction. In recent years, the research of line-of-sight tracking has mainly focused on both application-based and Model-based schemes. The Appearance-based scheme directly regresses the direction of the sight line by inputting the images of the human face and the eye region and combining a deep neural network, is a pure data-driven scheme, is easy to deploy and does not depend on specific equipment. However, this scheme is difficult to achieve a good effect when the amount of data is small. The Model-based scheme is based on a 3D eyeball Model, and the method for obtaining the sight line by using the modeling methods such as the Corneal Reflection method (PCCR) and the like to obtain the sight line related parameters (the curvature radius of the cornea, the Pupil position and the like) has high precision. The scheme has high precision and does not need too much training data. However, the scheme needs specific hardware equipment, is poor in robustness, and is difficult to deploy in a situation of a cabin with a complex environment.

Disclosure of Invention

The embodiment of the disclosure provides a method and a device for generating a sight line estimation model, a sight line estimation method and a device, a computer-readable storage medium and an electronic device.

The embodiment of the present disclosure provides a method for generating a sight line estimation model, including: predicting the sample eye image through a backbone network in a sight estimation model to be trained to obtain predicted eye characteristics, wherein the sample eye image has corresponding eye labeling characteristics and a corresponding sight labeling direction; predicting the sight direction of the predicted eye features through a reconstruction network in a sight estimation model to be trained to obtain a predicted sight direction; determining parameters of a first loss function based on the predicted gaze direction and the annotated gaze direction; determining parameters of a second loss function based on the predicted eye features and the labeled eye features; based on the first loss function and the second loss function, a line of sight estimation model is trained.

An embodiment of the present disclosure provides a gaze estimation method, including: acquiring an eye image to be estimated; inputting an eye image to be estimated into a backbone network of a pre-trained sight estimation model to obtain predicted eye characteristic information, wherein the sight estimation model is obtained by training in advance based on a generation method of the sight estimation model; and inputting the predicted eye feature information into a reconstruction network included in the sight estimation model to obtain predicted sight direction information.

According to another aspect of the embodiments of the present disclosure, there is provided an apparatus for generating a gaze estimation model, the apparatus including: the first prediction module is used for predicting the sample eye image through a trunk network in the sight estimation model to be trained to obtain predicted eye characteristics, and the sample eye image has corresponding marked eye characteristics and marked sight direction; the second prediction module is used for predicting the sight direction of the predicted eye features through a reconstruction network in the sight estimation model to be trained to obtain the predicted sight direction; a first determination module for determining parameters of a first loss function based on the predicted gaze direction and the annotated gaze direction; a second determination module for determining parameters of a second loss function based on the predicted eye features and the annotated eye features; and the training module is used for training the sight estimation model based on the first loss function and the second loss function.

According to another aspect of the embodiments of the present disclosure, there is provided a gaze estimation device, including: the acquisition module is used for acquiring an eye image to be estimated; the third prediction module is used for inputting the eye image to be estimated into a backbone network of a pre-trained sight estimation model to obtain predicted eye characteristic information, wherein the sight estimation model is obtained by training in advance based on the generation method of the sight estimation model; and the fourth prediction module is used for inputting the predicted eye feature information into a reconstruction network included in the sight estimation model to obtain predicted sight direction information.

According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the above-described sight line estimation model generation method or sight line estimation method.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; and the processor is used for reading the executable instructions from the memory and executing the instructions to realize the generation method of the sight line estimation model or the sight line estimation method.

Based on the method and the device for generating the sight estimation model, the method and the device for estimating the sight, the computer-readable storage medium and the electronic device, which are provided by the embodiments of the present disclosure, the sight estimation model is set as a backbone network and a reconstruction network, the backbone network is used for predicting the sample eye image to obtain a predicted eye feature, the reconstruction module is used for predicting the direction of sight of the predicted eye feature to obtain the predicted direction of sight of the sample eye image, then a parameter of a first loss function is determined based on the predicted direction of sight and a marked direction of sight, a parameter of a second loss function is determined based on the predicted eye feature and the marked eye feature, and finally the sight estimation model is trained based on the first preset loss function and the second preset loss function. The vision prediction task is divided into two prediction tasks, namely the task of predicting the eye features and the task of predicting the vision, so that the vision prediction process of the trained vision prediction model is more detailed, the model can learn the eye feature information related to the vision direction by introducing the supervision of the eye features, the eye features representing the eyeball structure are obtained, the vision direction is reconstructed by utilizing the eye features, the prediction precision of the vision direction is higher, the dependence on the head posture of a detected person is lower, and the adaptability to a use scene is better.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is a system diagram to which the present disclosure is applicable.

Fig. 2 is a flowchart illustrating a method for generating a gaze estimation model according to an exemplary embodiment of the disclosure.

Fig. 3 is a flowchart illustrating a method for generating a gaze estimation model according to another exemplary embodiment of the disclosure.

Fig. 4 is a schematic diagram of an exemplary training process of a method for generating a line-of-sight estimation model according to an exemplary embodiment of the disclosure.

Fig. 5 is a flowchart illustrating a gaze estimation method according to an exemplary embodiment of the disclosure.

Fig. 6 is a schematic structural diagram of a device for generating a gaze estimation model according to an exemplary embodiment of the present disclosure.

Fig. 7 is a schematic structural diagram of a device for generating a gaze estimation model according to another exemplary embodiment of the present disclosure.

Fig. 8 is a schematic structural diagram of a gaze estimation device according to an exemplary embodiment of the present disclosure.

Fig. 9 is a schematic structural diagram of a gaze estimation device according to another exemplary embodiment of the present disclosure.

Fig. 10 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.

Detailed Description

Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Summary of the application

The ABMs in the related art require a large amount of training data, are difficult to combat noise in the training data, and have poor interpretability. The method is an implicit learning method, ignores that the eyeball is a regular geometric sphere, does not combine explicit information related to the sight, such as the prediction of the sight direction by a key point of the human eye, and is easy to over-fit the image.

The GBMs scheme in the related art has higher accuracy, but the use scenario has more restrictions, which may cause prediction failure in cases of large head pose of the user, eye occlusion, and frequent head movement. Furthermore, such methods require high resolution of the image, requiring expensive hardware devices including cameras and flash, depth estimation devices, and the like.

Exemplary System

Fig. 1 illustrates a method or apparatus for generating a gaze estimation model to which embodiments of the disclosure may be applied, and an exemplary system architecture 100 for the gaze estimation method or apparatus.

As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use terminal device 101 to interact with server 103 over network 102 to receive or send messages and the like. Various communication client applications, such as an image capturing application, a search application, a web browser application, an instant messaging tool, and the like, may be installed on the terminal device 101.

The terminal device 101 may be various electronic devices including, but not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle-mounted terminal (e.g., a car navigation terminal), etc., and a fixed terminal such as a digital TV, a desktop computer, etc.

The server 103 may be a server that provides various services, such as a background server that performs model training or line-of-sight estimation on images uploaded by the terminal device 101. The background server can acquire the training samples and train the sight estimation model based on the training samples, or perform sight estimation on the acquired eye images by using the sight estimation model to obtain the predicted sight direction of the eye images.

It should be noted that the method for generating the gaze estimation model or the gaze estimation method provided in the embodiment of the present disclosure may be executed by the server 103 or the terminal device 101, and accordingly, the device for generating the gaze estimation model or the gaze estimation device may be provided in the server 103 or the terminal device 101.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the sample eye images required for training the gaze estimation model or the images for gaze direction prediction using the gaze estimation model do not need to be acquired from a remote location, the system architecture may include no network, and only a server or a terminal device.

Exemplary method

Fig. 2 is a flowchart illustrating a method for generating a gaze estimation model according to an exemplary embodiment of the disclosure. The embodiment can be applied to an electronic device (such as the terminal device 101 or the server 103 shown in fig. 1), and as shown in fig. 2, the method includes the following steps:

step 201, predicting the sample eye image through a backbone network in the sight estimation model to be trained to obtain predicted eye characteristics.

In this embodiment, the electronic device may predict the sample eye image through a backbone network in the sight estimation model to be trained, so as to obtain a predicted eye feature. The sample eye image has corresponding eye labeling features and a labeling sight line direction. The sample eye image may be an image extracted from a preset sample set, and the corresponding eye labeling feature of the image represents a structural feature of an eye, and the corresponding eye labeling direction represents a direction in which the eye gazes, and generally, the eye labeling direction includes a pitch angle and a yaw angle of a connecting line from an eyeball center to a pupil center in a preset coordinate system (for example, a coordinate system established by taking the eyeball center as a coordinate origin).

The trunk network in the sight line estimation model may include a deep neural network formed of a plurality of convolutional layers, fully-connected layers, and the like, and the trunk network may extract feature data (representing features such as color, shape, texture, and the like) of the input eye image, and then predict the eye features using the obtained feature data.

Step 202, conducting sight line direction prediction on the predicted eye features through a reconstruction network in the sight line estimation model to be trained, and obtaining the predicted sight line direction.

In this embodiment, the electronic device may perform gaze direction prediction on the predicted eye features through a reconstruction network in the gaze estimation model to be trained, so as to obtain a predicted gaze direction. The reconstruction network is used for representing the corresponding relation between the predicted eye features and the predicted sight direction. The reconstruction network may be constructed based on a correspondence table obtained by counting a large number of predicted eye features and predicted gaze directions, or may be constructed based on a preset calculation formula.

As an example, the predicted eye features may include coordinates of eye key points and eyeball radii. The coordinates of the key points and the radius of the eyeball are used as parameters of a formula for calculating and predicting the sight line direction, and a pitch angle and a deflection angle which represent the predicted sight line direction can be output. In general, the key point information used in the embodiments of the present disclosure represents a projection of a three-dimensional eyeball center point on a two-dimensional eye image and a pupil center point on the eye image. Thus, the reconstruction network may determine the predicted gaze direction according to the following formula:

wherein, theta,Pitch and yaw, respectively, (x) in three-dimensional spaceic,yic)、(xec,yec) Pupil center and eyeball center, respectively, and r is the eyeball radius.

In step 203, parameters of the first loss function are determined based on the predicted gaze direction and the annotated gaze direction.

In this embodiment, the electronic device may determine parameters of the first loss function based on the predicted gaze direction and the annotated gaze direction. In general, the predicted gaze direction and the annotated gaze direction may be parameters of the first loss function. Alternatively, the predicted gaze direction and the labeled gaze direction may be converted by scaling, coordinate system conversion, etc., and the obtained conversion result may be used as a parameter of the first loss function.

The first penalty function is used to determine a difference between the predicted gaze direction and the annotated gaze direction. The first loss function may be constructed based on various existing loss functions, such as klloss.

Step 204, determining parameters of the second loss function based on the predicted eye features and the annotated eye features.

In this embodiment, the electronic device may determine the parameters of the second loss function based on the predicted eye features and the annotated eye features. In general, the predicted eye feature and the annotated eye feature may be parameters of a second loss function. Alternatively, the predicted eye feature and the labeled eye feature may be converted in a manner such as scaling, and the obtained conversion result is used as a parameter of the second loss function.

The second penalty function is used to determine the difference between the predicted ocular feature and the annotated ocular feature. The second loss function may be constructed based on various existing loss functions, such as L2 loss.

Step 205, training a sight line estimation model based on the first loss function and the second loss function.

In this embodiment, the electronic device may train the gaze estimation model based on the first loss function and the second loss function.

As an example, the first loss function is L1Expressed as L for the second loss function2Expressed, the total loss function is expressed as:

L=a1*L1+a2*L2 (2)

wherein, a1And a2Is a preset weight. During training, the L corresponding to the currently input sample eye image can be calculated1Loss value of (1) and L2Then, the loss value of L is calculated by using the formula (2), and then, by using the gradient descent method, the parameters of the sight line estimation model are iteratively updated by inputting the sample eye image again, so that the loss value of L is gradually reduced. And when the preset training ending condition is met, ending the training to obtain the trained sight line estimation model. The preset training end condition may includeBut are not limited to, at least one of: and the loss value of the L is converged, the training times exceed the preset times, and the training time exceeds the preset time.

The method provided by the above embodiment of the present disclosure includes setting the sight estimation model as a backbone network and a reconstruction network, predicting a sample eye image by using the backbone network to obtain a predicted eye feature, predicting a sight direction of the predicted eye feature by using a reconstruction module to obtain a predicted sight direction of the sample eye image, determining a parameter of a first loss function based on the predicted sight direction and a labeled sight direction, determining a parameter of a second loss function based on the predicted eye feature and the labeled eye feature, and training the sight estimation model based on the first preset loss function and the second preset loss function. The vision prediction task is divided into two prediction tasks, namely the task of predicting the eye features and the task of predicting the vision, so that the vision prediction process of the trained vision prediction model is more detailed, the model can learn the eye feature information related to the vision direction by introducing the supervision of the eye features, the eye features representing the eyeball structure are obtained, the vision direction is reconstructed by utilizing the eye features, the prediction precision of the vision direction is higher, the dependence on the head posture of a detected person is lower, and the adaptability to a use scene is better.

In some optional implementations, the method may further include:

and evaluating the predicted sight direction through a sight uncertainty evaluation network in the sight estimation model to be trained to obtain evaluation information representing uncertainty of the predicted sight direction.

The uncertainty evaluation network is used for representing the corresponding relation between the predicted sight line direction and the evaluation information. The uncertainty of the predicted gaze direction may represent the quality of the input sample eye image, i.e. the better the quality (e.g. more complete eye image, sharper image, more eye opening, etc.), the lower the uncertainty, the worse the quality (e.g. incomplete eye image, less sharpness, eye closure, etc.), the higher the uncertainty. Generally, the uncertainty evaluation network may determine a probability distribution condition of the predicted gaze direction, determine a standard deviation corresponding to the currently obtained predicted gaze direction according to the probability distribution condition, and determine evaluation information representing uncertainty of the predicted gaze direction according to the standard deviation. The evaluation information may be a numerical value whose magnitude may indicate uncertainty of the predicted gaze direction.

As an example, the predicted gaze direction follows a gaussian distribution, represented by:

therein, predgazeIndicating the predicted direction of sightgazeIndicates the marked line-of-sight direction (i.e., ground truth), and σ indicates the standard deviation.

During training, the regression objective is to fit the predicted distribution to the true distribution, so the first loss function can be klloss, which is used to calculate the regression loss: ,

wherein the content of the first and second substances,a distribution indicating the predicted direction of the line of sight,the distribution of the true viewing directions, here, the distribution of the annotation viewing directions, is shown. DKLIndicating KL divergence and sigma standard deviation. Replacing log σ in equation (4) with α, neglecting the constant term in equation (4)2The final KL loss function is shown as follows:

in general, α in equation (5) may represent the score of the predicted line-of-sight direction as evaluation information output from the uncertainty evaluation networkThe larger the difference between the distribution and the real distribution, the larger the alpha value, the larger the difference between the distribution of the predicted sight line direction and the real distribution, that is, the larger the uncertainty. It should be noted that the evaluation information can also be obtained in other ways based on α, for example, eAs evaluation information.

When the trained sight line evaluation model is used for sight line prediction, evaluation information which represents uncertainty of the predicted sight line direction and corresponds to the eye image can be output at the same time, and therefore the evaluation information can be used for evaluating the quality of the input eye image.

The evaluation information output by the implementation mode can be used for evaluating the quality of the input eye image, so that the quality of the eye image can be accurately and efficiently evaluated.

In some optional implementations, the labeling the eye feature includes labeling an eyeball radius and labeling eye keypoint information, and the second loss function includes a first sub-loss function and a second sub-loss function. The first sub-loss function is used for determining the difference between the predicted eyeball radius and the labeled eyeball radius, and the second sub-loss function is used for determining the difference between the coordinates of the predicted eye key point and the coordinates of the labeled eye key point. The first sub-loss function and the second sub-loss function may use various existing regression loss functions, such as L2 (mean square error) loss, L1 (mean absolute value error) loss, and the like.

The labeling eyeball radius represents the radius of an eyeball indicated by the sample eye image, and the labeling key point information represents the specific position of the eye. In general, the key point information used in the embodiments of the present disclosure represents a projection of a three-dimensional eyeball center point on a two-dimensional eye image and a pupil center point on the eye image.

On this basis, step 201 may be performed as follows:

and predicting the sample eye image through a backbone network to obtain the predicted eyeball radius and the predicted eye key point information.

The backbone network in this embodiment is used to predict the radius of the eyeball and the position of the eye key point indicated by the input sample eye image.

Based on this, step 204 may be performed as follows:

first, parameters of a first sub-loss function are determined based on the predicted eyeball radius and the annotated eyeball radius.

Then, parameters of the second sub-loss function are determined based on the predicted eye keypoint information and the annotated eye keypoint information.

In general, the predicted eyeball radius and the labeled eyeball radius may be parameters of a first sub-loss function, and the predicted eye key point information and the labeled eye key point information may be parameters of a second sub-loss function. Or, the predicted eyeball radius and the labeled eyeball radius, and the predicted eye key point information and the labeled eye key point information may be converted in a manner of scaling, coordinate system conversion, and the like, and the obtained conversion result is used as a parameter of the first sub-loss function and the second sub-loss function.

As an example, the first sub-loss function is L11Expressing the second sub-loss function as L12That is, the above formula (2) can be modified as follows:

L=a11*L11+a12*L12+a2*L2 (6)

wherein, a11、a12、a2Is a preset weight. During training, the L corresponding to the currently input sample eye image can be calculated11、L12And L2Then, the loss value of L is calculated by using the equation (6), and then, by using the gradient descent method, the parameters of the sight line estimation model are iteratively updated by inputting the sample eye image again, so that the loss value of L is gradually reduced. And when the preset training ending condition is met, ending the training to obtain the trained sight line estimation model.

According to the implementation mode, the eyeball radius and the eye key point information are used as the eye characteristic information, the corresponding first sub-loss function and the second sub-loss function are set, the three-dimensional size information of the eyeball model can be introduced into the sight line estimation model, more effective information is provided for the sight line direction prediction during the sight line direction prediction, meanwhile, the sight line prediction task is decomposed into three prediction tasks of the eyeball radius, the eye key point information and the sight line direction, the sight line estimation process is further refined, and therefore the sight line estimation precision is further improved.

With further reference to fig. 3, a flow diagram of yet another embodiment of a method of generating a gaze estimation model is shown. As shown in fig. 3, on the basis of the embodiment shown in fig. 2, step 201 may include the following steps:

in step 2011, the sample type of the sample eye image is determined.

Step 2012, if the sample type indicates that the sample eye image is a sample synthesized eye image, predicting the sample synthesized eye image through the trunk network to obtain a first predicted eye sub-feature corresponding to the sample synthesized eye image.

Wherein the sample synthetic eye image may be an image generated from a pre-established eye model.

And 2013, if the sample type indicates that the sample eye image is the sample real eye image, predicting the sample real eye image through the backbone network to obtain a second predicted eye sub-feature corresponding to the sample real eye image.

The sample real eye image may be an image previously taken of a real eye.

Step 2014, a predicted ocular feature is obtained based on the first predicted ocular sub-feature and the second predicted ocular sub-feature.

In training a model, a plurality of sample synthetic eye images and a plurality of sample real eye images are generally used as sample eye images required for training the model. Typically, the same number of at least one sample synthetic eye image and at least one sample real eye image constitute one batch (batch) of samples, and the model is trained in a batch-wise manner. The predicted eye feature includes a first predicted eye sub-feature and a second predicted eye sub-feature.

In the method provided by the embodiment corresponding to fig. 3, the samples required for model training are set as the sample synthesized eye image and the sample real eye image, the sample synthesized eye image can provide accurate three-dimensional eye features for the model during training, and the sample real eye image can test the prediction performance of the trunk network, so that the two are combined during model training, and the prediction accuracy of the model is gradually improved.

In some optional implementations, based on the method provided by the corresponding embodiment of fig. 3, step 202 may be performed as follows:

and respectively predicting the sight direction of the first predicted eye sub-feature and the second predicted eye sub-feature through a reconstruction network in a sight estimation model to be trained to obtain a first predicted sight direction corresponding to the sample synthesized eye image and a second predicted sight direction corresponding to the sample real eye image.

According to the implementation mode, the first predicted sight direction and the second predicted sight direction corresponding to the sample synthesized eye image and the sample real eye image are respectively determined, so that the ideal synthesized eye image and the actually shot eye image can be fully utilized during training, the sight direction prediction capability of the model is trained, the model learns more eye states from more types of samples, and the precision of the model for predicting the sight direction is improved.

In some alternative implementations, step 203 may be performed as follows:

and determining parameters of the first loss function based on a first labeling sight line direction and a first prediction sight line direction corresponding to the sample synthesized eye image and a second labeling sight line direction and a second prediction sight line direction corresponding to the sample real eye image.

In general, a first annotated gaze direction and a first predicted gaze direction, and a second annotated gaze direction and a second predicted gaze direction may be parameters of the first loss function. Alternatively, the first marked sight line direction and the first predicted sight line direction, and the second marked sight line direction and the second predicted sight line direction may be converted in a manner of scaling, coordinate system conversion, and the like, and the obtained conversion result is used as a parameter of the first loss function.

The first loss function may determine a loss value representing a difference between the first annotated gaze direction and the first predicted gaze direction and a loss value representing a difference between the second annotated gaze direction and the second predicted gaze direction.

Determining parameters of a second loss function based on the predicted ocular feature and the annotated ocular feature, comprising:

and determining parameters of a second loss function based on the first annotated ocular feature and the first predicted ocular sub-feature corresponding to the sample synthesized ocular image.

The first eye labeling feature is obtained by labeling the sample synthetic eye image in advance. In the embodiment, the sample real eye image is a two-dimensional image shot for the real eye, so that three-dimensional features of the eye are lacked, and corresponding eye labeling features are not included. During training, the loss value of the second loss function can be calculated through the first labeled eye feature and the first predicted eye sub-feature corresponding to the sample synthesized eye image, and the backbone network is trained on the basis of the loss value. And predicting to obtain a second predicted eye sub-feature corresponding to the sample real eye image by using the backbone network, and calculating a second predicted sight direction corresponding to the sample real eye image by using the reconstruction network. And calculating a loss value of a first loss function according to a first marked sight line direction and a first predicted sight line direction corresponding to the sample synthetic eye image and a second marked sight line direction and a second predicted sight line direction corresponding to the sample real image, and continuously training the backbone network based on the loss value. Usually, during training, the total loss value can be calculated by using the formula (2), and parameters of the main network are adjusted by a gradient descent method, so that the total loss value is gradually converged, and thus the eye image synthesized according to the sample and the real eye image of the sample are combined to train the sight line estimation model.

After each training, inputting the new sample real eye image into the backbone network of the latest adjustment parameter, and obtaining a second predicted eye sub-feature corresponding to the new sample real eye image to continue to participate in the training. And iterating for multiple times until the sight line estimation model meets the training end condition.

According to the selectable implementation mode, the eye characteristic information and the sight direction are marked on the sample synthesized eye image, the sight direction is marked on the sample real eye image, the eye characteristic recognition capability of the three-dimensional space characteristic training model marked on the sample synthesized eye image is effectively utilized in training, and the sight estimation capability of the sight direction training model marked on the sample synthesized eye image and the sample real eye image is utilized, so that the three-dimensional eyeball model and the mixed sample data training strategy are combined, more effective information is provided for the model to accurately predict the sight direction, and the sight estimation accuracy of the trained model is further improved.

Further referring to fig. 4, it shows an exemplary training process diagram of the method for generating the gaze estimation model provided by the embodiment of the present disclosure. As shown in fig. 4, the training samples include a sample synthesized eye image 401 and a sample real eye image 402, and by using the same number of sample synthesized eye images 401 and sample real eye images 402 as a batch through a batch training method, the same number of sample synthesized eye images 401 and sample real eye images 402 are input into a main network 403 of the sight estimation model to be trained, so as to obtain a first predicted eye radius 404 and first predicted eye key point information 405 corresponding to the sample synthesized eye image 401, and a second predicted eye radius 406 and second predicted eye key point information 407 corresponding to the sample real eye image 402.

Then, the first predicted eyeball radius 404 and the first predicted eye key point information 405, and the second predicted eyeball radius 406 and the second predicted eye key point information 407 are input into a reconstruction network 408 included in the sight line estimation model to be trained, so as to obtain a first predicted sight line direction 409 corresponding to the sample synthetic eye image 401 and a second predicted sight line direction 410 corresponding to the sample real eye image 402.

The sample synthetic eye image 401 may have a corresponding annotated eyeball radius 411, annotated eye keypoint information 412, a first annotated gaze direction 413, and the sample real eye image 402 may have a corresponding second annotated gaze direction 414. During training, the labeled eyeball radius 411 and the first predicted eyeball radius 404 are taken as parameters of a first sub-loss function 415 included in the second loss function; the annotation eye keypoint information 412 and the first predicted eye keypoint information 405 are used as parameters of a second sub-loss function 416 included in the second loss function; the first annotated visual direction 413 and the first predicted visual direction 409, and the second annotated visual direction 414 and the second predicted visual direction 410 are taken as parameters of a first loss function 417, and parameters of the backbone network are adjusted by using a gradient descent method and a back propagation method, so that a total loss value obtained based on each loss function is minimized. And when the training ending condition is met (for example, the total loss value is converged), ending the training to obtain the trained sight line estimation model.

With further reference to fig. 5, fig. 5 is a schematic flow chart diagram of a gaze estimation method provided by an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device (such as the terminal device 101 or the server 103 shown in fig. 1), and as shown in fig. 5, the method includes the following steps:

step 501, obtaining an eye image to be estimated.

In this embodiment, the electronic device may acquire the eye image to be estimated from a local or remote location. The eye image to be estimated may be an image of eyes of a target object (for example, a pedestrian on the road, a passenger in the vehicle, or the like) photographed in various scenes. For example, a camera is arranged inside the vehicle, and the camera can shoot eyes of a person in the vehicle, and an obtained eye image is an eye image to be estimated.

Step 502, inputting the eye image to be estimated into a backbone network of a pre-trained sight estimation model to obtain predicted eye feature information.

In this embodiment, the electronic device may input the eye image to be estimated into a backbone network of a pre-trained gaze estimation model, so as to obtain predicted eye feature information. The sight line estimation model is obtained by training in advance based on the method described in the embodiment corresponding to fig. 2.

Step 503, inputting the predicted eye feature information into a reconstruction network included in the sight estimation model to obtain predicted sight direction information.

In this embodiment, the electronic device may input the predicted eye feature information into a reconstruction network included in the gaze estimation model to obtain predicted gaze direction information. After the sight line direction information is obtained, the sight line direction information can be generally utilized to a scene of human-computer interaction, for example, the control of the controlled device is completed through the sight line direction information.

According to the sight estimation method provided by the embodiment of the disclosure, by using the sight estimation model obtained by training in the embodiment corresponding to fig. 2, when the sight estimation is performed, the eye features are predicted first, and then the sight direction is reconstructed by using the eye features, so that the prediction accuracy of the sight direction is higher, the dependence on the head posture of the detected person is lower, and the adaptability to the use scene is better.

In some optional implementations, after step 503, the electronic device may further perform the following steps:

firstly, inputting the predicted sight direction information into an uncertainty evaluation network included in a sight estimation model to obtain evaluation information representing uncertainty of the predicted sight direction.

The evaluation information may be a numerical value, and the magnitude of the numerical value may indicate uncertainty of the predicted line-of-sight direction. Generally, the distribution of the predicted viewing direction conforms to a gaussian distribution, that is, there is a standard deviation between the predicted viewing direction and the actual viewing direction, and the evaluation information can be obtained from the standard deviation. For example, the larger the standard deviation, the larger the evaluation information in numerical form, the higher the uncertainty indicating the predicted gaze direction; conversely, the smaller the standard deviation, the smaller the evaluation information in numerical form, indicating the lower uncertainty of the predicted gaze direction.

Then, based on the evaluation information, information characterizing the quality of the eye image to be estimated is generated.

The evaluation information may indicate the quality of the image, that is, when the evaluation information indicates that the uncertainty of the predicted viewing direction is strong, the quality of the image is poor, and conversely, when the evaluation information indicates that the uncertainty of the predicted viewing direction is weak, the quality of the image is good. The information characterizing the quality of the eye image to be estimated may be in various forms, such as text, symbols, numbers, and the like.

The realization mode generates the information representing the image quality through the evaluation information output by the uncertainty evaluation network included in the sight line estimation model, and can accurately evaluate the quality of the eye images, thereby being beneficial to the operations of screening the eye images and the like and providing the reference of the image quality in engineering application.

Exemplary devices

Fig. 6 is a schematic structural diagram of a device for generating a gaze estimation model according to an exemplary embodiment of the present disclosure. The embodiment may be applied to an electronic device, and as shown in fig. 6, the apparatus for generating a gaze estimation model includes: the first prediction module 601 is configured to predict a sample eye image through a backbone network in a sight estimation model to be trained to obtain a predicted eye feature, where the sample eye image has a corresponding labeled eye feature and a labeled sight direction; the second prediction module 602 is configured to perform gaze direction prediction on the predicted eye features through a reconstruction network in the gaze estimation model to be trained, so as to obtain a predicted gaze direction; a first determining module 603 for determining parameters of a first loss function based on the predicted gaze direction and the annotated gaze direction; a second determining module 604 for determining parameters of a second loss function based on the predicted eye features and the annotated eye features; a training module 605 for training the gaze estimation model based on the first loss function and the second loss function.

In this embodiment, the first prediction module 601 may predict the eye image of the sample through a backbone network in the sight line estimation model to be trained, so as to obtain a predicted eye feature. The sample eye image has corresponding eye labeling features and a labeling sight line direction. The sample eye image may be an image extracted from a preset sample set, and the corresponding eye labeling feature of the image represents a structural feature of an eye, and the corresponding eye labeling direction represents a direction in which the eye gazes, and generally, the eye labeling direction includes a pitch angle and a yaw angle of a connecting line from an eyeball center to a pupil center in a preset coordinate system (for example, a coordinate system established by taking the eyeball center as a coordinate origin).

The trunk network in the sight line estimation model may include a deep neural network formed of a plurality of convolutional layers, fully-connected layers, and the like, and the trunk network may extract feature data (representing features such as color, shape, texture, and the like) of the input eye image, and then predict the eye features using the obtained feature data.

In this embodiment, the second prediction module 602 may perform gaze direction prediction on the predicted eye features through a reconstruction network in the gaze estimation model to be trained, so as to obtain a predicted gaze direction. The reconstruction network is used for representing the corresponding relation between the predicted eye features and the predicted sight direction. The reconstruction network may be constructed based on a correspondence table obtained by counting a large number of predicted eye features and predicted gaze directions, or may be constructed based on a preset calculation formula.

In this embodiment, the first determination module 603 may determine parameters of the first loss function based on the predicted gaze direction and the annotated gaze direction. In general, the predicted gaze direction and the annotated gaze direction may be parameters of the first loss function. Alternatively, the predicted gaze direction and the labeled gaze direction may be converted by scaling, coordinate system conversion, etc., and the obtained conversion result may be used as a parameter of the first loss function.

The first penalty function is used to determine a difference between the predicted gaze direction and the annotated gaze direction. The first loss function may be constructed based on various existing loss functions, such as klloss.

In this embodiment, the second determination module 604 may determine the parameters of the second loss function based on the predicted eye features and the annotated eye features. In general, the predicted eye feature and the annotated eye feature may be parameters of a second loss function. Alternatively, the predicted eye feature and the labeled eye feature may be converted in a manner such as scaling, and the obtained conversion result is used as a parameter of the second loss function.

The second penalty function is used to determine the difference between the predicted ocular feature and the annotated ocular feature. The second loss function may be constructed based on various existing loss functions, such as L2 loss.

In this embodiment, the training module 605 may train the gaze estimation model based on the first loss function and the second loss function.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a device for generating a gaze estimation model according to another exemplary embodiment of the present disclosure.

In some optional implementations, the first prediction module 601 includes: a first determining unit 6011 configured to determine a sample type of the sample eye image; a first prediction unit 6012, configured to predict, if the sample type indicates that the sample eye image is a sample synthesized eye image, the sample synthesized eye image through a backbone network, to obtain a first predicted eye sub-feature corresponding to the sample synthesized eye image; a second prediction unit 6013, configured to predict, if the sample type indicates that the sample eye image is the sample real eye image, the sample real eye image through the backbone network, to obtain a second predicted eye sub-feature corresponding to the sample real eye image; a second determining unit 6014, configured to obtain the predicted eye feature based on the first predicted eye sub-feature and the second predicted eye sub-feature.

In some optional implementations, the second prediction module 602 is further configured to: and respectively predicting the sight direction of the first predicted eye sub-feature and the second predicted eye sub-feature through a reconstruction network in a sight estimation model to be trained to obtain a first predicted sight direction corresponding to the sample synthesized eye image and a second predicted sight direction corresponding to the sample real eye image.

In some optional implementations, the first determining module 603 is further configured to: determining parameters of a first loss function based on a first labeling sight line direction and a first prediction sight line direction corresponding to the sample synthesis eye image and a second labeling sight line direction and a second prediction sight line direction corresponding to the sample real eye image; the second determination module 604 is further configured to: and determining parameters of a second loss function based on the first annotated ocular feature and the first predicted ocular sub-feature corresponding to the sample synthesized ocular image.

In some optional implementations, the apparatus further comprises: the evaluation module 606 is configured to evaluate the predicted gaze direction through a gaze uncertainty evaluation network in the gaze estimation model to be trained, so as to obtain evaluation information representing uncertainty of the predicted gaze direction.

In some optional implementations, the labeling of the eye features includes labeling of eyeball radius and labeling of eye keypoint information, and the second loss function includes a first sub-loss function and a second sub-loss function; the first prediction module is further to: predicting the sample eye image through a backbone network to obtain predicted eyeball radius and predicted eye key point information; the second determining module 604 includes: a third determining unit 6041 for determining parameters of the first sub-loss function based on the predicted eyeball radius and the annotated eyeball radius; a fourth determining unit 6042 for determining parameters of the second sub-loss function based on the predicted eye keypoint information and the annotated eye keypoint information.

The generation device of the sight estimation model provided by the above embodiment of the present disclosure sets the sight estimation model as a backbone network and a reconstruction network, predicts the sample eye image by using the backbone network to obtain a predicted eye feature, predicts the predicted eye feature by using a reconstruction module to obtain a predicted sight direction of the sample eye image, determines a parameter of a first loss function based on the predicted sight direction and a labeled sight direction, determines a parameter of a second loss function based on the predicted eye feature and the labeled eye feature, and trains the sight estimation model based on the first preset loss function and the second preset loss function. The vision prediction task is divided into two prediction tasks, namely the task of predicting the eye features and the task of predicting the vision, so that the vision prediction process of the trained vision prediction model is more detailed, the model can learn the eye feature information related to the vision direction by introducing the supervision of the eye features, the eye features representing the eyeball structure are obtained, the vision direction is reconstructed by utilizing the eye features, the prediction precision of the vision direction is higher, the dependence on the head posture of a detected person is lower, and the adaptability to a use scene is better.

Fig. 8 is a schematic structural diagram of a gaze estimation device according to an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device, as shown in fig. 8, the gaze estimation apparatus includes: an obtaining module 801, configured to obtain an eye image to be estimated; a third prediction module 802, configured to input the eye image to be estimated into a backbone network of a pre-trained eye estimation model to obtain predicted eye feature information, where the eye estimation model is obtained by training in advance based on the method described in the embodiment corresponding to fig. 2; the fourth prediction module 803 is configured to input the predicted eye feature information into a reconstruction network included in the gaze estimation model, so as to obtain predicted gaze direction information.

In this embodiment, the obtaining module 801 may obtain the eye image to be estimated from a local place or a remote place. The eye image to be estimated may be an image of eyes of a target object (which may be a person, for example) photographed in various scenes. For example, a camera is arranged inside the vehicle, and the camera can shoot eyes of a person in the vehicle, and an obtained eye image is an eye image to be estimated.

In this embodiment, the third prediction module 802 may input the eye image to be estimated into a backbone network of a pre-trained gaze estimation model to obtain predicted eye feature information. The sight line estimation model is obtained by training in advance based on the method described in the embodiment corresponding to fig. 2.

In this embodiment, the fourth prediction module 803 may input the predicted eye feature information into a reconstruction network included in the gaze estimation model to obtain predicted gaze direction information. After the sight line direction information is obtained, the sight line direction information can be generally utilized to a scene of human-computer interaction, for example, the control of the controlled device is completed through the sight line direction information.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a gaze estimation device according to another exemplary embodiment of the present disclosure.

In some optional implementations, the apparatus further comprises: a third determining module 804, configured to input the predicted gaze direction information into an uncertainty evaluation network included in the gaze estimation model, to obtain evaluation information representing uncertainty of the predicted gaze direction; a generating module 805 configured to generate information characterizing the quality of the eye image to be estimated based on the evaluation information.

According to the sight estimation device provided by the embodiment of the disclosure, by using the sight estimation model obtained by training in the embodiment corresponding to fig. 2, when the sight estimation is performed, the eye features are predicted first, and then the sight direction is reconstructed by using the eye features, so that the prediction accuracy of the sight direction is higher, the dependence on the head posture of the detected person is lower, and the adaptability to the use scene is better.

Exemplary electronic device

Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 10. The electronic device may be either or both of the terminal device 101 and the server 103 as shown in fig. 1, or a stand-alone device separate from them, which may communicate with the terminal device 101 and the server 103 to receive the collected input signals therefrom.

FIG. 10 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

As shown in fig. 10, the electronic device 1000 includes one or more processors 1001 and memory 1002.

The processor 1001 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 1000 to perform desired functions.

Memory 1002 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and executed by the processor 1001 to implement the above generation method of the gaze estimation model or the gaze estimation method of the various embodiments of the present disclosure, and/or other desired functions. Various contents such as a sample eye image, an eye image to be estimated, and the like can also be stored in the computer-readable storage medium.

In one example, the electronic device 1000 may further include: an input device 1003 and an output device 1004, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, when the electronic apparatus is the terminal apparatus 101 or the server 103, the input device 1003 may be a camera, a mouse, a keyboard, or the like, and is used to input a sample eye image, an eye image to be estimated, various commands, or the like. When the electronic apparatus is a stand-alone apparatus, the input device 1003 may be a communication network connector for receiving the input sample eye image, eye image to be estimated, various commands, and the like from the terminal apparatus 101 and the server 103.

The output device 1004 may output various information including a sight line estimation model, a predicted sight line direction, and the like to the outside. The output devices 1004 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.

Of course, for simplicity, only some of the components of the electronic device 1000 relevant to the present disclosure are shown in fig. 10, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 1000 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatuses, embodiments of the present disclosure may also be a computer program product including computer program instructions that, when executed by a processor, cause the processor to perform the steps in the generation method of the gaze estimation model or the gaze estimation method according to various embodiments of the present disclosure described in the above-described "exemplary methods" section of this specification.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the generation method of the gaze estimation model or the gaze estimation method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于单目摄像头的实时位移测量方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!