Method and system for automatic pupil detection

文档序号：1835368 发布日期：2021-11-12 浏览：24次中文

阅读说明：本技术 用于自动瞳孔检测的方法和系统 (Method and system for automatic pupil detection ) 是由艾哈迈德·贾维什伊德里斯·S·阿利姆于 2020-07-27 设计创作，主要内容包括：描述了检测对象头部上瞳孔位置的系统和方法。当相机和光源位于对象头部前方时,使得相机捕获表示对象头部的正面部分的图像数据。从图像数据检测对象的面部。从所检测到的面部识别最多两只眼睛。检测光源在每只所识别的眼睛上的反射。基于所检测到的光源反射在所识别的眼睛上的位置来估计每只所识别的眼睛的瞳孔位置。可以根据所估计的瞳孔位置来确定瞳孔间距离、左瞳孔距离和右瞳孔距离中的一个或多个。(Systems and methods of detecting pupil position on a subject's head are described. When the camera and light source are positioned in front of the subject's head, the camera is caused to capture image data representing a frontal portion of the subject's head. A face of the subject is detected from the image data. A maximum of two eyes are identified from the detected face. The reflection of the light source on each of the identified eyes is detected. The pupil position of each identified eye is estimated based on the position of the detected light source reflection on the identified eye. One or more of the interpupillary distance, the left pupillary distance, and the right pupillary distance may be determined from the estimated pupil location.)

1. A computer-implemented method of detecting pupil location on a subject's head, the method comprising:

causing a camera and a display of an electronic device to capture image data representing a frontal portion of a head of the subject when the camera and the display are positioned in front of the head of the subject;

detecting a face of the subject from the image data;

identifying a maximum of two eyes from the detected face;

detecting a reflection of the display screen on each of the identified eyes; and

the pupil position of each identified eye is estimated based on the position of the detected light source reflection on the identified eye.

2. The method of claim 1, wherein detecting the reflection of the display screen on each identified eye comprises convolving an area of the detected face including the identified eye with a convolution kernel configured to identify a bright region having a screen reflection shape and surrounded by relatively dark pixels.

3. The method of claim 1, wherein detecting the reflection of the display screen on each identified eye comprises convolving an area of the detected face including the identified eye with a first convolution kernel and a second convolution kernel, wherein the first convolution kernel is configured to identify a bright region having a screen reflection shape and surrounded by relatively dark pixels, and wherein the second convolution kernel is configured to identify dark pixels surrounding a relatively bright region having a screen reflection shape.

4. The method of any of claims 1 to 3, further comprising, prior to causing the camera to capture the image data, presenting a gaze focus on the display screen.

5. The method of claim 4, further comprising, prior to causing the camera to capture the image data, directing a gaze of the object to the gaze focus.

6. The method of claim 4, wherein estimating a pupil location of each identified eye comprises determining a point on a respective detected screen reflection corresponding to the gaze focus on the display screen.

7. The method of claim 6, wherein estimating a pupil location of each identified eye further comprises applying a lateral shift to the determined point on the respective detected screen reflection to compensate for an offset between an optical axis and a visual axis of the identified eye.

8. The method of any preceding claim, further comprising causing the display screen to illuminate a front portion of the subject's head while causing the camera to capture the image data.

9. The method of claim 8, further comprising, during causing the camera to capture at least a portion of the image data, determining whether a lighting condition of a frontal portion of the subject's head is indicative of a bright environment.

10. The method of claim 9, wherein determining whether the lighting condition of the frontal portion of the subject's head is indicative of a bright environment comprises:

for at least one identified eye, determining whether there are additional light source reflections on the at least one identified eye and comparing the average intensity of each of the additional light source reflections to the average intensity of the detected light source reflections on the at least one identified eye; and

wherein the lighting condition of the frontal portion of the subject's head is indicative of a bright environment if there is at least one additional light source reflection on the at least one identified eye having an average intensity that substantially matches the average intensity of the detected light source reflections on the at least one identified eye.

11. The method of claim 10, further comprising, if it is determined that the lighting condition of the frontal portion of the subject's head indicates a bright environment, re-causing the camera to capture image data representing the frontal portion of the subject's head with a different lighting condition.

12. The method of claim 11, further comprising prompting the subject to face away from an additional light source to provide the different lighting condition.

13. The method of any one of the preceding claims, further comprising determining a confidence level for the estimate of the pupil location of each identified eye based on a degree of displacement of the pupil location of the identified eye relative to a center location of the iris on the identified eye.

14. The method of any one of the preceding claims, further comprising determining at least one of an interpupillary distance, a left pupillary distance and a right pupillary distance from the estimated pupil location.

15. An electronic device for detecting a pupil location on a subject's head, the electronic device comprising:

a forward facing camera;

a display screen;

at least one processor communicatively coupled to the forward-facing camera and the display screen; and

a memory to store a set of instructions that, as a result of execution by the at least one processor, cause the electronic device to perform the method of any of claims 1-14.

16. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, as a result of being executed, cause at least one computer processor to perform the method of any one of claims 1-14.

Background

There is a great interest in turning eyeglasses into wearable electronic devices. Various combinations of these wearable electronic devices include eyewear with integrated cameras (e.g., Snap inc. SPECTACLES), eyewear with integrated audio (e.g., BOSE framealto audio sunglasses), and eyewear with integrated displays (e.g., forth and VUZIX blades of North inc). These wearable electronic devices are commonly referred to collectively as smart glasses. However, the complexity of these devices may vary greatly depending on the features supported by the device. Smart glasses that provide a display may be referred to as wearable heads-up displays (WHUDs) to distinguish them from smart glasses that do not provide a display.

One type of WHUD is a virtual retinal display in which a projector renders a raster scan onto the eye of a user (i.e., the person wearing the WHUD). The WHUD includes a support frame that has the appearance of an eyeglass frame. The support frame accommodates two lenses. The projector is mounted in a temple of the support frame, and an optical combiner integrated in one of the lenses receives the light from the projector and redirects it to the eye. The optical combiner may be a free space combiner, such as a holographic combiner or a substrate guided combiner, such as a waveguide or light guide combiner with input and output couplers. The WHUD has an eyebox that defines a range of eye positions at which the user can see the display content. The eye frame may be defined by a single exit pupil or multiple exit pupils, depending on the configuration of the optical combiner forming the exit pupil(s). Typically, a user will be able to see all of the display content when the user's eye is within the eyebox, or when the eye's pupil is aligned with at least one exit pupil. Conversely, when the user's eyes are outside the eyebox, the user will not be able to see at least a portion of the display.

As with conventional eyeglasses, the WHUD includes a support frame (or frames) that needs to be sized and adjusted to the head, and lenses that need to be mounted to the support frame, with or without a prescription. Unlike ordinary eyeglasses, WHUDs have additional requirements regarding the alignment of the eye with the eyebox and the placement of the optical combiner on the lens. The optical combiner must be placed on the lens and placed with respect to the support frame by: the support frame ensures that the user can see the display when mounted on the user's head, and the placement must be customized for each user. One of the key measures to generally adjust WHUD or any eyeglass size is interpupillary distance (IPD), also known as interpupillary distance (PD). IPD is the distance between the centers of the pupils, typically in millimeters. Other measurements related to the IPD are the Left Pupil Distance (LPD) and the Right Pupil Distance (RPD). LPD is the distance from the left pupil center to the center of the nose bridge, and the Right Pupillary Distance (RPD) is the distance from the right pupil center to the center of the nose bridge. If the eyes are perfectly symmetric about the center of the bridge of the nose and look straight forward or at infinity, LPD should equal RPD, and RPD should equal half the IPD. In prescription ordinary eyeglasses and prescription WHUDs, the IPD is used to determine where the optical center of the lens should be. Furthermore, in WHUD the IPD (or IPD together with LPD or RPD) is used to determine the optimal placement of the optical combiner on the lens.

An ophthalmologist (e.g., an optometrist or ophthalmologist) may provide pupil position measurements such as IPD as part of the ophthalmic exam and prescription information. However, requiring a potential user to see an ophthalmologist for an ophthalmic exam before resizing the WHUD constitutes a significant obstacle to large-scale adoption of WHUDs. There are manual methods of measuring the IPD. One manual method is based on physically measuring the distance between the pupils using a ruler placed on the forehead. However, this manual method does not necessarily produce pupil position measurements accurate enough to adjust the size of the WHUD (in order to project content into the eye through the eyebox).

Disclosure of Invention

A computer-implemented method of detecting a pupil location on a subject's head can be summarized as including: causing a camera to capture image data representing a frontal portion of the subject's head when the camera and light source are positioned in front of the subject's head; detecting a face of the subject from the image data; identifying a maximum of two eyes from the detected face; detecting a reflection of the light source on each of the identified eyes; and estimating a pupil position of each of the identified eyes based on the position of the detected light source reflections on the identified eyes.

In some cases, the method includes causing the camera to capture the image data while causing the light source to illuminate a frontal portion of the subject's head.

In some cases, the light source is a display screen of an electronic device, and detecting a reflection of the light source on each identified eye includes detecting a reflection of the display screen on each identified eye.

In some cases, detecting the reflection of the display screen on each identified eye includes convolving an area of the detected face including the identified eye with a convolution kernel configured to identify a bright region having a screen reflection shape and surrounded by relatively dark pixels.

In other cases, detecting a reflection of the display screen on each identified eye includes convolving an area of the detected face that includes the identified eye with a first convolution kernel and a second convolution kernel. The first convolution kernel is configured to identify bright regions having a screen reflection shape and surrounded by relatively dark pixels. The second convolution kernel is configured to identify dark pixels that surround a relatively bright area having a screen reflection shape.

In some cases, the method includes presenting a gaze focus on the display screen prior to causing the camera to capture the image data.

In some cases, the method includes directing the gaze of the object to the gaze focus prior to causing the camera to capture the image data.

In some cases, estimating a pupil location of each identified eye includes determining a point on a respective detected screen reflection corresponding to the gaze focus on the display screen.

In some cases, estimating the pupil location of each identified eye further comprises applying a lateral shift to a determined point on the respective detected screen reflection to compensate for an offset between the optical axis and the visual axis of the identified eye.

In some cases, the method includes, during causing the camera to capture at least a portion of the image data, determining whether lighting conditions of a frontal portion of the subject's head indicate a bright environment.

In some cases, determining whether the lighting condition of the frontal portion of the subject's head indicates a bright environment comprises: determining for at least one identified eye whether there are additional light source reflections on the at least one identified eye, and comparing the average intensity of each of the additional light source reflections to the average intensity of the light source reflections detected on the at least one identified eye. The lighting condition of the frontal portion of the subject's head is indicative of a bright environment if there is at least one additional light source reflection on the at least one identified eye having an average intensity and the average intensity substantially matches the average intensity of the light source reflections detected on the at least one identified eye.

In some cases, if it is determined that the lighting conditions of the frontal portion of the subject's head indicate a bright environment, the method includes causing the camera to re-capture image data representing the frontal portion of the subject's head through different lighting conditions.

In some cases, the method includes prompting the subject to face away from additional light sources to provide different lighting conditions.

In some cases, the method includes determining a confidence level of the estimate of the pupil location for each identified eye based on a degree of displacement of the pupil location of the identified eye relative to a center location of an iris on the identified eye.

In some cases, the method includes determining at least one of an interpupillary distance, a left pupillary distance, and a right pupillary distance from the estimated pupil location.

A system for detecting a subject's head pupil location can be summarized as including: a forward facing camera; a display screen; at least one processor communicatively coupled to the forward-facing camera and the display screen; and a memory for storing a set of instructions that, as a result of execution by the at least one processor, cause the system to: capturing image data representing a frontal portion of the subject's head using the forward-facing camera; detecting a face of the subject in the image data; identifying a maximum of two eyes from the detected face; detecting a reflection of the display screen on each of the identified eyes; and estimating a pupil position of each identified eye based on the position of the detected screen reflection on the identified eye.

A non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, as a result of being executed, cause at least one computer processor to: obtaining image data representing a frontal portion of a subject's head; detecting a face of the subject from the image data; identifying a maximum of two eyes from the detected face; detecting a reflection of a light source having a known shape on each of the identified eyes; and estimating a pupil position of each of the identified eyes based on the position of the detected light source reflection on the identified eyes.

The foregoing general description and the following detailed description are examples of the invention, and are intended to provide an overview or framework for understanding the nature of the invention as it is claimed. The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate various embodiments of the invention and together with the description serve to explain the principles and operations of the invention.

Drawings

In the drawings, like reference numbers indicate similar elements or acts. The sizes and relative positions of elements in the drawings are not necessarily drawn to scale. For example, the shapes of various elements and angles are not necessarily drawn to scale, with some elements being arbitrarily enlarged and positioned to improve drawing legibility. Further, the particular shapes of the elements as drawn, are not necessarily to convey any information regarding the actual shape of the particular elements, and have been solely selected for ease of recognition in the drawings.

FIG. 1 is a front view of an exemplary electronic device that may be configured to automatically detect pupil locations on a user's head.

Fig. 2 is a flow chart illustrating a method of automatically detecting pupil position on the head.

Fig. 3A is a front view of the electronic device of fig. 1, illustrating an exemplary home screen of a ui (app ui) of the glasses resizing application.

Fig. 3B is a front view of the electronic device of fig. 1, showing an exemplary pupil detection screen of the app UI.

Fig. 3C is a side view of the electronic device tilted with respect to the vertical direction.

Fig. 3D and 3E are front views of the electronic device of fig. 1, showing different states of a processing screen of the app UI.

Fig. 4 is a schematic diagram of the face region and eye region of an image captured by a front camera.

FIG. 5A is an example of a convolution kernel for detecting screen reflections.

FIG. 5B is an image representation of the convolution kernel of FIG. 5A.

Fig. 6A to 6D are examples of convolution kernels for detecting screen reflection.

Fig. 7A and 7B show the proportional relationship between the display screen and the screen reflection.

Fig. 8A illustrates a lateral shift of the left pupil position to compensate for the offset between the optical axis of the left eye and the visual axis.

Fig. 8B illustrates a lateral shift of the right pupil position to compensate for the offset between the optical axis of the right eye and the visual axis.

Fig. 9A is a schematic diagram illustrating the alignment of the center of the iris of an eye with the position of the pupil detected on the eye.

Fig. 9B is a schematic diagram illustrating the offset between the center of the iris of an eye and the pupil location detected on the eye.

Detailed Description

Certain specific details are set forth in the following description in order to provide a thorough understanding of various disclosed embodiments. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details, or with other methods, components, materials, and so forth. In other instances, well-known structures associated with portable electronic devices and head-mounted devices have not been shown or described in detail to avoid unnecessarily obscuring descriptions of the embodiments. For the sake of continuity and brevity, the same or similar reference numerals may be used for the same or similar objects in a plurality of drawings. For the sake of brevity, the term "corresponding to" may be used to describe a correspondence between features of different drawings. When a feature in a first drawing is described as corresponding to a feature in a second drawing, the feature in the first drawing is considered to have the characteristic of the feature in the second drawing, and vice versa, unless otherwise specified.

Throughout this disclosure, unless the context requires otherwise, the word "comprise" and variations such as "comprises" and "comprising", will be interpreted in an open, inclusive sense, i.e., "including but not limited to", both generically and now.

Reference in the present disclosure to "one embodiment" or "an embodiment" or "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic may be combined in any suitable manner in one or more embodiments or one or more embodiments.

In this disclosure, the singular forms "a," "an," and "the" include plural referents unless the content clearly dictates otherwise. It should also be noted that the term "or" is generally employed in its broadest sense, i.e., as a meaning of "and/or," unless the content clearly dictates otherwise.

The headings and abstract of the disclosure are provided herein for convenience only and do not interpret the scope or meaning of the embodiments.

Fig. 1 illustrates a Portable Electronic Device (PED)100 that may be configured to automatically detect pupil locations of a subject's head according to the methods described herein. The term "pupil position" refers to the central position of the pupil of the eye. For a typical user with both eyes, there are two pupil positions from which the IPD can be calculated, or if the fixation point is known, the exact LPD and RPD can be determined. Although the electronic device 100 is described as being portable, it should be understood that the method may be implemented in electronic devices that are not considered to be portable. For purposes of illustration, PED100 is shown in fig. 1 as a smartphone, but the method of detecting pupil location may be implemented with other types of portable electronic devices (e.g., a tablet computer or a notebook computer). Furthermore, the smartphone shown in fig. 1 as PED100 is too simple and does not indicate all features that the smartphone may have. Generally, a PED (or electronic device in general) configured to detect pupil position is a system comprising a camera and a light source, the position and geometry of which are fixed and known during use of the system. In one example, PED100 includes, among other electronic components, a front-facing (FF) camera 104 (or "self-timer" camera), a display screen 108 that acts as a light source, at least one processor 112 for executing computer-readable instructions, and a memory 116 for storing data and computer-readable instructions. The term "front facing camera" or "FF camera" primarily indicates that the camera is facing the user when the user is operating PED 100. The memory 116 includes one or more types of data storage, such as Random Access Memory (RAM), Read Only Memory (ROM), flash memory, solid state drives, and the like. PED100 may have an electronic forward (FF) flash that may be selectively activated (or turned on) when an image is captured by FF camera 104. On a smartphone, the FF flash is typically activated by displaying a white screen with maximum brightness on the display screen.

In one embodiment, to allow PED100 to be used to detect pupil positions of a subject user's head, a glasses sizing application is stored in memory 116 of electronic device 100. For example, at the request of the subject user, the processor 112 may obtain a glasses sizing application from an application store and store it in the memory 116. The user may activate the glasses sizing application using any suitable method for launching an application on the electronic device (e.g., by selecting an icon in a menu presented on the display 108 or by voice command). In embodiments herein, the eyeglass resizing application typically comprises computer readable instructions for capturing an image of a scene in front of FF camera 104 and extracting pupil location information from the captured image. FF camera 104 is positioned in front of the subject user's head, so the scene captured by the FF camera includes a frontal portion of the head. The front portion of the head is the portion of the head that includes the face. Any suitable method of positioning FF camera 104 in front of the user's head may be used, such as the user holding PED100 in a self-portrait position, or by supporting PED100 on a tripod or self-portrait pole or the like positioned in front of the user.

FF cameras, such as smart phone-attached cameras, typically do not capture high resolution images. Therefore, it is very difficult, and sometimes even impossible, to extract pupil data directly from FF camera images. Herein, the pupil position is detected using a method that does not rely on extracting pupil data from FF camera images. It has been found that if the FF camera captures an image of the front part of the head with the flash on, there will be a reflection of the display screen in each pupil of the eye. These screen reflections can be detected and used as an accurate estimate of the pupil center, especially if the gaze direction of the subject user is known. To ensure accurate detection of screen reflections, the method includes evaluating the lighting conditions at the time the FF camera image is captured. If the lighting conditions may cause ambiguity in detecting screen reflections, the user is prompted to make changes in the environment of the FF camera in order to capture a better FF camera image.

Fig. 2 is a flowchart illustrating a method of detecting a pupil position on a head of a subject user. The method illustrated in fig. 2 and described below can be provided as a set of instructions (or an eyewear sizing application) stored in a memory of an electronic device (e.g., PED100 in fig. 1) and executable by at least one processor of the electronic device. At step 200, a processor of the electronic device (e.g., processor 112 in fig. 1) causes a User Interface (UI) of a glasses sizing application (hereinafter app UI) to be presented on a display screen of the electronic device (e.g., display screen 108 in fig. 1). The app UI may be presented in response to a user accessing a glasses sizing application on the electronic device, for example, by selecting an icon representing the glasses sizing application or by voice command. Fig. 3A shows an example home screen of app UI 300 on display screen 108 of PED 100. The details of the home screen are design elements and may be different than what is shown in fig. 3A. For purposes of illustration, the home screen of app UI 300 may include a set of instructions or information 304 and buttons 308 related to the use of the eyeglass resizing application. In one embodiment, button 308 is configured to have the action of initiating a glasses sizing operation (or WHUD sizing operation). After reading the instructions 304, the user may select a button 308 to begin the eyeglass sizing operation.

Returning to fig. 2, at 204, the processor receives a request to begin an eyeglass resizing operation. In response to the request to begin the glasses resizing operation, at 208, the processor presents a pupil detection screen of the app UI on the display screen. The processor may further turn on a FF camera (e.g., camera 104 in fig. 1). For purposes of illustration, fig. 3B shows an exemplary pupil detection screen of app UI 300 on display screen 108. The details of the pupil detection screen are design elements and may be different from what is shown in fig. 3B. For purposes of illustration, the pupil detection screen includes a capture window 312, a top column 314 above the capture window 312, and a bottom column 316 below the capture window 312. Because the processor has turned on the FF camera 104, the scene in front of the FF camera 104 is displayed within the capture window 312 (in this case, the user operates the PED 100). The bottom bar 316 includes buttons 320. In one embodiment, button 320 is configured to initiate the action of a pupil detection process. The user may select button 320 to begin the pupil detection process. When the pupil detection process is running, feedback regarding the process may be displayed in the bottom bar 316 and/or the top bar 314.

Returning to fig. 2, at 212, the processor receives a request to begin a pupil detection process. In response to receiving a request to begin the pupil detection process, the processor may perform some preliminary examination. For example, the upper eyelid typically obscures a portion of the eye. Generally, the lower the position at which the eye is fixated, the larger the portion that is obscured by the upper eyelid. Therefore, it is preferable that when capturing an image of the user's face for pupil detection purposes, the eyes are looking down to such an extent that the upper eyelid does not substantially cover the pupil. Fig. 3C shows an example of the PED100 (and thus the camera 104 included in the PED 100) in a tilted position with respect to the vertical direction (direction of gravity). The angle between PED100 and the vertical is denoted by β and may be referred to as tilt angle hereinafter. The tilt angle may be predetermined and beyond a tilt angle threshold, the gaze of the eye may be too low. As a non-limiting example, the tilt angle threshold may be 20 degrees. Returning to fig. 2, in one embodiment, at 216, the processor determines a tilt angle of the electronic device (e.g., PED 100) relative to a vertical direction. The processor may obtain the measurements required to determine the tilt angle of the electronic device from a sensor unit in the electronic device, such as an Inertial Motion Unit (IMU). The processor determines whether a tilt angle of the electronic device (i.e., an angle between the electronic device and a vertical direction) exceeds a tilt angle threshold. If the tilt angle exceeds the tilt angle threshold, the processor may prompt the user to lift the device higher (e.g., by displaying an appropriate message on the app UI), causing the tilt angle to decrease. The processor may repeatedly determine the tilt angle of the electronic device relative to the vertical and check if the tilt angle is outside of an acceptable range until the user has properly adjusted the orientation of the device.

At 220, the processor may present a gaze focus on a pupil detection screen of the app UI. When the FF camera captures an image of the front portion of the user's head, it is desirable for the user to focus on the gaze focus. The gaze focus may be any UI element, such as a textual element or a graphical element. Further, the UI element may be animated or have features that attract the attention of the user. In some cases, the user image displayed in the capture window of the app UI may serve as the gaze focus. For purposes of illustration, fig. 3D shows an example gaze focus 332 in the top column 314. Gaze focus 332 is shown as a timer. However, as previously described, any suitable UI element that may provide focus may be used as the gaze focus (e.g., simple text such as "see here" or an animation that draws the user's attention). A cue (such as cue 334) may be displayed in the bottom bar 316 to direct the user's gaze to the gaze focus 332.

Returning to fig. 2, at 224, the processor causes the FF camera to capture one or more image frames (or, simply, images) with the FF flash turned on — these FF camera image frames may be referred to as Flash (FON) images. Turning on the flash is equivalent to illuminating the front portion of the user's head captured by the FF camera. Act 224 may be undertaken with the user looking at the gaze focus. After capturing the FON image(s) at 224, the processor may turn off the FF camera (and turn off the FF flash) and present the processing screen of the app UI, as shown at 228. Fig. 3E shows an example of the processing screen of app UI 300 after FF camera 104 is turned off. In this case, the gaze focus is removed, as it is no longer needed. Capture window 312 is empty because FF camera 104 has been closed. A message 336 may be displayed in the bottom border 316 to indicate that the application is processing the captured image.

Returning to fig. 2, at 232, the processor selects at least one FON image and detects screen reflections on each eye region in the selected FON image (it is not necessary to detect screen reflections on left and right eye regions from the same FON image, although it may be convenient to do so). Referring to FIG. 4, in one embodiment, detecting the screen reflection may include detecting a facial region 400 from the selected FON image, for example, using a facial recognition library (such as DLIB). A region of interest (ROI)404 including the left eye and a ROI408 including the right eye are detected from the face region 400 using a face recognition library. Then, a left screen reflection 412 in the ROI404 and a right screen reflection 416 in the ROI408 are detected. In one embodiment, the screen reflections 412, 416 are detected using a convolution-based approach.

In one example of the convolution-based method, a convolution kernel (or filter) for detecting a screen reflection (hereinafter referred to as a screen reflection kernel) is constructed. In one embodiment, the screen reflection kernel is a matrix that supports bright areas having a screen reflection shape and surrounded by dark pixels. The screen reflection shape is determined by the shape of the display screen (or known light source) reflected onto the eye. In one non-limiting example, the screen reflection shape is a rectangle. Fig. 5A shows an example of a screen reflection kernel 420. However, this example is not intended to be limiting, as weights other than those shown in FIG. 5A may be used in the matrix, and the size of the matrix may be different than that shown in FIG. 5A. The inner area of the matrix, i.e. the area denoted by a number of 1's, is usually slightly smaller than the size of the screen reflection to be detected. FIG. 5B shows an image representation of the screen reflection kernel 420, where 1 is replaced with white and 1 is replaced with black. The screen reflection kernel (e.g., kernel 420 in fig. 5A) is convolved with the ROI404 (in fig. 4) to detect a bright region having a screen reflection shape (e.g., a rectangular shape), which corresponds to at least a portion of the left screen reflection 412. Similarly, a screen reflection kernel (e.g., kernel 420 in fig. 5A) is convolved with ROI408 (in fig. 4) to detect a bright region having a screen reflection shape (e.g., a rectangular shape), which corresponds to at least a portion of right screen reflection 416. Each window of the convolution operation can be normalized by the pixels within the window-this will ensure that windows with large differences between the bright inner region and the dark outer region end up with higher response after convolution. The neighborhood of the bright regions corresponding to the screen reflections found by the convolution process may be searched to determine the boundaries of the detected screen reflections.

In another convolution method, two screen reflection kernels are constructed-a first kernel to find the bright areas corresponding to the screen reflections and a second kernel to find the dark edges around the screen reflections. The first kernel is similar to the screen reflection kernel described above (e.g., kernel 420 in fig. 5A) which supports bright regions having a screen reflection shape and surrounded by dark pixels, but the outer regions of the first kernel will be multiple 0's instead of multiple-1's as described above, so that the convolution response does not take into account the dark outer regions. FIG. 6A illustrates a matrix representation showing an example first core 424. FIG. 6B shows an image representation of the first kernel 424, where 1 is replaced with white and 0 is replaced with black. The second kernel supports dark pixels surrounding a bright area having the shape of a screen reflection. Fig. 6C shows a matrix representation of an example second core 428. Fig. 6B shows an image representation of the second kernel 428, where 1 is replaced with white and 0 is replaced with black. The example kernels shown in fig. 6A-6D detect screen reflections having a rectangular shape. However, cores that detect screen reflections having other shapes may be constructed in a similar manner. To detect the left screen reflection, each of the first and second kernels (e.g., kernel 424 in fig. 6A and kernel 428 in fig. 6B) is convolved with the left ROI404 (in fig. 4). The result of the convolution using the second kernel is divided by the result of the convolution using the first kernel to obtain a final output image including the detected left screen reflection. Similarly, to detect the right screen reflection, each of the first and second kernels is convolved with the right ROI408 (of fig. 4), and the result of the convolution using the second kernel is divided by the result of the convolution using the first kernel to obtain a final output image including the detected right screen reflection. This method avoids the normalization process in previous convolution methods and speeds up the detection of screen reflections.

If there are multiple reflections on any eye ROI (e.g., 404 or 408 in fig. 4) from additional light sources (i.e., light sources other than the known light sources included in the PED), the screen reflection detected by the convolution process is not necessarily the one corresponding to the screen reflection inside the pupil. Therefore, it is useful to check whether there are multiple reflections on the eye ROI that may interfere with pupil position detection. Returning to FIG. 2, at 236, the processor determines whether the lighting condition of the front portion of the user's head when capturing the selected FON image indicates a bright environment if the user is facing one or more known light sources other than the known light source (e.g., a window or artificial light or a bright reflection of a wall or surface in front of the user) during the capturing of the FON image. To make this determination, all reflections inside each of the eye ROI404 (of fig. 4) and the eye ROI408 (of fig. 4) are detected. In one embodiment, all reflections R within each eye ROI are detected using an adaptive threshold T calculated based on the screen reflection average intensity Ir (which may be the average intensity of the screen reflection found by the convolution process described above). If dist (r)_i,(c_x,c_y))<D and area (r)_i)<A, then the reflection r_ie.R is considered as an indication of a bright environment, whichMiddle dist is the Euclidean distance, area is the area function, and (c)_x,c_y) Is the detected screen reflection. If one or more of the reflections satisfy the above condition, then the lighting condition of the front portion of the user's head when capturing the selected FON image indicates a bright environment.

If the processor concludes that the selected FON image was captured in a bright environment, then at 240, the processor discards the FON image captured at 224. The processor may notify the user of the failed pupil detection because the user is facing the unknown light source, and may prompt the user to face away from any additional light sources. The processor may stop the eyeglass resizing operation and return to step 200. Once the user is facing away from the additional light source (i.e., the light source included in the PED in addition to the known light source), the user may attempt to run the eyeglass resizing operation again.

If the processor concludes that the selected FON image was not captured in a bright environment, then at 244 the processor estimates the pupil position from the screen reflection detected at 232. Depending on the tolerance of the error, any point on the detected left screen reflection may be taken as the left pupil position (or left pupil center) and any point on the detected right screen reflection may be taken as the right pupil position (or right pupil center). If a more accurate pupil position is desired, the left pupil position may be determined by the proportional relationship between the left screen reflection and the display screen, and the right pupil position may be determined by the proportional relationship between the right screen reflection and the display screen. For example, FIG. 7A shows a display 432 having a height Sy and a width Sx. The gaze focus G is shown on the display screen 432 (which would be where the user was focused when capturing the user's FON image). Gaze focus G is vertically displaced Sdy from the upper edge of display screen 432 and horizontally displaced Sdx from the left edge of the display screen. Fig. 7B shows a screen reflection 436 (which may be a left screen reflection or a right screen reflection) having a height Ry and a width Rx. The pupil center (or pupil position) P to be determined is displaced Rdy vertically from the upper edge of the screen reflection and Rdx horizontally from the left edge of the screen reflection. Since Rx, Ry, Sx, Sy, Sdx, and Sdy are known, Rdx and Rdy can be determined by the following equations (1) and (2):

the left and right pupil positions obtained by the above expressions may be adjusted to compensate for the offset between the optical axis of the eye and the visual axis. In one example, this includes applying a right offset (right horizontal shift) to the left pupil position and applying a left offset (left horizontal shift) to the right pupil position. In one non-limiting example, a right offset approximately equal to the horizontal displacement of the right edge of the baseline left pupil position from the left screen reflection provides suitable compensation for the offset between the optical axis of the left eye and the visual axis. Similarly, in one non-limiting example, a left offset approximately equal to the horizontal displacement of the left edge of the baseline right pupil position from the left screen reflection provides suitable compensation for the offset between the optical axis of the right eye and the visual axis. The baseline left and right pupil positions may be determined according to equations (1) and (2). For purposes of illustration, fig. 8A shows a baseline left pupil position PL on the left screen reflex 412. The horizontal displacement of the baseline pupil position PL from the right edge of the screen reflection 412 is represented by Δ xl. According to one example, if the coordinates of the baseline pupil position PL are (x1, y1), the left pupil position PL' that compensates for the shift between the optical axis of the eye and the visual axis may be (x1+ Δ xl, y). However, a different right shift than shown in FIG. 8B may be applied to PL (e.g., one less than Δ xl or greater than Δ xl). Fig. 8B shows the baseline right pupil position PR on the right screen reflection 416 shifted to the left of PR by an offset Δ xr. If the coordinates of the baseline pupil position PR are (x2, y2), the right pupil position PR' that compensates for the offset between the optical axis of the eye and the visual axis may be (x 2- Δ xr, y 2). As described above, for the left pupil position, a different left shift (e.g., one shift less than Δ xr or greater than Δ xr) may be applied to PR than shown in fig. 8B.

Returning to fig. 2, at 248, the processor may determine a confidence level of the screen reflection detection and the resulting estimated pupil position. In one example, determining the confidence level of the screen reflection detection includes detecting the left iris and the right iris from the selected FON image (or selected FON images). The left and right irises can be found by a convolution process using an iris kernel having a generally circular shape approximating the shape of the irises. The iris is surrounded by the sclera. Thus, the convolution process can detect a non-white region surrounded by white pixels. Next, the center of each iris is determined. For each eye, the confidence is calculated as the gaussian-weighted distance of the detected pupil position from the center of the iris of the eye. The highest confidence value of 1 occurs when the iris center and pupil positions of the eye align. Fig. 9A illustrates a case where the detected pupil position 450 is aligned with the center of the iris 454. In this case, the confidence level is 1. Fig. 9B illustrates a case where the detected pupil position 450' is not aligned with the center of the iris 454. The further the pupil location is from the center of the iris, the lower the confidence in the screen reflection detection. In fig. 9A and 9B, the gaussian-weighted distance is calculated in the horizontal direction (along the x-axis). However, the gaussian-weighted distance can be calculated in both the horizontal direction (along the x-axis) and the vertical direction (along the y-axis) to obtain more accurate measurements. In some cases, the horizontally and vertically calculated gaussian weighted distances may be multiplied by each other to obtain a final confidence value. The pupil position determined at 244 for each eye will have an associated confidence measure (or set of associated confidence measures if the gaussian-weighted distances are calculated in the horizontal and vertical directions and the horizontal and vertical gaussian-weighted distances are not combined). As previously described, the detected pupil position may be a baseline pupil position or a pupil position adjusted to compensate for a shift between the optical axis and the visual axis of the eye as compared to the center of the iris in the confidence measurement calculation.

Returning to fig. 2, the processor may calculate the IPD from the determined pupil position. The IPD is simply the distance between the left and right pupil positions. The processor may also calculate LPD and RPD — assuming the user's gaze point is known, the simulated pupil may be rotated to estimate gaze towards infinity, after which the nose bridge center and the two pupil measurements may be used to calculate LPD and RPD, respectively. The pupil detection data (e.g., pupil position, confidence level of screen reflection detection, and any calculated IPD, LPD, and RPD) may be stored in a memory of the electronic device (e.g., memory 116 in fig. 1) and/or transmitted to a remote location/server for glasses or WHUD sizing or other operations requiring information about the pupil position on the subject's head.

The foregoing description of illustrated embodiments and implementations, including what is described in the abstract of the invention, is not intended to be exhaustive or to limit the embodiments and implementations to the precise forms disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible without departing from the spirit and scope of the present invention, as those skilled in the relevant art will recognize.

25页详细技术资料下载

Method and system for automatic pupil detection

相关技术

网友询问留言