Apparatus for machine learning based visual device selection

文档序号：1942816 发布日期：2021-12-07 浏览：5次中文

阅读说明：本技术 用于基于机器学习的视觉设备选择的装置 (Apparatus for machine learning based visual device selection ) 是由 J·安多科 E·内特于 2020-04-09 设计创作，主要内容包括：本披露内容涉及确定用于患者或用户的视觉设备。在实施例中,基于机器学习的方法在标记的面部和视觉设备的数据库的上下文中考虑用户的面部,所述标记的图像中的每一个反映了建议的视觉设备相对于患者或用户的面部的美学价值。(The present disclosure relates to determining a visual device for a patient or user. In an embodiment, a machine learning based approach considers a user's face in the context of a database of labeled faces and visual devices, each of the labeled images reflecting the aesthetic value of a proposed visual device relative to the patient or the user's face.)

1. An apparatus for determining a suitable vision device, the apparatus comprising:

processing circuitry configured to

Receiving at least one input, the at least one input comprising an image of a person's face,

applying a neural network to the at least one input, the neural network generating at least one fitness metric for the at least one input, and

determining the suitable visual device based on the at least one suitability metric generated by the neural network,

wherein the at least one fitness metric corresponds to a correlation synchronization between the person's face and a visual device.

2. The apparatus of claim 1, wherein the at least one input comprises a visual device image.

3. The apparatus of claim 1, wherein the at least one input comprises an image of a face of the person, wherein the person is wearing a vision device.

4. The apparatus of claim 1, wherein the at least one input is processed at least one input comprising morphological features determined from an image of the person's face.

5. The apparatus of claim 2, wherein the at least one input is processed at least one input comprising a visual device attribute determined from the visual device image.

6. The apparatus of claim 1, wherein the at least one input comprises a visual measurement of the person, the visual measurement indicative of visual acuity of the person.

7. The apparatus of claim 1, wherein the processing circuitry is further configured to

Training the neural network on a training database,

wherein the training database comprises a training image corpus comprising facial images of a person and visual device images, each combination of an image in the facial images of the person and an image in the visual device images being associated in the training database with at least one training suitability metric assigned by a marker group.

8. The apparatus of claim 7, wherein the corpus of training images comprises images of a person wearing a visual device, each of the images of the person wearing a visual device being associated in the training database with at least one training suitability metric assigned by the set of markers.

9. The apparatus of claim 1, wherein the neural network comprises implicit inputs, the implicit inputs being a predefined set of visual devices, the at least one suitability metric generated by the neural network being at least one matching score of the at least one input to each of the predefined set of visual devices.

10. The apparatus of claim 9, wherein to determine the appropriate visual device, the processing circuitry is further configured to

Selecting a largest at least one matching score, the largest at least one matching score being the one of the predetermined set of visual devices that best matches the face of the person in the at least one input.

11. The apparatus of claim 10, wherein the maximum at least one match score is selected from a vector comprising the at least one match score, each of the at least one match scores in the vector corresponding to one visual device in the predetermined set of visual devices, the at least one match score based on a percentage of markers in the marker group assigned a same value of the at least one match score.

12. The apparatus of claim 11, wherein to determine the appropriate visual device, the processing circuitry is further configured to

Calculating coordinates corresponding to the at least one input,

calculating a center of gravity of a cluster associated with each visual device of the predefined set of visual devices,

calculating a distance between the coordinates and each center of gravity of the cluster, the distances being ordered in a vector, an

Selecting a cluster of the clusters that minimizes a distance between the coordinate and each center of gravity of the cluster.

13. The apparatus of claim 12, wherein the cluster associated with each visual device of the predefined set of visual devices comprises matching coordinates corresponding to at least one training input that maximizes at least one training matching score during training of the neural network, the at least one training input comprising morphological features of the person's face.

14. A method for determining a suitable vision apparatus, the method comprising:

receiving, by processing circuitry, at least one input comprising an image of a person's face;

applying, by the processing circuitry, a neural network to the at least one input, the neural network generating at least one fitness metric for the at least one input; and

determining, by the processing circuitry, the suitable visual device based on the at least one suitability metric generated by the neural network,

wherein the at least one fitness metric corresponds to a correlation synchronization between the person's face and a visual device.

15. A non-transitory computer-readable storage medium storing computer-readable instructions that, when executed by a computer, cause the computer to perform a method for determining a suitable vision device, the method comprising:

receiving at least one input, the at least one input comprising an image of a person's face;

applying a neural network to the at least one input, the neural network generating at least one fitness metric for the at least one input; and

determining the suitable visual device based on the at least one suitability metric generated by the neural network,

wherein the at least one fitness metric corresponds to a correlation synchronization between the person's face and a visual device.

Technical Field

The present disclosure relates to eyewear, and in particular to the matching of a vision device to a patient's face.

Background

During the process of selecting a new ocular device or eyewear, patients often need to perform a self-examination to determine the aesthetics of the new eyewear on their face. At the same time, the patient may be confused between his own opinion of the new eye wear in the face and a fictitious opinion of whether a third party (e.g., friend, family, professional, etc.) is appropriate for the new eye wear in the face. The task of eye-wear selection can be burdensome given the aesthetic appeal and the necessity of eye-wear for proper vision, with no effective way to confidently purchase a set of eye-wears that users, users 'doctors, and users' friends must be satisfied with. The present disclosure provides a solution to this problem.

The foregoing "background" description is intended to generally introduce the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.

Disclosure of Invention

The present disclosure relates to an apparatus, method, and computer-readable storage medium for determining a suitable vision device.

According to an embodiment, the present disclosure further relates to an apparatus for determining a suitable vision device, the apparatus comprising processing circuitry configured to: receiving at least one input, the at least one input comprising an image of a person's face; applying a neural network to the at least one input, the neural network generating at least one fitness metric for the at least one input; and determining the appropriate visual device based on the at least one suitability metric generated by the neural network, wherein the at least one suitability metric corresponds to a correlation synchronization between the person's face and a visual device.

According to an embodiment, the present disclosure further relates to a method for determining a suitable visual device, the method comprising receiving, by processing circuitry, at least one input comprising an image of a person's face; applying, by the processing circuitry, a neural network to the at least one input, the neural network generating at least one fitness metric for the at least one input; and determining, by the processing circuitry, the appropriate visual device based on the at least one suitability metric generated by the neural network, wherein the neural network includes implicit input, wherein the at least one suitability metric corresponds to a correlation synchronization between the person's face and a visual device.

The preceding paragraphs have been provided as a general introduction and are not intended to limit the scope of the claims below. The described embodiments, together with further advantages, will be best understood by reference to the following detailed description taken in conjunction with the accompanying drawings.

Drawings

A more complete appreciation of the present disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:

FIG. 1 is a flow diagram of an implementation of a machine learning based vision device selection tool in accordance with an exemplary embodiment of the present disclosure;

FIG. 2A is a schematic diagram of one aspect of an image input process according to an exemplary embodiment of the present disclosure;

FIG. 2B is a schematic diagram of one aspect of an image preparation process in accordance with an exemplary embodiment of the present disclosure;

FIG. 3 is a schematic diagram of one aspect of an image preparation process in accordance with an exemplary embodiment of the present disclosure;

FIG. 4 is an aspect of a flow chart of a training process for a neural network of a visual device selection tool in accordance with an exemplary embodiment of the present disclosure;

FIG. 5A is a schematic illustration of an input to a marking process according to an exemplary embodiment of the present disclosure;

FIG. 5B is a schematic diagram of a marking process according to an exemplary embodiment of the present disclosure;

FIG. 6A is a diagram of inputs to a training process for a machine learning based visual device selection tool according to an exemplary embodiment of the present disclosure;

FIG. 6B is a diagram of inputs to a training process of a machine learning based visual device selection tool according to an exemplary embodiment of the present disclosure;

FIG. 6C is a diagram of inputs to a training process of a machine learning vision device selection tool in accordance with an exemplary embodiment of the present disclosure;

FIG. 6D is a diagram of inputs to a training process of a machine learning vision device selection tool in accordance with an exemplary embodiment of the present disclosure

FIG. 6E is a diagram of inputs to a training process of a machine learning vision device selection tool in accordance with an exemplary embodiment of the present disclosure

FIG. 7A is a schematic diagram illustrating tagging of inputs in accordance with an exemplary embodiment of the present disclosure;

FIG. 7B is a schematic diagram illustrating tagging of inputs in accordance with an exemplary embodiment of the present disclosure;

FIG. 7C is a schematic diagram illustrating tagging of inputs in accordance with an exemplary embodiment of the present disclosure;

FIG. 7D is a schematic diagram illustrating tagging of inputs in accordance with an exemplary embodiment of the present disclosure;

FIG. 8A is a diagram illustrating multiple marker sets according to an exemplary embodiment of the present disclosure;

FIG. 8B is a diagram illustrating tagging of an input with multiple tagging groups, according to an exemplary embodiment of the present disclosure;

FIG. 8C is a diagram illustrating tagging of an input with multiple tagging groups, according to an exemplary embodiment of the present disclosure;

FIG. 8D is a diagram illustrating tagging of an input with multiple tagging groups, according to an exemplary embodiment of the present disclosure;

FIG. 8E is a diagram illustrating tagging of an input with multiple tagging groups, according to an exemplary embodiment of the present disclosure;

FIG. 9A is a schematic diagram illustrating a marker set according to an exemplary embodiment of the present disclosure;

FIG. 9B is a diagram of inputs to a training process for a machine learning based vision device selection tool in accordance with an exemplary embodiment of the present disclosure;

FIG. 9C is a diagram of inputs to a training process for a machine learning based vision device selection tool in accordance with an exemplary embodiment of the present disclosure;

FIG. 9D is a diagram of inputs to a training process for a machine learning based vision device selection tool in accordance with an exemplary embodiment of the present disclosure;

FIG. 9E is a diagram of inputs to a training process for a machine learning based vision device selection tool in accordance with an exemplary embodiment of the present disclosure;

FIG. 9F is a diagram of inputs to a training process for a machine learning based visual device selection tool according to an exemplary embodiment of the present disclosure

FIG. 10A is a diagram illustrating tagging of an input with a tagging set according to an exemplary embodiment of the present disclosure;

FIG. 10B is a schematic diagram illustrating tagging of an input with a tagging set in accordance with an exemplary embodiment of the present disclosure;

FIG. 10C is a diagram illustrating tagging of an input with a tagging set, according to an exemplary embodiment of the present disclosure;

FIG. 10D is a diagram illustrating tagging of an input with a tagging set, according to an exemplary embodiment of the present disclosure;

FIG. 11A is a schematic diagram of a neural network of a machine learning based visual device selection tool in which the inputs are morphological features and visual device attributes, according to an exemplary embodiment of the present disclosure;

FIG. 11B is a schematic diagram of a neural network of a machine learning based visual device selection tool in which the inputs are morphological features and visual device attributes, in accordance with an exemplary embodiment of the present disclosure;

FIG. 11C is a schematic diagram of a neural network of a machine learning based visual device selection tool in which the inputs are facial images and visual device attributes, according to an exemplary embodiment of the present disclosure;

FIG. 11D is a schematic diagram of a neural network of a machine learning based visual device selection tool in which the inputs are morphological features and visual device images in accordance with an exemplary embodiment of the present disclosure;

FIG. 11E is a schematic diagram of a neural network of a machine learning based visual device selection tool in which the inputs are a facial image and a visual device image in accordance with an exemplary embodiment of the present disclosure;

FIG. 11F is a schematic diagram of a neural network of a machine learning based visual device selection tool in which the inputs are morphological features, visual measurements and visual device attributes, in accordance with an exemplary embodiment of the present disclosure;

FIG. 11G is a schematic diagram of a neural network of a machine learning based visual device selection tool in which the inputs are morphological features, visual measurements and visual device attributes, in accordance with an exemplary embodiment of the present disclosure;

FIG. 11H is a schematic diagram of a neural network of a machine learning based visual device selection tool in which the inputs are facial images, visual measurements and visual device attributes, according to an exemplary embodiment of the present disclosure;

FIG. 11I is a schematic diagram of a neural network of a machine learning based visual device selection tool in which the inputs are morphological features, visual measurements and visual device images in accordance with an exemplary embodiment of the present disclosure;

FIG. 11J is a schematic diagram of a neural network of a machine learning based visual device selection tool in which the inputs are a facial image, visual measurements and a visual device image, according to an exemplary embodiment of the present disclosure;

FIG. 11K is a schematic diagram of a neural network of a machine learning based vision device selection tool in which the inputs are morphological features in accordance with an exemplary embodiment of the present disclosure;

FIG. 11L is a schematic diagram of a neural network of a machine learning based vision device selection tool in which the inputs are morphological features in accordance with an exemplary embodiment of the present disclosure;

FIG. 11M is a schematic diagram of a neural network of a machine learning based vision device selection tool in which the inputs are morphological features in accordance with an exemplary embodiment of the present disclosure;

FIG. 11N is a schematic diagram of a neural network of a machine learning based vision device selection tool in which the inputs are morphological features in accordance with an exemplary embodiment of the present disclosure;

FIG. 11O is a schematic diagram of a neural network of a machine learning based vision device selection tool in which the inputs are morphological features and vision measurements, in accordance with an exemplary embodiment of the present disclosure;

FIG. 11P is a schematic diagram of a neural network of a machine learning based vision device selection tool in which the inputs are morphological features and vision measurements, in accordance with an exemplary embodiment of the present disclosure;

FIG. 11Q is a schematic diagram of a neural network of a machine learning based vision device selection tool in which the inputs are facial images and vision measurements, according to an exemplary embodiment of the present disclosure;

FIG. 11R is a schematic diagram of a neural network of a machine learning based vision device selection tool in which the inputs are facial images and vision measurements, according to an exemplary embodiment of the present disclosure;

FIG. 12A is a schematic diagram illustrating the preparation of a training database for a neural network training process in accordance with an exemplary embodiment of the present disclosure;

FIG. 12B is a schematic diagram of a neural network of a training process for a machine learning based visual device selection tool, where the input is metric facial marker coordinates;

FIG. 13A is a generalized flow diagram of a neural network configured to process heterogeneous input data in accordance with an exemplary embodiment of the present disclosure;

FIG. 13B is an aspect of a generalized flow diagram of a neural network configured to process heterogeneous input data, according to an exemplary embodiment of the present disclosure;

FIG. 13C is an aspect of a generalized flow diagram of a neural network configured to process heterogeneous input data, according to an exemplary embodiment of the present disclosure;

FIG. 13D is an aspect of a generalized flow diagram of a neural network configured to process heterogeneous input data, according to an exemplary embodiment of the present disclosure;

FIG. 13E is an aspect of a generalized flow diagram of a neural network configured to process heterogeneous input data, according to an exemplary embodiment of the present disclosure;

FIG. 13F is an aspect of a generalized flow diagram of a neural network configured to process heterogeneous input data, according to an exemplary embodiment of the present disclosure;

FIG. 14 is a flowchart of training a neural network of a machine learning based vision device selection tool in accordance with an exemplary embodiment of the present disclosure;

FIG. 15A is a generalized flow diagram of an embodiment of an artificial neural network;

FIG. 15B is a flow diagram of an implementation of a convolutional neural network, according to an exemplary embodiment of the present disclosure;

FIG. 16 is an example of a feedforward artificial neural network; and

fig. 17 is a hardware configuration of a machine learning based vision device selection tool according to an exemplary embodiment of the present disclosure.

Detailed Description

The terms "a" or "an," as used herein, are defined as one or more than one. The term "plurality", as used herein, is defined as two or more than two. The term another, as used herein, is defined as at least a second or more. The terms including and/or having, as used herein, are defined as comprising (i.e., open language). The terms "visual equipment", "eyeglasses", and "visual equipment" may be used interchangeably to refer to a device having both a frame and lenses. The term "visual equipment" may be used to refer to a single visual equipment, while the term "visual equipment" may be used to refer to more than one visual equipment. Reference throughout this document to "one embodiment," "certain embodiments," "an embodiment," "an implementation," "an example" or similar terms means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Similarly, the terms "facial image" and "facial image of a person" are corresponding terms that may be used interchangeably. Thus, the appearances of such phrases or in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments without limitation.

Today, patients or other users seeking vision equipment or glasses often have little guidance regarding ophthalmically suitable and aesthetically pleasing. For some patients or users, cultural trends drive their decision-making. Friends and family opinions are most important to other patients or users. The opinion of a trained visual expert is necessary for still other patients or users who prefer ergonomically and visual acuity. Currently, users may use methods that provide some, but not all, of the above features. For example, in one approach, a decision tree may be implemented to match the frame of the visual device to morphological features detected from landmarks (landmark) on the face of the individual, the match determining the ideal visual device. In another approach, the user may be asked to answer questions about their own style, lifestyle, and personality in order to determine the user's tastes and habits. These features can then be used to recommend desired visual devices based on implementation of decision trees or content-based filtering. As an extension of the above, yet another approach employs a user preference model to perform an analysis of the user's browsing history to determine visual device characteristics (e.g., content-based filtering) or the user's closest consumer profile (e.g., collaboration-based filtering) like a frame that appears to be relevant to the user.

The above-described methods, while partially addressing the needs of the user, do not provide robust, end-to-end input to the user when making visual device selections. To this end, the present disclosure describes a machine learning based visual device selection tool for presenting appropriate visual device selections to a user based on morphological and structural features, ophthalmic needs, and aesthetic appeal.

Referring now to the drawings, FIG. 1 is a generalized flow diagram of a machine learning based vision device selection tool (ML-VEST)100 according to an exemplary embodiment of the present disclosure. The ML-VEST 100 may include an input preparation process 110, a machine learning application process 115, and a visual device selection process 125. Initially, a user provides input 105 to the ML-VEST 1000. Based on neural network training, the input 105 may be provided 114 directly to the machine learning application process 115, or may be provided to the input preparation process 110, where the input 105 is prepared according to the specifications of a particular implementation of the machine learning application process 115. In an embodiment, the input 105 may be a user facial image that needs to be prepared and provided to the input preparation process 110 accordingly. The prepared input or prepared image may then be passed to the machine learning application process 115. The suitability metric 120 may be generated from the machine learning application process 115, where the prepared image corresponding to the user's face is scored based on the "suitability" of the visual device or eyewear, which is a correlation synchronization that provides a metric that quantifies the suitability between the user's face and the visual device. Based on the magnitude of the suitability metric 120 scored for the visual device or devices, the visual device selection process 125 may select the desired glasses for the face of the user providing the input. In an embodiment, a user may provide an image of the user's face and an image of a visual device of interest. In processing the image, the ML-VEST 100 may generate a suitability metric 120 that, when compared to a predetermined threshold for the suitability metric 120, indicates that the visual device should be selected as the user's ideal visual device. In an embodiment, the suitability metric 120 may be associated with a confidence level that, when compared to a confidence level threshold, indicates whether the suitability metric 120 is accurate. In another embodiment, the user may provide only a facial image as input, and the desired or suitable vision device may be a vision device selected from a database of multiple eyeglasses, or may be selected, for example, from a subset of eyeglasses pre-selected by the user or available to the user. To this end, as described above, a suitability metric 120 may be generated for each selected visual device, and a comparison of the generated suitability metrics 120 may indicate the visual device to be selected as the ideal visual device. Further, the user may be recommended the ideal visual device reflecting the user's unique morphological characteristics in the context of the user's preferences with respect to aesthetic appeal and visual acuity.

FIG. 2A depicts at least one input 205 that may be provided by a user to the ML-VEST. As described with reference to fig. 1, the at least one input 205 may be provided to an input preparation process or may be provided directly to a machine learning application process. The at least one user-provided input 205 may also include an image of the user's face 206, an image of the user's face and separately provided images of the visual device 207, an image of the user's face wearing the visual device 208, and visual measurements corresponding to the user 209, among others. The visual measurement may be a standard ophthalmic measurement of visual acuity.

When provided to the input preparation process, each of the at least one input may be prepared prior to being provided to the machine learning application process, as shown in fig. 2B. Thus, FIG. 2B depicts an input preparation process 210 that may be implemented on at least one input received. In embodiments, the input preparation process 210 described herein may be implemented on input provided by a user during application of the ML-VEST and input provided during training of the ML-VEST neural network.

At a high level, as applied during application of the ML-VEST or training of the ML-VEST, the input preparation process 210 performs at least one input preparation function 211 and generates at least one input preparation output 213. As will be understood by one of ordinary skill in the art, the at least one input preparation function 211 and the at least one input preparation output 213 may be selected such that a similar process is performed during application of the ML-ves and during training of the neural network of the ML-ves.

From a low level, the at least one input preparation function 211 may include, for example, image classification, image segmentation and convolution 212, and the like. Image segmentation may be performed to detect relevant characteristics of at least one input during training of a neural network of the ML-VEST and during application of the ML-VEST. These relevant characteristics may be, for example, morphological features such as "face width" and "nose size", or may be visual device attributes such as "frame shape" and "frame color", in the case of what is referred to as at least one input preparation output 213. Additional morphological features include facial shape, skin color, eye color, hair color, and the like. Such morphological features may be computed via image processing (i.e., image segmentation/classification) as described above, or may be determined or measured manually on the input image, where manual measurement requires the calibration object to accurately compute the dimensions of the features. Additional visual equipment attributes may include lens width, lens height, nose bridge distance, temple length, and the like. Such visual device attributes may be calculated via image processing (i.e., image segmentation/classification) as described above, or may be determined or measured manually on the input image, as described above, where manual measurement requires the calibration object to accurately calculate the attributes. In some cases, the visual device attributes may be accessed from a database containing the visual device attributes.

In an embodiment, and in addition to the image segmentation and image classification described above, convolution 212 may be performed on at least one input. Convolution 212 may include the use of a convolution filter and may speed up feature extraction. As will be described later, the convolution 212 may also be performed by the neural network of the ML-VEST, thereby avoiding the input preparation process 210.

In embodiments, at least one input may be provided to the input preparation process 210, or may be provided directly to the machine learning application process. For example, the at least one input may be a visual measurement of the corresponding user. In the case provided by the user, the at least one input may include a sphere lens and an add-down light, and may be provided to a neural network of the ML-ves.

Referring to fig. 3, after the input preparation process, the prepared at least one input may be delivered to the ML-ves machine learning application process 315, if needed. In general, the inputs of the machine learning application process may include at least one input (e.g., facial images and visual device images, facial images wearing visual devices) and at least one input (e.g., morphological features from facial images, visual device attributes from visual device images) provided directly to the machine learning application process. Taken together, several use cases of the neural network of the input ML-VEST can be considered: (1) morphological features and visual device attributes obtained from the facial image and the visual device image or from the facial image wearing the visual device; (2) facial image and visual device attributes; (3) morphological features and visual device attributes; (4) a facial image and a visual device image; (5) morphological features, visual device attributes, and visual measurements; (6) facial images, visual device attributes, and visual measurements; (7) morphological features, visual device images, and visual measurements; (8) a facial image, a visual device image, and a visual measurement; (9) morphological characteristics; (10) a face image; (11) morphological features and visual measurements; (12) face images and visual measurements.

Returning to fig. 3, and based on the selected use case (as outlined above), machine learning may be applied to the prepared at least one input, wherein a suitability metric 320 may be generated as an output of the machine learning application process 315. For each use case, a detailed description of the neural network of the ML-VEST will be provided with reference to subsequent figures. An evaluation of the magnitude of a single suitability metric or a comparison of the magnitudes of multiple suitability metrics may then be used to select the desired visual device 325 for the user.

The type of suitability metric 320 and the final selected ideal visual device 325 may be based on training of the neural network of the ML-ves. Thus, FIG. 4A provides a flow diagram of the training of a neural network used during the machine learning application process of ML-VEST 435.

In general, training includes providing the same input to the ML-VEST and to a set of tokenizers that score the suitability of the input during the tagging process 440 in order to generate training suitability metric data or "trueness" data. To train the neural network 430, the fitness metric 420 generated by the neural network 430 of the ML-VEST may be compared to the training fitness metric that the marker group scored during the marking process 440. The error values 438 generated therebetween may be evaluated and the parameters of the ML-ves neural network 430 may be adjusted 439, thus making future fitness metrics generated by the ML-ves neural network 430 more and more accurate relative to the fitness metrics scored during the labeling process 440.

Specifically, training initially includes receiving at least one input 405 from a training database 436. As mentioned in fig. 2A, the training database 436 may be comprised of a plurality of inputs including a facial image, a facial image alongside a visual device image, a facial image with a visual device worn, and a visual measurement corresponding to the visual acuity of the user's eyes. The multiple inputs stored in the training database 436 are intended to be from a fuzzy population and a variety of visual devices, allowing the ML-VEST to robustly select an ideal visual device for a random user. However, it will be appreciated that the plurality of inputs stored in the training database 436 may be any kind of input and may be customized for a particular application. For example, the plurality of inputs stored in the training database 436 may also include facial images (or morphological features thereof) from a population of people, visual measurements corresponding to the facial images, and visual device images (or visual device attributes thereof), among others.

If desired, at least one input 405 from the plurality of inputs may be provided to the input preparation process 410 or directly to the neural network 430. Additionally, at least one input 405 may be provided to the tagging process 440. In an embodiment, the at least one input 405 provided to both the input preparation process 410 and the labeling process 440 may be a subset of the plurality of inputs stored in the training database, as shown in FIG. 5A. In an example, the subset may include a facial image and a visual device image. Thus, the trained neural network 430 will be able to generate the fitness metric 420 for any visual device. In another example, the subset may include only facial images. In contrast to providing a facial image with respect to a visual device image, the facial image is provided with a predefined list of visual devices, which is an implicit input to the neural network 430. In training the neural network 430, the facial images may be scored (i.e., labeled) for each visual device in the predefined list of visual devices, and the output of the neural network 430 may then be a matching score list of each visual device in the predefined list of visual devices with a facial image. In other words, the implicit input may be a selected subset or a predefined list of images of the visual device. As described in the use case, the selected subset of visual devices may be used to evaluate a facial image, each visual device in the selected subset of visual devices being given a suitability score relative to the facial image. It may be appreciated that the selection of an input, including an implicit input, from among the plurality of inputs stored in the training database 436 may be based on a particular implementation of the ML-ves neural network 430.

According to an embodiment, and as introduced above, the labeling process 440 may provide "real" data or training data on which the neural network 430 may be trained to learn how to accurately classify or predict the fitness metric. In the context of the present disclosure, the tagging process 440 may include scoring and commenting on each of a plurality of images provided as at least one input.

In an embodiment, the marker may view, for example, a person's face from an input image alongside a visual device image and provide scores and comments thereon. Referring to fig. 5A, the inputs to the labeling process may include a facial image alongside the visual device image 508, a facial image 509 wearing the visual device, and a visual measurement 549 corresponding to the facial image, as previously described. Additionally, the input to the tagging process may include a facial image 552 wearing a virtual vision device. The image 552 of the face wearing a virtual vision device, referred to as a "virtual try-on operation," provides a method for the marker to visualize the vision device on the face when a real image of the face wearing a particular vision device is not available. The virtual try-on operation generated in the marking process may be created by: firstly, offset correction is carried out on an image of a visual device, so that only a pattern of the visual device is reserved; second, the pattern of the visual device is copied onto the facial image, thereby simulating a real image of the face wearing the visual device. The marker may then evaluate the virtual try-on operation and mark it as normal.

In the examples introduced above, the marker provides a marker or a series of markers or scores that define characteristics of the output layer of the neural network of the ML-ves. For example, referring to fig. 5B, the labeling process 540 may include determining whether a visual device fits a face, referred to as binary visual device suitability 541. The labeling process 540 may further include determining a match score between the visual device and the face, referred to as an overall visual device match score 542. Still further, the tagging process 540 may include determining a matching score for the visual device and face for each criterion in a predefined list of criteria, referred to as visual device by criterion matching 543. The list of predefined criteria may include, for example, a matching score for face width versus frame width, a matching score for face shape versus frame shape, a matching score for underlying shape versus frame height, and a matching score for skin color versus frame color/adornment, among others. The above-described marking of the marking process 540 may be represented as a corresponding value. For example, the binary visual device suitability may be represented by 0 or 1, the overall visual device matching score 542 may be represented as a score between 0 and N (e.g., 2 out of 5 stars), and for each criterion, the score may be represented as a score between 0 and N (e.g., 4 out of 5 stars) by the visual device matching score by the criterion. In addition to providing a marker for each image according to the particular process of the marking process 540, the marker may also provide a comment 544 as to why a particular marker was assigned, the comment 544 including phrases such as "the visual device is too large for the face width" or "the visual device is too dark for the skin tone". In an embodiment, the tagging process 540 may include, in addition to the facial image and the visual device image, a visual measurement corresponding to the facial image, wherein a person with visual expertise is able to take into account visual acuity when tagging.

It will be appreciated that the above-described labeling process may be iterative until all possible combinations of faces and visual devices are labeled. For example, a marker may be provided for each combination of a single facial image and multiple visual device images.

Returning now to FIG. 4A, and with the understanding that the labeling of the labeling process 440 becomes the output of the neural network 430, the neural network 430 may be trained. A more comprehensive but general description of the training of the neural network is described with respect to fig. 14-16. As illustrated in fig. 4A, the processed at least one input may be provided to the hidden layer 1 or the input layer of the neural network 430. In an example, the neural network 430 may be a fully-connected neural network, allowing each fully-connected layer of the neural network to learn from all combinations of features or outputs of a previous layer. As discussed with respect to the input preparation process 410, the input layer of the neural network 430 may vary according to use cases. After passing the processed at least one input through the nth hidden layer of the neural network 430, a fitness metric 420 may be generated from the output layer. The generated fitness metric 420 must match the labeling or training fitness metric of the labeling process 440. Accordingly, the value of the fitness metric 420 may be compared to the labeling or training data of the labeling process 440 at the error determination 438 to determine the accuracy of the output of the neural network 430. Based on the error determination 438, the training process 435 may continue or may return to the 1 st hidden layer of the neural network 430, and the coefficients/weights of each hidden layer may be updated based on the error of the error determination 438. As shown, in particular, the training process 435 of the ML-VEST and the neural network 430 may continue until the error determination 438 meets the criteria. The criterion may be one of a plurality of criteria including an error value or a number of iterations. Once the error of the fitness metric and the training data has met the criteria of the error determination 438, the neural network 430 is ready to be implemented in the ML-VEST.

During implementation within the ML-VEST, the suitability metric 420 may be one of a plurality of suitability metrics 420 describing at least one input including an image of a person's face and each visual device of a plurality of visual devices of a visual device database, and the suitability metric may be further output to a visual device selection process 425, as shown in fig. 4B. The visual device selection process 425 may take each of the plurality of suitability metrics 420 and select a suitable visual device. In one example, the appropriate visual device maximizes a binary visual device suitability, an overall visual device match score, or a normalized visual device match score and accompanying comments, as specified by the training 435 of the neural network 430. In another example, the appropriate visual device may be determined by comparing each of the binary visual device suitability, the overall visual device match score, or the normalized visual device match score, and the accompanying comments, to a predetermined threshold.

Fig. 6A-6E illustrate exemplary inputs to the ML-VEST. For example, FIG. 6A depicts a situation in which at least one input to the training of the ML-VEST includes input images taken from a training database, the input images containing an image of a person's face and an individual visual device image 608. Fig. 6B illustrates inputs for training of ML-ves, where at least one input includes a face image 607 of a person and visual device attributes 651, the visual device attributes 651 similarly retrieved from a training database. In view of fig. 6A, fig. 6C illustrates a case where at least one input of the training of the ML-VEST includes an image 609 of the face of the person wearing the visual device. Fig. 6D and 6E include morphological features of the face image as at least one input, which are stored in a training database. Referring to fig. 6D, at least one input of the ML-VEST may be morphological features 653 of a face image in the training database and visual device attributes 651 of the plurality of visual devices. As shown in fig. 6E, the at least one input obtained from the training database may include morphological features 653 of a facial image and a visual device image 605.

Fig. 7A reflects the labeling process described in view of fig. 5, wherein at least one input 705 of the labeling process comprises an image of a person's face and an individual visual device image. The at least one input 705 of FIG. 7A, or in another embodiment the at least one input that is processed, may be marked by a marker. In an embodiment, the marker is one of a marker group 745. With respect to fig. 7A, the marker group 745 may be a group of obscured persons. As depicted in fig. 5, the labeling of each of the group of obscured persons with respect to the facial image of the person having the visual device includes a binary visual device suitability 741, an overall visual device matching score 742, and a normalized visual device matching score 743. These labels determined by the set of labels 745 may be deployed as "real" data or training data during training of the neural network of the ML-ves and define the output layers of the neural network. The above-described labeling process may be repeated for each combination of facial images of people and visual device images of multiple visual devices in the training database.

Fig. 7B reflects the labeling process described in view of fig. 5, wherein the at least one input 705 comprises an image of the face of the person wearing the visual device. The at least one input 705 of FIG. 7B, or in another embodiment the at least one input that is processed, may be marked by a marker. In an embodiment, the marker is one of a marker group 745. With respect to fig. 7B, the marker group 745 may be a group of obscured persons. The labeling of each of the group of blurred people with respect to the facial image of the person wearing the vision device includes a binary vision device suitability 741, an overall vision device matching score 742, and a standard vision device matching score 743 as described in fig. 5. These labels determined by the set of labels 745 may be deployed as "real" data or training data during training of the neural network of the ML-ves, and may define the output layers of the neural network. The above-described tagging process may be repeated for each of the facial images of the person wearing the visual device in the plurality of images of the training database.

Fig. 7C reflects the labeling process described in view of fig. 5, wherein the at least one input 705 comprises an image of a person's face and an image of a visual device. The at least one input 705 of fig. 7B may be marked by a marker. In an embodiment, to alleviate the task of the marker, the at least one input 705 may be the processed at least one input. The processed at least one input may be a virtual fitting operation 752 or a 2D VTO, where the face image and the visual device image are manipulated such that the face may appear as if the visual device is worn. The marker group 745 that provides the suitability score relative to the processed at least one input may be a group of obscured persons. The indicia of the virtual fitting operation 752 for each of the group of blurred people with respect to the facial image of the person "wearing" the vision device includes a binary vision device suitability 741, an overall vision device matching score 742, and a per-standard vision device matching score 743 as described in fig. 5. These labels determined by the set of labels 745 may be deployed as "real" data or training data during training of the neural network of the ML-ves, and may define the output layers of the neural network. The above-described tagging process may be repeated for a virtual try-on operation 752 for each of the facial images of the person "wearing" the visual device in the plurality of images of the training database.

Fig. 7D reflects the labeling process described in view of fig. 5, wherein the at least one input 705 comprises an image of a person's face. The at least one input 705 of fig. 7B may be tagged by a tagger in view of a virtual try-on operation 752 of a visual device in a subset of visual device images 732, e.g., selected from a training database. The marker group 745 that provides the suitability score relative to the virtual fitting operation 752 may be a group of obscured persons. The indicia of the virtual fitting operation 752 for each of the group of blurred people with respect to the facial image of the person "wearing" the visual device in the subset includes a binary visual device suitability 741, an overall visual device matching score 742, and a standard visual device matching score 743 as described in fig. 5. These labels determined by the set of labels 745 may be deployed as "real" data or training data during training of the neural network of the ML-ves, and may define the output layers of the neural network. In an example, the above-described tagging process is repeated for virtual try-on operations 752 of each of the facial images of the persons "wearing" the visual devices in the subset of the visual device images in the training database, from visual device 1 or VE 1 to VE 2 and up to VE N.

According to an embodiment, as depicted in FIG. 8A, the marker set of the marking process may include sub-sets having related characteristics. For example, the marker panel 845 may include multiple categories of markers 850, each category of the multiple categories of markers 850 connected by a common feature. Within a category of markers, each marker is typically defined as a group of consumers, e.g., gender, age, social professional category, localization and style, etc. These markers may be further defined as a combination of consumer groups, such as "a male with frequent airplanes in their fifties", "a mad female with children in their forties", and so on. Thus, applying each of the multiple categories of markers 850 to the training process of FIG. 4A, the neural network may be trained such that the output of the neural network reflects the opinion of a group of people, as defined above. For example, a neural network trained on the opinions of markers of a category defined as "20's individual female professional" would accordingly generate a suitability metric indicative of the opinions of markers of such category. During implementation of the ML-VEST, the user may pre-select desired categories of markers 850 to provide particular opinions of interest.

It can be appreciated that by changing the categories of markers in the marker group, the ML-VEST can be adjusted according to the user's desires. For example, a user may desire to know which visual device looks best on their face from their local male opinion. In another example, a user may desire to know which visual device looks best on their face according to the celebrity's opinion. In any case, the ML-VEST and the set of markers therein may be adjusted as necessary to obtain a result reflecting the desired opinion.

To this end, fig. 8B reflects the tagging process described in view of fig. 5, wherein the group of tags may be one of a plurality of categories of tags and such that the at least one input 805 may include a facial image displayed alongside the visual device image. The at least one input 805 of FIG. 8B, or in another embodiment the at least one input that is processed, may be marked by a marker group. The marker group may be a first category of markers 846, and may be up to N categories of markers 847, where each category reflects a particular population defined by, for example, classical consumer segmentation criteria (described in fig. 8A). The labels for the side-by-side display of the face image and the visual device image by each of the first category of markers 846 and up to the nth category of markers 847 include a binary visual device suitability 841, an overall visual device match score 842, and a visual device match score by standard 843 as described in fig. 5. These labels, determined by the category of the marker, may be deployed as "real" data or training data during training of the neural network of the ML-ves, and may define the output layers of the neural network. In an example, the above-described labeling process is repeated for each combination of facial images in the training database and visual device images of multiple visual devices.

Fig. 8C reflects the tagging process described in view of fig. 5, wherein the group of markers may be one of a plurality of categories of markers, and such that the at least one input 805 may include a facial image of a wearing a visual device. The at least one input 805 of FIG. 8C, or in another embodiment the at least one input that is processed, may be marked by a marker group. The marker group may be a first category of markers 846, and may be up to N categories of markers 847, where each category reflects a particular population defined by, for example, classical consumer segmentation criteria (described in fig. 8A). The labeling of each of the first category of markers 846 and up to the nth category of markers 847 with respect to the facial image wearing the vision device includes a binary vision device suitability 841, an overall vision device match score 842, and a standard vision device match score 843 as described in fig. 5. These labels, determined by the category of the marker, may be deployed as "real" data or training data during training of the neural network of the ML-ves, and may define the output layers of the neural network. In an example, the above-described labeling process is repeated for each combination of facial images in the training database and visual device images of multiple visual devices.

In view of fig. 7C, fig. 8D reflects a marker, wherein the marker group may be one of a plurality of categories of markers, and such that the at least one input 805 may include a facial image alongside a visual device image. The at least one input 805 of FIG. 8D, or in another embodiment the at least one input that is processed, may be marked by a marker group. In an embodiment, to alleviate the task of the marker, the at least one input 805 may be the at least one input that is processed. The processed at least one input may be a virtual fitting operation 852 in which the face image and the visual device image are manipulated such that the face appears to be wearing the visual device. The marker group may be a first category of markers 846, and may be up to N categories of markers 847, where each category reflects a particular population defined by, for example, classical consumer segmentation criteria (described in fig. 8A). The labeling of each of the first category of markers 846 and up to the nth category of markers 847 with respect to the virtual try-on operation 852 includes a binary visual device suitability 841, an overall visual device match score 842, and a standard visual device match score 843 as described in fig. 5. These labels, determined by the category of the marker, may be deployed as "real" data or training data during training of the neural network of the ML-ves, and may define the output layers of the neural network. In an example, the tagging process described above is repeated for each combination of virtual try-on operations 852 in the training database that "wears" facial images of visual devices in the plurality of visual devices.

In view of fig. 7D, fig. 8E reflects tagging, wherein the group of taggers may be one of a plurality of categories of taggers, and such that the at least one input 805 may include an image of a face of the person. In an embodiment, to alleviate the task of the marker, the at least one input 805 may be a virtual fitting operation 852, wherein the visual device images in the subset 832 of facial images and visual device images selected from the training database are manipulated and combined such that the face appears to be "wearing" the visual device. The marker group may be a first category of markers 846, and may be up to N categories of markers 847, where each category reflects a particular population defined by, for example, classical consumer segmentation criteria (described in fig. 8A). The labeling of the virtual fitting operation 852 for each of the first category of markers 846 and up to the nth category of markers 847 with respect to the facial image of the visual device in the "wear" subset includes a binary visual device suitability 841, an overall visual device matching score 842, and a standard visual device matching score 843 as described in fig. 5. These labels, determined by the category of the marker, may be deployed as "real" data or training data during training of the neural network of the ML-ves, and may define the output layers of the neural network. From vision device 1 or VE 1 to VE 2 and up to VE N, the above labeling process may be repeated for a virtual try-on operation 852 of each of the facial images of the persons in the subset of "wear" vision devices images in the training database.

According to an embodiment, and referring to fig. 9A, the tagger group 945 may include multiple categories of taggers, one of which is an expert category of taggers 948. The markers 948 of the expert category may be heterogeneous or may be divided into subcategories of expert markers. For example, the expert category of markers 948 may include eye care professionals, cosmetologists, photo artists, and the like. In another example, eye care professionals, cosmetologists, photo facists, etc. may include subcategories, and may provide professional-specific indicia to the combination of facial and visual equipment.

To this end, as illustrated by the example at least one input and the processed at least one input of fig. 9B-9F, the expert tagger category including taggers allows additional characteristics of the at least one input to be defined and considered during tagging. For example, FIG. 9B depicts a case where at least one input 905 of the training of the ML-VEST (at least one input 905 being obtained from the training database) contains a face image and an individual visual device image 908. Further, since the example of the expert marker is an eye care professional, the facial image in the at least one input may be further associated with visual measurements 949 such as medical prescription and pupillary distance, for example. Similarly, FIG. 9C illustrates at least one input 905 of the training of the ML-VEST, the at least one input including a facial image and visual device attributes 951, the visual device attributes 951 determined from an input preparation process substantially similar to that described in FIG. 3. Further, as with fig. 9B, the face of the person in at least one input 905 may be associated with visual measurements 949 such as those described above. In view of FIG. 9B, FIG. 9D illustrates a case where the at least one input 905 of the training of the ML-VEST includes a facial image 909 of a wearing vision device. Moreover, as described above, the at least one input 905 of fig. 9D may be further associated with a visual measurement 949. Referring to fig. 9E, at least one input 905 of the ML-VEST may include morphological features 953 of a face image in a training database and visual device attributes 951 of a plurality of visual devices. Further, morphological features 953 of the facial image may be associated with visual measurements 949 such as those described above. Referring to fig. 9F, at least one input 905 of the ML-VEST may include morphological features 953 of a face image and a visual device image in a training database. Further, morphological features 953 of the facial image may be associated with visual measurements 949 such as those described above.

Fig. 10A reflects the labeling process described in view of fig. 5, wherein at least one input 1005 includes a face image of a person and a separate visual device image. Additionally, as fig. 10A employs a marker group 1045 that is an expert marker 1048 and may be an eye care professional, in an example, the at least one input 1005 may include a visual measurement 1049 associated with a facial image. As mentioned, the at least one input 1005 of fig. 10A, or in another embodiment the at least one input processed, may be tagged by a tagger 1048 of an expert category in the tagger group 1045. The labels of each of the expert category markers 1048 may include a binary visual device suitability 1041, an overall visual device matching score 1042, and a visual device matching score by standard 1043 as described in fig. 5. Additionally, these indicia may include comments about the suitability of the visual device, and in the case of an eye care professional, comments on and in context with the visual measurements 1049. These labels, determined by the expert category of markers 1048, may be deployed as "real" data or training data during training of the ML-ves' neural network, and may define the output layers of the neural network. In an example, the above-described labeling process is repeated for each combination of facial images in the training database and visual device images of multiple visual devices.

Fig. 10B reflects the labeling process described in view of fig. 5, wherein at least one input 1005 comprises a facial image of a wearing vision device. Additionally, as fig. 10B employs a marker group 1049 that is an expert marker 1048 and may be an eye care professional, in an example, the at least one input 1005 may include a vision measurement 1049 associated with a facial image wearing a vision device. As mentioned, the at least one input 1005 of fig. 10B, or in another embodiment the at least one input processed, may be tagged by the tagger 1048 of the expert category in the tagger group 1045. The labels of each of the expert category markers 1048 may include a binary visual device suitability 1041, an overall visual device matching score 1042, and a visual device matching score by standard 1043 as described in fig. 5. Additionally, these indicia may include comments about the suitability of the visual device, and in the case of an eye care professional, comments on and in context with the visual measurements 1049. These labels, determined by the expert category of markers 1048, may be deployed as "real" data or training data during training of the ML-ves' neural network and define the output layers of the neural network. The above-described labeling process may be repeated for each of the facial images wearing the visual device in the plurality of images in the training database.

Fig. 10C reflects the labeling process described in view of fig. 5, wherein at least one input 1005 includes a face image of a person and a separate visual device image. The at least one input 1005, or in another embodiment the processed at least one input, of fig. 10C may be tagged by a tag group 1045, the tag group 1045 being an expert tag 1048. In an embodiment, to alleviate the task of the marker, the at least one input 1005 may be the processed at least one input. The processed at least one input may be a virtual fitting operation 1052 in which the face image and the visual device image are manipulated such that the face appears to be wearing the visual device. The labels of the expert category markers 1048 may include a binary visual device suitability 1041, an overall visual device matching score 1042, and a visual device matching by standard score 1043 as described in fig. 5. These labels, determined by the expert category of markers 1048, may be deployed as "real" data or training data during training of the ML-ves' neural network, and may define the output layers of the neural network. The above tagging process may be repeated for each combined virtual try-on operation 852 in the training database that "wears" the facial images of a visual device of the plurality of visual devices.

Fig. 10D reflects the labeling process described in view of fig. 5, wherein the at least one input 10005 includes a face image and a visual measurement 1049 associated with the face image. In an embodiment, to alleviate the task of the marker, the at least one input 1005 may be a processed at least one input, the processed at least one input being a virtual fitting operation 1052 in which a visual device image in a subset 1032 of the face image and the visual device image selected from the training database is manipulated and combined such that the face appears to be "wearing" the visual device. The labels of the expert category markers 1048 may include a binary visual device suitability 1041, an overall visual device matching score 1042, and a visual device matching by standard score 1043 as described in fig. 5. These labels, determined by the expert category of markers 1048, may be deployed as "real" data or training data during training of the ML-ves' neural network, and may define the output layers of the neural network. In an example, the above-described tagging process is repeated for virtual try-in operations 1052 of each of the facial images of the persons "wearing" the visual devices in the subset of the visual device images in the training database, from visual device 1 or VE 1 to VE 2 and up to VE N.

Each of the above-described labeling schemes of the labeling process may be implemented in the training process of the ML-VEST as introduced in FIG. 4A. In particular, and with reference to the use case described with respect to fig. 3, an embodiment of the ML-VEST may proceed as illustrated in fig. 11A through 11R. It will be appreciated that "vision equipment" has been abbreviated as "VE" where appropriate in the description and drawings, and that the two may be interchanged.

Referring to fig. 11A, an exemplary embodiment of case (1), at least one input 1105 may be passed to an input preparation process 1110 before being input to a neural network 1115. The at least one input 1105 may include a facial image and a visual device image provided with the facial image or selected from a plurality of visual devices in a database. According to an embodiment, the input preparation process 1110 may include image processing or manual measurement to derive morphological features 1153 from the visual device attributes 1151 of the face image and the visual device image, respectively. Additionally, the visual device attributes 1151 may be obtained by a request from a database. The processed at least one input may be delivered to an input layer 1116 of the neural network 1115 to which the neural network 1115 is applied. The structure of the neural network 1115 may include data preparation (including averaging, normalization, etc.) and fully-connected neural networks and/or convolutional + fully-connected neural networks. The output layer 1117 of the neural network 1115 reflects the prediction of the signature by the neural network 1115 as defined by the set of markers during the labeling process. The prediction may be a suitability metric 1120 generated by the neural network 1115 for a combination of the at least one input 1105.

Similar to the above, referring now to fig. 11B, in an exemplary embodiment of case (1), at least one input 1105 may be passed to an input preparation process 1110 before being input to the neural network 1115. The at least one input 1105 may include an image of a face wearing a visual device. According to an embodiment, the input preparation process 1110 may include image processing or manual measurement to derive morphological features 1153 and visual device attributes 1151 for the face image and visual device image, respectively. The processed at least one input may be delivered to an input layer 1116 of the neural network 1115 to which the neural network 1115 is applied. The output layer 1117 reflects the prediction of the signature by the neural network 1115 as defined by the set of markers. The prediction is a fitness metric 1120 generated by the neural network 1115 for each combination of the at least one input 1105.

As an exemplary embodiment of case (2), fig. 11C provides a schematic diagram of an ML-VEST, where at least one input 1105 includes a face image and a visual device image provided with the face image or selected from a plurality of visual devices in a database. The facial image may be passed directly to the neural network 1115. As previously described, the visual device images may be passed to the input preparation process 1110 before being delivered to the neural network 1115. The structure of the neural network 1115 may include data preparation (including averaging, normalization, etc.) and fully-connected neural networks and/or convolutional + fully-connected neural networks. To this end, the visual device image may be prepared via image processing and manual measurement to generate visual device attributes 1151. Additionally, the visual device attributes 1151 may be obtained via a request from a database. Unlike the previous embodiments, the input preparation process 1110 applied to the facial image may be external to or integral with the neural network 1115. For example, convolutional neural network 1112 may be applied to a facial image in order to perform feature extraction and prepare the image for input to the input layer of neural network 1115 (where the image satisfies the processed at least one input of the visual device image). After both of the at least one input 1105 are prepared, the processed at least one input may be delivered to an input layer of the neural network 1115 to which the neural network 1115 is applied. The output layer 1117 reflects the prediction of the signature by the neural network 1115 as indicated by the set of markers. The prediction is a fitness metric 1120 that the neural network 1115 generates for each combination of the at least one input 1105.

As an exemplary embodiment of case (3), fig. 11D provides a schematic diagram of an ML-VEST, where at least one input 1105 includes a face image and a visual device image provided with the face image or selected from a plurality of visual devices in a database. The visual device images may be passed directly to the neural network 1115. As previously described, the facial images may be passed to the input preparation process 1110 before being delivered to the neural network 1115. To this end, a face image may be prepared via image processing and manual measurement to generate morphological features 1153. Unlike the previous embodiments, the input preparation process 1110 applied to the visual device image may be external to the neural network 1115 or integral with the neural network. For example, convolutional neural network 1112 may be applied to a visual device image in order to perform feature extraction and prepare the image for input to the input layer of neural network 1115 (where the image satisfies the processed input facial image). After both of the at least one input 1105 are prepared, the processed at least one input may be delivered to an input layer of the neural network 1115 to which the neural network 1115 is applied. The structure of the neural network 1115 may include data preparation (including averaging, normalization, etc.) and fully-connected neural networks and/or convolutional + fully-connected neural networks. The output layer 1117 reflects the prediction of the signature by the neural network 1115 as defined by the set of markers. The prediction is a fitness metric 1120 that the neural network 1115 generates for each combination of the at least one input 1105.

As an exemplary embodiment of case (4), fig. 11E provides a schematic diagram of an ML-VEST, where at least one input 1105 includes a face image and a visual device image provided with the face image or selected from a plurality of visual devices in a database. At least one input 1105 may pass directly to the neural network 1115, where the convolution is performed. As described above, the input preparation process 1110 applied to at least one input may be external to the neural network 1115 or integral with the neural network. For example, a convolutional neural network 1112 may be applied to at least one input 1105 including visual device images and facial images in order to perform feature extraction and prepare the images for input to the input layer of the neural network 1115. After both of the at least one input 1105 are prepared by convolution, the processed at least one input may be delivered to an input layer of the neural network 1115 to which the neural network 1115 is applied. The structure of the neural network 1115 may include data preparation (including averaging, normalization, etc.) and fully-connected neural networks and/or convolutional + fully-connected neural networks. The output layer reflects the prediction of the signature by the neural network 1115, as indicated by the set of markers. The prediction is a fitness metric 1120 that the neural network 1115 generates for each combination of the at least one input 1105.

Referring to fig. 11F, an exemplary embodiment of case (5), and in view of fig. 11A, at least one input 1105 may be passed to an input preparation process 1110 before being input to the neural network 1115. The at least one input 1105 may include a facial image and a visual device image provided with the facial image or selected from a plurality of visual devices in a database. Additionally, the at least one input 1105 may include visual measurements 1149 corresponding to facial images. According to an embodiment, the input preparation process 1110 may include image processing or manual measurement to derive morphological features 1153 from the visual device attributes 1151 of the face image and the visual device image, respectively. Additionally, the visual device attributes 1151 may be obtained by a request from a database. The processed at least one input may be delivered to an input layer 1116 of the neural network 1115 to which the neural network 1115 is applied. The output layer 1117 of the neural network 1115 reflects the prediction of the signature by the neural network 1115 as defined by the set of markers during the labeling process. In an example, the marker group may be expert markers. The prediction may be a suitability metric 1120 generated by the neural network 1115 for a combination of the at least one input 1105.

Referring now to fig. 11G, an exemplary embodiment of case (5), and in view of fig. 11B, at least one input 1105 may be passed to an input preparation process 1110 before being input to the neural network 1115. The at least one input 1105 may include an image of a face wearing a visual device. Additionally, the at least one input 1105 may include visual measurements 1149 corresponding to an image of a face wearing the visual device. According to an embodiment, the input preparation process 1110 may include image processing or manual measurement to derive morphological features 1153 and visual device attributes 1151 for the face image and visual device image, respectively. The processed at least one input and the visual measurement 1149 may be delivered to an input layer 1116 of the neural network 1115 to which the neural network 1115 is applied. As defined by the marker set, expert markers, in an embodiment, the output layer 1117 reflects the prediction of the markers by the neural network 1115. The prediction is a fitness metric 1120 generated by the neural network 1115 for each combination of the at least one input 1105.

In view of fig. 11C, fig. 11H, which is an exemplary embodiment of case (6), provides a schematic diagram of ML-VEST where at least one input 1105 includes a facial image and a visual device image provided with the facial image or selected from a plurality of visual devices in a database. The facial image of the person may be passed directly to the neural network 1115. Additionally, the at least one input 1105 includes visual measurements 1149 corresponding to the facial image. As previously described, the visual device images may be passed to the input preparation process 1110 before being delivered to the neural network 1115. To this end, the visual device image may be prepared via image processing and manual measurement to generate visual device attributes 1151. Additionally, the visual device attributes 1151 may be obtained via a request from a database. Unlike the previous embodiments, the input preparation process 1110 applied to the facial image may be external to or integral with the neural network 1115. For example, convolutional neural network 1112 may be applied to a facial image in order to perform feature extraction and prepare the image for input to the input layer of neural network 1115 (where the image satisfies the processed at least one input of the visual device image). After both the at least one input 1105 are prepared, the processed at least one input and the visual measurements 1149 may be delivered to an input layer of the neural network 1115 to which the neural network 1115 is applied. As indicated by the marker group, expert markers, in an embodiment, the output layer 1117 reflects the prediction of the markers by the neural network 1115. The prediction is a fitness metric 1120 that the neural network 1115 generates for each combination of the at least one input 1105.

In view of fig. 11D, fig. 11I, an exemplary embodiment of case (7), provides a schematic diagram of ML-VEST where at least one input 1105 includes a facial image and a visual device image provided with the facial image or selected from a plurality of visual devices in a database. The visual device images may be passed directly to the neural network 1115. Additionally, the at least one input 1105 includes visual measurements 1149 corresponding to the facial image. As previously described, the facial images may be passed to the input preparation process 1110 before being delivered to the neural network 1115. To this end, a face image may be prepared via image processing and manual measurement to generate morphological features 1153. Unlike the previous embodiments, the input preparation process 1110 applied to the visual device image may be external to the neural network 1115 or integral with the neural network. For example, convolutional neural network 1112 may be applied to a visual device image in order to perform feature extraction and prepare the image for input to the input layer of neural network 1115 (where the image satisfies the processed input facial image). After both the at least one input 1105 are prepared, the processed at least one input and the visual measurements can be delivered to an input layer of the neural network 1115 to which the neural network 1115 is applied. As defined by the marker set, expert markers, in an embodiment, the output layer 1117 reflects the prediction of the markers by the neural network 1115. The prediction is a fitness metric 1120 that the neural network 1115 generates for each combination of the at least one input 1105.

In view of fig. 11E, fig. 11J, which is an exemplary embodiment of case (8), provides a schematic diagram of ML-VEST where at least one input 1105 includes a face image and a visual device image provided with the face image or selected from a plurality of visual devices in a database. At least one input 1105 may pass directly to the neural network 1115, where the convolution is performed. Additionally, the at least one input 1105 may include visual measurements 1149 corresponding to facial images. As described above, the input preparation process 1110 applied to the at least one input 1105 may be external to or integral with the neural network 1115. For example, a convolutional neural network 1112 may be applied to at least one input 1105 including visual device images and facial images in order to perform feature extraction and prepare the images for input to the input layer of the neural network 1115. After both the at least one input 1105 are prepared by convolution, the processed at least one input and the visual measurements 1149 may be delivered to an input layer of the neural network 1115 to which the neural network 1115 is applied. As indicated by the marker group, expert markers, in an embodiment, the output layer 1117 reflects the prediction of the markers by the neural network 1115. The prediction is a fitness metric 1120 that the neural network 1115 generates for each combination of the at least one input 1105.

In view of FIG. 8E, FIG. 11K reflects a schematic of ML-VEST in which at least one input 1105 includes a face image. In such a process reflecting case (9), the at least one input 1105 may pass through a neural network 1115 having an architecture that allows the at least one input 1105 to be evaluated for each visual device in the subset 1132 of visual device images. In an embodiment, at least one input 1105 may be passed to an input preparation process 1110 before being delivered to the neural network 1115. To this end, a face image may be prepared via image processing and manual measurement to generate morphological features 1153.

Unlike the previous embodiments, the subset 1132 of visual device images obtained from the database 1155 are not provided as at least one input 1105 to the neural network 1115. Instead, the neural network 1115 is applied to the at least one input 1105, the neural network 1115 is trained based in part on the subset 1132 of visual device images. According to an embodiment, and in the context of training the neural network 1115, each visual device image in the subset 1132 of visual device images from the database 1155 needs to be pre-processed.

For each visual device i in the subset 1132 of visual device images from the database 1155, and in view of the morphological features j derived for each facial image, a statistical suitability score may be calculated. The statistical suitability score may include a marker percentage p_jiThe marker gives: (1) the same binary score for the visual device image i relative to the morphological feature j (case binary score of 0 or 1), (2) from database 1155With respect to the morphology features j, or (3) with respect to the morphology features j, with respect to the vision device images i in the subset 1132 of vision device images from the database 1155, the same matching scores are given by the markers or have the same ordering per item of the determined criteria list (case matching scores per item between 0 and N). For each of the above cases, the percentage p with respect to a given morphological feature j may be obtained_jiThe following vectors are associated. One vector may be a vector of N bi-values {0, 1}, each corresponding to the suitability of a morphological feature j of a facial image to an image i in the subset 1132 of vision device images. The second vector may be a vector of integer values between 0 and X, where each integer value corresponds to a matching score of an image i in the subset 1132 of visual device images to a facial image. The third vector may be a vector of N lists l of M integer values between 0 and X, each integer value of each list l corresponding to a matching score for an image i in the subset 1132 of each rule vision device image in the set of M matching rules relative to the facial image. In view of the above pre-processing, training may then be started. The at least one input of the training may be a morphological feature and the neural network may be configured as a combined neural network having convolutional layers and fully-connected layers. In addition, the activation function may be p_jiAny type of standard activation function with associated weights, including modified linear units (relus). The associated vector containing the matching information may be referred to as a target vector. Neural network training can be completed on the whole target vector, and can also be successfully completed on a part specially training and selecting neurons in the target vector.

Returning now to FIG. 11K, the output layer 1117 of the neural network 1115 reflects the prediction of the fitness metric 1120 by the neural network 1115. The suitability metric 1120 may be a match score 1121 and may be one of the vectors described above. Specifically, the matching score 1121 may be the following vector: (1) a vector of N binary values {0, 1}, each binary value corresponding to a suitability of a morphological feature j of a facial image to an image i in a subset 1132 of N visual device images, (2) a vector of integer values between 0 and X, wherein each integer value corresponds to a matching score 1121 of an image i in the subset 1132 of visual device images to a facial image, or (3) a vector of N lists l of M integer values between 0 and X, each integer value of each list l corresponding to a matching score 1121 of an image i in the subset 1132 of N visual device images relative to a facial image for each rule in a set of M matching rules. In an embodiment, the match score 1121 prediction can be generated by the neural network 1115 for each combination of at least one input 1105 from a database 1155 and a subset 1132 of visual device images, the database 1155 including "VE 1", "VE 2", and consecutive visual device images up to "VE N". In embodiments, database 1155 may be a pre-selected subset of images of visual devices available at the retailer or previously determined to be suitable for certain characteristics associated with the user.

In view of FIG. 8E, FIG. 11L reflects a schematic of ML-VEST in which at least one input 1105 includes a face image. In such a process reflecting the case (9), the at least one input 1105 may pass through a neural network 1115 having an architecture that allows the at least one input 1105 to be evaluated in view of the morphological features and corresponding visual device attributes associated with each visual device in the subset of visual device images. In an embodiment, at least one input 1105 may be passed to an input preparation process 1110 before being delivered to the neural network 1115. To this end, a face image may be prepared via image processing and manual measurement to generate morphological features 1153. Unlike the previous embodiments, the visual device or visual device attributes associated with each visual device in the subset of visual device images are not provided as the at least one input 1105 to the neural network 1115. Instead, the neural network 1115 is applied to the at least one input 1105, the neural network 1115 trained based in part on the morphological features and corresponding visual device attributes associated with each visual device in the subset of visual device images. According to an embodiment, and in the context of training the neural network 1115, in view of the morphological features of facial images, a visual device map from a database is requiredEach visual device image in the subset of images is preprocessed 1133. Pretreatment 1133 includes definition of F_jiCoordinate information of (d), morphology feature j that best matches visual device i or VEi, and percentage p of Np markers that give the highest score to the coordinate pair { visual device i, morphology feature j }_ji。

To this end, a statistical suitability score may be calculated for each visual device image i in the subset of visual device images, and in view of the morphological features of the face images. The statistical suitability score may include a marker percentage p_jiThe marker(s) give the same match score or have the same ordering (case match score between 0 and N) for the visual device i in the subset of visual device images relative to the morphological feature j of the facial image, or (3) give the same match score or have the same ordering (case match score between 0 and N) for the visual device i in the subset of visual device images relative to the facial information j, per item in the list of determined criteria (case match score per item between 0 and N). For simplicity, it may be assumed that the incomplete data set is ignored, and only the marker gives the matching score (between 0 and N) for each visual device i from the subset of visual device images with respect to the morphological feature j. Furthermore, for each entry of the morphological feature j, only the visual device attribute with the highest score per marker is retained. In view of the above, it is possible to obtain a subset of visual device images each visual device i with all morphological features F_jiAn associated matrix. Morphological characteristics F_jiMay include a percentage p by marker_jiThe best match.

In view of the above pre-processing, training may then be started. The at least one input to the training may be morphological features and visual measurements, and the neural network may be configured as a combined neural network having convolutional layers and fully-connected layers. The fully connected layer is configured for embedding. The embedding layer 1118 is a fully connected layer of D neurons containing vector representations of morphological features for each visual device i in a vector space determined during preprocessing. Each cluster i of the D-dimensional vector space 1119 contained within the embedding layer 1118 represents a vision device, and each morphological feature may be represented by D vector coordinates.

During training, random sampling may be performed to randomly select a particular number of morphological feature pairs, defined as { F }_ki，F_li}. As an exemplary pair, F_kiAnd F_liIs determined to have a corresponding percentage p_kiAnd p_liGood matching of visual devices i. The back propagation can then be considered in order to minimize the two activation functions F (F)_ki,p_ki) And F (F)_li,p_li) Wherein f is the activation function. As an exemplary pair, F_kiAnd F_liIs determined to have a corresponding percentage p_kiAnd p_liPoor matching of visual device i. The back propagation can then be considered in order to maximize the two activation functions F (F)_ki,p_ki) And F (F)_li,p_li) Wherein f is the activation function.

Returning now to fig. 11L, the output layer 1117 of the neural network 1115 reflects the prediction of the fitness metric 1120 by the neural network 1115. The suitability metric 1120 may be the morphological feature coordinate F in the dimension vector space 1119_ij. Morphological feature coordinates F_ijThe post-processing of (a) may include: (1) calculate the center of gravity of each cluster i in the D-dimensional vector space 1119, and (2) calculate the distance between the output coordinates and the center of gravity of each cluster i, thereby generating a vector containing the visual devices (the center of gravity of each cluster i) sorted from closest to farthest from the output coordinates. In an embodiment, pairs of morphological feature coordinates F may be generated by the neural network 1115 for each of the at least one input 1105, taking into account the morphological features and corresponding visual device attributes in a subset of visual device images from the database of the training neural network 1115_ijAnd (4) predicting. In embodiments, the database may be a pre-selected subset of visual device images available at the retailer or previously determined to be suitable for certain characteristics associated with the user.

FIG. 11M reflects a schematic of an ML-VEST in which at least one input 1105 includes a facial image. In such a process reflecting the case (10), the at least one input 1105 may pass through a neural network 1115 having an architecture that allows the at least one input 1105 to be evaluated for each visual device in the subset 1132 of visual device images. In an embodiment, the at least one input 1105 may pass directly to the neural network 1115, where the convolution is performed. The convolution may be performed by, for example, a convolutional neural network 1112 applied to at least one input 1105 including a facial image in order to perform feature extraction and prepare the facial image for input to the input layer of the neural network 1115.

For each visual device i in the subset 1132 of visual device images from the database 1155, and in view of the morphological features j derived for each facial image, a statistical suitability score may be calculated. The statistical suitability score may include a marker percentage p_jiThe marker gives: (1) the same binary score for the visual device image i relative to the morphology feature j (case binary score is 0 or 1), (2) the same match score for the visual device image i relative to the morphology feature j in the subset 1132 of visual device images from database 1155 (case match score is between 0 and N), or (3) the same match score given by the tagger or the same rank per item of the determined criteria list (case match score per item is between 0 and N) for the visual device image i relative to the morphology feature j in the subset 1132 of visual device images from database 1155. For each of the above cases, the percentage p with respect to a given morphological feature j may be obtained_jiThe following vectors are associated. A vector may be a vector of N binary values {0, 1}, each binary valueThe suitability of the morphological feature j corresponding to the face image with the image i in the subset 1132 of visual device images. The second vector may be a vector of integer values between 0 and X, where each integer value corresponds to a matching score of an image i in the subset 1132 of N visual device images to a facial image. The third vector may be a vector of N lists l of M integer values between 0 and X, each integer value of each list l corresponding to a matching score of an image i in the subset 1132 of N visual device images relative to a facial image for each rule in the set of M matching rules. In view of the above pre-processing, training may then be started. The at least one input of the training may be a morphological feature and the neural network may be configured as a combined neural network having convolutional layers and fully-connected layers. In addition, the activation function may be p_jiAny type of standard activation function with associated weights, including modified linear units (relus). The associated vector containing the matching information may be referred to as a target vector. Neural network training can be completed on the whole target vector, and can also be successfully completed on a part specially training and selecting neurons in the target vector.

Returning now to FIG. 11M, the output layer 1117 of the neural network 1115 reflects the prediction of the fitness metric 1120 by the neural network 1115. The suitability metric 1120 may be a match score 1121 and may be one of the vectors described above. Specifically, the matching score 1121 may be the following vector: (1) a vector of N binary values {0, 1}, each binary value corresponding to a suitability of a morphological feature j of a facial image to an image i in the subset 1132 of visual device images, (2) a vector of integer values between 0 and X, wherein each integer value corresponds to a matching score 1121 of an image i in the subset 1132 of visual device images to a facial image, or (3) a vector of N lists l of M integer values between 0 and X, each integer value of each list l corresponding to a matching score 1121 of an image i in the subset 1132 of visual device images relative to a facial image for each rule in a set of M matching rules. In an embodiment, the match score 1121 prediction can be generated by the neural network 1115 for each combination of at least one input 1105 from a database 1155 and a subset 1132 of visual device images, the database 1155 including "VE 1", "VE 2", and consecutive visual device images up to "VE N". In embodiments, database 1155 may be a pre-selected subset of images of visual devices available at the retailer or previously determined to be suitable for certain characteristics associated with the user.

FIG. 11N reflects a schematic of an ML-VEST in which at least one input 1105 includes a facial image. In such a process reflecting the case (10), the at least one input 1105 may pass through a neural network 1115 having an architecture that allows the at least one input 1105 to be evaluated for morphological features and corresponding visual device attributes associated with each visual device in the subset of visual device images. In an embodiment, the at least one input 1105 may pass directly to the neural network 1115, where the convolution is performed. The convolution may be performed by, for example, a convolutional neural network 1112 applied to at least one input 1105 including a facial image in order to perform feature extraction and prepare the facial image for input to the input layer of the neural network 1115.

Unlike the previous embodiments, the visual device attributes and corresponding morphological features associated with each visual device in the subset of visual device images are not provided as the at least one input 1105 to the neural network 1115. Instead, the neural network 1115 is applied to the at least one input 1105, the neural network 1115 trained based in part on the morphological features and corresponding visual device attributes associated with each visual device in the subset of visual device images. According to an embodiment, and in the context of training the neural network 1115, each visual device image in a subset of visual device images from the database needs to be pre-processed 1133 in view of the morphological features of the facial images. Pretreatment 1133 includes definition of F_jiCoordinate information of (d), face image j that best matches visual device i, and percentage p of Np markers that give the highest score to the coordinate pair { visual device i, face image j }_ji。

To this end, for each visual device image i in the subset of visual device images, and in view of the face image 1105 of the face image, one mayA statistical suitability score is calculated. The statistical suitability score may include a marker percentage p_jiThe method includes the steps of (1) associating a visual device i with a facial image j (a binary score of 1) by the marker (2) giving the same match score or having the same ordering for the visual device i in the subset of visual device images relative to the facial image j of the facial image (case match score between 0 and N), or (3) giving the same match score or having the same ordering per entry in the list of determined criteria for the visual device i in the subset of visual device images relative to the facial image j (case match score of each entry between 0 and N). For simplicity, it may be assumed that the incomplete data set is ignored, and only the tagger gives the matching score (between 0 and N) for each visual device i from the subset of visual device images relative to the face image j. Furthermore, for each entry of facial image j, only the visual device attribute with the highest score per marker is retained. In view of the above, it is possible to obtain a subset of visual device images that each visual device i in the subset of visual device images is associated with all face images F_jiAn associated matrix. Matrix F_jiMay include a percentage p by marker_jiThe best match.

In view of the above pre-processing, training may then be started. The at least one input to the training may be a facial image and a visual measurement, and the neural network may be configured as a combined neural network having a convolutional layer and a fully-connected layer. The fully connected layer is configured for embedding. The embedding layer 1118 is a fully connected layer of D neurons containing a vector representation of the face image for each visual device i in the vector space determined during pre-processing. Each cluster i of the D-dimensional vector space 1119 contained within the embedding layer 1118 represents a visual device, and each facial image may be represented by D-vector coordinates.

During training, random sampling may be performed to randomly select a particular number of pairs of facial images, a pair of morphological features defined as { F }_ki，F_li}. As an exemplary pair, F_kiAnd F_liIs determined to have a corresponding percentage p_kiAnd p_liGood matching of visual devices i. The back propagation can then be considered in order to minimize the two activation functions F (F)_ki,p_ki) And F (F)_li,p_li) Wherein f is the activation function. As an exemplary pair, F_kiAnd F_liIs determined to have a corresponding percentage p_kiAnd p_liPoor matching of visual device i. The back propagation can then be considered in order to maximize the two activation functions F (F)_ki,p_ki) And F (F)_li,p_li) Wherein f is the activation function.

Returning now to FIG. 11N, the output layer 1117 of the neural network 1115 reflects the prediction of the fitness metric 1120 by the neural network 1115. The suitability metric 1120 may be the facial image coordinates F in the dimension vector space 1119_ij. Face image coordinates F_ijThe post-processing of (a) may include: (1) calculate the center of gravity of each cluster i in the D-dimensional vector space 1119, and (2) calculate the distance between the output coordinates and the center of gravity of each cluster i, thereby generating a vector containing the visual devices (the center of gravity of each cluster i) sorted from closest to farthest from the output coordinates. In an embodiment, the facial image coordinates F may be generated by the neural network 1115 for each of the at least one input 1105 in view of the morphological features and the visual device attributes associated with each image of the subset of visual device images from the database_ijAnd (4) predicting. In embodiments, the database may be a pre-selected subset of visual device images available at the retailer or previously determined to be suitable for certain characteristics associated with the user.

FIG. 11O reflects a schematic of an ML-VEST in which at least one input 1105 includes a facial image. In such a process reflecting the case (11), the at least one input 1105 may pass through a neural network 1115 having an architecture that allows the at least one input 1105 to be evaluated for each visual device in the subset 1132 of visual device images. In an embodiment, at least one input 1105 may be passed to an input preparation process 1110 before being delivered to the neural network 1115. To this end, a face image may be prepared via image processing and manual measurement to generate morphological features 1153. In addition to the above, the at least one input 1105 may include visual measurements 1149 corresponding to facial images.

For each visual device i in the subset 1132 of visual device images from the database 1155, and in view of the morphological features j derived for each facial image, a statistical suitability score may be calculated. The statistical suitability score may include a marker percentage p_jiThe marker gives: (1) the same binary score for the visual device image i relative to the morphology feature j (case binary score is 0 or 1), (2) the same match score for the visual device image i relative to the morphology feature j in the subset 1132 of visual device images from database 1155 (case match score is between 0 and N), or (3) the same match score given by the tagger or the same rank per item of the determined criteria list (case match score per item is between 0 and N) for the visual device image i relative to the morphology feature j in the subset 1132 of visual device images from database 1155. For each of the above cases, the percentage p with respect to a given morphological feature j may be obtained_jiThe following vectors are associated. One vector may be a vector of N bi-values {0, 1}, each corresponding to the suitability of a morphological feature j of a facial image to an image i in a subset 1132 of N vision device images. The second vector may be a vector of integer values between 0 and X, where each integer value corresponds to a matching score of an image i in the subset 1132 of N visual device images to a facial image. The third vector may be a vector of N lists l of M integer values between 0 and X, each integer value of each list l corresponding to each of a set of M matching rulesThe matching score of an image i in the subset 1132 of regular visual device images relative to a facial image. In view of the above pre-processing, training may then be started. The at least one input of the training may be a morphological feature and the neural network may be configured as a combined neural network having convolutional layers and fully-connected layers. In addition, the activation function may be p_jiAny type of standard activation function with associated weights, including modified linear units (relus). The associated vector containing the matching information may be referred to as a target vector. Neural network training can be completed on the whole target vector, and can also be successfully completed on a part specially training and selecting neurons in the target vector.

Returning now to fig. 11O, the processed at least one input and the visual measurement 1149 may be delivered to an input layer of the neural network 1115. The output layer 1117 of the neural network 1115 reflects the prediction of the fitness metric 1120 by the neural network 1115. The suitability metric 1120 may be a match score 1121 and may be one of the vectors described above. Specifically, the matching score 1121 may be the following vector: (1) a vector of N binary values {0, 1}, each binary value corresponding to a suitability of a morphological feature j of a facial image to an image i in a subset 1132 of N visual device images, (2) a vector of integer values between 0 and X, wherein each integer value corresponds to a matching score 1121 of an image i in the subset 1132 of N visual device images with a facial image, or (3) a vector of N lists l of M integer values between 0 and X, each integer value of each list l corresponding to a matching score 1121 of an image i in the subset 1132 of N visual device images relative to a facial image for each rule in a set of M matching rules. In an embodiment, the match score 1121 prediction can be generated by the neural network 1115 for each combination of at least one input 1105 from a database 1155 and a subset 1132 of visual device images, the database 1155 including "VE 1", "VE 2", and consecutive visual device images up to "VE N". In embodiments, database 1155 may be a pre-selected subset of images of visual devices available at the retailer or previously determined to be suitable for certain characteristics associated with the user.

FIG. 11P reflects a schematic of an ML-VEST in which at least one input 1105 includes a facial image. In such a process reflecting the case (11), the at least one input 1105 may pass through a neural network 1115 having an architecture that allows the at least one input 1105 to be evaluated for morphological features and corresponding visual device attributes associated with each visual device in the subset of visual device images. In an embodiment, at least one input 1105 may be passed to an input preparation process 1110 before being delivered to the neural network 1115. To this end, a face image may be prepared via image processing and manual measurement to generate morphological features 1153. Additionally, the at least one input 1105 may include visual measurements 1149 corresponding to facial images.

Unlike the previous embodiments, the visual device attributes and corresponding morphological features associated with each visual device in the subset of visual device images are not provided as the at least one input 1105 to the neural network 1115. Instead, the neural network 1115 is applied to the at least one input 1105, the neural network 1115 trained based in part on the morphological features and corresponding visual device attributes associated with each visual device in the subset of visual device images. According to an embodiment, and in the context of training the neural network 1115, each visual device image in a subset of visual device images from the database needs to be pre-processed 1133 in view of the morphological features of the facial images. Pretreatment 1133 includes definition of F_jiCoordinate information of (d), morphology feature j that best matches vision device i, and percentage p of Np markers that give the highest score to the coordinate pair { vision device i, morphology feature j }_ji。

To this end, a statistical suitability score may be calculated for each visual device image i in the subset of visual device images, and in view of the morphological features of the face images. The statistical suitability score may include a marker percentage p_ji(1) the marker associates a visual device i with a morphological feature j (binary score of 1) (2) the marker gives the same match score orThe users have the same rank (case match score between 0 and N), or (3) the markers give the same match score or have the same rank per item in the determined criteria list (case match score per item between 0 and N) for a visual device i in the subset of visual device images relative to the facial information j. For simplicity, it may be assumed that the incomplete data set is ignored, and only the marker gives the matching score (between 0 and N) for each visual device i from the subset of visual device images with respect to the morphological feature j. Furthermore, for each entry of the morphological feature j, only the visual device attribute with the highest score per marker is retained. In view of the above, it is possible to obtain a subset of visual device images each visual device i with all morphological features F_jiAn associated matrix. Morphological characteristics F_jiMay include a percentage p by marker_jiThe best match.

Returning now to fig. 11P, the output layer 1117 of the neural network 1115 reflects the prediction of the fitness metric 1120 by the neural network 1115. The suitability metric 1120 may be the morphological feature coordinate F in the dimension vector space 1119_ij. Morphological feature coordinates F_ijThe post-processing of (a) may include: (1) calculate the center of gravity of each cluster i in the D-dimensional vector space 1119, and (2) calculate the distance between the output coordinates and the center of gravity of each cluster i, thereby generating a vector containing the visual devices (the center of gravity of each cluster i) sorted from closest to farthest from the output coordinates. In an embodiment, the pair of morphological feature coordinates F may be generated by the neural network 1115 for generating for each at least one input 1105 in view of the morphological feature and the corresponding visual device attribute in the subset of visual device images from the database_ijAnd (4) predicting. In embodiments, the database may be a pre-selected subset of visual device images available at the retailer or previously determined to be suitable for certain characteristics associated with the user.

FIG. 11Q reflects a schematic of an ML-VEST in which at least one input 1105 includes a facial image. In such a process reflecting the case (12), the at least one input 1105 may pass through a neural network 1115 having an architecture that allows the at least one input 1105 to be evaluated for each visual device in the subset 1132 of visual device images. In an embodiment, the at least one input 1105 may pass directly to the neural network 1115, where the convolution is performed. The convolution may be performed by, for example, a convolutional neural network 1112 applied to at least one input 1105 including a facial image in order to perform feature extraction and prepare the facial image for input to the input layer of the neural network 1115. Additionally, the at least one input 1105 may include visual measurements 1149 corresponding to facial images.

For each visual device i in the subset 1132 of visual device images from the database 1155, and in view of the morphological features j derived for each facial image, a statistical suitability score may be calculated. The statistical suitability score may include a marker percentage p_jiThe marker gives: (1) the same binary score for the visual device image i relative to the morphology feature j (case binary score is 0 or 1), (2) the same match score for the visual device image i relative to the morphology feature j in the subset 1132 of visual device images from database 1155 (case match score is between 0 and N), or (3) the same match score given by the tagger or the same rank per item of the determined criteria list (case match score per item is between 0 and N) for the visual device image i relative to the morphology feature j in the subset 1132 of visual device images from database 1155. For each of the above cases, the percentage p with respect to a given morphological feature j may be obtained_jiThe following vectors are associated. One vector may be a vector of N bi-values {0, 1}, each corresponding to the suitability of a morphological feature j of a facial image to an image i in a subset 1132 of N vision device images. The second vector may be a vector of integer values between 0 and X, where each integer value corresponds to a matching score of an image i in the subset 1132 of N visual device images to a facial image. The third vector may be a vector of N lists l of M integer values between 0 and X, each integer value of each list l corresponding to a matching score for an image i in the subset 1132 of each rule vision device image in the set of M matching rules relative to the facial image. In view of the above pre-processing, training may then be started. The at least one input of the training may be a morphological feature, and the neural network may be configured with convolutional layers and full layersA combined neural network of connected layers. In addition, the activation function may be p_jiAny type of standard activation function with associated weights, including modified linear units (relus). The associated vector containing the matching information may be referred to as a target vector. Neural network training can be completed on the whole target vector, and can also be successfully completed on a part specially training and selecting neurons in the target vector.

Returning now to fig. 11Q, the output layer 1117 of the neural network 1115 reflects the prediction of the fitness metric 1120 by the neural network 1115. The suitability metric 1120 may be a match score 1121 and may be one of the vectors described above. Specifically, the matching score 1121 may be the following vector: (1) a vector of N binary values {0, 1}, each binary value corresponding to a suitability of a morphological feature j of a facial image to an image i in a subset 1132 of N visual device images, (2) a vector of integer values between 0 and X, wherein each integer value corresponds to a matching score 1121 of an image i in the subset 1132 of N visual device images with a facial image, or (3) a vector of N lists l of M integer values between 0 and X, each integer value of each list l corresponding to a matching score 1121 of an image i in the subset 1132 of N visual device images relative to a facial image for each rule in a set of M matching rules. In an embodiment, the match score 1121 prediction can be generated by the neural network 1115 for each combination of at least one input 1105 from a database 1155 and a subset 1132 of visual device images, the database 1155 including "VE 1", "VE 2", and consecutive visual device images up to "VE N". In embodiments, database 1155 may be a pre-selected subset of images of visual devices available at the retailer or previously determined to be suitable for certain characteristics associated with the user.

FIG. 11R reflects a schematic of an ML-VEST in which at least one input 1105 includes a facial image. In such a process reflecting the case (12), the at least one input 1105 may pass through a neural network 1115 having an architecture that allows the at least one input 1105 to be evaluated for morphological features and corresponding visual device attributes associated with each visual device in the subset of visual device images. In an embodiment, the at least one input 1105 may pass directly to the neural network 1115, where the convolution is performed. The convolution may be performed by, for example, a convolutional neural network 1112 applied to at least one input 1105 including a facial image in order to perform feature extraction and prepare the facial image for input to the input layer of the neural network 1115. Additionally, the at least one input 1105 may include visual measurements 1149 corresponding to facial images.

To this end, a statistical suitability score may be calculated for each visual device image i in the subset of visual device images, and in view of the face image 1105 of the face image. The statistical suitability score may include a marker percentage p_jiEither (1) the marker associates a visual device i with a facial image j (a binary score of 1) (2) the marker gives the same match score or has the same ordering (case match score between 0 and N) for visual device i in the subset of visual device images relative to facial image j of the facial image, or (3) the marker gives the same match score or has the same ordering (case match score for each entry in the list of determined criteria) for visual device i in the subset of visual device images relative to facial image jA number between 0 and N). For simplicity, it may be assumed that the incomplete data set is ignored, and only the tagger gives the matching score (between 0 and N) for each visual device i from the subset of visual device images relative to the face image j. Furthermore, for each entry of facial image j, only the visual device attribute with the highest score per marker is retained. In view of the above, it is possible to obtain a subset of visual device images that each visual device i in the subset of visual device images is associated with all face images F_jiAn associated matrix. Face image F_jiMay include a percentage p by marker_jiThe best match.

In view of the above pre-processing, training may then be started. The at least one input to the training may be morphological features and visual measurements, and the neural network may be configured as a combined neural network having convolutional layers and fully-connected layers. The fully connected layer is configured for embedding. The embedding layer 1118 is a fully connected layer of D neurons containing a vector representation of the face image for each visual device i in the vector space determined during pre-processing. Each cluster i of the D-dimensional vector space 1119 contained within the embedding layer 1118 represents a visual device, and each facial image may be represented by D-vector coordinates.

During training, random sampling may be implemented to randomly select a particular number of pairs of facial images, defined as { F }_ki，F_li}. As an exemplary pair, F_kiAnd F_liIs determined to have a corresponding percentage p_kiAnd p_liGood matching of visual devices i. The back propagation can then be considered in order to minimize the two activation functions F (F)_ki,p_ki) And F (F)_li,p_li) Wherein f is the activation function. As an exemplary pair, F_kiAnd F_liIs determined to have a corresponding percentage p_kiAnd p_liPoor matching of visual device i. The back propagation can then be considered in order to maximize the two activation functions F (F)_ki,p_ki) And F (F)_li,p_li) Wherein f is the activation function.

Now return to the figure11R, the output layer 1117 of the neural network 1115 reflects the prediction of the fitness metric 1120 by the neural network 1115. The suitability metric 1120 may be the facial image coordinates F in the dimension vector space 1119_ij. Face image coordinates F_ijThe post-processing of (a) may include: (1) calculate the center of gravity of each cluster i in the D-dimensional vector space 1119, and (2) calculate the distance between the output coordinates and the center of gravity of each cluster i, thereby generating a vector containing the visual devices (the center of gravity of each cluster i) sorted from closest to farthest from the output coordinates. In an embodiment, facial image coordinates F may be generated by the neural network 1115 for each of the at least one input 1105 in view of the morphological features and corresponding visual device attributes in the subset of visual device images from the database_ijAnd (4) predicting. In embodiments, the database may be a pre-selected subset of visual device images available at the retailer or previously determined to be suitable for certain characteristics associated with the user.

According to embodiments of the present disclosure, the neural network of the ML-VEST may be configured to determine a fitness metric of the user that reflects ideal coordinates of the visual device. For this reason, FIG. 12A reflects the labeling process used therein, wherein the structure of the ML-VEST described above is substantially similar. First, an input image 1205 may be received. The input image 1205 may be delivered to an image preparation process 1210 whereby the coordinates of the facial markers are determined. These facial markers may be determined by: first, the image is calibrated to allow distance measurement; secondly, detecting markers of the face image of the person by classical image processing or deep learning techniques so as to extract metric coordinates of the markers; and third, normalizing these extracted coordinates with respect to an anatomical reference point (e.g., the lowest point of the chin). These normalized, extracted coordinates may be delivered as an input layer to the training process 1235 of the neural network. As discussed above, this same image preparation process 1210 may be followed during implementation of the ML-VEST. The processed input images may then be passed to a labeling process 1240, where a group of markers, and in particular, a marker 1248 of the expert category labels the processed input images. In an example, the markers of the expert category may be eye care professionals, and the facial image of the picture is modeled with the assistance of the expert in 3D modeling, so that an ideal visual device is generated from the facial morphological features of the person in the image. In an example, the model may constitute metrology coordinates corresponding to the interior and exterior silhouette landmarks of an ideal vision device. As previously described, these internal and external contour landmarks of an ideal visual device may be normalized to an anatomical reference point (e.g., the lowest point of the chin) and may be used as the output 1220 of the neural network. In other words, the above-described flags of the ideal visual device 1120 are used as training data during training and define the output layers of the neural network during implementation of the ML-VEST. As previously described, it can be appreciated that the training process 1235 can be repeated for each of a plurality of human facial images used in the training database.

The above-described labeling scheme of the labeling process may be implemented within the ML-VEST. In particular, the implementation may proceed as illustrated in fig. 12B.

Referring to fig. 12B, the input image 1205 may be passed to an image preparation process 1210 before being input to a neural network 1215. The input image 1205 may include an image of a person's face. According to an embodiment, the image preparation process 1210 may include image processing to derive morphological features and landmarks of a person's facial image. The processed input images may be delivered to an input layer of the neural network 1215, which the neural network 1215 is applied to. The output layer reflects the prediction of the marker by the neural network 1215 as indicated by the marker set. The prediction or fitness metric 1220 reflects the coordinates of the ideal visual device for the image of the person's face generated by the neural network 1215. During implementation of the trained neural network of fig. 12A in fig. 12B and ML-VEST, it is apparent that the output ideal visual device coordinates can be compared to the coordinates of multiple visual devices within the database, with the most highly correlated one selected as the visual device appropriate for the user.

Fig. 13A-13F illustrate exemplary embodiments and components of a neural network of an ML-VEST of the present disclosure.

By way of review, the following use cases for input to the neural network of the ML-VEST can be considered: (1) morphological features and visual device attributes obtained from the facial image and the visual device image or from the facial image wearing the visual device; (2) facial image and visual device attributes; (3) morphological features and visual device attributes; (4) a facial image and a visual device image; (5) morphological features, visual device attributes, and visual measurements; (6) facial images, visual device attributes, and visual measurements; (7) morphological features, visual device images, and visual measurements; (8) a facial image, a visual device image, and a visual measurement; (9) morphological characteristics; (10) a face image; (11) morphological features and visual measurements; (12) face images and visual measurements.

For all cases except case (1), case (4), case (9), and case (10), two heterogeneous input streams must be merged. To process the facial image and the visual device image, a convolution + fully connected neural network portion may be used, as shown in fig. 13A and described in detail in fig. 13B. To this end, neural networks employ a series of convolutional layers, each layer consisting of convolutional filters of different size, fill, stride, and depth, followed by an active layer (e.g., ReLU, leaky ReLU) or pooling filter (e.g., max pooling, average pooling). The last convolutional layer may then be vectorized and each real number of the vector may be obtained by a fully-connected layer process, where the activation function may be selected from the group consisting of ReLU, leaky ReLU, sigmoid, TanH, and the like.

At this point, the subset of neural network nodes remain to be processed before final output. The "data ready + fully connected" neural network portion shown in fig. 13C may be used to process data including real numbers and enumerations representing strings of characters such as morphological features, visual device attributes, and visual measurements. First, the data may be aggregated to form a vector of heterogeneous data. The vectors can then be averaged to obtain real numbers only. To this end, the enumeration may be replaced by an integer corresponding to the item number within the enumeration. For example, the color "yellow" may be replaced by "2" because "yellow" is the numbered "2" entry in the enumeration that includes the available colors of "blank", "yellow", "orange", "red", and the like. Each character may be replaced by an alphanumeric character of the item. Next, each real number of the vector obtained through the fully-connected layer may be processed, the processing being driven by an activation function selected from the group consisting of ReLU, leaky ReLU, sigmoid, and TanH.

After homogenization, there may still be unprocessed nodes. To combine the outputs of the "convolutional + fully-connected" network portion and the "data ready + fully-connected" network portion, (1) the output vectors of each network portion must be combined to generate a unique vector, and (2) the unique vector must be processed by the "output ready" network portion, as shown in fig. 13D. The "output ready" network portion may consist of a series of fully-connected layers for which the activation function is one selected from the group consisting of ReLU, leaky ReLU, sigmoid, TanH, and the like. The number and size of these fully connected layers may be based on the desired output. For example, if the output is unique, as in cases (1) through (8), the final fully-connected layer may consist of a single node representing the probability (real number between 0 and 1) that the visual device fits the face image given as input. In this case, the probability may be defined as (1) thresholding (e.g., threshold 0.5) if the desired output is a (match) binary value, or scaled to fit a predefined score range (e.g., if the score is between 0 and 10, the probability will be multiplied by 10) if the desired output is a (match) score)

As shown in fig. 13E, the output may be more than one node. If the output is multiple, but each output relates to the same (and unique) visual device, as in cases (1) through (8) when each output value is a matching score connected to a particular criterion, then there may be as many nodes in the final fully-connected layer as there are criteria to be scored, each node representing the probability (real number between 0 and 1) that the visual device fits the facial image given as input on criterion i. If the desired output is a (match) score, each probability may be scaled to fit within a predefined range of scores (e.g., if the score is between 0 and 10, the probability will be multiplied by 10). If the output is multiple, but each output may involve several visual devices, as in cases (9) to (12) and as shown in fig. 13F, there may be as many nodes in the final fully-connected layer as the criterion to be scored multiplied by the number of visual devices to be scored, each node representing the probability (real number between 0 and 1) that a visual device fits the face image given as input with respect to criterion i. In this case, each probability may be (1) thresholded (e.g., threshold 0.5) if the desired output is a (match) binary value, or (2) scaled to fit a predefined range of scores if the desired output is a (match) score (e.g., if the score is between 0 and 10, the probability will be multiplied by 10).

With regard to case (1), since only the processed features of the facial image and the image of the visual device are present as inputs, there may be at least one "data ready + fully connected" neural network portion, similar to that described above, and the output of the "data ready + fully connected" neural network portion may be processed as described above in order to achieve the desired output.

With regard to case (4), because there are only both facial and visual device images as inputs, there may be at least two "convolutional + fully-connected" neural network portions, one for each image, where the outputs of the two network portions may be combined in the same manner as described above in order to achieve the desired output.

With respect to the exemplary vector embodiments of case (9) (fig. 11K) and case (10) (fig. 11M), there may be at least one "convolution + fully connected" neural network portion because there is only a facial image as input.

With respect to the exemplary coordinate-based embodiments of case (10) (fig. 11N) and case (12) (fig. 11R), because there are facial images and visual measurements as inputs, there may be at least one "convolution + fully-connected" neural network portion, the fully-connected layer containing the embedding.

FIG. 14 is a non-limiting example of an embodiment of a training process 435 for training a neural network using training data. As described above, the training data may include a plurality of labeled input images or data from one or more sources, including a training database connected, for example, by a wired connection or a wireless connection.

In operation 1480 of process 435, an initial guess for the coefficients of the neural network is generated. For example, the initial guess may be based on a priori knowledge of the data being collected and the associated metrics therein. Additionally, the initial guess may be based on one of LeCun initialization, Xavier initialization, and Kaiming initialization.

Operation 1481 of process 435 provides a non-limiting example of an optimization method for training a neural network. In operation 1481 of process 435, an error is calculated (e.g., using a loss function or a cost function) to represent a measure (e.g., a distance measure) of the difference between the labeled data (i.e., the real data) as applied in the current iteration of the neural network and the output data of the neural network. The error may be calculated using any known cost function or distance measurement between training data. Further, in certain embodiments, one or more of hinge loss and cross entropy loss may be used to calculate the error/loss function. In an example, the loss function may be defined as the output (S) of the neural network_NN) True data (S) with mark_AGT) Mean square error therebetween, or

Where n is the number of training subjects. Optimization methods including random gradient descent can be used to minimize this loss.

Additionally, the loss function may be combined with a regularization method to avoid overfitting the network to a particular instance represented in the training data. Regularization can help prevent overfitting in machine learning problems. If the training time is too long and the model is assumed to have sufficient representation power, the network will learn the noise specific to the dataset, which is called overfitting. In the case of overfitting, the generalization capability of the neural network becomes poor and the variance will be large due to the noise difference between the data sets. The total error is minimal when the sum of the deviation and variance is minimal. Therefore, it is desirable to arrive at interpreting the local minimum of data in the simplest possible way to maximize the likelihood that the trained network represents a general solution, rather than a specific solution to noise in the training data. This goal may be achieved by, for example, early stopping, weight regularization, lasso regularization, ridge regularization, or elastic network regularization.

In some embodiments, the neural network is trained using back propagation. Back propagation can be used to train neural networks and used in conjunction with gradient descent optimization methods. During forward pass, the algorithm computes a prediction of the network based on the current parameter Θ. These predictions are then input into a loss function, by which they are compared to the corresponding real markers (i.e., the marked data). During the backward pass, the model calculates the gradient of the loss function with respect to the current parameters, and then updates the parameters by taking a step size of a predefined size in the direction that minimizes the loss (e.g., in the acceleration method, so that the Nesterov momentum method and various adaptive methods can select the step size to optimize the loss function to converge faster).

The optimization method that performs back propagation may use one or more of gradient descent, batch gradient descent, random gradient descent, and small-batch random gradient descent. Additionally, one or more momentum update techniques in optimization methods that result in faster convergence rates of random gradient descent in deep networks may be used to accelerate the optimization method, including, for example, the nerterov momentum technique or adaptive methods such as the Adagrad sub-gradient method, the adadelata or RMSProp parameter update variants of the Adagrad method, and Adam adaptive optimization techniques. The optimization method may also apply a second order method by incorporating a Jacobian matrix (Jacobian matrix) into the update step.

The forward and backward passes may be performed incrementally through various layers of the network. In forward pass, execution begins with providing input through a first layer, creating an output activation for a subsequent layer. This process is repeated until the loss function of the last layer is reached. During backward pass, the last layer calculates the gradient with respect to its own learnable parameters (if any) and with respect to its own input, which is the upstream derivative of the previous layer. This process is repeated until the last input layer is reached.

Returning to the non-limiting example shown in fig. 14, operation 1482 of process 435 determines that an error change (e.g., an error gradient) may be calculated as a function of the network change, and this error change may be used to select a direction and step size for subsequent changes in the weights/coefficients of the neural network. Calculating the gradient of the error in this manner is consistent with certain embodiments of the gradient descent optimization method. In certain other embodiments, this operation may be omitted and/or replaced with another operation according to another optimization algorithm (e.g., a non-gradient descent optimization algorithm such as simulated annealing or genetic algorithms), as will be understood by those of ordinary skill in the art.

In operation 1483 of process 435, a new set of coefficients is determined for the neural network. For example, the weights/coefficients may be updated using the changes calculated in operation 1482, as in a gradient descent optimization method or an over-relaxation acceleration method.

In operation 1484 of process 435, a new error value is calculated using the updated weights/coefficients of the neural network.

In operation 1485 of process 435, predefined stopping criteria are used to determine whether training of the network is complete. For example, a predefined stopping criterion may evaluate whether the new error and/or the total number of iterations performed exceeds a predefined value. For example, the stopping criterion may be fulfilled if the new error is below a predefined threshold or a maximum number of iterations is reached. When the stopping criteria are not met, the training process performed in process 435 will continue back to the beginning of the iterative loop (which includes operation 1482, operation 1483, operation 1484, and operation 1485) by returning and repeating operation 1482 using the new weights and coefficients. When the stopping criteria are met, the training process performed in process 435 is complete.

Fig. 15A shows a flow diagram of an embodiment of a process 435. Fig. 15A is generic to any type of layer in a feed-forward Artificial Neural Network (ANN), including, for example, the fully-connected layers shown in fig. 11A-11E. The ANN of the present disclosure may include a fully connected layer before the image processing neural network for convolution, pooling, batch normalization and activation, resulting in a combined flow diagram of fig. 15A and 15B, as will be understood by one of ordinary skill in the art. The embodiment of the process 435 shown in fig. 15A also corresponds to applying the ANN of the present disclosure to the corresponding training data of the present disclosure.

In operation 1586, weights/coefficients corresponding to connections between neurons (i.e., nodes) are applied to respective inputs corresponding to the processed input image data.

In operation 1587, the weighted inputs are summed. The combination of operations 1586 and 1587 is substantially the same as performing a convolution operation when the unique non-zero weights/coefficients connected to a given neuron on the next layer are localized regionally in the processed input image data represented in the previous layer.

In operation 1588, a respective threshold is applied to the weighted sum of the respective neurons.

In process 1589, the steps of weighting, summing, and thresholding are repeated for each subsequent layer.

Fig. 15B shows a flow diagram of another embodiment of a process 435 in which a convolutional neural network is applied during an image preparation step to prepare an input image for applying the ANN of the present disclosure, as discussed in fig. 11C-11E. Thus, the embodiment of process 435 shown in FIG. 15B corresponds to operating on the input image data at the hidden layer using a non-limiting embodiment of a convolutional neural network.

In operation 1590, the computation of the convolutional layer is performed as previously discussed and in accordance with an understanding of the convolutional layer by one of ordinary skill in the art.

In operation 1591, after convolution, batch normalization may be performed to control the variation of the output of the previous layer, as will be understood by one of ordinary skill in the art.

In operation 1592, after batch normalization, activation is performed according to the foregoing description of activation and according to an understanding of activation by one of ordinary skill in the art. In an example, the activation function is a modified activation function or, for example, a ReLU, as discussed above.

In another embodiment, the ReLU layer of operation 1592 may be performed before the bulk normalization layer of operation 1591.

In operation 1593, after batch normalization and activation, the output from the convolutional layer is the input to the pooling layer, which is performed according to the description of the pooling layer above and according to the understanding of the pooling layer by one of ordinary skill in the art.

In process 1594, the steps of the convolutional layers, pooling layers, batch normalization layers, and ReLU layers may be repeated, in whole or in part, for a predefined number of layers. After (or mixed with) the above layers, the output from the ReLU layer may be fed to a predefined number of ANN layers, which is performed according to the description provided for the ANN layers in fig. 9A. The final output will be the processed input image characteristics as previously described.

With regard to convolutional neural network architectures, generally, convolutional layers are placed close to the input layer, while fully-connected layers that perform high-level reasoning are placed further down the architecture towards the loss function. Pooling layers may be inserted after convolution and the spatial extent of the filter is reduced, thereby reducing the number of learnable parameters. The batch normalization layer adjusts the gradient perturbation to outliers and speeds up the learning process. Activation functions are also incorporated into the various layers to introduce non-linearities and enable the network to learn complex predictive relationships. The activation function may be a saturation activation function (e.g., sigmoid or hyperbolic tangent activation function) or a modified activation function (e.g., ReLU, discussed above).

Fig. 16 shows an example of interconnections between layers in an ANN as described in this disclosure. The ANN may include a fully connected layer, and in view of fig. 15B, may include a convolutional layer, a pooling layer, a batch normalization layer, and an activation layer, all of which are explained above and below. In an embodiment, the convolutional neural network layer may be embedded in the ANN. Alternatively, a convolutional neural network may be arranged before the ANN, wherein the output layer of the convolutional neural network partially defines the input layer of the ANN. The placement of the convolutional neural network relative to the ANN is important because the convolutional neural network provides, in part, the processed input image to the input layer of the ANN.

Still referring to fig. 16, fig. 16 shows an example of a generic ANN with N inputs, K hidden layers and three outputs. Each layer is made up of nodes (also referred to as neurons), and each node performs a weighted summation of the inputs and compares the result of the weighted summation with a threshold to generate an output. ANN constitutes a class of functions, the members of which are obtained by varying thresholds, connection weights, or architectural details such as node number and/or node connectivity. The nodes in an ANN may be referred to as neurons (or as neuron nodes), and the neurons may have interconnections between different layers of the ANN system. The simplest ANN has three layers and is called an autoencoder. CNNs of the present disclosure may have more than three layers of neurons and as many output neurons as input neuronsWhere N is the number of data entries in the processed input image data. Synapses (i.e., connections between neurons) store values called "weights" (also interchangeably referred to as "coefficients" or "weighting coefficients"), which manipulate data in calculations. The output of an ANN depends on three types of parameters: (i) patterns of interconnections between different layers of neurons, (ii) a learning process for updating interconnection weights, and (iii) an activation function that converts weighted inputs of neurons into activation of their outputs.

Mathematically, the net function m (x) of a neuron is defined as the other function n_i(x) These other functions may be further defined as combinations of other functions. This can be conveniently represented as a network structure, with arrows depicting dependencies between variables, as shown in fig. 16. For example, an ANN may use a non-linear weighted sum, where m (x) K (Σ)_iw_in_i(x) And wherein K (commonly referred to as an activation function)) Is some predefined function such as hyperbolic tangent.

In fig. 16, neurons (i.e., nodes) are depicted by circles around the threshold function. For the non-limiting example shown in fig. 16, the input is depicted as a circle around a linear function, and the arrows indicate directional communication between the neurons. In certain embodiments, the ANN is a feed-forward network.

The ANN of the present disclosure finds m by searching within the category of the function F to be learned, using a set of observations^*E F to accomplish the particular task, which addresses the particular task in some optimal sense (e.g., meeting the stopping criteria used in operation 1485 of process 435 discussed above). For example, in some embodiments, this may be accomplished by defining a cost function C: F → m such that m is the optimal solution^*，(i.e., the cost of no solution is lower than the cost of the optimal solution). The cost function C is a measure of how far (e.g., error) the optimal solution to the problem to be solved for a particular dissociation. The learning algorithm iteratively searches the entire solution space to find the function with the smallest possible cost. In some embodiments, the cost is minimized over a sample of data (i.e., training data).

Referring now to FIG. 17, FIG. 17 is a hardware description of an ML-VEST according to an exemplary embodiment of the present disclosure.

In FIG. 17, the ML-VEST includes a CPU 1760 that performs the above-described process. The ML-VEST may be a general purpose computer or a specific special purpose machine. In one embodiment, the ML-VEST becomes a specific dedicated machine when the processor 1760 is programmed to perform visual device selection (and in particular, any of the processes discussed with reference to the above disclosure).

Alternatively or additionally, the CPU 1760 may be implemented on an FPGA, ASIC, PLD, or using discrete logic circuitry, as will be appreciated by one of ordinary skill in the art. Further, the CPU 1760 may be implemented as multiple processors working in conjunction in parallel to execute the instructions of the inventive process described above.

The ML-VEST also includes a network controller 1763, such as an Intel Ethernet PRO network interface card, for interfacing with a network 1775. As can be appreciated, the network 1775 can be a public network, such as the internet, or a private network, such as a LAN or WAN network, or any combination thereof, and can also include PSTN or ISDN sub-networks. The network 1775 can also be wired (such as an ethernet network) or can be wireless (such as a cellular network, including EDGE, 3G, and 4G wireless cellular systems). The wireless network may also be WiFi, bluetooth, or any other form of wireless communication known.

During the training process 435, input training images may be obtained from the training database 1736, which is wirelessly connected to the ML-ves via the network 1775, or through the memory controller 1772 via a hardwired connection to the ML-ves. In an embodiment, training database 1736 is a visual device database.

The ML-VEST further includes a display controller 1764, such as a graphics card or graphics adapter for interfacing with a display 1765, such as a monitor. The general purpose I/O interface 1766 interfaces with a keyboard and/or mouse 1767 and a touchscreen panel 1768 on or separate from the display 1765. The general purpose I/O interface also connects to various peripheral devices 1769, including printers and scanners.

A sound controller 1770 is also provided in the ML-VEST to interface with the speaker/microphone 1771 to provide sound and/or music.

The general storage controller 1772 connects the storage media disk 1762 with a communication bus 1773, which may be ISA, EISA, VESA, PCI, etc., to interconnect all the components of the ML-VEST. Descriptions of the general features and functionality of the display 1765, keyboard and/or mouse 1767, as well as the display controller 1764, storage controller 1772, network controller 1763, sound controller 1770, and general purpose I/O interface 1766 are omitted herein for the sake of brevity, as these features are well known.

The example circuit elements described in the context of the present disclosure may be replaced with other elements and constructed in a different manner than the examples provided herein. Furthermore, circuitry configured to perform features described herein may be implemented in multiple circuit units (e.g., chips), or these features may be combined in circuitry on a single chipset.

The functions and features described herein may also be performed by various distributed components of the system. For example, one or more processors may perform these system functions, where the processors are distributed across multiple components in communication with a network. In addition to including various human interaction and communication devices (e.g., display monitors, smart phones, tablets, Personal Digital Assistants (PDAs)), the distributed components may include one or more client and server machines that may share processing. The network may be a private network such as a LAN or WAN, or may be a public network such as the internet. Input to the system may be received via direct user input and may be received remotely in real-time or as a batch process. Additionally, some embodiments may be performed on different modules or hardware than those described. Accordingly, other embodiments are within the scope of what may be claimed.

Obviously, many modifications and variations are possible in light of the above teaching. It is, therefore, to be understood that within the scope of the appended claims, the invention may be practiced otherwise than as specifically described herein.

Embodiments of the disclosure may also be described in parentheses below.

(1) An apparatus for determining a suitable vision device, the apparatus comprising processing circuitry configured to: receiving at least one input, the at least one input comprising an image of a person's face; applying a neural network to the at least one input, the neural network generating at least one fitness metric for the at least one input; and determining the appropriate visual device based on the at least one suitability metric generated by the neural network, wherein the at least one suitability metric corresponds to a correlation synchronization between the person's face and a visual device.

(2) The apparatus of (1), wherein the at least one input comprises a visual device image.

(3) The apparatus of (1) or (2), wherein the at least one input comprises an image of the person's face, wherein the person is wearing a vision device.

(4) The apparatus of any of (1) through (3), wherein the at least one input is processed at least one input that includes morphological features determined from an image of the person's face.

(5) The apparatus of any of (1) to (4), wherein the at least one input is processed at least one input comprising a visual device property determined from an image of the visual device.

(6) The apparatus of any of (1) to (5), wherein the at least one input comprises a visual measurement of the person, the visual measurement being indicative of visual acuity of the person.

(7) The apparatus of any of (1) to (6), wherein the processing circuitry is further configured to train the neural network on a training database, wherein the training database comprises a training image corpus comprising facial images of a person and visual device images, each combination of an image in the facial images of the person and an image in the visual device images being associated in the training database with at least one training suitability metric assigned by a marker group.

(8) The apparatus of any of (1) through (7), wherein the training image corpus comprises images of a person wearing a visual device, each of the images of the person wearing a visual device being associated in the training database with at least one training suitability metric assigned by the marker group.

(9) The apparatus of any of (1) through (8), wherein the neural network includes an implicit input, the implicit input being a predefined set of visual devices, the at least one suitability metric generated by the neural network being at least one matching score of the at least one input to each of the predefined set of visual devices.

(10) The apparatus of any of (1) to (9), wherein to determine the appropriate visual device, the processing circuitry is further configured to: selecting a largest at least one matching score, the largest at least one matching score being the one of the predetermined set of visual devices that best matches the face of the person in the at least one input.

(11) The apparatus of any of (1) through (10), wherein the maximum at least one match score is selected from a vector comprising the at least one match score, each of the at least one match scores in the vector corresponding to one visual device of the predetermined set of visual devices, the at least one match score being based on a percentage of markers of the marker group assigned a same value of the at least one match score.

(12) The apparatus of any of (1) to (11), wherein, to determine the appropriate visual device, the processing circuitry is further configured to: calculating coordinates corresponding to the at least one input; calculating a center of gravity of a cluster associated with each of the predefined set of vision devices; calculating a distance between the coordinate and each center of gravity of the cluster, the distances being ordered in a vector; and selecting a cluster of the clusters that minimizes a distance between the coordinate and each center of gravity of the cluster.

(13) The apparatus of any of (1) to (12), wherein the cluster associated with each visual device of the predefined set of visual devices comprises matching coordinates corresponding to at least one training input that maximizes at least one training matching score during training of the neural network, the at least one training input comprising morphological features of the person's face.

(14) A method for determining a suitable vision apparatus, the method comprising: receiving, by processing circuitry, at least one input comprising an image of a person's face; applying, by the processing circuitry, a neural network to the at least one input, the neural network generating at least one fitness metric for the at least one input; and determining, by the processing circuitry, the appropriate visual device based on the at least one suitability metric generated by the neural network, wherein the at least one suitability metric corresponds to a correlation synchronization between the person's face and visual device.

(15) A non-transitory computer-readable storage medium storing computer-readable instructions that, when executed by a computer, cause the computer to perform a method for determining a suitable vision device, the method comprising: receiving at least one input, the at least one input comprising an image of a person's face; applying a neural network to the at least one input, the neural network generating at least one fitness metric for the at least one input; and determining the appropriate visual device based on the at least one suitability metric generated by the neural network, wherein the at least one suitability metric corresponds to a correlation synchronization between the person's face and a visual device.

(16) The apparatus of any of (1) to (13), wherein the at least one training suitability metric comprises a score assessment of a suitability of a visual device to a human face.

(17) The apparatus of any of (1) through (13) and (16), wherein the marker group includes an ophthalmologist.

(18) The apparatus of any of (1) through (13) and (16) through (17), wherein to determine the appropriate visual device, the processing circuitry is further configured to compare the at least one suitability metric to a predetermined threshold.

(19) The method of (14), wherein the at least one input includes a visual device image.

(20) The method of (14) or (19), wherein the at least one input includes an image of the person's face, wherein the person is wearing a vision device.

(21) The method of any of (14) and (19) to (20), wherein the at least one input is processed at least one input comprising morphological features determined from a facial image of the person.

(22) The method of any of (14) and (19) to (21), wherein the at least one input is processed at least one input, the processed at least one input including a visual device attribute determined from the visual device image.

(23) The method of any of (14) and (19) to (22), wherein the at least one input includes a visual measurement of the person, the visual measurement being indicative of visual acuity of the person.

(24) The method of any of (14) and (19) to (23), further comprising training, by the processing circuitry, the neural network on a training database, wherein the training database comprises a training image corpus comprising facial images of a person and visual device images, each combination of an image in the facial images of the person and an image in the visual device images being associated in the training database with at least one training suitability metric assigned by a marker group.

(25) The method of any of (14) and (19) to (24), wherein the training image corpus comprises images of a person wearing a visual device, each of the images of the person wearing a visual device being associated in the training database with at least one training suitability metric assigned by the marker group.

(26) The method of any of (14) and (19) to (25), wherein the neural network includes implicit inputs, the implicit inputs being a predefined set of visual devices, the at least one suitability metric generated by the neural network being at least one matching score of the at least one input to each of the predefined set of visual devices.

(27) The method of any of (14) and (19) to (26), further comprising selecting, by the processing circuitry, a largest at least one match score that is the one of the predetermined set of visual devices that best matches the face of the person in the at least one input, in order to determine the appropriate visual device.

(28) The method of any of (14) and (19) to (27), wherein the maximum at least one match score is selected from a vector comprising the at least one match score, each of the at least one match scores in the vector corresponding to one visual device in the predetermined set of visual devices, the at least one match score being based on a percentage of markers in the marker group assigned a same value of the at least one match score.

(29) The method of any of (14) and (19) to (28), further comprising, to determine the appropriate vision device: calculating, by the processing circuitry, coordinates, the coordinates corresponding to the at least one input; calculating, by the processing circuitry, a center of gravity of a cluster associated with each of the predefined set of vision devices; calculating, by the processing circuitry, a distance between the coordinate and each center of gravity of the cluster, the distances being ordered in a vector; and selecting, by the processing circuitry, a cluster of the clusters that minimizes a distance between the coordinate and each center of gravity of the cluster.

(30) The method of any of (14) and (19) to (29), wherein the cluster associated with each visual device of the predefined set of visual devices includes matching coordinates corresponding to at least one training input that maximizes at least one training matching score during training of the neural network, the at least one training input including morphological features of the person's face.

(31) The method of any of (14) and (19) to (30), wherein the at least one training suitability metric comprises a score assessment of a suitability of a visual device to a human face.

(32) The method of any of (14) and (19) to (31), wherein the marker group includes an ophthalmologist.

(33) The method of any of (14) and (19) to (32), further comprising comparing, by the processing circuitry, the at least one suitability metric to a predetermined threshold for determining the suitable visual device.

(34) The method of (15), wherein the at least one input comprises a visual device image.

(35) The method of (15) or (34), wherein the at least one input comprises an image of the person's face, wherein the person is wearing a vision device.

(36) The method of any of (15) and (34) to (35), wherein the at least one input is processed at least one input comprising morphological features determined from a facial image of the person.

(37) The method of any of (15) and (34) to (36), wherein the at least one input is processed at least one input, the processed at least one input including a visual device attribute determined from the visual device image.

(38) The method of any of (15) and (34) to (37), wherein the at least one input includes a visual measurement of the person, the visual measurement indicative of visual acuity of the person.

(39) The method of any of (15) and (34) to (38), further comprising training the neural network on a training database, wherein the training database comprises a training image corpus comprising facial images of a person and visual device images, each combination of an image in the facial images of the person and an image in the visual device images being associated in the training database with at least one training suitability metric assigned by a marker group.

(40) The method of any of (15) and (34) to (39), wherein the training image corpus comprises images of a person wearing a visual device, each of the images of the person wearing a visual device being associated in the training database with at least one training suitability metric assigned by the marker group.

(41) The method of any of (15) and (34) to (40), wherein the neural network includes implicit inputs, the implicit inputs being a predefined set of visual devices, the at least one suitability metric generated by the neural network being at least one matching score of the at least one input to each of the predefined set of visual devices.

(42) The method of any of (15) and (34) to (41), further comprising selecting a largest at least one matching score for determining the appropriate visual device, the largest at least one matching score being the one of the predetermined set of visual devices that best matches the face of the person in the at least one input.

(43) The method of any of (15) and (34) to (42), wherein the maximum at least one match score is selected from a vector comprising the at least one match score, each of the at least one match scores in the vector corresponding to one visual device in the predetermined set of visual devices, the at least one match score being based on a percentage of markers in the marker group assigned a same value of the at least one match score.

(44) The method of any of (15) and (34) to (43), further comprising, to determine the suitable vision device, calculating coordinates, the coordinates corresponding to at least one input; calculating a center of gravity of a cluster associated with each of the predefined set of vision devices; calculating a distance between the coordinate and each center of gravity of the cluster, the distances being ordered in a vector; and selecting a cluster of the clusters that minimizes a distance between the coordinate and each center of gravity of the cluster.

(45) The method of any of (15) and (34) to (44), wherein the cluster associated with each visual device of the predefined set of visual devices includes matching coordinates corresponding to at least one training input that maximizes at least one training matching score during training of the neural network, the at least one training input including morphological features of the person's face.

(46) The method of any of (15) and (34) to (45), wherein the at least one training suitability metric comprises a score assessment of a suitability of a visual device to a human face.

(47) The method of any of (15) and (34) to (46), wherein the marker group includes an ophthalmologist.

(48) The method of any of (15) and (34) to (47), further comprising comparing the at least one suitability metric to a predetermined threshold for determining the suitable visual device.

Accordingly, the foregoing discussion discloses and describes merely exemplary embodiments of the present invention. As will be understood by those skilled in the art, the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, as well as other claims. This disclosure, including any readily discernible variants of the teachings herein, defines, in part, the scope of the foregoing claim terminology such that no inventive subject matter is dedicated to the public.

97页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：使用率失真成本作为深度学习的损失函数

Apparatus for machine learning based visual device selection

相关技术

网友询问留言