Retinal vascular measurements

文档序号：39280 发布日期：2021-09-24 浏览：16次中文

阅读说明：本技术 视网膜血管测量 (Retinal vascular measurements ) 是由许为宁李梦莉徐德江黄天荫张艳蕾于 2020-02-11 设计创作，主要内容包括：公开了一种用于训练神经网络以量化视网膜眼底图像的血管口径的方法。所述方法包含：接收多个眼底图像；预处理所述眼底图像以使所述眼底图像的图像特征归一化；以及训练多层神经网络,所述神经网络包括卷积单元、与过渡单元交替用于对由所述神经网络确定的图像特征进行下采样的多个密集块、以及全连接单元,其中每个密集块包括一系列装有多个卷积的cAdd单元,并且每个过渡层包括具有池化的卷积。(A method for training a neural network to quantify a vessel caliber of a retinal fundus image is disclosed. The method comprises the following steps: receiving a plurality of fundus images; pre-processing the fundus image to normalize image characteristics of the fundus image; and training a multi-layer neural network comprising a convolution unit, a plurality of dense blocks alternating with a transition unit for downsampling image features determined by the neural network, and a fully-connected unit, wherein each dense block comprises a series of cAdd units fitted with a plurality of convolutions, and each transition layer comprises a convolution with pooling.)

1. A method for training a neural network for automatic retinal vascular measurements, comprising:

receiving a plurality of fundus images;

pre-processing the fundus image to normalize image characteristics of the fundus image; and

training a multi-layer neural network on the pre-processed fundus image, the neural network comprising a convolution unit, a plurality of dense blocks alternating with a transition unit for down-sampling image features determined by the neural network, and a fully-connected unit, wherein each dense block comprises a series of cAdd units fitted with a plurality of convolutions, and each transition layer comprises convolutions with pooling.

2. The method of claim 1, further comprising grouping input channels of each cAdd unit into non-overlapping groups and adding an output of the cAdd unit to one of the non-overlapping groups, thereby forming an input to a next cAdd unit in the series, and for successive cAdd units in the cAdd unit, an output of a previous cAdd unit in the series is added to a different one of the non-overlapping groups.

3. The method of claim 1 or 2, further comprising:

automatically detecting the center of the optic disc in each fundus image; and

the corresponding image is cropped to a region of predetermined size centered on the optic disc center.

4. The method of claims 1 to 2, wherein pre-processing the fundus images comprises applying global contrast normalization to each fundus image.

5. The method of claim 3, wherein pre-processing the fundus image further comprises median filtering using a kernel of a predetermined size.

6. The method of any of claims 1-5, wherein there are five dense blocks in the plurality of dense blocks.

7. The method of any one of claims 1 to 6, wherein each dense block comprises a series of cAdd cells packed with two types of convolutions.

8. The method of claim 7, wherein the two types of convolutions comprise a 1x1 convolution and a 3x3 convolution.

9. The method of any one of claims 1 to 8, wherein the convolution of each transition layer is a 1x1 convolution.

10. A method of quantifying a vascular aperture of a retinal fundus image, comprising:

receiving a retinal fundus image; and

applying a neural network trained according to any one of claims 1 to 9 to the retinal fundus image.

11. A computer system for training a neural network to generate retinal vascular measurements, comprising:

a memory; and

at least one processor, the memory storing a multi-layer neural network and instructions that, when executed by the at least one processor, cause the at least one processor to:

receiving a plurality of fundus images;

pre-processing the fundus image to normalize image characteristics of the fundus image; and

training the neural network on the pre-processed fundus image, the neural network comprising a convolution unit, a plurality of dense blocks alternating with a transition unit for down-sampling image features determined by the neural network, and a fully-connected unit, wherein each dense block comprises a series of cAdd units fitted with a plurality of convolutions, and each transition layer comprises convolutions with pooling.

12. The computer system of claim 11, wherein the instructions further cause the processor to group input channels of each cAdd unit into non-overlapping groups and add outputs of the cAdd units to one of the non-overlapping groups, thereby forming inputs to a next cAdd unit in the series, and for successive cAdd units in the cAdd unit, outputs of a previous cAdd unit in the series are added to a different one of the non-overlapping groups.

13. The computer system of claim 11 or 12, wherein the instructions further cause the processor to:

automatically detecting the center of the optic disc in each fundus image; and

the corresponding image is cropped to a region of predetermined size centered on the optic disc center.

14. The computer system of any of claims 11 to 13, wherein the instructions cause the processor to pre-process the fundus images by applying global contrast normalization to each fundus image.

15. The computer system of any of claims 11 to 14, wherein there are five dense blocks in the plurality of dense blocks.

16. The computer system of any one of claims 11 to 15, wherein each dense block comprises a series of cAdd cells packed with two types of convolutions.

17. The computer system of claim 16, wherein the two types of convolutions comprise a 1x1 convolution and a 3x3 convolution.

18. The computer system of any of claims 11 to 17, wherein the convolution of each transition layer is a 1x1 convolution.

19. The computer system of any of claims 11 to 18, wherein the neural network is trained on the pre-processed fundus image to quantify a vascular caliber of a retinal fundus image.

Technical Field

The present invention relates to a depth learning system for automatic retinal vessel measurement from fundus pictures.

Background

Clinical studies have shown that alterations in retinal vascular structure are early warnings of underlying cardiovascular disease (CVD) and other conditions, such as dementia and diabetes. This is because the condition of the retinal arterioles and venules reflect the condition of blood vessels elsewhere in the body.

Currently, scoring of retinal photographs by human evaluators is challenged by implementation issues, availability and training of evaluators, and long-term financial sustainability. Deep Learning Systems (DLS) have been proposed as an option for large-scale analysis of retinal images. DLS processes natural raw data using artificial intelligence and representation learning methods (rendering-learning methods) to identify complex structures in high-dimensional information. In contrast to traditional pattern recognition type software to detect specific images, patterns (patterns) and lesions, DLS uses large datasets to enable the mining, extraction and machine learning of meaningful patterns or features.

The performance of DLS depends in part on the connectivity of the layers of the neural network that extract features from the image. The greater the number of available features, the higher the confidence of the evaluation. However, this is at the cost of memory and other computer resources. Problems, such as vanishing gradient (waning gradient) problems, also exist when seeking to train the neural network to ensure that errors are propagated back through the neural network.

It would therefore be desirable to provide a method of training DLS through a wide variety of retinal images to address the problems noted in the current prior art and/or to provide the public with a useful choice.

Disclosure of Invention

The invention relates to a novel deep learning system for automatic retinal vessel measurement for non-invasive observation of cardiovascular disorders. In particular, embodiments of the present invention relate to methods for obtaining retinal image characteristics and automatically computing a metric related to a medical condition (measure) based on retinal vascular characteristics.

Disclosed herein is a method for training a neural network for automatic retinal vascular measurements, the method comprising:

receiving a plurality of fundus images;

pre-processing the fundus image to normalize image characteristics of the fundus image; and

in training a multi-layer neural network, the neural network comprises convolution units (convolution units), a plurality of dense blocks (dense blocks) alternating with transition units (transition units) for downsampling image features determined by the neural network, and fully-connected units (fully-connected units), wherein each dense block comprises a series of cAdd units fitted with a plurality of convolutions, and each transition layer comprises a convolution with pooling (Pooling).

The method may further comprise: grouping the input channels of each cAdd unit into non-overlapping groups and adding the output of the cAdd unit to one of the non-overlapping groups, thereby forming an input to the next cAdd unit in the series, and for successive cAdd units in the cAdd unit, the output of the previous cAdd unit in the series is added to a different one of the non-overlapping groups. In this context, the cAdd units form a series. During processing, an input is provided to the first cAdd unit in the series, and that unit processes the input and passes it to the next cAdd unit in the series, and so on until the last cAdd unit. As a result, it will be understood that each cAdd unit in a given series (except for the first cAdd unit) will have a "previous" cAdd unit, which is the unit from which each cAdd unit receives output. Similarly, it will be understood that each cAdd unit in a given series (except for the last cAdd unit) will have a "next" cAdd unit to which each cAdd unit passes its output.

The method may include: the center of the optic disc in each fundus image is automatically detected, and the corresponding image is cropped to a region of a predetermined size centered on the optic disc center.

Pre-processing the fundus images may include applying global contrast normalization to each fundus image. Preprocessing the fundus image may also include median filtering using kernels (kernel) of a predetermined size.

Preferably, there are five dense blocks in the plurality of dense blocks. Each dense block may comprise a series of cAdd cells that contain both types of convolutions. The two types of convolutions may include a 1x1 convolution and a 3x3 convolution.

The convolution for each transition layer may be a 1x1 convolution.

Also disclosed herein is a method of quantifying a vascular aperture of a retinal fundus image, the method comprising:

receiving a retinal fundus image; and

applying a neural network trained according to the method described above to the retinal fundus image.

Also disclosed herein is a computer system for training a neural network to generate retinal vascular measurements, the computer system comprising:

a memory; and

at least one processor, the memory storing a multi-layer neural network and instructions that, when executed by the at least one processor, cause the at least one processor to:

receiving a plurality of fundus images;

pre-processing the fundus image to normalize image characteristics of the fundus image; and

The instructions may also cause the processor to group the input lanes of each cAdd unit into non-overlapping groups and add the output of the cAdd unit to one of the non-overlapping groups, thereby forming an input to a next cAdd unit in the series, and for successive cAdd units in the cAdd unit, the output of a previous cAdd unit in the series is added to a different one of the non-overlapping groups.

The instructions may also cause the processor to:

automatically detecting the center of the optic disc in each fundus image; and

the corresponding image is cropped to a region of predetermined size centered on the optic disc center.

The instructions may cause the processor to pre-process the fundus images by applying global contrast normalization to each fundus image.

There may be five dense blocks in the plurality of dense blocks. Each dense block may comprise a series of cAdd cells that contain both types of convolutions. The two types of convolutions may include a 1x1 convolution and a 3x3 convolution.

The convolution for each transition layer may be a 1x1 convolution.

The neural network may be trained on the pre-processed fundus image to quantify a vascular caliber of the retinal fundus image.

Drawings

Some embodiments will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which:

FIG. 1 illustrates a cloud-based interface for interacting with a system according to the present teachings for evaluating retinal fundus images;

FIG. 2 illustrates a neural network for use in the present method;

FIG. 3 schematically illustrates the propagation mechanism of element-wise Addition (eAdd) and channel-wise Concatenation (cCon);

FIG. 4 illustrates the overall architecture of a deep neural network using cAdd;

FIG. 5 illustrates one embodiment of a propagation mechanism with a cAdd level that includes four cAdd units;

FIG. 6 illustrates portability of cAdd into existing architectures by replacing different propagation mechanisms;

FIG. 7 illustrates a bottleneck unit (bottomless unit) using different propagation mechanisms;

FIG. 8 shows a comparison of ResNet and cResNet performance on CIFAR-10;

FIG. 9 shows a training curve and a test curve for cResNet-1224;

FIG. 10 shows training curves for Wide-ResNet (WRN) and cWRN on CIFAR-10;

FIG. 11 shows a comparison of the training curves for Condensenet and cCondensenet on CIFAR-10;

FIG. 12 shows neuron weights in convolutional layers of an architecture using cAdd, eAdd, and cCon;

FIG. 13 illustrates a method for training a neural network to classify fundus images and for subsequently classifying fundus images; and

FIG. 14 schematically illustrates a system for performing the method of FIG. 13, or by which a cloud-based platform implementing the method of FIG. 13 may be accessed.

Detailed Description

A system is described herein that utilizes deep learning to estimate retinal vascular parameters, such as vessel caliber and other measurements, such as, for example, vessel density. This method has been developed and tested using-10,000 retinal images from various population-based studies. It can be effectively used for large-scale scoring of population-based studies. This represents a significant time and cost savings for the clinician researcher. Furthermore, due to the breadth of the training data used, the system is not limited to a particular population/race. It can be used as such for the general population. This removes geographic restrictions on clinician researchers, enabling the system to be usefully applied to cloud-based platforms that are accessible from anywhere.

Embodiments of the method and system for performing the method employ pre-processing to normalize image factors (image factors). So that the method is not limited to a particular model/type of retinal fundus camera. The method can be used as is for any optic disc centered retinal image with sufficient field of view.

Previous systems, such as the system described in WO 2019/022663, the entire contents of which are incorporated herein by reference, provide a semi-automated platform for large-scale scoring of retinal images. However, manual input is often required to locate the optic disc, correct the vessel type, and edit the tracked vessel and segment width. Embodiments disclosed herein eliminate such manual input, thus allowing retinal images to be scored more easily and quickly and resulting in significant time savings for population-based studies. The automatic scoring system has the following advantages: can be used as needed and guarantees perfectly reproducible results.

A method 100 to achieve this is set forth in fig. 13. The method 100 is used to train a neural network for automatic retinal vascular measurements and broadly includes:

step 102: receiving a plurality of fundus images;

step 104: pre-processing the fundus image to normalize image characteristics of the fundus image; and

step 106: training a multi-layer neural network on the pre-processed fundus image, the neural network comprising a convolution unit, a plurality of dense blocks alternating with a transition unit for down-sampling image features determined by the neural network, and a fully-connected unit, wherein each dense block comprises a series of cAdd units fitted with a plurality of convolutions, and each transition layer comprises convolutions with pooling.

The method 100 and the system implementing it enable the calculation of quantifiable measurements in retinal images for large scale scoring of retinal fundus images. As a result:

a) the step of performing pre-processing of the input retinal image minimizes deleterious aspects of the retinal image and standardizes the retinal image score;

b) the multilayer neural network forms a deep learning framework for quantifying retinal vascular parameters; and

c) the method may be implemented on a cloud-based platform to facilitate large-scale population studies over time.

The preprocessing step 104 is intended to remove variances in the fundus image caused by changes in illumination, camera, angle, and other factors. In other words, the preprocessing step 102 is intended to remove a confounding effect (noise effect) of noise characteristics in the fundus image.

The width of the blood vessels in the retinal image is measured based on the number of pixels. However, the units of vascular parameters such as CRAE and CRVE are in microns. Each Image has an Image Conversion Factor (ICF) that gives a mapping from the number of pixels to microns. Images from different fundus cameras may have different ICFs due to magnification effects, image resolution, and other reasons.

The method normalizes each image by resizing each image. In one embodiment, the ratio cf/16.38 is used to resize, where cf is the ICF of the image and 16.38 is the ICF of the normalized image. So that the images are all at the same size and pixel to micron mapping.

The present preprocessing step 104 also involves automatically detecting the center of the optic disc in each fundus image, and cropping the corresponding image to a region of predetermined size centered on the optic disc center. In particular, step 104 involves region cropping to focus neural network training and any subsequent assessment of the retinal fundus on the features of the optic disc. The optic disc detection algorithm is described in WO 2019/022663 and is incorporated herein by reference. The optic disc detection algorithm is applied to the ICF normalized image to locate the optic disc center. The image is then cropped to a region of predetermined size, e.g., 512 x 512, centered on the optic disc.

Once the optic disc has been identified and the image cropped to a standard area size, the image is normalized to remove noise resulting from, for example, camera calibration and variations in camera type. To achieve this, global contrast normalization is applied over the cropped image to reduce color variation between retinal images from different ethnicities. The total range of contrast for the entire image is scaled by a scaling factor to normalize the contrast variation, and then each pixel contrast is scaled using the same scaling factor.

After contrast normalization, the image is filtered to remove noise. Currently, median filtering is used with kernels of a predetermined size, e.g., 5 pixels.

After preprocessing the images according to step 104, they are used in training to quantify retinal vascular parameters. A deep learning architecture is used for the quantization. In particular, step 106 involves a propagation mechanism known as channel-wise addition. As mentioned above, the input is pre-processed to produce 512 x 512 cropped images. The preprocessed images are then passed to a neural network to train the neural network, step 106. The output of the neural network is a fully connected layer with predicted calibers (craeB, craeC, crveB, crveC) or other measurements of the input image. The neural network may have various compositions, but includes at least a convolution unit, followed by a plurality of dense blocks (currently five) alternating with transition units to downsample the features, and a fully-connected unit. Each dense block comprises or is a series of channel-level additions (cAdd) equipped with a number of convolutions (currently two types of convolutions-which in the illustrated embodiment are 1x1 and 3x 3). Each transition cell now includes a convolution with pooling (which in the illustrated embodiment is 1x 1).

In neural networks, channel-level addition (cAdd) is applied to each image, after which the image passes through the convolutional layer and maximum pooling — currently, the convolutional layer uses a 7x7 window and a stride of 2 (stride), and the maximum pooling layer uses a 3x3 window with a stride of 2.

The convolutional layer in this embodiment is followed by a series of five dense blocks alternating with transition layers/cells. The transition unit downsamples the features, i.e., the features detected by the previous layer/unit in the neural network. Each dense block includes a series of cells. Each transition layer includes convolution followed by average pooling — currently the convolution is a 1x1 convolution followed by 2x2 average pooling. Finally, the output is a regression layer with one output node.

Table 1: detailed architecture of neural networks

This architecture is illustrated in the flow chart of fig. 2, where, in use, the cropped retinal image 200 is passed to a convolution unit 202, followed by dense blocks 204 alternating with a transition layer 206 (the last dense block may be followed by no transition units), followed by a fully connected unit 208 that outputs the vessel caliber (output 210). Each dense block includes a series of cAdd cells 212 that contain both types of convolutions, as detailed in table 1. The output size is gradually reduced by the transition unit, and the output of the full-connection unit is 1 × 1.

Any suitable error function may be used in order to propagate the error through the neural network. Currently, the mean absolute error is used as a loss function (loss function).

The model was then trained using a random gradient descent with a newton momentum of 0.9, no damping, and a weight decay of 10 "4. During testing, a batch size of 80 was used, with a learning rate of cosine and a drop rate (drop ratio) of 0.2.

Recent Deep Neural Networks (DNNs) utilize identity mapping (identity mapping) involving element-level addition (eAdd) or channel-level stitching (cCon) for the propagation of these identity mappings. Unlike cCon, cAdd can eliminate the need to store a profile, thus reducing memory requirements.

As described with reference to fig. 6 and 7, the proposed cAdd mechanism is more generally compatible with convolutional neural architectures, and can deepen and widen the neural architecture with fewer parameters than cCon and eAdd. To illustrate, cAdd has been incorporated into current state-of-the-art architectures such as ResNet, WideRes-Net, and CondenseNet, and as discussed below, CIFAR-10, CIFAR-100, and SVHN were experimented to demonstrate that cAdd-based architectures can achieve much higher accuracy with much fewer parameters than their corresponding base architectures.

In particular, deeper and wider neural networks often yield better performance. However, deep and wide networks suffer from vanishing gradients and quadratic growth in the number of parameters. Furthermore, computational complexity and memory requirements are also escalating in these architectures, which makes scalable learning more difficult to implement in real-world applications.

The depth of the neural architecture is critical to its performance. Current neural architectures use identity maps in the form of skip connections (skip connection) to increase their depth. This allows the gradient to be passed directly backwards, thus allowing an increase in depth without the problem of disappearing the gradient. As mentioned above, propagation of these identity maps from one block to the next is achieved via eAdd or cCon.

Fig. 3 illustrates the eAdd and cCon propagation mechanisms. In the eAdd 200, the addition is performed on the corresponding elements so the input size of each cell remains the same-for example, six input channels 202 produce six output channels 204. On the other hand, the cCon 206 splices the inputs 208 from all the preceding cells, thus increasing the input size 210 a second time for each subsequent cell. As a result, the cCon can learn more complex features, however, it requires more memory to store the spliced input.

To maintain the feature complexity of, for example, a cCon, while saving memory by avoiding a quadratic increase in input size, cAdd can be easily incorporated into any prior art neural architecture. As a result, computational and memory requirements are reduced while achieving high accuracy.

To keep the memory requirements small, small residual portions are sequentially generated and added to the portion of the channel of the identity portion in one cell. This unit is repeated multiple times until all channels are added. By this, the depth of the network is increased and the parameters are reduced.

Fig. 4 shows the overall architecture 300 of a neural network using cAdd. It has several stages 302, 304, 306, and the cAdd cells within each stage have the same resolution for the input and output profiles — this allows channel stage addition. In some embodiments, the resolution across stages may vary — this will enable down-sampling by the transition unit.

This cAdd design has several advantages. cAdd provides a shortcut that allows the gradient to bypass the cell directly. This mitigates the vanishing gradient. cAdd adds back the output feature instead of splicing. As a result, the input is kept at the same size for each cell and less memory is required. More complex features can also be generated because cAdd significantly increases the width and depth of Convolutional Neural Networks (CNNs). Furthermore, fewer parameters are required when compared to existing neural networks having the same width and height.

These advantages are demonstrated by experimental results on CIFAR-10, CIFAR-100 and SVHN, which demonstrate the efficacy of the proposed transmission mechanism. As discussed with reference to fig. 8-12, cAdd-based neural networks achieve higher accuracy with fewer parameters at all times than their corresponding networks.

With respect to neural networks using the eAdd propagation mechanism, depth is crucial for achieving higher performance for the eAdd based neural networks. However, it is difficult to optimize the deep neural network. eAdd was introduced in ResNet to significantly deepen the neural network and also to ease the training process. It has been widely used in many deep neural networks including inclusion-ResNet, Wide-ResNet, ResNeXt, PyramidNet, Shake-ShakenNet, and ShuffleNet. It is also adopted by AlphaGo and auto-designed architectures like NASNet, ENAS and AmoebaNets.

The width of the neural network is also critical to achieving accuracy. Unlike ResNet, which achieves higher performance by simply stacking element-level additions, Wide-ResNet widens the network by increasing the input channels along the depth. Experimental results show that 16 layers of Wide-ResNet outperforms thousands of layers of ResNet in both accuracy and efficiency. For Wide-ResNet, the increase in width only occurs between stages, and the input size within a stage remains the same. PyramidNet uses a broadening step factor (size factor) to gradually increase its width in a pyramid-like shape, which has been experimentally demonstrated to improve generalization capability. Resext uses multi-branch element-level addition-by replacing a unique branch with a set of small homogeneous (homogeneous) branches. Simply adding more branches may improve the performance of ResNext. Instead of summing all the small branches directly, Shakeshake Net uses random affine combination to significantly improve generalization capability.

Unlike manual designs, which require human expertise, automatically designed architectures search the entire architecture space to find the best design. Although the learned architecture has many different small branches, the obvious characteristic is that they all use the eAdd to sum the branches.

Since the eAdd requires that the output size be at least the same as or larger than the input size, the neural network can become deeper or wider when the number of parameters is limited, but not both. There is therefore a trade-off between neural network width and depth.

With regard to neural networks using cCon, such as DenseNet, features from all preceding units are used as inputs to generate a small number of outputs that are passed to subsequent units. While this enhances feature propagation and reuse, it is not necessary to use all existing features as input for each subsequent layer.

CondenseNet selects only the most relevant inputs by learned group convolution. It sparsifies the convolutional layers by pruning away unimportant filters during the compression phase and optimizes the sparsified model in the second half of the training process. CondenseNet is more efficient than compact mobileenes and shefflenets, which are specifically designed for mobile devices that use depth-wise separable convolutions, due to pruning of redundant filters.

For the architecture of the auto-design, the cCon is used entirely in their most accurate model, especially for the combination of all cell outputs (cell outputs). However, since concatenation increases the input size linearly, this also increases the number of parameters and memory requirements. Instead, the proposed cAdd can keep the input size constant by adding the output back to the selected input. Furthermore, the eAdd enables the neural network to be deepened or widened, but not both. Conversely, cAdd can both deepen and widen the neural network for the same number of parameters.

As a result, cAdd combines the benefits of the eAdd and cCon propagation mechanisms to deepen and widen the network with fewer parameters. FIG. 5 shows propagation at a cAdd level 400 across four units using cAdd. Each cell must generate a small number of output channels, the outputs thus generated then being added back to the corresponding skip connections to form the inputs to the next cell. In the embodiment shown in fig. 5, first cAdd unit 402 generates output 404, which output 404 is then added as an input for the skip connection of subsequent cAdd unit 406. Specifically, cAdd unit 402 generates three outputs 404. The three outputs 404 are then added back to the first three skipped connections 408 (i.e., skipped by cAdd 402) to form inputs to a second cAdd unit 406.

Let M be the number of input channels. To ensure that all skipped connections are covered, the input channels of each cell are grouped into non-overlapping portions or groups. For each cAdd unit, the outputs of all input channels are added to one of the non-overlapping groups, where for consecutive cAdd units, the output group is a different one of the non-overlapping groups.

The size of each section (i.e., non-overlapping group) is controlled by a parameter α such that each section 410, 412 has exactly one section, except the last section 414A channel, the last part 414 havingAnd (c) a plurality of channels, wherein R is the remaining channel. With further reference to FIG. 5, the second input portion 412 hasA channel. These channels were added to cAdd₂The output of unit 406 is overwritten. The third and last input section 414 hasA channel that is skipped in cAdd 402 and cAdd 404 and added to cAdd₃The output of element 416 is overwritten. As a result, each cAdd unit skips all but one non-overlapping group, and each non-overlapping group that is skipped in one cAdd unit is connected in at least one other cAdd unit. In fact, in the embodiment shown in fig. 5, of the α non-overlapping groups 410, 412, 414 and the α +1 cAdd cells 402, 406, 416, 418 in each cAdd stage 400, each non-overlapping group is connected in exactly one cAdd cell except for one non-overlapping group 414, which non-overlapping group 414 is connected to the first cAdd cell 402 that receives the input lane to cAdd stage 400And a second cAdd unit 418 that provides an output from cAdd stage 400.

For the addition operation to be meaningful, the number of generated outputs from one unit must match the number of channels to be covered in the next unit. Mathematically, the number of output channels of the kth cAdd unit is given by:

to analyze the propagation mechanism, let X ═ X₁，x₂，…，x_M]Is an input to the cAdd unit, and Y ═ Y₁，y₂，…，y_N]Is the output of X after passing through the non-linear transformation function F (-) of the convolution block, which may have different layers consisting of Batch Normalization (BN), modified linear units (ReLU), and convolution layer (Conv), in other words,

the cAdd unit adds its output Y back to a portion of its input X to form the input X' of the next unit as follows:

X′＝X+TY (3)

where T is an M N sparse matrix if y_jWill be added to x_i，T_ij＝1。

According to equations 2 and 3:

now consider the propagation from cAdd units s to cAdd units e, whose corresponding inputs are X, respectively^SAnd X^e. This results in:

let E be the error penalty. Then X^sThe gradient above can be expressed as:

it is not possible for all training samples within a batch to have the component in equation (6) always equal to-1. Thus, this suggests that cAdd propagation mechanisms can alleviate the vanishing gradient problem.

To analyze the parameters, a predetermined parameter α is used to control the complexity of each cAdd unit. A large α would mean that the number of output channels is significantly reduced, resulting in a reduced number of parameters of the neural network. Therefore, it is important to use cAdd to analyze the number of parameters in the neural architecture.

Fig. 6, comprising fig. 6(a) to 6(d), shows the basic units of a neural architecture using different propagation mechanisms. The symbols are:

conv (I, O, L, L). A convolutional layer with I input channels, O output channels, and a kernel size of L.

BN (I). Batch normalization with I input channels.

ReLU. And correcting the linear unit.

In these embodiments, a simple alternative has been applied where in fig. 6(a) and 56b) the outputs of the bulk normalization layers 500, 502 are fed to the eAdd 504 and cAdd 506, respectively, and each takes the same input channel. Similarly, in fig. 6(c) and 6(d), input channel 508 and convolutional layer outputs 510, 516 may be fed to cAdd 512 in the same manner as cCon 514. In each case, non-overlapping groups are propagated through the network.

For a fair comparison, assume that the growth rate g of the cCon units is M/α, so that the cCon units have the same number of outputs as cAdd. Table 2 gives the number of parameters required for a neural network with M input channels and U elementary units.

TABLE 2 comparison of the required parameters

The neural network using cAdd has about 2 α times less parameters than the network using eAdd. In other words, with the same number of parameters, the depth of the neural network using cAdd may be increased by 2 α, or the width may be increased compared to using eAddSuch an increase may improve the generalization capability of the neural network, thus leading to higher accuracy.

The number of parameters required for cCon in table 2 is greater than cAdd. (M/α)^2*L^2*(U²The residual part of-U)/2 is introduced by the stitching operation.

In addition to the ability to widen and deepen the network using cAdd when compared to eads and reduce parameter requirements when compared to cCon, cAdd units can be easily incorporated into existing neural networks by replacing their corresponding eAdd and/or cCon units.

For neural networks using eAdd, there are two kinds of units-a base unit and a bottleneck unit. In the oad elementary unit, the number of output channels must be the same as the number of input channels for element-level addition. This is no longer the case when we replace the eAdd with cAdd. Under cAdd operation, the number of output channels, O, is determined based on equation 1. In other words, for cAdd, we simply change the initial convolution layer of the basic unit of the eAdd from Conv (M, L) to Conv (M, O, L), O < < M.

The bottleneck cell of eAdd uses convolutional layers with a core size of 1x1 to spatially combine a large number of input feature maps with few parameters (see bottleneck cell 600 of fig. 7 (a)). Because of the element-level addition requirement 602, an additional convolutional layer 604 is needed to extend the size of the output channel back to M. However, this is not necessary for channel-level addition. Fig. 7(b) shows a modified cAdd bottleneck unit 606 with cAdd 608. Similar adaptations can be applied to variants such as those used in preactivation and PyramidNet.

Adapting a cCon-based neural network to use cAdd is simple, whereby the number of output channels for the base unit and the bottleneck unit is determined using equation 1 rather than by the growth rate.

The potency of cAdd was experimentally compared to eAdd and cCon. Three widely used CNN architectures-ResNet, WRN, and CondenseNet-are adapted to use cAdd, as described in the previous section. The adapted architectures are referred to as cResNet, cWRN, and cCondenseNet, respectively. Each architecture has 3 stages.

Using a Newton momentum with 0.9, no damping and 10^-4The network is trained with a decreasing random gradient of the decay of the weight. For a fair comparison, all training settings (learning rate, batch size, epoch (epoch), and data augmentation) are the same as in the original paper, unless otherwise noted. The following data sets were used:

CIFAR-10: it has 10 object classes, each with 6000 32x32 color images. There were 50,000 images for training and 10000 for testing.

CIFAR-100: it has 100 classes, each with 600 32x32 color images. The training and test sets contained 50,000 and 10,000 images, respectively.

SVHN: this has over 600,000 images of real world house numbers of 32x 32. There were 73,257 images for training, 26,032 for testing, and an additional 531,131 for additional training.

In this set of experiments, the performance of ResNet and cResNet was examined. Like ResNet, all dresnts are trained over 300 epochs using a batch size of 128 (α ═ 7). The learning rate starts at 0.1 and decreases by 10 after the 150 th and 225 th epochs. For 1224 layers cResNet, the initial learning rate was 0.01 over the first 20 epochs, and then returned to 0.1 to continue training.

Table 3 shows the results of ResNet, Pre-activated ResNet, and cResNet on CIFAR-10, CIFAR-100, and SVHN datasets. ResNet-20, with 27 ten thousand parameters, has a depth of 20 and the widths of the three stages are 16, 32 and 64, respectively. In contrast, the cResNet-86 with a significant number of parameters (21 ten thousand) has a depth of 86 and its corresponding widths are 84, 112 and 140. The increased width and depth of cResNet-86 over ResNet-20 enables it to have much higher accuracy on CIFAR-10. In fact, cResNet-86 has better accuracy than ResNet-56 on CIFAR-10, CIFAR-100 and SVHN datasets with four times the number of parameters.

Table 3 Top-1 error rates for ResNet and cResNet. The width is the number of input channels in the three stages. And + indicates that the results are from accepted references. The results of cResNet were averaged over 5 runs in the format of "mean + -std".

The difference in accuracy expands significantly when the width of cResNet-86 is increased to 168-196-308 such that it has a number of parameters (84 ten thousand) comparable to ResNet-56. In the experiment, cResNet-86 also outperforms ResNet-110, ResNet-164, and pre-activated ResNet-164, which ResNet-110, ResNet-164, and pre-activated ResNet-164 have twice the number of parameters. It is seen that a cResNet-170 with 165 ten thousand parameters gives the best results among all Resnets and pre-activated Resnets.

FIG. 8 shows top-1 error rates of cResNet and ResNet on a CIFAR-10 dataset as a function of the number of parameters. Clearly, the error rate of cResNet is always lower than ResNet for the same number of parameters. The graph of fig. 8 also shows that ResNet has a parameter at its lowest error rate that is 8 times that of crenet.

Another advantage cAdd has over eAdd is its ability to reduce overfitting. ResNet-1202 has 1940 ten thousand parameters and, due to overfitting, has a higher error rate than ResNet-110. On the other hand, cResNet-1224, which is much wider and deeper than ResNet-1202, achieved a minimum top-1 error rate of 4.06 on CIFAR-10 (see Table 3) overfitting, as demonstrated by its training and testing curves in FIG. 9.

The performance of WRN and cWRN was also examined experimentally. Similar to WRN, cWRN is trained over 200 epochs using a batch size of 128 (α ═ 7). The learning rate starts at 0.1 and anneals (aneal) 5 times after the 60 th, 120 th and 160 th epochs for the CIFAR-10 and CIFAR-100 datasets. For SVHN datasets, cWRN was trained over 160 epochs using a batch size of 128, and optimized by dividing the initial learning rate of 0.01 by 10 after the 80 th and 120 th epochs.

The results are given in table 4. All cWRN are much wider and deeper than the corresponding WRN and can achieve a lower top-1 error rate with fewer parameters on all three datasets. Specifically, cWRN-130-2 wins WRN-52-1 over all three datasets with half the parameters (39 million versus 76 million). Overall, cWRN-88-13 gives the best performance.

TABLE 4 Top-1 error rates for WRN and cWRN. The width is the number of input channels in the three stages. Results for cWRN were averaged over 5 runs in the format "mean + -std".

FIG. 10 shows the top-1 error rates of cWRN and WRN on the CIFAR-10 dataset as a function of the number of parameters. cWRN is shown to have 1.4 times less parameters than WRN for the same error rate.

Finally, the performance of cAdd in CondenseNet was examined. All ccondeneset (α ═ 6) were trained over 300 epochs using a batch size of 64, and cosine-shaped learning rates from 0.1 to 0 were used. The cCondenseNet-254 was trained over 600 epochs using a 0.1 drop rate to ensure a fair comparison with CondenseNet-182.

Table 5 shows the results that cCondenseNet-254 gave the best performance on both CIFAR-10 and CIFAR-100. It has 456 input channels that are 38 times the width of CondenseNet-182 and 254 convolutional layers that are 1.4 times the depth of CondenseNet-182. cCondenseNet-146 and cCondenseNet-110 are clearly much wider and deeper with fewer parameters than their counterparts, CondenseNet-86 and CondenseNet-50. Specifically, although cCondensenet-110 has a parameter of less than 3 million than Condensenet-50, its top-1 error rate is less than that of Condensenet-50, 5.74 vs 6.22.

TABLE 5 Top-1 error rates for Condensenet and cCondensenet. The width is the number or growth rate of the input channels in the three stages. The results of ccondeneset were averaged over 5 runs in the format of "mean ± std".

FIG. 11 compares top-1 error rates on CIFAR-10. Clearly, for the same error rate, ccondeneset has a parameter 1.4 times less than that of CondenseNet.

To determine what is happening in the neural network, a Weight Norm (Weight Norm) can be used to measure the liveness of neurons during feature learning. FIG. 12 shows the mean and standard deviation of the neuron weights within each convolutional layer of a neural network trained using cAdd (adapted from ResNet-26 and DenseNet-28), eAdd (ResNet-26) and cCon (DenseNet-28). As shown, neurons in cAdd-based networks have greater weight than the eAdd and cCon-based networks. This indicates that cAdd neurons are more active than eAdd and cCon neurons during feature learning. This may be due to the fact that eAdd and cCon introduce a large number of learning weights, many of which are close to zero and can be pruned without sacrificing accuracy. Using cAdd, the number of weights can be reduced, resulting in fewer parameters as set forth in table 2, and higher accuracy as shown in tables 3-5.

As discussed with reference to fig. 3-6, depth and width are important dimensions for the neural network to achieve higher performance. Depth controls the complexity of the learned features. Deeper neural networks may learn more complex features, while wider networks enable more features to participate in the final classification.

For cAdd-based architectures, we have the flexibility to increase the depth or width or both, and still retain about the same number of parameters. Therefore, it is useful to study the effect of depth and width of cAdd-based architectures on their classification accuracy. To do this, ResNet-56 with 85 ten thousand parameters and CondenseNet-86 with 52 ten thousand parameters were used as baselines, and the different cResNet and cCondenseNet were designed to have about the same number of parameters at different depths and widths. Table 6 shows the results on both CIFAR-10 and CIFAR-100 datasets.

TABLE 6 Top-1 error rates for cResNet and cCondenseNet on CIFAR-10 and CIFAR-100 datasets.

As shown in table 6, the best performance was obtained when the increase in depth was balanced with the increase in width, indicating that both depth and width were equally important. This is significant because the performance of a neural network depends on the number of features and the complexity of those features.

As stated above, the channel-level additive propagation mechanism may be used to deepen and widen the neural network with significantly fewer parameters when compared to other propagation mechanisms.

The above may then be used in step 106 to train a multi-layer neural network by passing the pre-processed fundus image to the cAdd-level input channel, followed by passing the output (of cAdd-level) to the convolutional layer of the neural network.

To facilitate access to the system, a cloud-based Web system may be used that provides the one-stop interface shown in fig. 1, allowing the user to manage the research learning and processing of digital fundus photographs for objective assessment of retinal vascular parameters. This platform is designed to meet the needs of all levels of participating users, providing a simple solution for the management of research learning across different teams of an organization. The cloud-based platform can also be accessed from most internet-enabled devices, thereby reducing the impact of system usage due to geographical limitations. Fig. 1 shows the key functions of platform 108: a logged-in user may log into the platform via interface 110 and view and modify study details, upload images, generate blood vessel parameter data through interfaces 112, 114, and 116.

A system for performing the method of fig. 13 and for classifying fundus images using the method is also provided. In this regard, the present method will be understood to be performed, accessed or embodied on a system that includes a memory and a processor (which may be distributed across multiple servers), the memory including instructions that when executed by the processor result in the performance of the method. The system may be a stand-alone system or, in a preferred embodiment, deployed on a cloud platform. The serverless computational model may also be used to build and host a neural network and interface for accessing it. This allows the system to be instantly scalable without human intervention.

FIG. 14 is a block diagram illustrating an exemplary mobile computer device 1000 in which embodiments of the present invention may be practiced. The mobile computer device 1000 may be a mobile computer device such as a smart phone, a Personal Data Assistant (PDA), a handheld computer, and a multimedia internet enabled cellular telephone. For ease of description, reference is made hereinafter to, for example, by Apple^TMManufactured by Inc. as iPhone^TMOr by LG^TM、HTC^TMAnd Samsung^TMThe mobile computer device 1000 is depicted by way of non-limiting example as a manufactured mobile device.

As shown, the mobile computer device 1000 includes the following components in electronic communication via the bus 1006:

(a) a display 1002;

(b) a non-volatile (non-transitory) memory 1004;

(d) n processing components 1010;

(e) a transceiver component 1012 comprising N transceivers; and

(f) user controls 1014.

Although the components depicted in FIG. 11 represent physical components, FIG. 11 is not intended to be a hardware diagram. Thus, many of the components depicted in FIG. 11 may be implemented or distributed among additional physical components by a common construct. Furthermore, it is of course contemplated that the functional components described with reference to fig. 11 may be implemented with other existing and yet to be developed physical components and architectures.

The display 1002 generally operates to provide a representation of content to a user and may be implemented with any of a wide variety of displays (e.g., CRT, LCD, HDMI, pico projector, and OLED display).

Generally, the non-volatile data storage 1004 (also referred to as non-volatile memory) functions to store (e.g., persistently store) data and executable code.

For example, in some embodiments, the non-volatile memory 1004 includes boot loader code, modem software, operating system code, file system code, and code to facilitate implementation of components that are not depicted or described for simplicity, as is well known to those of ordinary skill in the art.

In many embodiments, the non-volatile memory 1004 is implemented by a flash memory (e.g., a NAND or ONENAND memory), although it is, of course, contemplated that other memory types could also be utilized. Although it is possible to execute code from the non-volatile memory 1004, executable code in the non-volatile memory 1004 is typically loaded into RAM 1008 and executed by one or more of the N processing components 1010.

The N processing components 1010 coupled to the RAM 1008 generally operate to execute instructions stored in the non-volatile memory 1004. Those of ordinary skill in the art will appreciate that the N processing elements 1010 may include video processors, modem processors, DSPs, Graphics Processing Units (GPUs), and other processing elements.

The transceiver component 1012 includes N transceiver chains that may be used to communicate with external devices via a wireless network. Each of the N transceiver chains may represent a transceiver associated with a particular communication scheme. For example, each transceiver may correspond to a protocol specific to a local area network, a cellular network (e.g., a CDMA network, a GPRS network, a UMTS network), and other types of communication networks.

It should be appreciated that fig. 11 is merely exemplary, and that in one or more exemplary embodiments, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code encoded on the non-transitory computer-readable medium 1004. Non-transitory computer-readable media 1004 includes computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer.

In some embodiments, the mobile computer device 1000 is embodied by a wearable device such as a smart Watch (e.g., Apple Watch) or a fitness tracker (e.g., FitBit). Alternatively, the mobile computer device 1000 is connected with a smart watch or fitness tracker.

Embodiments of the present process may have particular industrial applications, for example:

as an automated tool for measuring retinal vessel calibre in large population-based studies.

As an automated assistant for clinicians and scorers to obtain the second opinion.

As an independent on-demand risk assessment for cardiovascular disease via the internet.

It will be understood that many further modifications and permutations of the various aspects of the described embodiments are possible. Accordingly, the described aspects are intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims.

Throughout this specification and the claims which follow, unless the context requires otherwise, the word "comprise", and variations such as "comprises" and "comprising", will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps.

The reference in this specification to any prior publication (or information derived from it), or to any matter which is known, is not, and should not be taken as an acknowledgment or admission or any form of suggestion that prior publication (or information derived from it) or known matter forms part of the common general knowledge in the field of endeavour to which this specification relates.

32页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于提供多维人力资源配置顾问的方法和系统

Retinal vascular measurements

相关技术

网友询问留言