Data processing system and data processing method

文档序号：1220310 发布日期：2020-09-04 浏览：9次中文

阅读说明：本技术 数据处理系统和数据处理方法 (Data processing system and data processing method ) 是由矢口阳一于 2018-01-16 设计创作，主要内容包括：数据处理系统(100)具有学习部,该学习部根据通过对学习数据执行基于神经网络的处理而输出的输出数据和针对该学习数据的理想的输出数据之间的比较,对神经网络的最优化对象参数进行最优化。神经网络的激活函数f(x)是如下的函数：在设第1参数为C、取非负值的第2参数为W时,针对输入值的输出值连续地取C±W的范围内的值,针对输入值的输出值是唯一地确定的,该函数的曲线图关于与f(x)＝C对应的点点对称。学习部将第1参数和第2参数作为最优化参数之一而进行最优化。(A data processing system (100) is provided with a learning unit that optimizes a parameter to be optimized for a neural network based on a comparison between output data output by performing a neural network-based process on learning data and ideal output data for the learning data. The activation function f (x) of the neural network is a function of: when the 1 st parameter is C and the 2 nd parameter that is a non-negative value is W, the output value for the input value continuously takes a value within the range of C ± W, and the output value for the input value is uniquely determined, and the graph of the function is point-symmetric with respect to a point corresponding to f (x) or C. The learning unit optimizes the 1 st parameter and the 2 nd parameter as one of the optimization parameters.)

1. A data processing system comprising, in combination,

the data processing system includes a learning unit that optimizes an optimization target parameter of the neural network based on a comparison between output data output by performing neural network-based processing on learning data and ideal output data for the learning data,

the activation function f (x) of the neural network is a function of: when the 1 st parameter is C and the 2 nd parameter which is a non-negative value is W, the output value of the input value is continuously in the range of C + -W, the output value of the input value is uniquely determined, the graph of the function is point-symmetric with respect to the point corresponding to f (x) which is C,

the learning unit sets an initial value of the 1 st parameter to 0, and optimizes the 1 st parameter and the 2 nd parameter as one of optimization parameters.

2. The data processing system of claim 1,

the activation function f (x) is represented by

[ numerical formula 1]

f(x)＝max(C-W)，min((C+W)，x))。

3. The data processing system of claim 1,

the activation function f (x) is represented by

[ numerical formula 2]

4. The data processing system according to any one of claims 1 to 3,

the neural network is a convolutional neural network having the 1 st parameter and the 2 nd parameter, the 1 st parameter and the 2 nd parameter being independent per component.

5. The data processing system of claim 4,

the component is a channel.

6. The data processing system of any one of claims 1 to 5,

the learning unit does not execute arithmetic processing that affects only the output based on the activation function when the 2 nd parameter is equal to or less than a predetermined threshold value.

7. A data processing method, characterized in that the data processing method has the steps of:

outputting output data corresponding to the learning data by performing neural network-based processing on the learning data; and

optimizing an optimization target parameter of the neural network based on a comparison between output data corresponding to learning data and ideal output data for the learning data,

the initial value of the 1 st parameter is set to 0,

in the step of optimizing the optimization target parameter, the 1 st parameter and the 2 nd parameter are optimized as one of optimization parameters.

8. A program for causing a computer to realize functions of: optimizing an optimization target parameter of a neural network based on a comparison between output data output by performing neural network-based processing on learning data and ideal output data for the learning data,

in the function of optimizing the parameter to be optimized, the initial value of the 1 st parameter is set to 0, and the 1 st parameter and the 2 nd parameter are optimized as one of the optimization parameters.

Technical Field

The present invention relates to a data processing system and a data processing method.

Background

The neural network is a mathematical model including 1 or more nonlinear units, and is a machine learning model that predicts an output corresponding to an input. Most neural networks have 1 or more intermediate layers (hidden layers) in addition to an input layer and an output layer. The output of each intermediate layer becomes the input of the next layer (intermediate layer or output layer). Each layer of the neural network generates an output from the input and its own parameters.

Disclosure of Invention

Problems to be solved by the invention

It is desirable to achieve learning with higher accuracy and more stability.

The present invention has been made in view of such a situation, and an object thereof is to provide a technique capable of realizing learning with relatively high accuracy and more stably.

Means for solving the problems

In order to solve the above problem, a data processing system according to an aspect of the present invention includes a learning unit that optimizes an optimization target parameter of a neural network based on a comparison between output data output by performing a process by the neural network on learning data and ideal output data for the learning data. The activation function f (x) of the neural network is a function of: when the 1 st parameter is C and the 2 nd parameter that is a non-negative value is W, the output value of the input value is continuously a value within a range of C ± W, the output value of the input value is uniquely determined, the graph of the function is point-symmetric with respect to a point corresponding to f (x) or C, and the learning unit optimizes the 1 st parameter and the 2 nd parameter as one of the optimization parameters.

Another embodiment of the present invention is a data processing method. The method comprises the following steps: outputting output data corresponding to the learning data by performing neural network-based processing on the learning data; and optimizing the optimization target parameter of the neural network based on a comparison between the output data corresponding to the learning data and ideal output data for the learning data. The activation function f (x) of the neural network is a function of: when the 1 st parameter is C and the 2 nd parameter that is a non-negative value is W, the output value of the input value is continuously a value within a range of C ± W, the output value of the input value is uniquely determined, the graph of the function is point-symmetric with respect to a point corresponding to f (x) or C, and the 1 st parameter and the 2 nd parameter are optimized as one of the optimization parameters in the step of optimizing the parameter to be optimized.

In addition, any combination of the above-described constituent elements, and contents obtained by converting the expression of the present invention between a method, an apparatus, a system, a recording medium, a computer program, and the like are also effective as aspects of the present invention.

ADVANTAGEOUS EFFECTS OF INVENTION

According to the invention, higher-precision and more stable learning can be realized.

Drawings

FIG. 1 is a block diagram illustrating the function and structure of a data processing system of an embodiment.

Fig. 2 is a diagram showing a flowchart of the learning process performed by the data processing system.

Fig. 3 is a diagram showing a flowchart of application processing performed by the data processing system.

Detailed Description

The present invention will be described below with reference to the accompanying drawings according to preferred embodiments.

Before the embodiments are explained, basic knowledge and findings will be explained. In learning using a gradient, it is known that when an average value of inputs supplied to an arbitrary layer of a neural network deviates from zero, learning is delayed due to the influence of the deviation corresponding to the direction of weight update.

On the other hand, by using the ReLU function for the activation function, the problem of gradient disappearance that makes learning of the deep neural network difficult can be alleviated. The deep neural network capable of learning by improving expressive power achieves high performance in a variety of tasks including image classification. Since the gradient of the ReLU function with respect to the positive-valued input is always 1, it is possible to alleviate the problem of gradient disappearance which occurs when a sigmoid function in which the gradient with respect to an input having a large absolute value is always much smaller than 1 is used for the activation function, or the like. However, the output of the ReLU function is non-negative, with an average that deviates significantly from zero. Therefore, the average value of the input for the next layer is deviated from zero, and learning may be delayed.

Leaky ReLU, PReLU, RReLU, ELU functions are proposed for negative inputs where the gradient is not zero, but the average of the outputs of all functions is greater than zero. The crellu function and the NCReLU function output channel coupling of ReLU (x) and ReLU (-x) in convolution depth learning, and thus the BReLU function inverts the positive and negative of half of the channels to make the average value of the entire layer zero, but does not solve the problem of the deviation of the average value of each channel from zero. Furthermore, it cannot be applied to other neural networks without the concept of a channel.

The Nonlinear Generator (NG) is defined as f (x) max (x, a) (a is a parameter), and if a ≦ min (x), it becomes an identity map, and therefore, in a neural network initialized so that the average value of the inputs of each layer becomes zero, the average value of the outputs of each layer becomes zero. In addition, in the case of performing initialization as described above, the experimental result in which convergence is performed and the average value is further converged even in a state where the average value is deviated from zero is shown, and it is found that the zero average value is really important as the initial point of learning. Here, when the initial value a0 of a is too small, it takes a very long time to start convergence, and therefore, a0 ≈ min (x0) (x0 is an initial value of x) is preferable. However, in recent years, the computational graph structure of the neural network is complicated, and it is difficult to provide an appropriate initial value.

The Batch Normalization (BN) normalizes the average and variance of the entire small lot, and makes the average of the output zero, thereby speeding up the learning. However, in recent years, it has been reported that when offset is performed in an arbitrary layer of a neural network, the regularity of the neural network cannot be guaranteed, and a local solution with low accuracy exists.

Therefore, in order to realize learning with higher accuracy and more stability, that is, to solve the learning lag problem, the gradient vanishing problem, the initial value problem, and the low-accuracy local solution problem, the following activation functions are required: independent of the initial value of the input, there is no offset, the output average value is zero in the initial state of the neural network, and the gradient is sufficiently large (close to 1) in a range where the range is sufficiently wide.

In the following, a case where the data processing device is applied to image processing will be described as an example, but it can be understood by those skilled in the art that the data processing device can also be applied to voice recognition processing, natural language processing, and other processing.

Fig. 1 is a block diagram illustrating the function and structure of a data processing system 100 of an embodiment. The blocks shown here can be realized by hardware, a mechanical device including a cpu (central processing unit) of a computer, or software, or a computer program, but here, functional blocks realized by cooperation of these are depicted. Accordingly, those skilled in the art will appreciate that these functional blocks can be implemented in various forms by a combination of hardware and software.

The data processing system 100 executes "learning processing" in which learning of a neural network is performed based on an image for learning and a forward solution value that is ideal output data for the image, and "application processing" in which the learned neural network is applied to the image and image processing such as image classification, object detection, or image segmentation is performed.

In the learning process, the data processing system 100 executes a neural network-based process on the image for learning, and outputs output data for the image for learning. Then, the data processing system 100 updates parameters of an optimization (learning) target of the neural network (hereinafter referred to as "optimization target parameters") so that the output data approaches a positive solution value. By repeating this process, the optimization target parameter is optimized.

In the application process, the data processing system 100 executes a neural network-based process on an image using the optimization target parameter optimized in the learning process, and outputs output data for the image. The data processing system 100 interprets the output data, performs image classification on the image, or performs object detection from the image, or performs image segmentation on the image.

The data processing system 100 includes an acquisition unit 110, a storage unit 120, a neural network processing unit 130, a learning unit 140, and an interpretation unit 150. The function of the learning process is mainly realized by the neural network processing unit 130 and the learning unit 140, and the function of the application process is mainly realized by the neural network processing unit 130 and the interpretation unit 150.

In the learning process, the acquisition unit 110 acquires a plurality of images for learning and positive solution values corresponding to the plurality of images at a time. In the application process, the acquisition unit 110 acquires an image to be processed. The image is not dependent on the number of channels, and may be, for example, an RGB image or a grayscale image.

The storage unit 120 stores the image acquired by the acquisition unit 110, and also serves as a storage area for the operation areas of the neural network processing unit 130, the learning unit 140, and the interpretation unit 150, and for parameters of the neural network.

The neural network processing unit 130 executes processing based on a neural network. The neural network processing unit 130 includes an input layer processing unit 131 that executes processing corresponding to each component (component) of the input layer of the neural network, an intermediate layer processing unit 132 that executes processing corresponding to each component of each of 1 or more intermediate layers (hidden layers), and an output layer processing unit 133 that executes processing corresponding to each component of the output layer.

The middle layer processing section 132 executes activation processing for applying an activation function to input data from a layer at a previous stage (an input layer or a middle layer at a previous stage) as processing of each component of each layer of the middle layer. The intermediate layer processing unit 132 may perform convolution processing, thinning processing, and other processing in addition to the activation processing.

The activation function is given by the following equation (1).

[ numerical formula 1]

f(x_c)＝max((C_c-W_c)，min((C_c+W_c)，x_c))…(1)

Here, C_cIs a parameter (hereinafter referred to as "central value parameter") indicating the central value of the output value, W_cThe value is a non-negative value (hereinafter referred to as "width parameter"). Center value parameter C_cWidth parameter W_cIs set independently for each component. For example, the component is a channel of the input data, coordinates of the input data, the input data itself.

That is, the activation function of the present embodiment is a function as follows: the output value for the input value is uniquely determined by continuously taking a value within a range of C ± W, the graph of which is point-symmetric with respect to a point corresponding to f (x) C. Therefore, as will be described later, in centering the center value parameter C_cFor example, "0" is set as the initial value of (a), the average value of the output, that is, the average value of the input to the next layer becomes apparently zero at the initial stage of learning.

The output layer processing unit 133 performs an operation combining, for example, a softmax function, a sigmoid function, a cross entropy function, and the like.

The learning unit 140 optimizes the parameters to be optimized of the neural network. The learning unit 140 calculates an error by using an objective function (error function) that compares an output obtained by inputting the image for learning to the neural network processing unit 130 with a positive solution value corresponding to the image. The learning unit 140 calculates a gradient relating to the parameter by a gradient back propagation method or the like based on the calculated error, and updates the parameter to be optimized of the neural network based on a momentum method, as described in non-patent document 1. In the present embodiment, the optimization target parameters include a center value parameter C in addition to the weight coefficient and the offset_cAnd width parameter W_c. In addition, the center value parameter C_cFor example, "0" is set as the initial value of (1), and the width parameter W is set_cIs set to "1".

To the central value parameter C_cAnd width parameter W_cThe processing performed by the learning unit 140 will be specifically described as an example of the case where the update is performed.

The learning part 140 reverses according to the gradientA propagation method for calculating the objective function and the central value parameter C of the neural network by using the following expressions (2) and (3)_cRelated gradient and width parameter W_cThe gradient of interest.

[ numerical formula 2]

[ numerical formula 3]

Here, the first and second liquid crystal display panels are,is the gradient propagating back from the subsequent layer.

The learning unit 140 calculates an input x in each element of each layer of the intermediate layer using the following expressions (4), (5), and (6)_cCenter value parameter C_cAnd width parameter W_cRespectively associated gradients

[ numerical formula 4]

[ numerical formula 5]

[ numerical formula 6]

The learning unit 140 uses a momentum method (hereinafter, referred to as "momentum method") based on the calculated gradientEquation (7), (8)) to the center value parameter C_cWidth parameter W_cAnd (6) updating.

[ number formula 7]

[ number formula 8]

Wherein the content of the first and second substances,

μ: momentum

Eta: learning rate

For example, μ ═ 0.9 and η ═ 0.1 are set.

The learning unit 140 is W_c<If 0, the value is further updated to W_c＝0。

The acquisition of the learning image by the acquisition unit 110, the neural network processing by the neural network processing unit 130 on the learning image, and the updating of the optimization target parameter by the learning unit 140 are repeated, whereby the optimization target parameter is optimized.

The learning unit 140 determines whether or not learning should be finished. The termination condition for terminating the learning is, for example, that the learning is performed a predetermined number of times, that termination instruction is received from the outside, that the average value of the update amounts of the optimization target parameters reaches a predetermined value, and that the calculated error falls within a predetermined range. When the end condition is satisfied, the learning unit 140 ends the learning process. When the end condition is not satisfied, the learning unit 140 returns the process to the neural network processing unit 130.

The interpretation unit 150 interprets the output from the output layer processing unit 133, and performs image classification, object detection, or image segmentation.

The operation of the data processing system 100 according to the embodiment will be described.

Fig. 2 shows a flowchart of the learning process performed by the data processing system 100. The acquisition unit 110 acquires a plurality of images for learning (S10). The neural network processing unit 130 targets the fetchThe acquisition unit 110 executes processing by a neural network on each of the plurality of images for learning, and outputs output data relating to each of the plurality of images for learning (S12). The learning unit 140 updates the parameters based on the output data for each of the plurality of learning images and the forward solutions for each of the plurality of learning images (S14). In this updating of the parameters, the center value parameter C is set in addition to the weighting coefficient and the offset_cAnd width parameter W_cThe parameters to be optimized are updated. The learning unit 140 determines whether or not the end condition is satisfied (S16). If the end condition is not satisfied (S16: NO), the process returns to S10. When the termination condition is satisfied (S16: YES), the process is terminated.

FIG. 3 shows a flow diagram of application processing by data processing system 100. The acquisition unit 110 acquires an image to be processed (S20). The neural network processing unit 130 executes processing based on the learned neural network in which the optimization target parameter is optimized with respect to the image acquired by the acquisition unit 110, and outputs output data (S22). The interpretation unit 150 interprets the output data, classifies the image of the object, performs object detection from the image of the object, or performs image segmentation on the image of the object (S24).

According to the data processing system 100 of the embodiment described above, the output of all activation functions does not depend on the initial value of the input, there is no offset, the output average value is zero in the initial state of the neural network, and the gradient becomes 1 in a fixed range of the value range. This makes it possible to speed up learning, maintain gradients, alleviate initial value dependency, and avoid low-precision local solutions.

The present invention has been described above based on the embodiments. It will be understood by those skilled in the art that this embodiment is an example, and various modifications may be made in combination of these respective components and respective processes, and such modifications are also within the scope of the present invention.

(modification 1)

In the embodiment, the case where the activation function is given by equation (1) is described, but is not limited thereto. The activation function may be a function in which the output value for the input value continuously takes a value in the range of C ± W, and the output value for the input value is uniquely determined, and the graph thereof is point-symmetric with respect to the point corresponding to f (x) or C. The activation function can also be given by the following equation (9) instead of equation (1), for example.

[ numerical formula 9]

In this case, instead of expressions (4), (5), and (6), the following expressions (10), (11), and (12) give a gradient

[ numerical formula 10]

[ numerical formula 11]

[ numerical formula 12]

According to this modification, the same operational effects as those of the embodiment can be exhibited.

(modification 2)

In the embodiment, although not particularly limited, when the width parameter W of the activation function of a certain component is equal to or less than a predetermined threshold value and the output value based on the activation function is relatively small, it is considered that the output does not affect the application process. Therefore, when the width parameter W of the activation function of a certain component is equal to or less than a predetermined threshold value, the arithmetic processing that affects only the output based on the activation function may not be executed. That is, the arithmetic processing based on the activation function may not be executed, and the arithmetic processing for outputting only to the component may be executed. For example, a component that executes only these arithmetic processes may be deleted for each component. In this case, since unnecessary arithmetic processing is not executed, processing can be speeded up and memory consumption can be reduced.

Description of the reference symbols

100: a data processing system; 130: a neural network processing unit; 140: a learning unit.

Industrial applicability

The present invention relates to a data processing system and a data processing method.

11页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：使用量子神经网络的分类

Data processing system and data processing method

相关技术

网友询问留言