Sample feature processing method and device, electronic equipment and storage medium

文档序号：153947 发布日期：2021-10-26 浏览：24次中文

阅读说明：本技术 一种样本特征的处理方法、装置、电子设备及存储介质 (Sample feature processing method and device, electronic equipment and storage medium ) 是由吴晓琳于 2021-04-27 设计创作，主要内容包括：本公开实施例公开了一种样本特征的处理方法,所述方法,包括：获取神经网络模型中激活函数的输入数据中的第一参考数据；其中,所述输入数据,包括待输入所述激活函数的样本特征的特征数据；将所述输入数据输入所述激活函数,获取所述激活函数的输出数据；基于所述第一参考数据,确定所述输出数据中的与所述第一参考数据对应的第二参考数据；基于所述第二参考数据,对所述输出数据进行量化。相较于不基于所述第一参考数据,在获得所述输出数据后再从所述输出数据中获取第二参考数据的方式,确定第二参考数据的时间会更短,效率更高。基于所述第二参考数据,对所述输出数据进行量化,如此,可以提升运算速度并减少消耗的运算资源。(The embodiment of the disclosure discloses a method for processing sample characteristics, which comprises the following steps: acquiring first reference data in input data of an activation function in a neural network model; wherein the input data comprises feature data of sample features to be input into the activation function; inputting the input data into the activation function to obtain output data of the activation function; determining second reference data corresponding to the first reference data in the output data based on the first reference data; quantizing the output data based on the second reference data. Compared with a mode of acquiring second reference data from the output data after the output data is acquired without the first reference data, the time for determining the second reference data is shorter, and the efficiency is higher. And quantizing the output data based on the second reference data, so that the operation speed can be improved, and the consumed operation resources can be reduced.)

1. A method for processing sample features, the method comprising:

acquiring first reference data in input data of an activation function in a neural network model; wherein the input data comprises feature data of sample features to be input into the activation function;

inputting the input data into the activation function to obtain output data of the activation function;

determining second reference data corresponding to the first reference data in the output data based on the first reference data;

quantizing the output data based on the second reference data.

2. The method of claim 1, wherein the first reference data and the output data are acquired simultaneously.

3. The method of claim 1, wherein the determining second reference data in the output data corresponding to the first reference data based on the first reference data comprises:

after the first reference data is input into the activation function, determining output data output by the activation function as the second reference data.

4. The method of claim 3, wherein the first reference data comprises a maximum value and a minimum value in the input data.

5. The method of claim 1, wherein the activation function is a monotonically increasing function or a monotonically decreasing function.

6. The method of claim 5, wherein the first parameter comprises a maximum value and a minimum value in the input data; the second reference data includes a maximum value and a minimum value in the output data.

7. The method of claim 6, wherein the bit width of the output data is a first bit width; the quantizing the output data based on the second reference data comprises:

inputting the output data into a quantization mapping function to obtain quantized output data;

wherein the quantization mapping function is configured to quantize data from a first bit width to a second bit width; the first bit width is greater than the second bit width; the quantization reference points of the quantization mapping function comprise maxima and minima in the second reference data.

8. The device for processing the sample characteristics is characterized by comprising an acquisition module, a determination module and a quantification module; wherein the content of the first and second substances,

the obtaining module is configured to: acquiring first reference data in input data of an activation function in a neural network model; wherein the input data comprises feature data of sample features to be input into the activation function; inputting the input data into the activation function to obtain output data of the activation function;

the determining module is configured to: determining second reference data corresponding to the first reference data in the output data based on the first reference data;

the quantization module is configured to: quantizing the output data based on the second reference data.

9. The apparatus of claim 8, wherein the obtaining module is further configured to: the first reference data and the output data are acquired simultaneously.

10. The method of claim 8, wherein the determination module is further configured to:

after the first reference data is input into the activation function, determining output data output by the activation function as the second reference data.

11. The apparatus of claim 8, wherein the obtaining module is further configured to: the first reference data includes a maximum value and a minimum value in the input data.

12. The apparatus of claim 8, wherein the obtaining module is further configured to enable the activation function to be a monotonically increasing function or a monotonically decreasing function.

13. The method of claim 12, wherein the first parameter comprises a maximum value and a minimum value in the input data; the second reference data includes a maximum value and a minimum value in the output data.

14. The method of claim 13, wherein the bit width of the output data is a first bit width; the quantization module is further configured to: inputting the output data into a quantization mapping function to obtain quantized output data; wherein the quantization mapping function is configured to quantize data from a first bit width to a second bit width; the first bit width is greater than the second bit width; the quantization reference points of the quantization mapping function comprise maxima and minima in the second reference data.

15. An electronic device, characterized in that the electronic device comprises: a processor and a memory for storing a computer service capable of running on the processor, wherein the processor is configured to implement the method of any one of claims 1 to 7 when running the computer service.

16. A storage medium having computer-executable instructions embodied therein, the computer-executable instructions being executable by a processor to implement the method of any one of claims 1 to 7.

Technical Field

The present disclosure relates to the field of wireless communications technologies, but not limited to the field of wireless communications technologies, and in particular, to a method and an apparatus for processing sample characteristics, an electronic device, and a storage medium.

Background

With the development of computer technology, deep learning has been widely used in many fields such as computer vision processing, natural language processing, and speech processing. However, deep learning algorithmic models exhibit an increasing and deeper trend with larger parameter scales and slower inference speeds.

In the related art, the development of the deep learning algorithm model brings great challenges to the operation of the electronic equipment, and particularly, the mobile terminal equipment is limited by hardware resources of the mobile terminal equipment, so that the mobile terminal equipment is difficult to directly operate a large deep learning algorithm model, the operation speed is low, and the processing efficiency is low.

Disclosure of Invention

The embodiment of the disclosure discloses a sample characteristic processing method and device, electronic equipment and a storage medium.

According to a first aspect of the embodiments of the present disclosure, there is provided a method for processing sample features, the method including:

inputting the input data into the activation function to obtain output data of the activation function;

determining second reference data corresponding to the first reference data in the output data based on the first reference data;

quantizing the output data based on the second reference data.

In one embodiment, the first reference data and the output data are acquired simultaneously.

In one embodiment, after the first reference data is input into the activation function, the output data output by the activation function is determined as the second reference data.

In one embodiment, the first reference data includes:

a maximum value and a minimum value in the input data.

In one embodiment, the activation function is a monotonically increasing function or a monotonically decreasing function.

In one embodiment, the bit width of the output data is a first bit width; the quantizing the output data based on the second reference data comprises: inputting the output data into a quantization mapping function to obtain quantized output data; wherein the quantization mapping function is configured to quantize data from a first bit width to a second bit width; the first bit width is greater than the second bit width; the quantization reference points of the quantization mapping function include a maximum value and a minimum value in the second reference data. According to a second aspect of the embodiments of the present disclosure, there is provided a sample feature processing apparatus, including an obtaining module, a determining module, and a quantizing module; wherein the content of the first and second substances,

the determining module is configured to: determining second reference data corresponding to the first reference data in the output data based on the first reference data;

the quantization module is configured to: quantizing the output data based on the second reference data.

In one embodiment, the obtaining module is further configured to: the first reference data and the output data are acquired simultaneously.

In one embodiment, the determining module is further configured to:

after the first reference data is input into the activation function, the output data output by the activation function is determined as the second reference data.

In one embodiment, the obtaining module is further configured to: the first reference data includes a maximum value and a minimum value in the input data.

In one embodiment, the obtaining module is further configured to obtain the activation function as a monotonically increasing function or a monotonically decreasing function.

In one embodiment, the first parameter comprises a maximum value and a minimum value in the input data; the second reference data includes a maximum value and a minimum value in the output data.

In one embodiment, the bit width of the output data is a first bit width; the quantization module is further configured to input the output data into a quantization mapping function to obtain quantized output data; wherein the quantization mapping function is configured to quantize data from a first bit width to a second bit width; the first bit width is greater than the second bit width; the quantization reference points of the quantization mapping function comprise maxima and minima in the second reference data. .

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus, including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to: when the executable instructions are executed, the method of any embodiment of the present disclosure is implemented.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer storage medium storing a computer executable program which, when executed by a processor, implements the method of any of the embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

in the embodiment of the disclosure, first reference data in input data of an activation function in a neural network model is obtained; wherein the input data comprises feature data of sample features to be input into the activation function; inputting the input data into the activation function to obtain output data of the activation function; based on the first reference data, second reference data corresponding to the first reference data in the output data is determined. Compared with a mode of acquiring second reference data from the output data after the output data is acquired without the first reference data, the time for determining the second reference data is shorter, and the efficiency is higher. And quantizing the output data based on the second reference data, so that the operation speed can be improved, and the consumed operation resources can be reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a schematic diagram illustrating a quantization method according to an exemplary embodiment.

FIG. 2 is a schematic diagram illustrating a quantization method according to an exemplary embodiment.

FIG. 3 is a schematic diagram illustrating a quantization method according to an exemplary embodiment.

FIG. 4 is a schematic diagram illustrating a quantization method according to an exemplary embodiment.

FIG. 5 is a schematic diagram illustrating a quantization method according to an exemplary embodiment.

FIG. 6 is a flow chart illustrating a method of processing sample features in accordance with an exemplary embodiment.

FIG. 7 is a flow chart illustrating a method of processing sample features in accordance with an exemplary embodiment.

FIG. 8 is a flow chart illustrating a method of processing sample features in accordance with an exemplary embodiment.

FIG. 9 is a schematic diagram illustrating a sample feature processing device according to an exemplary embodiment.

FIG. 10 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

To facilitate understanding by those skilled in the art, the embodiments of the present disclosure exemplify a plurality of embodiments to clearly illustrate the technical aspects of the embodiments of the present disclosure. Of course, it will be understood by those skilled in the art that the embodiments provided in the present disclosure may be implemented individually, or in combination with other methods in other embodiments of the present disclosure, or in combination with other methods in other related arts; the disclosed embodiments are not limited thereto.

To facilitate an understanding of any embodiment of the present disclosure, first, a neural network model in deep learning is explained.

The neural network model comprises various parameters, and in the training process, the neural network model with higher parameter precision has a large number of matrix operations, which occupy more computing resources and cause low training efficiency of the neural network model. After training is completed, the running neural network model also consumes a lot of resources, and a large calculation amount generally causes a long delay and cannot meet the real-time requirement, so that parameters of the neural network model need to be quantized, and bit width of the parameters is reduced to improve the running speed of the neural network model.

The model quantization belongs to the category of model compression, and the purpose of the model compression is to reduce the memory size of the model and accelerate the model reasoning speed. The method mainly comprises the steps of compressing parameters of the neural network model to be quantized, wherein floating point data (for example, 32 bits) occupying large memory space and high bits in the neural network model are represented by integer data (for example, 8 bits) occupying small memory space and low bits in the neural network model in the compression process, so that the inside of the neural network model to be quantized is calculated by adopting a simple numerical type under the condition of not influencing the accuracy of the quantized neural network model, the calculation speed is greatly increased, and consumed calculation resources are greatly reduced.

In one embodiment, in order to maintain the requirement of precision, the trained neural network model needs to be saved by using floating point type float32, and model quantization refers to converting the numerical value of high-precision floating point type float32 into low-precision integer representation, such as int8 or int 4. Please refer to fig. 1, which converts the value of float32 into the value of int 8. Since the representation ranges are different greatly, and integer cannot represent decimal range, the quantization mode usually does not pass through type conversion, but is scaling or affine transformation. Referring to fig. 2, the upper layer number is a number before quantization, and the lower layer number is a number after quantization. The affine transformation is usually an asymmetric transformation, i.e. the mapping point of 0 point is not necessarily at 0 point. For example, the point 0 in fig. 2 is at the point z after quantization.

The formula for quantization by affine transformation includes:

z＝-round(β·s)-2^b-1；

x_q＝quantize(x,b,s,z)＝clip(round(s·x+z),-2^b-1,2^b-1-1)；

where s is a scaling factor, z is a position where 0 point of float32 is mapped to an integer, α represents a maximum value of the vector, β represents a minimum value of the vector, b represents a number of bits of the integer, l is a minimum value of data to be quantized, and u is a maximum value of the data to be quantized.

Under the condition of 8 bits of the signal,

z＝-round(β·s)-128；

z is rounded to ensure that it can be represented by an integer.

Referring to fig. 3, scaling is a symmetric transformation, and 0 points still map to 0 points.

The formula for quantization by scaling includes:

x_q＝quantize(x,b,s)＝clip(round(s·x),-2^b-1+1,2^b-1-1)；

α₁indicating the maximum absolute value.

Both the above affine transform quantization method and the scaling transform quantization method require the computation of the maximum and minimum values in the tensor.

In one embodiment, tensors in the neural network can be divided into two types, namely a weight and an activation amount, the weight is a weight value obtained through training, the activation amount is an output after an activation function, and the maximum value and the minimum value of the weight can be obtained after the training is fixed. The value of the activation amount is related to the input.

In one embodiment, the maximum value and the minimum value of the activation amount have two calculation modes, namely an online mode and an offline mode. Referring to fig. 4, in the neural network inference process, the required maximum and minimum values are calculated before the quantization operation in an online manner and are further used for the quantization operation. Referring to fig. 5, in the neural network inference process, the maximum and minimum values of the activation amount of the sample at various positions in the network are collected and counted offline and recorded for use in the inference. It follows that both the online and offline modes have their own disadvantages. Computing the maximum and minimum values in an online manner generates additional time, which can weaken the benefits brought by quantization computation. The maximum and minimum values counted off-line have a deviation from the actual input, which may result in a deviation in accuracy.

As shown in fig. 6, the present embodiment provides a method for processing sample features, where the method includes:

step 61, acquiring first reference data in input data of an activation function in the neural network model; wherein the input data comprises feature data of sample features to be input into the activation function;

step 62, inputting the input data into the activation function to obtain output data of the activation function;

step 63, determining second reference data corresponding to the first reference data in the output data based on the first reference data;

step 64, quantizing the output data based on the second reference data.

In some embodiments, the method may be applied to a terminal, which may be, but is not limited to, a computer, a mobile phone, a wearable device, a vehicle-mounted terminal, a Road Side Unit (RSU), a smart home terminal, an industrial sensing device, and/or a medical device.

In some embodiments, the neural network model may be a model that functions as at least one of: an image processing function, a natural language processing function, and a voice processing function. Here, the sample characteristic may be at least one of the following characteristics: image features of the image data sample, text features of the text data sample, and voice features of the voice data sample. The feature data may be data obtained by performing feature extraction on features of the data sample.

In one embodiment, the neural network model is an image recognition neural network model. Acquiring first reference data in input data of an activation function in an image recognition neural network model; wherein the input data comprises image feature data to be identified of image sample features to be input into the activation function; inputting the input data into the activation function to obtain output data; determining second reference data corresponding to the first reference data in the output data based on the first reference data; quantizing the output data based on the second reference data. Here, since the second reference data for quantizing the output data is determined based on the first reference data, not according to a complicated preset algorithm from the output data, the operation efficiency of the neural network model may be improved. For example, the first reference data is the maximum value and the minimum value of the input data, and the second reference data is the maximum value and the minimum value of the output data, the method can directly obtain the second reference data through two operations based on the first reference data, and compared with the method that after the output data is obtained, the second reference data needs to be determined from the output data through algorithms such as data comparison, the data operation amount can be reduced, the operation efficiency of the neural network model is improved, and the time delay is reduced.

Here, output data obtained after the first reference data is input to the activation function may be determined as second reference data corresponding to the first reference data among the output data.

In one embodiment, the neural network model includes a plurality of neuron layers, where the neuron layer may be any one of an input layer, an output layer, and an implicit layer in a neural network, or any one of the neuron layers in the implicit layer in the neural network, and is not particularly limited herein. Here, the activation function is a function that runs on the neuron layer, the activation function being used to map an input of the neuron layer to an output. In one embodiment, the activation function may be one of: sigmoid, relu, gelu and tanh. Here, the sample feature may be a sample feature corresponding to any one of an input layer, an output layer, or a hidden layer.

In one embodiment, the input data may be data obtained by weighting and summing input feature data in the neuron. The output data may be obtained after the input data is input into the activation function.

In one embodiment, the first reference data may include a maximum value and a minimum value in the input data; here, the activation function may be a monotonically increasing or decreasing function, and thus, the maximum value and the minimum value of the output data may be obtained by inputting the first reference data into the activation function. It is noted that in one scenario embodiment, the maximum and minimum values in the input data are obtained before the input data is input to the activation function. That is, max (x) and min (x) in the input data x have been obtained before the input data is input to the activation function.

In one embodiment, the activation function is a monotonically increasing function, and the maximum value of the output data of the activation function can be expressed as:

max(activation(x))＝activation(max(x))。

here, the minimum value of the output data of the activation function may be expressed as:

min(activation(x))＝activation(min(x))。

where max (x) is the maximum value in the input data; min (x) is the minimum value in the input data; max (activation (x)) is the maximum value in the output data; min (activation (x)) is the minimum value in the output data. When the output data is good, quantization needs to be performed based on the maximum value and the minimum value of the output data.

In one embodiment, the output data is quantized using affine transform quantization or scaling transform quantization. Here, the affine transformation quantization method and the scaling transformation quantization method need to quantize the output data based on the maximum value and the minimum value of the output data.

In one embodiment, the first reference data is a and b, where a is the maximum value in the input data and b is the minimum value in the input data; and inputting a and b into the activation function to obtain output data c and d, wherein c is the maximum value in the output data, and d is the minimum value in the output data. In response to the activation function being an increasing function, a corresponds to c and b corresponds to d. After c and d are determined, all output data can be quantized based on affine transformation quantization or scaling transformation quantization.

In one embodiment, the input data may be sorted according to a size relationship, and the maximum value and the minimum value of the input data are obtained after sorting the input data. For example, the input data are sorted in the order from large to small, the first input data after sorting is the maximum value, and the last input data is the minimum value; for another example, when the input data are sorted in order from small to large, the first input data after sorting is the minimum value, and the last input data is the maximum value.

In one embodiment, adjacent data in the input data may be compared one by one to determine the maximum value and the minimum value. For example, the input data includes a, b, c, d, e, f, g, h, and i, and adjacent data in a, b, c, d, e, f, g, h, and i are compared one by one to determine the maximum value and the minimum value. For example, a is compared to b, a is greater than b, then the comparison continues with b and c, and so on.

In one embodiment, the input data is divided into a plurality of groups, a maximum value and a minimum value within each group are determined, and the maximum value and the minimum value of each group are compared in size to obtain the maximum value and the minimum value. In this way, the computational complexity for obtaining the maximum and minimum values can be reduced.

For example, the input data includes a, b, c, d, e, f, g, h, and i, the input data is divided into 3 groups, the first group is a, b, and c, the second group is d, e, and f, and the third group is g, h, and i; the maximum value of the first group is a, and the minimum value of the first group is b; the maximum value of the second group is e, and the minimum value is f; the third group has a maximum value of g and a minimum value of i; and comparing the sizes of a, b, e, f, g and i, wherein a is the maximum value, g is the minimum value, determining that a is the maximum value in the input data, and g is the minimum value in the input data. In one embodiment, the second reference data is needed when quantizing the output data of the activation function. The value range of the second reference data may be predetermined, for example, the second reference data is a maximum value and a minimum value. In one embodiment, the value range of the first reference data may be determined according to the value range of the second reference data, for example, the second reference data is a maximum value and a minimum value, and if the activation function is a monotonically increasing function or a monotonically decreasing function, the first reference data is also the maximum value and the minimum value. It should be noted that the second reference data may also be any value of a reference point used for quantization of the output data.

In one embodiment, the output data to be quantized may be floating point numbers (e.g., 32-bit floating point numbers) and the quantized output data may be integers. Thus, it can be represented by an integer that occupies less memory space, lower order (e.g., 8-bit floating point number).

In one embodiment, the 32-bit floating point number in the output data may be quantized to a 4-bit integer or an 8-bit integer.

In one embodiment, first reference data in input data of an activation function in a neural network model may be simultaneously acquired and the input data may be input to the activation function to acquire output data. In this way, after the output data is obtained, second reference data corresponding to the first reference data in the output data can be quickly determined based on the first reference data. Compared with the method that after the output data are obtained, the second reference data need to be determined from the output data through algorithms such as data comparison, the data operation amount can be reduced, the operation efficiency of the neural network model is improved, and the time delay is reduced.

In an embodiment, a terminal for operating the neural network model is a multi-core terminal, and the terminal may concurrently acquire first reference data in input data of an activation function in the neural network model and input the input data into the activation function to acquire output data, so that the output data is acquired while the first reference data is acquired, and compared with the case that after the output data is acquired, the second reference data needs to be determined from the output data through algorithms such as data comparison, the amount of data operation may be reduced, the operating efficiency of the neural network model is improved, and the time delay is reduced.

In one embodiment, a maximum and a minimum of the output data are needed when quantizing the output data.

In one embodiment, the terminal acquires the minimum value and the maximum value in input data of an activation function in a neural network model; inputting the input data into the activation function to obtain output data; determining a minimum value and a maximum value corresponding to the minimum value and the maximum value respectively in the output data based on the minimum value and the maximum value in the input data; quantizing the output data based on the maximum and minimum values in the output data. Wherein the activation function may be a monotonically increasing function.

In the embodiment of the disclosure, first reference data in input data of an activation function in a neural network model is obtained; wherein the input data comprises feature data of sample features to be input into the activation function; inputting the input data into the activation function to obtain output data; based on the first reference data, second reference data corresponding to the first reference data in the output data is determined. Compared with a method of acquiring second reference data from the output data after the output data is acquired without the first reference data, the time for determining the second reference data is shorter and the efficiency is higher. And quantizing the output data based on the second reference data, so that the operation speed can be improved, and the consumed operation resources can be reduced.

It should be noted that, as can be understood by those skilled in the art, the methods provided in the embodiments of the present disclosure can be executed alone or together with some methods in the embodiments of the present disclosure or some methods in the related art.

In one embodiment, the first reference data and the output data are acquired simultaneously.

In one embodiment, the neural network model is an image recognition neural network model. The multi-core terminal parallelly acquires first reference data in input data of an activation function in an image recognition neural network model and inputs the input data into the activation function to acquire output data; wherein the input data comprises image feature data to be identified of image sample features to be input into the activation function; determining second reference data corresponding to the first reference data in the output data based on the first reference data; quantizing the output data based on the second reference data. Here, since the second reference data for quantizing the output data is determined based on the first reference data, and is not determined from the output data according to a complicated preset algorithm, it is possible to improve the operation efficiency of the neural network model and to rapidly recognize an image. For example, the first reference data is the maximum value and the minimum value of the input data, and the second reference data is the maximum value and the minimum value of the input data, the method can directly obtain the second reference data through two operations based on the first reference data, and compared with the method that after the output data is obtained, the second reference data needs to be determined from the output data through algorithms such as data comparison, the data operation amount can be reduced, the operation efficiency of the neural network model is improved, and the time delay is reduced.

In one embodiment, the first reference data may be a maximum value and a minimum value in the input data. The output data may be data obtained after the input data is input into the activation function. Wherein the output data comprises the maximum value and the minimum value.

In one embodiment, the first reference data may be obtained in a sorted manner. For example, the input data may be sorted according to a size relationship, and after sorting the input data, the maximum value and the minimum value of the input data are obtained.

In one embodiment, adjacent data in the input data may be compared one by one to determine the maximum value and the minimum value.

In one embodiment, the first reference data may be obtained by a post-packet ordering manner. For example, the input data is divided into a plurality of groups, the maximum value and the minimum value in each group are determined, and the maximum value and the minimum value are obtained by comparing the magnitudes of the maximum value and the minimum value in each group.

Here, since the maximum value, the minimum value, and the output data are obtained simultaneously, after the output data is obtained, the maximum value and the minimum value may be determined according to the first reference data, and compared with the case that the maximum value and the minimum value of the output data need to be obtained by comparing the magnitudes after the output data is obtained, the processing delay of the terminal may be reduced, and the operation speed of the terminal may be increased.

The embodiment provides a method for processing sample characteristics, which includes: after the first reference data is input into the activation function, determining output data output by the activation function as the second reference data. In one embodiment, the first reference data comprises a first maximum value and a first minimum value. If the activation function is a monotonically increasing function, the second reference data also comprises a second maximum value corresponding to the first maximum value and a second minimum value corresponding to the first minimum value. If the activation function is a monotonically decreasing function, the second reference data also includes a second minimum value corresponding to the first maximum value and a second maximum value corresponding to the first minimum value. It is noted that in one scenario embodiment, the maximum and minimum values in the input data are obtained before the input data is input to the activation function. That is, max (x) and min (x) in the input data x have been obtained before the input data is input to the activation function.

In one embodiment, the second reference data is obtained quickly since the first reference data is input to the activation function. Therefore, after the second reference data is obtained quickly, the neural network model can quantize the output data of the input data of the activation function in the neural network in real time based on the second reference data, for example, every time an output result y is output, y can be quantized by using the second reference data, and compared with a mode that all input data are input into the activation function to obtain all output data and the second reference data are obtained after all the output data are sequenced, the quantization time of the neural network is saved, so that the quantization processing of the neural network is more efficient.

As shown in fig. 7, the present embodiment provides a method for processing sample features, where the obtaining first reference data in input data of an activation function in a neural network model includes:

and step 71, acquiring the maximum value and the minimum value in the input data of the activation function in the neural network model.

In one embodiment, the activation function is a monotonically increasing function or a monotonically decreasing function.

For example, the activation function is a monotonically increasing function. In one embodiment, the activation function may be one of: sigmoid, relu, gelu and tanh.

In one embodiment, the activation function is a monotonically increasing function; the terminal obtains the minimum value and the maximum value in input data of an activation function in the neural network model; inputting the input data into the activation function to obtain output data; determining a minimum value and a maximum value corresponding to the minimum value and the maximum value respectively in the output data based on the minimum value and the maximum value in the input data; quantizing the output data based on the maximum and minimum values in the output data.

In one embodiment, the activation function is a monotonically decreasing function; the terminal obtains the minimum value and the maximum value in input data of an activation function in the neural network model; inputting the input data into the activation function to obtain output data; determining a maximum value and a minimum value corresponding to the minimum value and the maximum value respectively in the output data based on the minimum value and the maximum value in the input data; quantizing the output data based on the maximum and minimum values in the output data.

As shown in fig. 8, in the present embodiment, a sample feature processing method is provided, where a bit width of the output data is a first bit width; the method comprises the following steps:

step 81, inputting the output data into a quantization mapping function to obtain quantized output data;

the quantization mapping function is used for quantizing the data with the first bit width into the data with the second bit width; the first bit width is greater than the second bit width; the quantization reference points of the quantization mapping function comprise maxima and minima in the second reference data.

Here, the second reference data may include a maximum value and a minimum value of the activation function output. It should be noted that, when the data is quantized by using the quantization mapping function, at least two quantization reference points in the data, for example, α and β in the foregoing embodiment, need to be obtained first.

Here, the first bit width may be the number of bits occupied by data before quantization. For example, 32 bits are occupied. Here, the second bit width may be the number of bits occupied by the data after quantization. For example, 8 bits are occupied.

It should be noted that, in some scenario embodiments, the second bit width may also be defined as a quantization bit width. Here, the quantization bit width may be a number of bits occupied by the output data by the quantized data. For example, if the output data is a 32-bit floating point number, and the quantized output data is 8-bit integer data, the quantization bit width is 8 bits.

In one embodiment, the neural network model is a speech recognition neural network model. Acquiring first reference data in input data of an activation function in a speech recognition neural network model; wherein the input data comprises voice feature data to be recognized of voice sample features to be input into the activation function; inputting the input data into the activation function to obtain output data; determining second reference data corresponding to the first reference data in the output data based on the first reference data; and quantizing the output data by adopting an affine transformation quantization mode based on the second reference data. Here, since the second reference data for quantizing the output data is determined based on the first reference data, not according to a complicated preset algorithm from the output data, the operation efficiency of the neural network model may be improved. For example, the first reference data is the maximum value and the minimum value of the input data, and the second reference data is the maximum value and the minimum value of the input data, the method can obtain the second reference data directly based on the first reference data through two operations, and compared with the method that after the output data is obtained, the second reference data needs to be determined through algorithms such as data comparison and the like in the output data and is subjected to quantization processing, the data operation amount can be reduced, the operation efficiency of the neural network model is improved, and the time delay is reduced.

In one embodiment, the neural network model is a text recognition neural network model. Acquiring first reference data in input data of an activation function in a character recognition neural network model; the input data comprise character feature data to be recognized of character sample features of the activation function to be input; inputting the input data into the activation function to obtain output data; determining second reference data corresponding to the first reference data in the output data based on the first reference data; and quantizing the output data by adopting a scaling transformation quantization mode based on the second reference data. Here, since the second reference data for quantizing the output data is determined based on the first reference data, not according to a complicated preset algorithm from the output data, the operation efficiency of the neural network model may be improved. For example, the first reference data is the maximum value and the minimum value of the input data, and the second reference data is the maximum value and the minimum value of the input data, the method can directly obtain the second reference data through two operations based on the first reference data, and compared with the method that after the output data is obtained, the second reference data needs to be determined from the output data through algorithms such as data comparison, the data operation amount can be reduced, the operation efficiency of the neural network model is improved, and the time delay is reduced. Here, the mapping function is used to quantize the output data to be quantized. Here, different numbers of quantization bits may correspond to different quantization mapping functions. For example, when the quantization bit width is 4 bits, the corresponding mapping function is a first mapping function; when the quantization bit width is 8 bits, the corresponding mapping function is the second mapping function.

In one embodiment, the bit width of quantization is determined according to the operation environment of the terminal where the neural network model is located. For example, in response to the runtime environment supporting 8-bit quantization, the quantization bit width may be 8 bits; the quantization bit width may be 4 bits in response to the execution environment supporting 4-bit quantization.

In one embodiment, the output data is stored in a 32-bit floating point type data format in a storage area.

In one embodiment, quantizing the output data in response to the execution environment supporting 8-bit quantization comprises: quantizing the 32-bit output data into 8-bit output data; quantizing the output data in response to the runtime environment supporting 4-bit quantization, including: the 32-bit output data is quantized to 4-bit output data.

In one embodiment, the bit width threshold is determined according to an information accuracy of the output result.

In one embodiment, the quantization bit width is greater than a bit width threshold in response to an information accuracy requirement of the output result being less than an accuracy threshold; in response to an information accuracy requirement of the output result being greater than an accuracy threshold, the quantization bit width is less than a bit width threshold.

It should be noted that, the quantization of the data in the neural network model to be quantized mainly compresses the data of the neural network model to be quantized, and the compression process converts the data (for example, 32 bits) occupying more bits in the neural network model to be quantized into the data (for example, 8 bits) occupying less memory space and less bits, so that the calculation speed is increased greatly and the consumed calculation resources are greatly reduced without affecting the accuracy of the quantized neural network model.

As shown in fig. 9, the present embodiment provides a sample feature processing apparatus, which includes an obtaining module 91, a determining module 92, and a quantizing module 93; wherein the content of the first and second substances,

the obtaining module 91 is configured to: acquiring first reference data in input data of an activation function in a neural network model; wherein the input data comprises feature data of sample features to be input into the activation function; inputting the input data into the activation function to obtain output data of the activation function;

the determining module 92 is configured to: determining second reference data corresponding to the first reference data in the output data based on the first reference data;

the quantization module 93 is configured to: quantizing the output data based on the second reference data.

In one embodiment, the obtaining module 91 is further configured to: the first reference data and the output data are acquired simultaneously.

In one embodiment, the determining module is further configured to:

after the first reference data is input into the activation function, the output data output by the activation function is determined as the second reference data.

In one embodiment, the obtaining module 91 is further configured to: the first reference data includes a maximum value and a minimum value in the input data.

In an embodiment, the obtaining module 91 is further configured to use the activation function as a monotonically increasing function or a monotonically decreasing function.

In one embodiment, the first parameter comprises a maximum value and a minimum value in the input data; the second reference data includes a maximum value and a minimum value in the output data.

In one embodiment, the quantization module 93 is further configured to quantize the output data based on a quantization mapping function determined according to a quantization bit width, using the second reference data as a quantization reference point of the output data.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present disclosure further provides a communication device, including:

an antenna;

a memory;

and the processor is respectively connected with the antenna and the memory and used for controlling the antenna to transmit and receive wireless signals by executing the executable program stored in the memory, and can execute the steps of the wireless network access method provided by any of the foregoing embodiments.

The communication device provided in this embodiment may be the aforementioned terminal or base station. The terminal can be various human-borne terminals or vehicle-borne terminals. The base stations may be various types of base stations, such as 4G base stations or 5G base stations, and so on.

The antenna may be various types of antennas, for example, a mobile antenna such as a 3G antenna, a 4G antenna, or a 5G antenna; the antenna may further include: a WiFi antenna or a wireless charging antenna, etc.

The memory may include various types of storage media, which are non-transitory computer storage media capable of continuing to memorize information stored thereon after power is removed from the communication device.

The processor may be connected to the antenna and the memory via a bus or the like for reading an executable program stored on the memory, e.g. at least one of the methods shown in any of the embodiments of the present disclosure.

The embodiments of the present disclosure also provide a non-transitory computer-readable storage medium, which stores an executable program, where the executable program, when executed by a processor, implements the steps of the wireless network access method provided in any of the foregoing embodiments, for example, at least one of the methods shown in any of the embodiments of the present disclosure.

FIG. 10 is a block diagram illustrating a method for an electronic device 600 according to an example embodiment. For example, the electronic device 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.

Referring to fig. 10, electronic device 600 may include one or more of the following components: processing component 602, memory 604, power component 606, multimedia component 608, audio component 610, input/output (I/O) interface 612, sensor component 614, and communication component 616.

The processing component 602 generally controls overall operation of the electronic device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operation at the device 600. Examples of such data include instructions for any application or method operating on the electronic device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of electronic device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 600.

The multimedia component 608 includes a screen that provides an output interface between the electronic device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, the audio component 610 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the electronic device 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the electronic device 600, the sensor component 614 may also detect a change in the position of the electronic device 600 or a component of the electronic device 600, the presence or absence of user contact with the electronic device 600, orientation or acceleration/deceleration of the electronic device 600, and a change in the temperature of the electronic device 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the electronic device 600 and other devices in a wired or wireless manner. The electronic device 600 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 604 comprising instructions, executable by the processor 820 of the electronic device 600 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

21页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于贝叶斯优化的BiLSTM电压偏差预测方法

Sample feature processing method and device, electronic equipment and storage medium

相关技术

网友询问留言