Machine learning device, control system, and machine learning method

文档序号：1415483 发布日期：2020-03-10 浏览：19次中文

阅读说明：本技术 机器学习装置、控制系统以及机器学习方法 (Machine learning device, control system, and machine learning method ) 是由恒木亮太郎猪饲聪史下田隆贵于 2019-08-27 设计创作，主要内容包括：本发明提供机器学习装置、控制系统以及机器学习方法。能够容易地设置用于决定滤波器特性的参数。根据设置于电动机控制装置外部的外部测量器的测量信息和输入到电动机控制装置的控制指令,优化设置于电动机控制装置的滤波器的系数,其中,所述电动机控制装置控制机床、机器人或工业机械用的电动机的旋转。(The invention provides a machine learning device, a control system and a machine learning method. The parameters for determining the filter characteristics can be easily set. The coefficient of a filter provided in a motor control device that controls the rotation of a motor for a machine tool, a robot, or an industrial machine is optimized based on measurement information of an external measurement device provided outside the motor control device and a control command input to the motor control device.)

1. A machine learning device performs machine learning as follows:

the coefficient of a filter provided in a motor control device that controls the rotation of a motor is optimized based on measurement information of an external measurement device provided outside the motor control device and a control command input to the motor control device.

2. The machine learning apparatus of claim 1,

the measurement information of the external measurer includes at least one of a position, a velocity, and an acceleration.

3. The machine learning apparatus of claim 1 or 2,

the motor control device has at least one of a position feedback loop and a velocity feedback loop, the filter being external to the position feedback loop or the velocity feedback loop.

4. The machine learning apparatus of claim 1 or 2,

the motor control device has a feedback loop, and measurement information of the external measurement device is not used in feedback control of the feedback loop.

5. The machine learning apparatus according to any one of claims 1 to 4,

after the filter is adjusted by machine learning, the external measurer is removed.

6. The machine learning apparatus according to any one of claims 1 to 5,

the machine learning device includes:

a state information acquisition unit that acquires state information including the measurement information, the control command, and a coefficient of the filter;

a behavior information output unit that outputs behavior information to the filter, the behavior information including adjustment information of the coefficient included in the state information;

a report output unit that outputs a report value in reinforcement learning using an evaluation function based on a difference between the measurement information and the control command; and

and a cost function updating unit that updates the behavior cost function based on the report value output by the report output unit, the status information, and the behavior information.

7. The machine learning apparatus of claim 6,

the machine learning device includes: and an optimization behavior information output unit that outputs adjustment information of the coefficient based on the cost function updated by the cost function update unit.

8. A control system, comprising:

a motor control device comprising the machine learning device according to any one of claims 1 to 7, a motor, and a filter, the motor control device controlling rotation of the motor; and

and an external measuring device provided outside the motor control device.

9. A machine learning method of a machine learning device, the machine learning device performing machine learning as follows:

acquiring a coefficient of a filter provided in a motor control device that controls rotation of a motor, measurement information of an external measurement device provided outside the motor control device, and a control command input to the motor control device,

optimizing the coefficients according to the measurement information and the control instructions.

Technical Field

The present invention relates to a machine learning device that performs machine learning for optimizing a filter coefficient, which is provided in a motor control device that controls rotation of a motor of a machine tool, a robot, an industrial machine, or the like, a control system including the machine learning device, and a machine learning method.

Background

For example, patent documents 1 and 2 describe devices for automatically adjusting the characteristics of a filter.

Patent document 1 describes the following servo actuator: in the tuning mode, an alternating current signal obtained by sweeping the frequency is superimposed on a signal of the velocity command value, the amplitude of a torque command value signal obtained from the velocity control unit as a result of the superimposition is detected, and the frequency of the torque command value signal when the amplitude change rate is changed from positive to negative is set as the center frequency of a notch filter (notchfilter).

Patent document 2 describes a servo actuator having a velocity feedback loop for controlling a motor velocity, and a notch filter unit is inserted into the velocity feedback loop to remove machine resonance, the servo actuator including: a data collection unit that acquires data indicating frequency response characteristics of the velocity feedback loop; a moving average unit that performs moving average processing on the data acquired by the data acquisition unit; a comparison unit for comparing the data obtained by the moving average unit with the data obtained by the data collection unit to extract the resonance characteristic of the velocity feedback loop; and a notch filter setting unit for setting the frequency and the Q value of the notch filter unit based on the resonance characteristics extracted by the comparison unit.

Disclosure of Invention

The invention aims to provide a machine learning device, a control system including the machine learning device, and a machine learning method, which can easily set parameters for determining filter characteristics, and can remove an external measuring device after machine learning, thereby reducing cost and improving reliability.

(1) A machine learning device (for example, a machine learning unit 130 described later) according to the present invention optimizes a coefficient of a filter (for example, a filter 110 described later) provided in a motor control device (for example, a servo motor 127 described later) that controls rotation of the motor (for example, the servo motor 100 described later) on the basis of measurement information of an external measurement device (for example, an acceleration sensor 300 described later) provided outside the motor control device (for example, the motor control device 100 described later) and a control command input to the motor control device.

(2) In the machine learning device according to the above (1), the measurement information of the external measurement device may include at least one of a position, a velocity, and an acceleration.

(3) In the machine learning device according to the above (1) or (2), the motor control device may have at least one of a position feedback loop and a velocity feedback loop, and the filter may be outside the position feedback loop or the velocity feedback loop.

(4) In the machine learning device according to the above (1) or (2), the motor control device may include a feedback loop, and the measurement information of the external measurement device may not be used for feedback control of the feedback loop.

(5) In the machine learning device according to any one of the above (1) to (4), the external measurement device may be detached after the filter is adjusted by machine learning.

(6) In the machine learning device according to any one of the above (1) to (5), the machine learning device may include:

a state information acquisition unit (for example, a state information acquisition unit 131 described later) that acquires state information including the measurement information, the control command, and the coefficient of the filter;

a behavior information output unit (for example, a behavior information output unit 133 described later) that outputs behavior information including adjustment information of the coefficient included in the state information to the filter;

a report output unit (for example, a report output unit 1321 described later) that outputs a report value in reinforcement learning using an evaluation function based on a difference between the measurement information and the control command; and a cost function update unit (for example, a cost function update unit 1322 to be described later) that updates the behavior cost function based on the return value output by the return output unit, the state information, and the behavior information.

(7) In the machine learning device according to the above (6), the machine learning device may include: and an optimization behavior information output unit (for example, an optimization behavior information output unit 135 described later) that outputs adjustment information of the coefficient based on the cost function updated by the cost function update unit.

(8) A control system of the present invention includes:

a motor control device (e.g., motor control device 100 described later) including the machine learning device (e.g., machine learning unit 130 described later), a motor (e.g., servo motor 127 described later), and a filter (e.g., filter 110 described later) of any one of the above (1) to (7), the motor control device controlling rotation of the motor; and

and an external measuring device (for example, an acceleration sensor 300 described later) provided outside the motor control device.

(9) A machine learning method of a machine learning device according to the present invention is a machine learning method of a machine learning device that performs machine learning as follows: a method for optimizing a motor control apparatus includes acquiring a coefficient of a filter provided in the motor control apparatus, measurement information of an external measurement device provided outside the motor control apparatus, and a control command input to the motor control apparatus, wherein the motor control apparatus controls rotation of a motor, and the coefficient is optimized based on the measurement information and the control command.

Effects of the invention

According to the present invention, the coefficient (parameter) for determining the filter characteristics can be easily set. Further, since the external measuring device is disposed outside the motor control device, the external measuring device can be detached after the machine learning, and the cost can be reduced and the reliability can be improved.

Drawings

Fig. 1 is a block diagram showing a control system including a motor control device, a machine tool, and an acceleration sensor according to an embodiment of the present invention.

Fig. 2 is a diagram for explaining the operation of the motor when the movement locus of the table is circular.

Fig. 3 is a diagram for explaining the operation of the motor when the movement locus of the table is a square.

Fig. 4 is a diagram for explaining the operation of the motor when the movement locus of the table is octagonal.

Fig. 5 is a diagram for explaining the operation of the motor when the movement locus of the table is in a shape in which every other corner of the octagon is replaced with a circular arc.

Fig. 6 is a block diagram showing a machine learning unit according to an embodiment of the present invention.

Fig. 7 is a flowchart illustrating an operation of the machine learning unit according to the embodiment of the present invention.

Fig. 8 is a flowchart for explaining the operation of the optimization behavior information output unit of the machine learning unit according to the embodiment of the present invention.

Fig. 9 is an explanatory view showing a state in which a scale is attached to a table of a machine main body.

Fig. 10 is a block diagram showing an example in which a plurality of filters are directly connected to constitute a filter.

Fig. 11 is a block diagram showing another configuration example of the control system.

Description of the symbols

10. 10A control system

100. 100A-1 to 100A-n motor control device

110 filter

120 servo control part

121 subtracter

122 position control part

123 adder

124 position feedforward section

125 subtracter

126 speed control unit

127 servo motor

128 rotary encoder

129 integrator

130 machine learning unit

130A-1 ~ 130A-n machine learning device

131 state information acquiring unit

132 learning unit

133 behavior information output unit

134 cost function storage unit

135 optimization behavior information output unit

200200-1 to 200-n machine tool

300 acceleration sensor

400 network

Detailed Description

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

Fig. 1 is a block diagram showing a control system including a motor control device, a machine tool, and an acceleration sensor according to an embodiment of the present invention.

As shown in fig. 1, the control system 10 includes: motor control device 100, machine tool 200 controlled by motor control device 100, and acceleration sensor 300 attached to machine tool 200. The acceleration sensor 300 is an external measurement device provided outside the motor control device 100, and the measured acceleration is measurement information.

Although the machine tool is used as the control target of the motor control device 100, the control target is not limited to the machine tool, and may be, for example, a robot, an industrial machine, or the like. The motor control device 100 may be provided as a part of a control target of a machine tool, a robot, an industrial machine, or the like.

The motor control device 100 includes: filter 110, servo control unit 120, and machine learning unit 130. Here, the motor control device 100 includes a servo control unit 120 that controls the servo motor, and may include a control unit that controls the spindle motor without performing feedback control.

The filter 110 is a filter of the machine tool 200, and for example, a notch filter, a filter for setting an acceleration/deceleration time constant, an inverse characteristic filter, or the like is used. The position command is input to the filter 110, and the filter 110 is a position command value shaper that shapes the input position command. The pulse frequency is changed to change the speed of the servo motor 127 by generating a position command according to a predetermined machining program by a host control device, an external input device, or the like. The position command is a control command. The filter 110 is provided outside the servo control unit 120, that is, outside a position feedback loop and a velocity feedback loop, which will be described later, but may be provided in the position feedback loop or the velocity feedback loop of the servo control unit 120. For example, the filter 110 may be connected to an output side of a speed control unit 126 of the servo control unit 120, which will be described later, or an output side of the adder 123. In order to suppress vibration outside the control loop (position feedback loop or velocity feedback loop) of the servo control unit 120, it is preferable to provide a filter outside the position feedback loop or velocity feedback loop. In fig. 1, the filter 110 is disposed before a subtractor 121 to be described later for determining the positional deviation. The structure of the filter 110 is not particularly limited, but is preferably an IIR filter of second order or higher.

Mathematical formula 1 (hereinafter, mathematical formula 1) represents the transfer function F (ρ, s) of the notch filter as the filter 110. The parameter ρ represents coefficients ω, ζ, R.

The coefficient R of mathematical formula 1 is an attenuation coefficient, the coefficient ω is a center angular frequency, and the coefficient ζ is a specific bandwidth (specficbandwidth). Assuming that the center frequency is fc and the bandwidth is fw, the coefficient ω is represented by ω ═ 2 π fc, and the coefficient ζ is represented by ζ ═ fw/fc.

[ mathematical formula 1 ]

The servo control unit 120 includes: a subtractor 121, a position control unit 122, an adder 123, a position feedforward unit 124, a subtractor 125, a speed control unit 126, a servo motor 127, a rotary encoder 128 as a position detection unit associated with the servo motor 127, and an integrator 129. The subtractor 121, the position control unit 122, the adder 123, the subtractor 125, the speed control unit 126, the servo motor 127, the rotary encoder 128, and the integrator 129 constitute a position feedback loop. The subtractor 125, the speed control unit 126, the servo motor 127, and the rotary encoder 128 form a speed feedback loop.

The subtractor 121 obtains a difference between the shaped position command output from the filter 110 and the detected position fed back from the position, and outputs the difference as a positional deviation to the position control unit 122 and the position feedforward unit 124.

The position control unit 122 outputs a value obtained by multiplying the position gain Kp by the position deviation to the adder 123 as a speed command value.

The position feedforward section 124 performs position feedforward processing indicated by a transfer function g(s) expressed by equation 2 (hereinafter, expressed as equation 2) on a value obtained by differentiating the position command value and multiplying the position command value by a constant α, and outputs the processing result to the adder 123 as a position feedforward term, a coefficient a of equation 2_i、b_j(X.gtoreq.i, j.gtoreq.0, X is a natural number) are coefficients of the transfer function G(s).

[ mathematical formula 2 ]

The adder 123 adds the speed command value and the output value (position feedforward term) of the position feedforward section 124, and outputs the resultant to the subtractor 125 as a speed command value for feedforward control. The subtractor 125 obtains a difference between the output of the adder 123 and the speed detected value fed back by the speed, and outputs the difference to the speed control unit 126 as a speed deviation.

The speed control unit 126 adds a value obtained by multiplying the speed deviation by the integral gain K1v and integrating the result, and a value obtained by multiplying the speed deviation by the proportional gain K2v, and outputs the resultant as a torque command to the servo motor 127.

The rotational angle position of the servo motor 127 is detected by the rotary encoder 128, and the speed detection value is input to the subtractor 125 as speed feedback (speed FB). The speed detection value is integrated by the integrator 129 to become a position detection value, and the position detection value is input to the subtractor 121 as position feedback (position FB).

As described above, the servo control unit 120 is configured.

Next, machine tool 200 and acceleration sensor 300 attached to machine tool 200 will be described before machine learning unit 130 is described.

The machine tool 200 includes: a ball screw 230 connected to a rotation shaft of the servo motor 127, a nut 240 screwed with the ball screw 230, and a machine body 250 including a table 251 connected to the nut. The nut 240 screwed with the ball screw 230 is moved in the axial direction of the ball screw 230 by the rotational drive of the servomotor 127.

In the machine tool 200, when the table 251 on which a workpiece (workpiece) is mounted moves in the X-axis direction and the Y-axis direction, the motor control device 100 shown in fig. 1 is provided for each of the X-axis direction and the Y-axis direction. When the table is moved in three or more axes, the motor control device 100 is provided for each axis.

The acceleration sensor 300 is provided outside the servo control unit 120, and is attached to the machine main body 250. The acceleration sensor is an external measurer. As the acceleration sensor, there are known one-axis, two-axis, three-axis acceleration sensors, and these acceleration sensors can be selected as needed. For example, a two-axis acceleration sensor is used when the table of the machine body 250 is moved in the X-axis direction and the Y-axis direction, and a three-axis acceleration sensor may be used when the table of the machine body 250 is moved in the X-direction, the Y-direction, and the Z-direction. It is desirable that the acceleration sensor 300 is provided at a position close to the machining point.

The acceleration sensor 300 measures the acceleration of the device main body 250 and outputs the measured acceleration to the device learning unit 130. When the acceleration sensor 300 is used only in the machine learning process, the coefficient of the filter 110 is adjusted by machine learning before shipment, and the acceleration sensor 300 can be detached from the machine body 250 after the filter 110 is adjusted. When the learning is performed again after the shipment, it can be detached after the relearning. The acceleration output from the acceleration sensor 300 may be used for feedback control by the servo control unit 120, and the acceleration sensor 300 may be removed if it is not used for feedback control. In this case, the cost of the machine tool 200 can be reduced, and the reliability can be improved.

< machine learning unit 130>

The machine learning unit 130 executes a preset machining program (hereinafter, also referred to as a "machining program at the time of learning"), and machine-learns the coefficients ω, ζ, and R of the transfer function of the filter 110 using the position command and the acceleration measurement value from the acceleration sensor 300 (hereinafter, referred to as learning). The machine learning unit 130 is a machine learning device. The learning by the machine learning unit 130 may be performed before shipment, or the learning may be performed again after shipment.

Here, the motor control device 100 drives the servo motor 127 by the machining program at the time of learning, and moves the table 251 in a state where the workpiece (workpiece) is not mounted. The trajectory of the movement of any point of the table 251 moving in the X-axis direction and the Y-axis direction is, for example, circular, quadrangular, octagonal, or octagonal, and has a shape in which every other corner of the octagon is replaced by a circular arc. Fig. 2 to 5 are diagrams for explaining the operation of the motor when the movement locus is a shape in which every other corner of a circle, a quadrangle, an octagon, or an octagon is replaced with a circular arc. In fig. 2 to 5, the table 251 moves clockwise in the X-axis and Y-axis directions.

When the movement locus of the table 251 is a circle as shown in fig. 2, the servomotor that moves the table in the Y-axis direction decelerates gradually as it approaches the position a1 at the position a1 shown in fig. 2, reverses the rotational direction at the position a1, and accelerates gradually as it passes through the position a 1. Then, the table is moved so as to gradually linearly reverse in the Y-axis direction with the position a1 interposed therebetween. On the other hand, at the position a1, the servo motor that moves the table in the X-axis direction rotates at the same speed as before and after the position a1, and the table moves at the same speed as before and after the position a1 in the X-axis direction. At a position a2 shown in fig. 2, each servo motor is controlled so that the operation of the servo motor that moves the table in the X-axis direction is opposite to the operation of the servo motor that moves the table in the Y-axis direction.

When the movement locus of the table 251 is a quadrangle shown in fig. 3, the rotation direction of the servo motor that moves the table in the X axis direction is abruptly reversed at a position B1 shown in fig. 3, and the table moves so as to be abruptly linearly reversed in the X axis direction via a position B1. On the other hand, at the position B1, the servo motor that moves the table in the Y-axis direction rotates at the same speed as before and after the position B1, and the table moves at the same speed as before and after the position B1 in the Y-axis direction. At a position B2 shown in fig. 3, each servo motor is controlled so that the operation of the servo motor that moves the table in the X-axis direction is opposite to the operation of the servo motor that moves the table in the Y-axis direction.

When the locus of movement of the table 251 is octagonal as shown in fig. 4, at the angular position C1, the motor speed at which the table moves in the Y-axis direction is made slower, and the motor speed at which the table moves in the X-axis direction is made faster.

At the angular position C2, the rotation direction of the motor that moves the table in the Y-axis direction is reversed, and the table moves so as to be linearly reversed in the Y-axis direction. Further, the motor that moves the table in the X-axis direction rotates at a constant speed in the same rotational direction from the position C1 toward the position C2 and from the position C2 toward the position C3.

At the angular position C3, the motor speed for moving the table in the Y-axis direction is increased, and the motor speed for moving the table in the X-axis direction is decreased.

At the angular position C4, the rotation direction of the motor that moves the table in the X-axis direction is reversed, and the table moves so as to be linearly reversed in the X-axis direction. Further, the motor that moves the table in the Y-axis direction rotates at a constant speed in the same rotational direction from the position C3 toward the position C4 and from the position C4 toward the position of the next corner.

As shown in fig. 5, when the locus of movement of the table 251 is such that every other corner of the octagon is replaced with a circular arc, the motor speed of the table moving in the Y-axis direction is slowed and the motor speed of the table moving in the X-axis direction is increased at the corner position D1.

At the position D2 of the arc, the motor rotation direction in which the table moves in the Y-axis direction is reversed, and the table moves so as to be linearly reversed in the Y-axis direction. Further, the motor that moves the table in the X-axis direction rotates at a constant speed in the same rotational direction from the position D1 toward the position D3. Unlike the case where the movement locus shown in fig. 4 is octagonal, the motor that moves the table in the Y-axis direction is decelerated gradually toward the position D2, stops rotating at the position D2, and increases gradually in rotation speed when passing through the position D2 so as to form a movement locus of an arc before and after the position D2.

At the angular position D3, the motor speed for moving the table in the Y-axis direction is increased, and the motor speed for moving the table in the X-axis direction is decreased.

At the position D4 of the arc, the motor rotation direction in which the table moves in the X-axis direction is reversed, and the table moves so as to be linearly reversed in the X-axis direction. Further, the motor that moves the table in the Y-axis direction rotates at the same speed in the same rotational direction from the position D3 toward the position D4 and from the position D4 toward the next corner position. The motor that moves the table in the X-axis direction is gradually decelerated toward the position D4, rotation is stopped at the position D4, and the rotation speed gradually increases when passing through the position D4, so that a moving locus of an arc is formed before and after the position D4.

In the present embodiment, the acceleration sensor 300 can be used to measure the vibration generated when one rotation direction in the X-axis direction or the Y-axis direction is reversed, from the positions a1 and a2, the positions B1 and B2, the positions C2 and C4, and the positions D2 and D4 of the movement trajectory specified by the machining program at the time of learning as described above. Further, the vibration when the rotation speed is changed in the linear control without the reverse rotation can be measured by using the acceleration sensor 300 at the positions C1 and C3 and the positions D1 and D3. As a result, the coefficients of the filter 110 can be machine-learned to control the vibration.

The machine learning unit 130 will be described in more detail below.

In the following description, a case where the machine learning unit 130 performs reinforcement learning will be described, but learning by the machine learning unit 130 is not particularly limited to reinforcement learning, and the present invention can be applied to a case where supervised learning is performed, for example.

Before describing each functional block included in the machine learning unit 130, a basic structure of reinforcement learning will be described first. An agent (corresponding to the machine learning unit 130 in the present embodiment) observes the environmental state, selects a certain behavior, and changes the environment according to the behavior. As the environment changes, providing some return, the agent learns better behavior choices (decisions).

Supervised learning represents the complete correct answer, while the reward in reinforcement learning is mostly based on fractional values of partial changes of the environment. Thus, the agent learns the selection behavior such that the return to the future is summed to a maximum.

In this way, in reinforcement learning, by learning behaviors, appropriate behaviors, that is, a method to be learned for maximizing the return obtained in the future is learned on the basis of the interaction given to the environment by the behaviors. This means that, in the present embodiment, it is possible to obtain behavior that affects the future, such as behavior information selected for suppressing machine-end vibration.

Here, any learning method may be used as the reinforcement learning, and in the following description, a case where Q-learning (Q-learning), which is a method of learning the value Q (S, A) of the selection behavior a, is used in a certain environmental state S will be described as an example.

Q learning aims to select, as an optimal behavior, a behavior a having the highest value Q (S, A) from among behaviors a that can be acquired in a certain state S.

However, at the time point when Q learning is initially started, the correct value of the value Q (S, A) is not known at all for the combination of state S and behavior a. Therefore, the agent selects various behaviors a in a certain state S, and for the behavior a at that time, selects a better behavior in accordance with the given reward, thereby continuing to learn the correct value Q (S, A).

Further, in order to maximize the total of the future returns, the goal is to eventually obtain Q (S, A) as E [ ∑ (γ) ]^t)r_t]. Here, E2]Represents an expected value, t represents time, γ represents a parameter called discount rate described later, and r_tRepresents the return at time t, and Σ is the sum of times t. The expected value in the equation is an expected value when the optimal behavior state changes. However, since the best behavior is not known in the Q learning process, reinforcement learning is performed while searching for various behaviors. The update formula of the value Q (S, A) can be expressed by, for example, the following mathematical formula 3 (hereinafter, mathematical formula 3).

[ mathematical formula 3 ]

In the above-mentioned numerical expression 3, S_tRepresenting the environmental state at time t, A_tRepresenting the behavior at time t. By action A_tChange of state to S_t+1。r_t+1Indicating the return obtained by the change in status. Further, the term with max is: in a state S_t+1In addition, α is a learning coefficient, and α is set to 0 < α ≦ 1.

The above mathematical formula 3 represents the following method: according to the attempt A_tIs fed back as a result of the above process_t+1Update the state S_tBehavior of_tValue of Q (S)_t、A_t)。

The updated equation represents: if behavior A_tResulting in the next state S_t+1Value of optimal behavior of_aQ(S_t+1A) ratio state S_tBehavior of_tValue of Q (S)_t、A_t) When it is large, Q (S) is increased_t、A_t) Whereas if small, Q (S) is decreased_t、A_t). That is, the value of a certain behavior in a certain state is made to approach the optimal behavior value in the next state resulting from the behavior. Wherein, although the difference is due to the discount rate γ and the return r_t+1But is basically a structure in which the best behavioral value in a certain state is propagated to the behavioral value in its previous state.

Here, Q learning exists as follows: a table of Q (S, A) for all state behavior pairs (S, A) is prepared for learning. However, the number of states of Q (S, A) values for all the state behavior pairs may be too large, and it may take a long time for Q learning to converge.

Therefore, a well-known technique called DQN (Deep Q-Network) can be utilized. Specifically, the value of the cost function Q (S, A) may be calculated by approximating the cost function Q by an appropriate neural network by adjusting parameters of the neural network using the appropriate neural network to construct the cost function Q. By using DQN, the time required for Q learning to converge can be shortened. DQN is described in detail in, for example, the following non-patent documents.

< non-patent document >

"Human-level control through depth retrieval for retrieval" learning ", VolodymerMniH 1, line, retrieval 1/17 in 29 years, Internet < URL: http: davidqi. com/research/source 14236. pdf)

The machine learning unit 130 performs the Q learning described above. Specifically, the machine learning unit 130 learns the following value Q: the values of the coefficients ω, ζ, and R of the transfer function of the filter 110, the measured acceleration from the acceleration sensor 300 obtained by executing the machining program at the time of learning, and the position command are set as the state S, and the adjustment of the values of the coefficients ω, ζ, and R of the transfer function of the filter 110 in the state S is selected as the behavior a.

The machine learning unit 130 determines the behavior a by observing the state information S including the measured acceleration and the position command from the acceleration sensor 300 by executing one or a combination of the above-described machining programs at the time of learning based on the coefficients ω, ζ, and R of the transfer function of the filter 110. The machine learning unit 130 returns a reward every time action a is performed. The machine learning unit 130 searches for the optimal behavior a, for example, trial and error, to maximize the total future return. In this way, the machine learning unit 130 can select the optimum behavior a (that is, the optimum coefficients ω, ζ, and R of the transfer function of the filter 110) for the state S including the measured acceleration and the position command from the acceleration sensor 300, which are obtained by executing the machining program at the time of learning from the respective coefficients ω, ζ, and R of the transfer function of the filter 110.

That is, by selecting the behavior a in which the value of Q is the largest among the behaviors a applied to the coefficients ω, ζ, and R of the transfer function of the filter 110 in a certain state S based on the merit function Q learned by the machine learning unit 130, the behavior a in which the machine-end vibration generated by executing the machining program at the time of learning is the smallest (that is, the coefficients ω, ζ, and R of the transfer function of the filter 110) can be selected.

Fig. 6 is a block diagram showing the machine learning unit 130 according to the embodiment of the present invention.

In order to perform the reinforcement learning, as shown in fig. 6, the machine learning unit 130 includes: a state information acquisition unit 131, a learning unit 132, a behavior information output unit 133, a cost function storage unit 134, and an optimized behavior information output unit 135. The learning unit 132 includes: a report output unit 1321, a cost function update unit 1322, and a behavior information generation unit 1323.

The state information acquiring unit 131 acquires a state S including: the measured acceleration and the position command from the acceleration sensor 300, which are obtained by executing the machining program during learning, are based on the coefficients ω, ζ, and R of the transfer function of the filter 110. This state information S corresponds to the environmental state S in Q learning.

The state information acquisition unit 131 outputs the acquired state information S to the learning unit 132.

Further, the coefficients ω, ζ, and R of the transfer function of the filter 110 at the time point when Q learning is first started are generated in advance by the user. In the present embodiment, the initial setting values of the coefficients ω, ζ, and R of the transfer function of the filter 110 created by the user are optimally adjusted by reinforcement learning.

When the operator adjusts the machine tool in advance, the coefficients ω, ζ, and R may be machine-learned using the adjusted values as initial values.

The learning unit 132 is a part that learns the value Q (S, A) when a certain behavior a is selected in a certain environmental state S.

The reward output unit 1321 is a part that calculates a reward when the action a is selected in a certain state S. Here, the measured acceleration as the state variable in the state S is represented by y (S), the position command as the state variable related to the state information S is represented by R (S), the measured acceleration as the state variable related to the state information S 'changed from the state S by the behavior information a (correction of the coefficients ω, ζ, and R of the transfer function of the filter 110) is represented by y (S'), and the position command as the state variable related to the state information S 'is represented by R (S').

The evaluation function f can be obtained by applying, for example, the following mathematical formula 4 (hereinafter, expressed as mathematical formula 4). Equation 4 shows that the evaluation function f is a time integral of a value obtained by squaring the absolute value of the difference between the second order derivative of the position command r and the measured acceleration y.

[ mathematical formula 4 ]

In addition, the evaluation function may also use: numerical expression (d)²r/dt²-y) absolute value of the logarithm of the formula (d)²r/dt²-y) is weighted by time t, and the equation (d)²r/dt²-y) maximum value of the set of absolute values.

At this time, when the evaluation functions f (r (S '), y (S ')) when the motor control device 100 operates based on the filter 110 after the correction relating to the state information S ' corrected by the behavior information a are larger than the evaluation functions f (r (S), y (S)) when the motor control device 100 operates based on the filter 110 before the correction relating to the state information S before the correction relating to the behavior information a, the reward output unit 1321 sets the reward value to a negative value.

On the other hand, when the evaluation functions f (r (S '), y (S ')) when the motor control device 100 operates based on the filter 110 after the correction relating to the state information S ' corrected by the behavior information a are smaller than the evaluation functions f (r (S), y (S)) when the motor control device 100 operates based on the filter 110 before the correction relating to the state information S before the correction relating to the behavior information a, the feedback output unit 1321 sets the feedback value to a positive value.

When the evaluation functions f (r (S '), y (S ')) when the motor control device 100 operates based on the filter 110 after the correction relating to the state information S ' corrected by the behavior information a are equal to the evaluation functions f (r (S), y (S)) when the motor control device 100 operates based on the filter 110 before the correction relating to the state information S before the correction relating to the behavior information a, the feedback output unit 1321 sets the feedback value to zero.

The negative value in the case where the evaluation function f (r (S '), y (S ')) in the state S ' after the execution of the action a is larger than the evaluation function f (r (S), y (S)) in the previous state S may be set to be larger in proportion. That is, the negative value may be made larger according to the degree to which the value of f (r (S '), y (S')) becomes larger. Conversely, the positive value of the evaluation function f (r (S '), y (S ')) in the state S ' after the execution of the action a, which is smaller than the positive value of the evaluation function f (r (S), y (S)) in the previous state S, may be set to be larger by a ratio. That is, the positive value can be made larger according to the degree to which the value of f (r (S '), y (S')) becomes smaller.

The merit function update unit 1322 performs Q learning based on the state S, the behavior a, the state S' when the behavior a is applied to the state S, and the return value calculated as described above, thereby updating the merit function Q stored in the merit function storage unit 134.

The update of the merit function Q may be performed by online learning, batch learning, or small-batch learning.

The online learning is a learning method as follows: by applying some kind of behavior a to the current state S, the updating of the cost function Q is done immediately each time the state S transitions to a new state S'. Further, the batch learning is a learning method as follows: by repeatedly applying a certain behavior a to the current state S, the state S shifts to a new state S', whereby data for learning is collected, and the merit function Q is updated using all the collected data for learning. Further, the small batch learning is a learning method intermediate between the online learning and the batch learning, and is a learning method for updating the merit function Q every time data for learning is accumulated to some extent.

The behavior information generation section 1323 selects the behavior a in the process of Q learning for the current state S. In the Q learning process, the behavior information generating unit 1323 generates the behavior information a so as to perform the operation of correcting the coefficients ω, ζ, and R of the transfer function of the filter 110 (corresponding to the behavior a in the Q learning), and outputs the generated behavior information a to the behavior information output unit 133. More specifically, the behavior information generation unit 1323 adds or subtracts an increment to or from each of the coefficients ω, ζ, and R of the transfer function of the filter 110 included in the behavior a, for example, with respect to each of the coefficients ω, ζ, and R of the transfer function of the filter 110 included in the state S.

Also, the following strategies may be adopted: when the state S 'is shifted to the state S' by applying the increase or decrease of the coefficients ω, ζ, and R of the transfer function of the filter 110 and a positive return (return of a positive value) is returned, the behavior information generation unit 1323 selects, as the next behavior a ', a behavior a' in which the value of the evaluation function f is made smaller by increasing or decreasing the coefficients ω, ζ, and R of the transfer function of the filter 110 by an increment as in the previous operation.

In addition, the following strategy can be adopted in the opposite way: when a negative return (negative return) is returned, the behavior information generation unit 1323 selects, as the next behavior a ', a behavior a' in which the evaluation function f is smaller than the previous value, for example, by decreasing or increasing the coefficients ω, ζ, and R of the transfer function of the filter 110 by an increment opposite to the previous operation.

The behavior information generation unit 1323 may adopt the following policy: the behavior a 'is selected by a well-known method such as a greedy algorithm that selects the behavior a' with the highest value Q (S, A) among the values of the currently estimated behaviors a, or an epsilon greedy algorithm that randomly selects the behavior a 'with a certain small probability epsilon and selects the behavior a' with the highest value Q (S, A) outside of it.

The behavior information output unit 133 is a part that transmits the behavior information a output from the learning unit 132 to the filter 110. As described above, the filter 110 slightly corrects the current state S, i.e., the currently set coefficients ω, ζ, and R, based on the behavior information, and shifts to the next state S' (i.e., the corrected coefficients of the filter 110).

The cost function storage unit 134 is a storage device that stores the cost function Q. The cost function Q may be stored as a table (hereinafter referred to as a behavior value table) for each of the state S and the behavior a, for example. The cost function Q stored in the cost function storage unit 134 is updated by the cost function update unit 1322. The merit function Q stored in the merit function storage unit 134 may be shared with another machine learning unit 130. If the merit function Q is shared among a plurality of machine learning units 130, the reinforcement learning can be performed in a distributed manner by each machine learning unit 130, and therefore, the efficiency of the reinforcement learning can be improved.

The optimization behavior information output unit 135 generates behavior information a (hereinafter, referred to as "optimization behavior information") for causing the filter 110 to perform an operation in which the value Q (S, A) is maximized, based on the merit function Q updated by the merit function update unit 1322 through Q learning.

More specifically, the optimization behavior information output unit 135 acquires the cost function Q stored in the cost function storage unit 134. As described above, the cost function Q is a function updated by the cost function update unit 1322 performing Q learning. Then, the optimization behavior information output unit 135 generates behavior information from the cost function Q, and outputs the generated behavior information to the filter 110. The optimization behavior information includes information of the coefficients ω, ζ, and R of the transfer function of the correction filter 110, as well as behavior information output by the behavior information output unit 133 during Q learning.

In the filter 110, the coefficients ω, ζ, and R of the transfer function are corrected based on the behavior information.

The device learning unit 130 may operate to optimize the coefficients ω, ζ, and R of the transfer function of the filter 110 by the above operation, thereby suppressing the vibration of the device side.

As described above, the parameter adjustment of the filter 110 can be simplified by using the machine learning unit 130 according to the present invention.

The functional blocks included in motor control device 100 have been described above.

To realize these functional blocks, the motor control device 100 includes an arithmetic Processing device such as a CPU (Central Processing Unit). The motor control device 100 also includes a main storage device such as an auxiliary storage device such as a Hard Disk Drive (HDD) that stores various control programs such as application software and an Operating System (OS), and a Random Access Memory (RAM) that stores data temporarily required after the execution of the programs by the arithmetic processing device.

In the motor control device 100, the arithmetic processing unit reads application software or an OS from the auxiliary storage device, and performs arithmetic processing based on the application software or the OS while expanding the application software or the OS read in from the main storage device. Various hardware of each device is controlled based on the calculation result. In this way, the functional blocks of the present embodiment are realized. That is, the present embodiment can be realized by hardware in cooperation with software.

Since the amount of computation associated with machine learning increases, the machine learning unit 130 can perform high-speed Processing when a GPU (General-purpose Graphics Processing unit) is used for computation associated with machine learning, for example, by using a technique of mounting a GPU (Graphics Processing unit) on a personal computer, which is called GPGPU (General-purpose Graphics Processing unit). In order to perform higher-speed processing, a computer cluster may be constructed using a plurality of computers each having such a GPU mounted thereon, and parallel processing may be performed by a plurality of computers included in the computer cluster.

Next, the operation of the device learning unit 130 in Q learning in the present embodiment will be described with reference to the flow of fig. 7.

In step S11, state information acquisition unit 131 acquires state information S from motor control device 100. The acquired state information is output to the cost function update unit 1322 or the behavior information generation unit 1323. As described above, the state information S is information corresponding to the state in Q learning, and includes the coefficients ω, ζ, and R of the transfer function of the filter 110 at the time point of step S11. In this way, the position command r(s) and the measured acceleration y(s) corresponding to the shape of the movement trajectory at the predetermined feed speed when each coefficient of the transfer function of the filter 110 is an initial value are obtained.

The motor control device 100 is operated by the machining program at the time of learning, thereby obtaining the state S at the time point when the Q learning is first started₀Lower position command r (S)₀) And the measured acceleration y (S) from the acceleration sensor 300₀). Input to motor control deviceThe position command 100 is a position command corresponding to a predetermined movement trajectory designated by the machining program, for example, an octagonal movement trajectory shown in fig. 4 and 5. The position command is input to the filter 110 and the machine learning unit 130. The initial values of the coefficients ω, ζ, and R of the transfer function of the filter 110 are generated in advance by the user, and are transmitted to the device learning unit 130. The acceleration sensor 300 measures the acceleration y (S) at each of the positions C1 to C4 and D1 to D4 of the movement trajectory₀) And outputs the result to the machine learning unit 130. In the machine learning unit 130, the position commands r at the positions of the movement trajectory, such as the positions C1 to C4 and the positions D1 to D4, can be extracted (S)₀) And measuring the acceleration y (S)₀)。

In step S12, the behavior information generation unit 1323 generates new behavior information a, and outputs the generated new behavior information a to the filter 110 via the behavior information output unit 133. The behavior information generation unit 1323 outputs new behavior information a according to the policy. The motor control device 100 that has received the behavior information a drives the machine tool including the servo motor 127 by correcting the state S' of each coefficient ω, ζ, R of the transfer function of the filter 110 relating to the current state S based on the received behavior information. As described above, this behavior information corresponds to the behavior a in Q learning.

In step S13, the state information acquisition unit 131 acquires the coefficients ω, ζ, and R of the transfer function from the position command R (S '), the measured acceleration y (S ') from the acceleration sensor 300, and the filter 110 in the new state S '. In this way, the state information acquiring unit 131 acquires the position command R (S ') and the measured acceleration y (S ') corresponding to the octagonal movement locus (specifically, the positions of the movement locus, such as the positions C1 to C4 and the positions D1 to D4) from the filter 110 as the coefficients ω, ζ, and R in the state S '. The acquired status information is output to the report output unit 1321.

In step S14, the report output unit 1321 determines the magnitude relationship between the evaluation function f (r (S '), y (S ')) in the state S ' and the evaluation function f (r (S), y (S)) in the state S, where f (r (S '), y (S ')) > f(r), (S), y (S)), in step S15, the return is set to a negative value. If f (r (S '), y (S')) < f (r (S), y (S)), the return is made positive in step S16. If f (r (S '), y (S'))) is f (r (S), y (S)), the return is set to zero in step S17. Additionally, negative, positive values of the reward may be weighted. In addition, the state S is the state S at the time point when the Q learning is started₀。

When any one of step S15, step S16, and step S17 is finished, the cost function update unit 1322 updates the cost function Q stored in the cost function storage unit 134 in step S18 based on the return value calculated in the certain step. Then, the process returns to step S11 again, and the above-described process is repeated, so that the cost function Q converges to an appropriate value. The processing may be terminated on condition that the processing is repeated a predetermined number of times or for a predetermined time.

In addition, step S18 illustrates online updating, but instead of online updating, batch updating or mini-batch updating may be performed.

As described above, the following effects are obtained by the operation described with reference to fig. 7: in the present embodiment, by using the machine learning unit 130, an appropriate cost function for adjusting the coefficients ω, ζ, and R of the transfer function of the filter 110 can be obtained, and optimization of the coefficients ω, ζ, and R of the transfer function of the filter 110 can be simplified.

Next, the operation of the optimization behavior information output unit 135 in generating the optimization behavior information will be described with reference to the flow of fig. 8.

First, in step S21, the optimization behavior information output unit 135 acquires the cost function Q stored in the cost function storage unit 134. As described above, the cost function Q is a function updated by the cost function update unit 1322 through Q learning.

In step S22, the optimization behavior information output unit 135 generates optimization behavior information from the cost function Q, and outputs the generated optimization behavior information to the filter 110.

Further, by the operation described with reference to fig. 8, in the present embodiment, the optimization behavior information can be generated from the cost function Q obtained by learning by the machine learning unit 130, and the adjustment of the coefficients ω, ζ, and R of the transfer function of the filter 110 set at present can be simplified based on the optimization behavior information, and the vibration of the machine end can be suppressed, and the quality of the processed surface of the workpiece can be improved. Since the external measuring device is disposed outside the motor control device, the external measuring device can be detached after machine learning, and cost reduction and reliability improvement can be achieved.

The servo control unit and the machine learning unit of the motor control device described above may be implemented by hardware, software, or a combination thereof. The servo control method performed by cooperation of each component included in the motor control device may be realized by hardware, software, or a combination thereof. Here, the software implementation means an implementation in which a computer reads and executes a program.

Various types of non-transitory computer-readable recording media (non-transitory computer readable media) can be used to store and provide a program to a computer. The non-transitory computer-readable recording medium includes various types of tangible storage media. Examples of the non-transitory computer-readable recording medium include: magnetic recording media (e.g., magnetic disks, hard disk drives), optical-magnetic recording media (e.g., magneto-optical disks), CD-ROM (read Only memory), CD-R, CD-R/W, semiconductor memory (e.g., mask ROM, PROM (Programmable ROM), eprom (erasable PROM)), flash ROM, RAM (random access memory).

The above embodiment is a preferred embodiment of the present invention, but the scope of the present invention is not limited to the above embodiment, and various modifications may be made without departing from the spirit of the present invention.

In the above-described embodiment, the case where the acceleration sensor is used as the external measurement device and the measurement information is the acceleration information has been described, but when the acceleration information is to be obtained, the position information and the velocity information may be obtained by using the position sensor and the velocity sensor as the external measurement device, and the acceleration information may be obtained by performing the first order differentiation and the second order differentiation, respectively.

The evaluation function f is a value d obtained by second-order differentiation of the position command r²r/dt²The difference from the measured acceleration y, i.e. the function of the acceleration deviation, may also be a function using a position deviation or a velocity deviation.

Specifically, when the positional deviation is used as the evaluation function, the device learning unit 130 acquires the positional command and the measured position from the position sensor as the external measuring device as the state information, and uses the time integral of the absolute value of the difference between the positional command and the measured position (positional deviation), the time integral of the square of the absolute value of the positional deviation, the time integral obtained by weighting the absolute value of the positional deviation by time t, and the maximum value of the set of the absolute values of the positional deviation as the evaluation function.

When the velocity deviation is used as the evaluation function, the device learning unit 130 acquires the position command and the measured position from the velocity sensor as the external measuring device as the state information, and takes the maximum value of the set of the time integral of the absolute value of the difference between the position command and the measured velocity (velocity deviation), the time integral of the square of the absolute value of the velocity deviation, the time integral of the weighted absolute value of the velocity deviation by the time t, and the absolute value of the velocity deviation as the evaluation function.

Examples of evaluation functions using position deviation, velocity deviation and acceleration deviation are

[c_aX (positional deviation)²+c_bX (speed deviation)²+c_cX (acceleration deviation)²]Is integrated over time. Coefficient c_a、c_b、c_cAre coefficients that give a weight.

When a position sensor is used as the external measurement device, for example, a scale (linear scale) is attached to the table as the external measurement device. Fig. 9 is an explanatory diagram showing a state in which a scale is attached to the table 251 of the machine main body 250. In this case, the position of the table 251 is detected by the scale 301, and the position information is output to the device learning unit 130.

In the above-described embodiment, the case where one resonance point exists in the machine tool 200 is described, but a plurality of resonance points may exist in the machine tool 200. When a plurality of resonance points exist in the machine tool 200, a plurality of filters are provided so as to correspond to the respective resonance points, and all the resonances can be attenuated by connecting them in series. Fig. 10 is a block diagram showing an example in which a plurality of filters are directly connected to constitute a filter. In fig. 10, when there are m (m is a natural number of 2 or more) resonance points, m filters 100-1 to 110-m are connected in series to constitute a filter 110. The coefficients ω, ζ, and R of the m filters 110-1 to 110-m are sequentially optimized to attenuate the resonance point by machine learning.

In the servo control unit 120 of the motor control device 100 shown in fig. 1, an example in which only the position feedforward unit 124 is provided as feedforward control is shown, but a speed feedforward unit may be provided in addition to the position feedforward unit 124. An adder is provided on the output side of the speed control unit 126 shown in fig. 1, and a speed feedforward unit is provided between the input side of the adder and the output side of the filter 110. The adder adds the output of the velocity control unit 126 and the output of the velocity feedforward unit, and outputs the sum to the servo motor 127.

The velocity feedforward section performs velocity feedforward processing shown by a transfer function h(s) expressed by equation 5 (hereinafter, expressed as equation 5) on a value obtained by second-order differentiating the position command value and multiplying the position command value by a constant β, and outputs the processing result to the adder as a velocity feedforward term, coefficient c of equation 5_i、d_j(X.gtoreq.i, j.gtoreq.0, X is a natural number) are coefficients of the transfer function H(s). The natural number X may be the same value as the natural number X in expression 2 or a different value.

[ math figure 5 ]

The following configuration is also available as the configuration of the control system in addition to the configuration of fig. 1.

< modification example in which the machine learning device is provided outside the motor control device >

Fig. 11 is a block diagram showing another configuration example of the control system. The control system 10A shown in fig. 11 is different from the control system 10 shown in fig. 1 in that: n (n is a natural number of 2 or more) motor control devices 100A-1 to 100A-n and n machine tools 200-1 to 200-n each having an acceleration sensor 300-1 to 300-n are connected to machine learning devices 130A-1 to 130A-n via a network 400. Each of the motor control devices 100A-1 to 100A-n has the same configuration as the motor control device 100 of fig. 1, except that it does not have a machine learning unit. The machine learning devices 130A-1 to 130A-n have the same configuration as the machine learning unit 130 shown in FIG. 6.

Here, motor control device 100A-1 and acceleration sensor 300-1 are communicatively connected to machine learning device 130A-1 in a one-to-one set. The motor control devices 100A-2 to 100A-n, the acceleration sensors 300-2 to 300-n, and the machine learning devices 130A-2 to 130A-n are also connected in the same manner as the motor control device 100A-1, the machine tool 200-1, and the machine learning device 130A-1. In fig. 11, n sets of the motor control devices 100A-1 to 100A-n and the acceleration sensors 300-1 to 300-n and the machine learning devices 130A-1 to 130A-n are connected via a network 400, but the motor control devices and the machine tool and the machine learning devices of the respective sets may be directly connected via a connection interface with respect to the n sets of the motor control devices 100A-1 to 100A-n and the acceleration sensors 300-1 to 300-n and the machine learning devices 130A-1 to 130A-n. The n sets of machine tools 200-1 to 200-n and machine learning devices 130A-1 to 130A-n, to which the motor control devices 100A-1 to 100A-n and the acceleration sensors 300-1 to 300-n are attached, may be provided in the same factory, or may be provided in different factories.

The Network 400 is, for example, a Local Area Network (LAN) constructed in a factory, the internet, a public telephone Network, or a combination thereof. The specific communication method in the network 400 is not particularly limited, such as wired connection or wireless connection.

< degree of freedom of System Structure >

In the above-described embodiment, the motor control devices 100A-1 to 100A-n, the acceleration sensors 300-1 to 300-n, and the machine learning devices 130A-1 to 130A-n are communicably connected in a one-to-one group, respectively, and for example, one machine learning device may be communicably connected to a plurality of motor control devices and a plurality of acceleration sensors via the network 400, and machine learning of each motor control device and each machine tool is performed.

In this case, the functions of one machine learning device can be appropriately distributed in a distributed processing system including a plurality of servers. Further, each function of one machine learning device may be realized by a virtual server function or the like on the cloud.

In addition, when there are n machine learning devices 130A-1 to 130A-n corresponding to the motor control devices 100A-1 to 100A-n and the machine tools 200-1 to 200-n with the same model name, the same specification, or the same series, the learning results of the machine learning devices 130A-1 to 130A-n can be shared. In this way, a more suitable model can be constructed.

24页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：永磁同步电机的堵转检测方法、堵转检测装置和控制系统

Machine learning device, control system, and machine learning method

相关技术

网友询问留言