Polishing apparatus, information processing system, polishing method, and recording medium

文档序号：92361 发布日期：2021-10-12 浏览：33次中文

阅读说明：本技术 研磨装置、信息处理系统、研磨方法及记录介质 (Polishing apparatus, information processing system, polishing method, and recording medium ) 是由中村显鸟越恒男铃木佑多松尾尚典神子岛隆仁于 2021-03-17 设计创作，主要内容包括：本发明是研磨装置、信息处理系统、研磨方法及记录介质,研磨装置能够参照存储有使用学习用数据而完成学习的机器学习模型的存储体,该学习用数据将关于研磨中的研磨部件与基板间的摩擦力的信号的特征量或研磨中的研磨部件或基板的温度的特征量作为输入,并将关于研磨后的基板的膜厚的数据或研磨后的基板所包含的关于产品合格率的参数作为输出,研磨装置具备处理器,该处理器根据关于研磨中的研磨部件与基板间的摩擦力的信号、或者研磨中的研磨部件或对象基板的温度而生成特征量,将该生成的特征量输入所述完成学习的机器学习模型,由此输出关于研磨后的基板的膜厚的数据或研磨后的基板所包含的关于产品合格率的参数的任意一个作为推定值。(The present invention relates to a polishing apparatus, an information processing system, a polishing method, and a recording medium, wherein the polishing apparatus can refer to a memory storing a machine learning model that has been learned using learning data that has as input a characteristic amount of a signal relating to a frictional force between a polishing member and a substrate during polishing or a characteristic amount of a temperature of the polishing member or the substrate during polishing and that has as output data relating to a film thickness of the substrate after polishing or a parameter relating to a product yield included in the substrate after polishing, and the polishing apparatus includes a processor that generates a characteristic amount based on the signal relating to the frictional force between the polishing member and the substrate during polishing or the temperature of the polishing member or a target substrate during polishing, inputs the generated characteristic amount to the machine learning model that has been learned, and outputs data relating to the film thickness of the substrate after polishing or the parameter relating to the product yield included in the substrate after polishing One as the estimate.)

1. A polishing apparatus capable of referring to a memory storing a machine learning model that has been learned using data for learning, which has as input a characteristic amount of a signal relating to a frictional force between a polishing member and a substrate during polishing or a characteristic amount of a temperature of the polishing member or the substrate during polishing, and which has as output data relating to a film thickness of the substrate after polishing or a parameter relating to a yield of a product contained in the substrate after polishing,

the polishing device is characterized by comprising:

a polishing table provided with a polishing member and configured to be rotatable;

a polishing head that faces the polishing table, is configured to be rotatable, and is capable of mounting a substrate on a surface facing the polishing table;

a control unit that controls the polishing member to press the substrate against the polishing head and polish the substrate while rotating the polishing head and the polishing table on which the substrate is mounted; and

and a processor for generating a feature quantity based on a signal relating to a frictional force between the polishing member and the substrate during polishing or a temperature of the polishing member or the target substrate during polishing, and inputting the generated feature quantity to the machine learning model in which learning is completed, thereby outputting, as an estimated value, either data relating to a film thickness of the substrate after polishing or a parameter relating to a product yield included in the substrate after polishing.

2. The abrading apparatus of claim 1,

when the output estimated value satisfies a predetermined polishing deterioration condition, the processor stops the subsequent substrate processing.

3. Grinding device as claimed in claim 1 or 2,

comprises a film thickness measuring device for measuring the film thickness of the substrate,

the processor controls the film thickness measuring device to measure the film thickness of the target substrate after polishing when the output estimated value satisfies a predetermined polishing deterioration condition, and controls the film thickness measuring device not to measure the film thickness of the target substrate after polishing when the output estimated value does not satisfy the predetermined polishing deterioration condition.

4. The abrading apparatus of claim 1,

the processor outputs the maintenance timing using the tendency of the estimated value output for the substrate polished at a plurality of different times.

5. The abrading apparatus of claim 1,

the processor performs control in such a manner that a warning for prompting maintenance is issued when the output estimated value satisfies a predetermined polishing deterioration condition.

6. The abrading apparatus of claim 1,

the processor adjusts the polishing conditions of the subsequent substrate so as to obtain desired data on the film thickness of the polished substrate or a parameter on the yield included in the desired polished substrate, based on the output estimated value.

7. The abrading apparatus of claim 1,

the processor relearns the machine learning model using the feature amount in the operation of the polishing apparatus.

8. An information processing system capable of referring to a memory storing a machine learning model that completes learning by using data for learning that takes as input a characteristic amount of a signal relating to a frictional force between a polishing member and a substrate during polishing or a characteristic amount of a temperature of the polishing member or the substrate during polishing and takes as output data relating to a film thickness of the substrate after polishing or a parameter relating to a yield of a product contained in the substrate after polishing,

the information processing system is characterized by comprising:

a generation unit that generates a characteristic amount based on a signal relating to a frictional force between the polishing member and the substrate during polishing or a temperature of the polishing member or the target substrate during polishing; and

and an estimating unit that inputs the generated feature amount to the machine learning model having completed learning, and outputs, as an estimated value, either data on a film thickness of the substrate after polishing or a parameter on a yield of a product included in the substrate after polishing.

9. A polishing method for polishing a substrate by a polishing apparatus capable of referring to a memory in which a machine learning model learned by using data for learning having as input a characteristic amount of a signal relating to a frictional force between a polishing member and the substrate during polishing or a characteristic amount of a temperature of the polishing member or the substrate during polishing and having as output data relating to a film thickness of the substrate after polishing or a parameter relating to a yield of a product contained in the substrate after polishing is stored,

the grinding method is characterized in that the grinding method comprises the following steps,

polishing the substrate by pressing the substrate against the polishing member while rotating a polishing head and a polishing table on which the substrate is mounted,

measuring a signal relating to a frictional force between the polishing member and the substrate during polishing or measuring a temperature of the polishing member or the target substrate during polishing to generate a characteristic amount,

inputting the generated feature quantity into the machine learning model for which learning is completed,

either data on the film thickness of the substrate after polishing or a parameter on the yield of the product included in the substrate after polishing is output as an estimation value.

10. The grinding method according to claim 9,

and determining whether the output estimated value satisfies a predetermined polishing deterioration condition, and stopping the subsequent substrate processing when the predetermined polishing deterioration condition is satisfied.

11. The grinding method according to claim 9,

and determining whether or not the output estimated value satisfies a predetermined polishing deterioration condition, and if the predetermined polishing deterioration condition is satisfied, measuring the film thickness of the target substrate by a film thickness measuring instrument after polishing, and if the output estimated value does not satisfy the predetermined polishing deterioration condition, not measuring the film thickness of the target substrate by the film thickness measuring instrument after polishing.

12. The grinding method according to claim 9,

the maintenance timing is output using the tendency of the estimated value output for the substrate polished at a plurality of different times.

13. The grinding method according to claim 9,

and determining whether the outputted estimated value satisfies a predetermined polishing deterioration condition, and issuing a warning for prompting maintenance when the predetermined polishing deterioration condition is satisfied.

14. The grinding method according to claim 9,

adjusting the polishing conditions of the subsequent substrate so as to obtain desired data on the film thickness of the polished substrate or parameters on the yield included in the desired polished substrate, based on the output estimated value.

15. The grinding method according to claim 9,

the machine learning model is relearned using the feature amount in the operation of the polishing apparatus.

16. A recording medium storing a program for causing a computer to function as a device capable of referring to a storage body storing a machine learning model learned using data for learning having as input a characteristic amount of a signal relating to a frictional force between a polishing member and a substrate during polishing or a characteristic amount of a temperature of the polishing member or the substrate during polishing and having as output data relating to a film thickness of the substrate after polishing or a parameter relating to a product yield included in the substrate after polishing,

the element includes:

Technical Field

The present technology relates to a polishing apparatus, an information processing system, a polishing method, and a program.

Background

Polishing apparatuses for polishing a substrate (e.g., a wafer) are known. For example, patent document 1 discloses a polishing apparatus including, for example: a grinding table which is provided with a grinding component and can be rotatably formed; and a polishing head rotatably opposed to the polishing table, and a substrate can be mounted on a surface opposed to the polishing table.

The polishing performance of the polishing apparatus may be deteriorated. The deterioration of the condition at this time includes consumption of a consumable part (e.g., a polishing pad as an example of a polishing member) of the polishing apparatus, and deterioration of the condition of the platen. Therefore, when the polishing condition deteriorates, the profile of the film thickness (also referred to as a residual film) after the substrate polishing deteriorates (for example, the variation in the film thickness becomes large). In this case, in order to investigate whether or not the substrate is a defective product, the thickness of the polished film or the film thickness profile is measured for all the polished substrates by the film thickness measuring instrument, which requires a lot of time. In particular, when only one film thickness measuring device is provided for a plurality of polishing apparatuses, there is a problem that the measurement time of the film thickness measuring device becomes a bottleneck and the throughput is reduced when the film thickness measuring device measures all the polished substrates. Although only arbitrary substrates may be sampled and film thickness measurement may be performed, or the number of measurement points of the substrates may be reduced to shorten the measurement time of the film thickness measuring machine (ITM), it is not preferable because defective products may be missed and the yield of the product may be affected.

Disclosure of Invention

The present technology has been made in view of the above problems, and it is desirable to provide a polishing apparatus, an information processing system, and a program that can improve throughput and yield by avoiding missing defective products.

(means for solving the problems)

A polishing apparatus according to one embodiment can refer to a memory storing a machine learning model that completes learning using data for learning, the data for learning having as input a characteristic amount of a signal relating to a frictional force between a polishing member and a substrate during polishing or a characteristic amount of a temperature of the polishing member or the substrate during polishing and having as output data relating to a film thickness of the substrate after polishing or a parameter relating to a yield of a product included in the substrate after polishing, the polishing apparatus comprising: a polishing table provided with a polishing member and configured to be rotatable; a polishing head that faces the polishing table, is configured to be rotatable, and is capable of mounting a substrate on a surface facing the polishing table; a control unit that controls the polishing member to press the substrate against the polishing head and polish the substrate while rotating the polishing head and the polishing table on which the substrate is mounted; and a processor for generating a feature amount based on a signal relating to a frictional force between the polishing member and the substrate during polishing or a temperature of the polishing member or the target substrate during polishing, and inputting the generated feature amount to the machine learning model in which learning is completed, thereby outputting, as an estimated value, any one of data relating to a film thickness of the substrate after polishing and a parameter relating to a product yield included in the substrate after polishing.

An information processing system according to an embodiment is an information processing system capable of referring to a memory storing a machine learning model that completes learning using data for learning that takes as input a feature quantity of a signal relating to a frictional force between a polishing member and a substrate during polishing or a feature quantity of a temperature of the polishing member or the substrate during polishing and that takes as output data relating to a film thickness of the substrate after polishing or a parameter relating to a yield of a product included in the substrate after polishing, the information processing system including: a generation unit that generates a characteristic amount based on a signal relating to a frictional force between the polishing member and the substrate during polishing or a temperature of the polishing member or the target substrate during polishing; and an estimating unit that inputs the generated feature amount to the machine learning model having completed learning, and outputs, as an estimated value, either data on a film thickness of the polished substrate or a parameter related to a yield of the product included in the polished substrate.

A polishing method according to one embodiment is a polishing method in which a substrate is polished by a polishing apparatus capable of generating a characteristic amount by referring to a memory in which a machine learning model that has been learned using data for learning, which has as input a characteristic amount of a signal relating to a frictional force between a polishing member and the substrate during polishing or a characteristic amount of a temperature of the polishing member or the substrate during polishing and outputs data relating to a film thickness of the substrate after polishing or a parameter relating to a yield of a product included in the substrate after polishing, is stored, the polishing method being characterized in that the substrate is polished by pressing the substrate against the polishing member while rotating a polishing head and a polishing table on which the substrate is mounted, the signal relating to the frictional force between the polishing member and the substrate during polishing or the temperature of the polishing member or a target substrate during polishing is measured, the generated feature amount is input to the machine learning model in which learning is completed, and either data on the film thickness of the substrate after polishing or a parameter on the yield of the product included in the substrate after polishing is output as an estimated value.

A program according to an embodiment for causing a computer to function as a storage medium storing a machine learning model that completes learning using data for learning that takes as input a characteristic amount of a signal relating to a frictional force between a polishing member and a substrate during polishing or a characteristic amount of a temperature of the polishing member or the substrate during polishing and takes as output data relating to a film thickness of the substrate after polishing or a parameter relating to a product yield included in the substrate after polishing, the program comprising: a generation unit that generates a characteristic amount based on a signal relating to a frictional force between the polishing member and the substrate during polishing or a temperature of the polishing member or the target substrate during polishing; and an estimating unit that inputs the generated feature amount to the machine learning model having completed learning, and outputs, as an estimated value, either data on a film thickness of the polished substrate or a parameter related to a yield of the product included in the polished substrate.

Drawings

Fig. 1 is a schematic configuration diagram of an information processing system according to a first embodiment.

Fig. 2 is a schematic diagram showing the overall configuration of the polishing apparatus according to the first embodiment.

Fig. 3 is a schematic configuration diagram of the AI section of the first embodiment.

Fig. 4 is an explanatory diagram of correspondence between the polishing condition of the wafer and the waveform of the TCM.

Fig. 5 is a schematic diagram for explaining a waveform of the truncated TCM.

Fig. 6 is a graph showing correlation coefficients between the maximum value of the residual film and each parameter.

Fig. 7 is a schematic diagram illustrating an example of the LightGBM.

Fig. 8 is a schematic diagram illustrating an example of the learning step and the estimation step.

Fig. 9 is a diagram comparing the measured value of the maximum film thickness with the AI estimated value in the first embodiment.

Fig. 10 is a graph comparing the measured value of the average film thickness value with the AI estimated value in the first embodiment.

Fig. 11 is a graph comparing the measured value of the film thickness range and the AI estimated value in the first embodiment.

Fig. 12 is a flowchart showing an example of a process for stopping the process of the subsequent substrate satisfying the polishing deterioration condition.

Fig. 13 is a flowchart showing an example of a process of measuring a film thickness by a film thickness measuring device in the apparatus when a polishing deterioration condition is satisfied.

FIG. 14 is a flowchart showing an example of a process for measuring a film thickness by a film thickness measuring instrument in the apparatus when a polishing deterioration condition is satisfied.

Fig. 15 is a flowchart showing an example of a process of issuing a warning for prompting maintenance when a polishing deterioration condition is satisfied.

Fig. 16 is a schematic diagram showing the overall configuration of the polishing system according to the second embodiment.

Fig. 17 is a schematic diagram showing the overall configuration of the polishing system according to the third embodiment.

Detailed Description

Hereinafter, various embodiments will be described with reference to the drawings. However, unnecessary detailed description will be omitted. For example, detailed descriptions of already known matters and repetitive descriptions of substantially the same configuration will be omitted. This is to avoid unnecessary redundancy in the following description, which will be readily understood by those skilled in the art.

A polishing apparatus according to a first aspect of the present invention is a polishing apparatus that can refer to a memory storing a machine learning model that completes learning using data for learning, the data for learning having as input a characteristic amount of a signal relating to a frictional force between a polishing member and a substrate during polishing or a characteristic amount of a temperature of the polishing member or the substrate during polishing, and having as output data relating to a film thickness of the substrate after polishing or a parameter relating to a yield of a product included in the substrate after polishing, the polishing apparatus comprising: a polishing table provided with a polishing member and configured to be rotatable; a polishing head that faces the polishing table, is configured to be rotatable, and is capable of mounting a substrate on a surface facing the polishing table; a control unit that controls the polishing member to press the substrate against the polishing head and polish the substrate while rotating the polishing head and the polishing table on which the substrate is mounted; and a processor for generating a feature amount based on a signal relating to a frictional force between the polishing member and the substrate during polishing or a temperature of the polishing member or the target substrate during polishing, and inputting the generated feature amount to the machine learning model in which learning is completed, thereby outputting, as an estimated value, any one of data relating to a film thickness of the substrate after polishing and a parameter relating to a product yield included in the substrate after polishing.

In the case of this configuration, since the polishing apparatus obtains data on the film thickness of the substrate after polishing or an estimated value of a parameter related to the yield of the product included in the substrate after polishing during polishing, the state of the substrate after polishing can be predicted even if the film thickness is not measured. Thus, the state of the substrate after polishing can be grasped without measuring the film thickness, and the number of times of measuring the film thickness can be reduced, so that missing defective products can be avoided and the throughput can be improved. Thus, the throughput can be improved by omitting the film thickness measurement during normal polishing. Further, by estimating the parameter relating to the yield, a defect or predicted defect can be detected. In addition, by updating the polishing parameters in accordance with the parameters relating to the yield, the yield can be improved.

A polishing apparatus according to a second aspect of the present technology is the polishing apparatus according to the first aspect, wherein the processor stops processing of a subsequent substrate when the output estimated value satisfies a predetermined polishing deterioration condition.

With this configuration, when the polishing state is deteriorated, the subsequent substrate processing is stopped, so that maintenance such as replacement of the polishing member can be performed, and the polishing state can be prevented from being further deteriorated.

A polishing apparatus according to a third aspect of the present technology is the polishing apparatus according to the first or second aspect, wherein a film thickness measuring device is provided for measuring a film thickness of the substrate, the processor controls the film thickness measuring device to measure the film thickness of the target substrate after polishing when the output estimated value satisfies a predetermined polishing deterioration condition, and the processor controls the film thickness measuring device not to measure the film thickness of the target substrate after polishing when the output estimated value does not satisfy the predetermined polishing deterioration condition.

With this configuration, since the film thickness of the substrate is measured when the polishing state is deteriorated, it can be determined whether or not the polishing is performed satisfactorily, and when the polishing state is not deteriorated, the film thickness of the substrate is not measured, thereby increasing the throughput.

A polishing apparatus according to a fourth aspect of the present technology is the polishing apparatus according to any one of the first to third aspects, wherein the processor outputs the maintenance timing using a tendency of the estimated value output for the substrate polished at a plurality of different timings.

With this configuration, the timing at which the polishing state deteriorates can be predicted, and maintenance such as replacement of the polishing member can be performed at that timing, so that further deterioration of the polishing state can be prevented.

A polishing apparatus according to a fifth aspect of the present technology is the polishing apparatus according to any one of the first to fourth aspects, wherein the processor performs control so as to issue a warning for prompting maintenance when the output estimated value satisfies a predetermined polishing deterioration condition.

With this configuration, when the polishing state deteriorates, maintenance such as replacement of the polishing member can be performed, and therefore, the polishing state can be prevented from further deteriorating.

A polishing apparatus according to a sixth aspect of the present technology is the polishing apparatus according to any one of the first to fifth aspects, wherein the processor adjusts polishing conditions for subsequent substrates so as to obtain desired data on a film thickness of the polished substrate or a parameter on a yield included in the desired polished substrate, based on the output estimated value.

With this configuration, since the subsequent substrate can be polished in a good state, the good polished state can be maintained for a longer period of time.

A polishing apparatus according to a seventh aspect of the present technology is the polishing apparatus according to any one of the first to sixth aspects, wherein the processor relearns the machine learning model using the feature amount in operation of the polishing apparatus.

With this configuration, the estimation accuracy can be improved.

An information processing system according to an eighth aspect of the present technology is an information processing system capable of referring to a memory storing a machine learning model that completes learning using data for learning that takes as input a characteristic amount of a signal relating to a frictional force between a polishing member and a substrate during polishing or a characteristic amount of a temperature of the polishing member or the substrate during polishing and takes as output data relating to a film thickness of the substrate after polishing or a parameter relating to a yield of a product included in the substrate after polishing, the information processing system comprising: a generation unit that generates a characteristic amount based on a signal relating to a frictional force between the polishing member and the substrate during polishing or a temperature of the polishing member or the target substrate during polishing; and an estimating unit that inputs the generated feature amount to the machine learning model having completed learning, and outputs, as an estimated value, either data on a film thickness of the polished substrate or a parameter related to a yield of the product included in the polished substrate.

In the case of this configuration, since data on the thickness of the substrate after polishing and an estimated value of a parameter related to the yield of the product included in the substrate after polishing are obtained during polishing by the polishing apparatus, the state of the substrate after polishing can be predicted without measuring the thickness of the substrate. Thus, the state of the substrate after polishing can be grasped even if the film thickness is not measured, and the number of times of measuring the film thickness can be reduced, so that missing defective products can be avoided and the throughput can be improved.

A polishing method according to a ninth aspect of the present invention is a method of polishing a substrate by a polishing apparatus capable of generating a characteristic amount by referring to a memory storing a machine learning model learned using learning data that has as input a characteristic amount of a signal relating to a frictional force between a polishing member and a substrate during polishing or a characteristic amount of a temperature of the polishing member or the substrate during polishing and that has as output data relating to a film thickness of the substrate after polishing or a parameter relating to a yield of a product included in the substrate after polishing, the method comprising pressing the substrate against the polishing member to polish the substrate while rotating a polishing table and the substrate on which the substrate is mounted, measuring a signal relating to a frictional force between the polishing member and the substrate during polishing or measuring a temperature of the polishing member or a target substrate during polishing, the generated feature amount is input to the machine learning model in which learning is completed, and either data on the film thickness of the substrate after polishing or a parameter on the yield of the product included in the substrate after polishing is output as an estimated value.

A program according to a tenth aspect of the present technology is a program for causing a computer to function as a storage medium storing a machine learning model that completes learning using data for learning, the data for learning having as input a characteristic amount of a signal relating to a frictional force between a polishing member and a substrate during polishing or a characteristic amount of a temperature of the polishing member or the substrate during polishing, and having as output data relating to a film thickness of the substrate after polishing or a parameter relating to a product yield included in the substrate after polishing, the program comprising: a generation unit that generates a characteristic amount based on a signal relating to a frictional force between the polishing member and the substrate during polishing or a temperature of the polishing member or the target substrate during polishing; and an estimating unit that inputs the generated feature amount to the machine learning model having completed learning, and outputs, as an estimated value, either data on a film thickness of the polished substrate or a parameter related to a yield of the product included in the polished substrate.

In the case of this configuration, since data on the film thickness of the substrate after polishing and an estimated value of a parameter related to the yield of the product included in the substrate after polishing are obtained during polishing by the polishing apparatus, the state of the substrate after polishing can be predicted without measuring the film thickness. Thus, the state of the substrate after polishing can be grasped even if the film thickness is not measured, and the number of times of measuring the film thickness can be reduced, so that missing defective products can be avoided and the throughput can be improved.

In addition to the above-described problems, there is a problem that it takes time to determine the deterioration of the polishing condition (for example, the stage condition).

Various embodiments estimate the polishing state, the film thickness after polishing (also referred to as residual film thickness), the statistical value of the film thickness (average, maximum, minimum, etc.), or the profile of the film thickness (also referred to as film thickness distribution) from the change of the monitoring waveform during polishing. Thus, the good/bad polishing and the polishing condition (e.g. the condition of the worktable) can be estimated and managed on time. Therefore, when the polishing condition is defective, the next polishing is not performed, and the condition of the table can be adjusted. Thus, poor grinding of the sample can be reduced. The various embodiments are described with reference to a wafer as an example of a substrate.

< first embodiment >

First, the first embodiment will be explained. Fig. 1 is a schematic configuration diagram of an information processing system according to a first embodiment. As shown in fig. 1, the information processing system S1 according to the first embodiment includes: a loading/unloading section 2, two polishing apparatuses 10 as an example, a cleaning section 5, and a film thickness measuring instrument 6.

The loading/unloading section 2 has 2 or more (4 in the present embodiment) front loading sections 20 on which the wafer cassette storing a plurality of wafers is loaded. The Front loading unit 20 may be loaded with an open cassette, a SMIF (Standard Manufacturing Interface), or a FOUP (Front Opening Unified Pod). Here, the SMIF and the FOUP are cassettes which house wafers therein, and are closed containers which can maintain an environment independent from an external space by being covered with a partition wall. An example of this will be described in which one of the front loaders 20 is loaded with FOUP 21. The wafer is transferred from the loading/unloading section 2 to the polishing apparatus 10 by a transfer robot 22 (see patent document 1).

The film thickness measuring instrument 6 measures the film thickness or the profile of the film thickness (also referred to as film thickness distribution) of a substrate (here, a wafer). The film thickness measuring device 6 is, for example, an optical film thickness measuring device (also referred to as ITM).

The polishing apparatus 10 includes an AI unit 4. The AI unit 4 outputs, as an estimated value, any of data on the thickness of the substrate after polishing, a profile statistic value of the thickness of the substrate after polishing, and a parameter (for example, yield) related to the yield of the product included in the substrate after polishing. Then, the AI unit 4 causes the film thickness measuring instrument 6 to measure the film thickness of the target substrate after polishing, for example, when the estimated value exceeds a predetermined polishing normal condition or when a predetermined polishing deterioration condition is satisfied. For example, when the estimated value of the wafer W1 in fig. 1 exceeds a predetermined polishing normal condition or when a predetermined polishing deterioration condition is satisfied, the wafer W1 is cleaned by the cleaning unit 5 as indicated by an arrow a1, and then the film thickness is measured by the film thickness measuring instrument 6. For example, when the estimated value of the wafer W2 in fig. 1 satisfies a predetermined polishing normal condition or does not satisfy a predetermined polishing deterioration condition, the wafer W2 is cleaned by the cleaning unit 5 as indicated by an arrow a2, and then returned to the FOUP21 without measuring the film thickness by the film thickness measuring instrument 6.

In addition, the AI unit 4 may learn normal data when the polishing normal condition is used, may learn bad data when the polishing deterioration condition is used, or may learn normal data and bad data by determining a ratio.

The output of the AI unit 4 may be classified into three types, normal, defective, and near-defective, and output. When the output is nearly defective, the film thickness is measured by the film thickness measuring instrument 6.

In fig. 9, for example, the AI unit 4 may determine that the film thickness is close to a defect and measure the film thickness when the estimated AI output value does not exceed the lower threshold all the time or suddenly exceeds the upper threshold.

In addition, or alternatively, the AI portion 4 may determine that the polishing time is close to a defect when the polishing time exceeds the normal range, and measure the film thickness.

The output of the AI unit 4 may be divided into normal and near-defective outputs.

Fig. 2 is a schematic diagram showing the overall configuration of the polishing apparatus according to the first embodiment. As shown in fig. 2, the polishing apparatus 10 includes: a polishing table 100; and a polishing head 1 as a substrate holding device for holding a substrate (here, a wafer) as a polishing object and pressing a polishing surface on the polishing table 100. The polishing head 1 is also called a top ring. The polishing table 100 is connected to a table rotating motor 102 disposed therebelow via a table shaft 100 a. The polishing table 100 is rotated about a table axis 100a by rotation of a table rotation motor 102. A polishing pad 101 as a polishing member is bonded to the upper surface of the polishing table 100. The surface of the polishing pad 101 constitutes a polishing surface 101a for polishing the semiconductor wafer W. Therefore, the polishing apparatus 10 includes: a polishing table 100 rotatably provided with a polishing member (here, a polishing pad 101 is an example); a polishing head rotatably provided opposite to the polishing table 100, and a polishing head 1 for mounting a substrate (here, a wafer) on a surface opposite to the polishing table 100.

A polishing liquid supply nozzle 60 is provided above the polishing table 100. A polishing liquid (polishing slurry) Q is supplied from the polishing liquid supply nozzle 60 onto the polishing pad 101 on the polishing table 100.

The polishing head 1 basically consists of: a top ring body 2 for pressing the semiconductor wafer W against the polishing surface 101 a; and a retainer ring 3 serving as a fixing member for holding the outer peripheral edge of the semiconductor wafer W and preventing the semiconductor wafer W from being ejected from the polishing head 1. The polishing head 1 is connected to the top ring shaft 111. The top ring shaft 111 is moved up and down with respect to the top ring head 110 by an up-and-down moving mechanism 124. The polishing head 1 is vertically positioned by moving the top ring shaft 111 up and down to move the entire polishing head 1 up and down with respect to the top ring head 110. A rotary joint 26 is attached to the upper end of the top ring shaft 111.

The vertical movement mechanism 124 for vertically moving the top ring shaft 111 and the polishing head 1 includes: a bridge 128 that rotatably supports the top ring shaft 111 via a bearing 126; a ball screw 132 mounted to the bridge 128; a support table 129 supported by a support column 130; and a servo motor 138 provided on the support table 129. A support table 129 supporting a servo motor 138 is fixed to the top ring head 110 via a support column 130.

The ball screw 132 includes: a screw shaft 132a connected to a servo motor 138; and a nut 132b to which the screw shaft 132a is screwed. When the servo motor 138 is driven, the bridge 128 moves up and down via the ball screw 132, and thereby the top ring shaft 111 and the polishing head 1, which move up and down integrally with the bridge 128, move up and down.

As shown in fig. 2, by rotationally driving the top ring rotary motor 114, the rotary cylinder 112 and the top ring shaft 111 rotate integrally via the timing pulley 116, the timing belt 115, and the timing pulley 113, and the polishing head 1 rotates.

The top ring head 110 is supported by a top ring head shaft 117 rotatably supported by a frame (not shown). The polishing apparatus 10 includes a control unit 500 connected to each device in the apparatus such as the top ring rotation motor 114, the servo motor 138, and the table rotation motor 102 via control lines, and configured to control each device. The control unit 500 controls the polishing head 1 and the polishing table 100 on which the substrate is mounted to rotate, and to press the substrate against a polishing member (here, the polishing pad 101) to polish the substrate.

The machine learning model described later is input by table rotation, head rotation, and rotation of a motor (not shown) for swinging the top ring head 110, but one or more sensor detection values (for example, motor current values) or a calculated value of torque calculated from the sensor detection values may be used.

The polishing apparatus 10 includes an AI unit 4 connected to the control unit 500 via wiring. Fig. 3 is a schematic configuration diagram of the AI section of the first embodiment. As shown in fig. 3, the AI unit 4 is, for example, a computer, and includes: memory bank 41, memory 42, input unit 43, output unit 44, and processor 45.

The memory 41 stores a machine learning model that completes learning using data for learning, which is input with a signal characteristic amount regarding a frictional force between a polishing member (here, the polishing pad 101) and a substrate during polishing, and outputs data regarding a thickness of the substrate after polishing, a profile statistic value of the thickness of the substrate after polishing, or a parameter regarding a product yield included in the substrate after polishing. Further, the memory 41 stores a program that the processor 45 reads and executes.

Here, the signal relating to the frictional force between the polishing member and the substrate is, for example, a signal for calculating a Current value (Table Current Monitor, also referred to as TCM) of the torque of the Table rotation motor 102 during polishing. Here, the signal relating to the frictional force between the polishing member and the substrate may be a calculated value of torque converted from a current value of the motor. The signal relating to the frictional force between the polishing member and the substrate may be a signal of a drive current value of the top ring rotation motor 114 for rotating the polishing head 1, or a signal of a drive current value of a motor (not shown) for rotating the top ring head 110 (i.e., the top ring head shaft 117).

In addition, the polishing apparatus 10 may further include a load cell for measuring a frictional force between the polishing member and the substrate, and in this case, the signal relating to the frictional force between the polishing member and the substrate may be a signal of the load cell. The polishing apparatus 10 may further include a strain sensor for measuring strain of the substrate, and in this case, the signal relating to the frictional force between the polishing member and the substrate may be a signal of the strain sensor.

The memory 42 is a medium that temporarily stores information.

The input unit 43 receives information from the control unit 500 and outputs the information to the processor 45.

The output unit 44 receives information from the processor 45 and outputs the information to the control unit 500.

The processor 45 reads and executes the program from the storage 41, and functions as the generation unit 451, the estimation unit 452, and the determination unit 453.

The generating unit 451 generates a feature value from a signal relating to a frictional force between the polishing member and the substrate during polishing. Here, in the polishing, for example, a period of time during which the substrate is polished by pressing the substrate against the polishing member while the polishing head 1 and the polishing table 100 on which the substrate is mounted are rotated. This process is described in detail later.

The estimation unit 452 inputs the feature amount generated by the generation unit 451 to a machine learning model in which learning is completed, and thereby outputs, as an estimation value, either data on the film thickness of the substrate after polishing or a parameter related to the yield of the product included in the substrate after polishing. This process is described in detail later. Here, the data on the thickness of the polished substrate is, for example, any one of the thickness of the polished substrate, a statistical value of a thickness profile of the polished substrate (for example, an average value, a maximum value, a minimum value, a fluctuation range, a standard deviation, and the like of a thickness distribution), a thickness profile of the polished substrate, and the like. Here, the film thickness profile is a group of film thickness data (combination of XY coordinates and film thickness) obtained by measuring a plurality of points by changing the position on the wafer.

A plurality of chips are present in a wafer, and a failure determination is performed for each chip, whereby a parameter relating to the yield of the chip in the wafer can be calculated. The yield of the product contained in the substrate after the polishing is, for example, the yield of the chip in the wafer.

Fig. 4 is an explanatory diagram of correspondence between the polishing condition of the wafer and the waveform of the TCM. The vertical axis of the graph shown in fig. 4 represents the torque current value (TCM) of the table rotation motor 102 during polishing, and the horizontal axis represents time [ ms ], and a waveform C1 representing the time change of TCM is shown. Since the frictional force with the polishing pad 101 varies according to the exposed film species ratio, the value of TCM also varies according to it.

As shown in fig. 4, the wafer W has: a layer to be polished 51 attached so as to face the polishing pad 101; and a lower layer 52 provided on the layer 51 to be polished. The layer 51 to be polished is cut by a frictional force during polishing. At point P1 on the waveform C1, the polished layer 51 was not more cut, and a portion of the underlayer 52 was revealed by point P2 on the waveform C1 over time. The lower layer 52 is exposed over the entire surface at point P3 on the waveform C1 with time. After the lower layer 52 is exposed over the entire surface, the platen rotation motor 102 is stopped, and polishing is completed.

The portion of arrow a15 of fig. 4 is over-ground, resulting in the length of arrows a12, a13 being correspondingly shorter than the length of arrows a11, a 14.

The inventors of the present application have found that when a film is polished unevenly, because the timing of the exposure of the lower layer film varies within the wafer plane, the TCM signal (an example of fig. 5 is a waveform of a descending arc) before the end of polishing is related to the residual film thickness or the residual film profile. Therefore, in the present embodiment, the feature value is generated from the TCM signal in a predetermined period before the end of polishing.

The following describes the use of a signal obtained by extracting a part of a TCM signal to calculate a feature amount from the TCM signal, with reference to fig. 5. Fig. 5 is a schematic diagram for explaining a waveform of the truncated TCM. The vertical axis of fig. 5 is TCM and the horizontal axis is time. As shown in fig. 5, a waveform of the entire TCM is shown in a graph G1. In graph G1, cut-out region R1 is graph G2. Therefore, the generation unit 451 according to the present embodiment extracts data in a predetermined time range from the TCM, for example, and calculates a feature amount from the extracted data. The feature value is, for example, a value itself of a calculation period (for example, the entire period, a part of the period T1, a part of the period T2 after T1, and the like of the extracted data), a differential value, a statistic value (for example, a maximum value, a minimum value, a standard deviation, a variance, an average value, a median value, a variance, a kurtosis, a strain degree, and the like) with respect to a moving average value, and the like, in the extracted data. The kurtosis here is a number indicating the sharpness of the frequency distribution, and is calculated by a conventional calculation method. The strain degree is the degree of the data not symmetrically distributed around the average value, and is calculated by a conventional calculation method.

An example of the feature amount will be described with reference to fig. 6. Fig. 6 is a graph showing correlation coefficients between the maximum value of the residual film and each parameter. In fig. 6, the vertical axis represents each characteristic amount, the horizontal axis represents a correlation coefficient, and the correlation coefficient between each characteristic amount on the vertical axis and the maximum value of the residual film is 0.5 or more, and they are found to be related to each other.

In fig. 6, the feature value T1_ min-d _ r25 is the minimum value of the 25 data moving average differential of the TCM in the predetermined period T1 out of the periods of the graph G2 in fig. 5, All _ min-d _ r25 is the minimum value of the 25 data moving average differential of the TCM in All the periods of the graph G2 in fig. 5, the feature value T1_ min-d _ r10 is the minimum value of the 10 data moving average differential of the TCM in the predetermined period T1 out of the periods of the graph G2 in fig. 5, and All _ min-d _ r10 is the minimum value of the 10 data moving average differential of the TCM in All the periods of the graph G2 in fig. 5. T2_ sum is the total value of TCM of period T2 after period T1 in the period of graph G2 of fig. 5, T1_ std-d _ r10 is the standard deviation of the differential of the 10 data moving average of TCM of predetermined period T1 in the period of graph G2 of fig. 5, All _ skew-d-r10 is the strain degree of the differential of the 10 data moving average of TCM of All periods of graph G2 of fig. 5, All _ skew-d-r25 is the strain degree of the differential of the 25 data moving average of TCM of All periods of graph G2 of fig. 5, and T1_ len is the number of data.

The number of types of parameters to be used (for example, the upper 10 parameters) may be determined from the upper layer parameters having high correlation coefficients, or the applicable conditions may be determined.

The applicable condition may be, for example, a condition using a parameter equal to or larger than the average value of the correlation coefficient, or a condition using a parameter equal to or larger than the value obtained by adding the standard deviation σ to the average value of the correlation coefficient.

Furthermore, All _ range-d _ r25 is a range of differential values of 25 data moving average of TCM in the entire period of the graph G2 in fig. 5, T1_ var-d _ r25 is a dispersion of differential values of 25 data moving average of TCM in the predetermined period T1 in the period of the graph G2 in fig. 5, and All _ range-d _ r10 is a range of differential values of 10 data moving average of TCM in the entire period of the graph G2 in fig. 5. T1_ sum is the total of TCMs for a predetermined period T1 out of the periods of the graph G2 in fig. 5.

T1_ mean-d _ r10 is the average of the differential of the 10 data moving average of the TCM of the predetermined period T1 in the period of the graph G2 of fig. 5, and T1_ max is the maximum value of the TCM of the predetermined period T1 in the period of the graph G2 of fig. 5. All _ max is the maximum value of TCM during the entire period of the graph G2 of fig. 5. All _ std-d _ r10 is the standard deviation of the differential of the 10 data moving average of TCM over the entire period of graph G2 of FIG. 5. T1_ mean is an average value of TCM of a predetermined period T1 in the period of the graph G2 of fig. 5. All _ std-d _ r25 is the standard deviation of the differential of the 25 data moving average of TCM during the entire period of graph G2 of FIG. 5. All _ len is the number of data.

T1_ range-d _ r10 is a range of the differential of the 10-data moving average of the predetermined period T1 during the period of the graph G2 in fig. 5, T1_ mean-d _ r25 is an average of the differential of the 25-data moving average of the predetermined period T1 during the period of the graph G2 in fig. 5, T1_ range-d _ r25 is a range of the differential of the 25-data moving average of the predetermined period T1 during the period of the graph G2 in fig. 5, All _ var-d _ r10 is a dispersion of the differential of the 10-data moving average during the entire period of the graph G2 in fig. 5, T2_ mean-d _ r5 is an average of the differential of the 5-data moving average of the period T2 after the period T1 during the period of the graph G2 in fig. 5, and All _ var-d _ r25 is an average of the differential of the data moving average of the period T2 during the graph G2 in fig. 5. All _ mean is an average of TCMs during the entire period of the graph G2 in fig. 5, T2_ mean is an average of TCMs during the period T2 after the period T1 during the period of the graph G2 in fig. 5, All _ skew is a strain degree of TCMs during the entire period of the graph G2 in fig. 5, and T2_ min is a minimum value of TCMs during the period T2 after the period T1 during the period of the graph G2 in fig. 5.

The AI (artificial intelligence) model used by the estimation unit 452 of the AI unit 4 may be, for example, a Light Gradient Boosting Machine (LightGBM) shown in non-patent document 1. LightGBM is a decision tree based machine learning model.

Fig. 7 is a schematic diagram illustrating an example of the LightGBM. In the example of FIG. 7, a first decision tree M1, a second decision tree M2, and a third decision tree M3 are provided. Model training is performed with a first decision tree M1 to evaluate the speculative result. Training of the second decision tree M2 is performed using "errors" of the guessed results and actual values of the first decision tree M1 as training data. Similarly, training of the third decision tree M3 is performed using "errors" between the guessed results and the actual values of the second decision tree M2 as training data. After training is completed, when the feature values are input to the first decision tree M1, the estimated values are output from the third decision tree M3. The processing method of the decision tree in the training process of gradient Boosting (Boosting) is a method called "Leaf-wise tree growth" (Leaf growth strategy), and grows according to the Leaf (i.e. Leaf) of the decision tree.

Fig. 8 is a schematic diagram illustrating an example of the learning step and the estimation step. As shown in fig. 8, in the learning step, the AI unit 4 learns the machine learning model using the learning data having the feature amount as an input and the film thickness estimation value as an output. When the feature value is input to the machine learning model in which learning is completed, the estimation step outputs the film thickness estimation value from the machine learning model, for example. In this way, the AI unit 4 outputs a film thickness estimation value for the input feature amount using the machine learning model having completed learning. The AI section 4 may compare the estimated film thickness with a set threshold value, for example, to determine whether the wafer is normal or close to a defect, and may control the wafer to be transferred to the film thickness measuring device when the estimated film thickness is determined to be close to a defect. Thus, when it is determined that the wafer is close to defective, the film thickness of the wafer is actually measured.

In the present embodiment, data obtained by grinding in the past is divided into learning data and test data, and then an AI is trained by using only the learning data for learning, and the AI estimates the entire data and compares the data with an actual measurement value. The following describes the results of comparison of the maximum film thickness, the average film thickness, and the Range of film thicknesses (also referred to as film thickness Range).

Fig. 9 is a graph comparing the measured value of the maximum film thickness with the AI estimated value in the first embodiment. As shown in fig. 9, the ordinate of the graph G11 is the AI estimated value of the maximum film thickness normalized by the maximum film thickness of the allowable limit, and the abscissa is the measured value of the maximum film thickness normalized by the maximum film thickness of the allowable limit. The AI estimates are distributed around the correct answer line, indicating that the estimation is proceeding efficiently. The ordinate of the graph G12 represents the data statistics, and the abscissa represents the estimation error (i.e., the measurement value of the maximum film thickness — the AI estimation value of the maximum film thickness). Train represents learning data, and Test represents Test data. The estimation errors are all within a predetermined range.

For example, the determination unit 453 may determine that the film thickness needs to be measured when the AI estimated value of the maximum film thickness value normalized by the allowable maximum film thickness value exceeds the first threshold value. Thus, when the first threshold is exceeded, since there is a possibility that the wafer is cut and left over by including the maximum value of the film thickness exceeding the allowable limit, the film thickness can be controlled so as to be measured. Thus, the AI estimated value is used as a judgment value, so that the condition for measuring the film thickness can be set without missing defective products. In this example, it is found that only about 25% of the substrates can be measured by using the AI estimate value. Specifically, the determination unit 453 may control one or more robots to move the wafer to the film thickness measuring device 6 after polishing (for example, the conveyor 7, the transfer robot 22, and the transfer robot 53, see patent document 1).

Fig. 10 is a graph comparing the measured value of the average film thickness value with the AI estimated value in the first embodiment. As shown in fig. 10, the ordinate of the graph G21 represents the AI estimated value of the normalized film thickness average value, and the abscissa represents the measured value of the normalized film thickness average value. The AI estimates are distributed around the correct answer line, indicating that the estimation is proceeding efficiently. The ordinate of the graph G22 represents the data statistics, and the abscissa represents the estimation error (i.e., the measurement value of the average film thickness-the AI estimation value of the average film thickness). Train represents learning data, and Test represents Test data.

Fig. 11 is a graph comparing the measured value of the film thickness range and the AI estimated value in the first embodiment. As shown in fig. 11, the ordinate of the graph G31 represents the AI estimated value of the normalized film thickness range, and the abscissa represents the measured value of the normalized film thickness range. The AI estimates are distributed around the correct answer line, indicating that the estimation is proceeding efficiently. The ordinate of the graph G32 represents the data statistics, and the abscissa represents the estimation error (i.e., measured value in the film thickness range — AI estimated value in the film thickness range). Train represents learning data, and Test represents Test data.

Next, an example of processing for stopping the subsequent substrate processing when the estimated value output from the estimating unit 452 satisfies a predetermined polishing deterioration condition will be described with reference to fig. 12. Fig. 12 is a flowchart showing an example of a process for stopping the process of the subsequent substrate under the polishing deterioration condition.

Here, the determination section 453 determines whether or not the estimated value output by the estimation section 452 satisfies a predetermined polishing deterioration condition. For example, when the polishing deterioration condition is a condition that the estimated value is out of the set range, the determination section 453 determines whether the estimated value output from the estimation section 452 is out of the set range. In the example of fig. 12 to 14, the polishing deterioration condition is a condition that "the estimated value of the standard deviation of the film thickness profile is equal to or greater than the set threshold", and the determination section 453 determines whether or not the estimated value of the standard deviation of the film thickness profile output by the estimation section 452 is equal to or greater than the set threshold.

(step S110) first, the processor 45 acquires TCM signals for polishing the wafer.

Next, the generating unit 451 calculates a feature value from the acquired TCM signal (step S120).

Next, (step S130) the estimation unit 452 inputs the feature quantity, for example, an estimated value of the standard deviation of the film thickness profile, to the machine learning model stored in the storage 41, which has been learned. Here, an example of the machine learning model in which learning is completed is a model in which learning data is learned, the learning data having a feature amount of the TCM signal as an input and a standard deviation of the film thickness profile as an output.

Next, the determination section 453 determines whether or not the estimated value of the standard deviation of the film thickness profile is equal to or greater than a set threshold value (step S140). If the standard deviation of the film thickness profile is not equal to or greater than the set threshold (that is, if the standard deviation of the profile is smaller than the set threshold), the process returns to step S110, and the subsequent processes are repeated.

(step S150) when it is determined in step S140 that the estimated value of the standard deviation of the film thickness profile is equal to or greater than the set threshold, the determination section 453 controls the control section 500 to stop the processing of the subsequent wafer. Thereby, the control section 500 controls to stop the processing of the subsequent wafer.

Therefore, the processor 45 may stop the processing of the subsequent substrate when the estimated value output from the estimating unit 452 satisfies the predetermined polishing deterioration condition. In this way, when the polishing state deteriorates, the subsequent substrate processing is stopped, and therefore, maintenance such as replacement of the polishing member can be performed, and therefore, the polishing state can be prevented from further deteriorating.

Next, a description will be given of a process of measuring the film thickness by a film thickness measuring instrument inside or outside the apparatus when the estimated value satisfies a predetermined polishing deterioration condition. Fig. 13 is a flowchart showing an example of a process of measuring a film thickness by a film thickness measuring device in the apparatus when a polishing deterioration condition is satisfied.

(step S210) first, the processor 45 acquires TCM signals for polishing the wafer.

Next, the generating unit 451 calculates a feature value from the acquired TCM signal (step S220).

Next, (step S230) the estimation unit 452 inputs the feature quantity, for example, an estimated value of the standard deviation of the film thickness profile, to the machine learning model stored in the memory 41, which has been learned. Here, an example of the machine learning model in which learning is completed is a model in which learning data is learned, the learning data having a feature amount of the TCM signal as an input and a standard deviation of the film thickness profile as an output.

(step S250) when it is determined in step S240 that the estimated value of the standard deviation of the film thickness profile is equal to or greater than the set threshold, the processor 45 controls one or more robots to measure the film thickness of the polished wafer by the film thickness measuring instrument 6 (for example, the conveyor 7, the transfer robot 22, and the transfer robot 53, see patent document 1).

(step S260) in step S240, if the estimated value of the standard deviation of the film thickness profile is not equal to or greater than the set threshold (that is, if the standard deviation of the profile is smaller than the set threshold), the processor 45 controls one or more robots (for example, the conveyor 7, the transfer robot 22, and the transfer robot 53, see patent document 1) so as to return the wafer to the FOUP without causing the film thickness measuring device 6 to measure the film thickness profile.

Therefore, the processor 45 controls the film thickness measuring device 6 to measure the film thickness of the target substrate after polishing when the estimated value output from the estimating unit 452 satisfies a predetermined polishing deterioration condition, and controls the film thickness measuring device 6 not to measure the film thickness of the target substrate after polishing when the estimated value output does not satisfy the predetermined polishing deterioration condition. Thus, when the polishing state is deteriorated, the thickness of the substrate is measured, and therefore, it is possible to determine whether or not the polishing is effective.

Next, when the estimated value satisfies a predetermined polishing deterioration condition, a process of measuring the film thickness by a film thickness measuring instrument inside or outside the apparatus will be described. FIG. 14 is a flowchart showing an example of a process for measuring a film thickness by a film thickness measuring instrument in the apparatus when a polishing deterioration condition is satisfied.

(step S310) first, the processor 45 acquires TCM signals for polishing the wafer.

Next, the generating unit 451 calculates a feature value from the acquired TCM signal (step S320).

Next, (step S330), the estimation unit 452 inputs the feature quantity, for example, an estimated value of the standard deviation of the film thickness profile, to the machine learning model stored in the memory 41, which has been learned. Here, an example of the machine learning model in which learning is completed is a model in which learning data is learned, the learning data having a feature amount of the TCM signal as an input and a standard deviation of the film thickness profile as an output.

Next, the determination section 453 determines whether or not the estimated value of the standard deviation of the film thickness profile is equal to or greater than a set threshold value (step S340). If the estimated value of the standard deviation of the film thickness profile is not equal to or greater than the set threshold (that is, if the standard deviation of the profile is smaller than the set threshold), the process returns to step S310, and the subsequent processes are repeated.

(step S350) if it is determined in step S340 that the estimated value of the standard deviation of the film thickness profile is equal to or greater than the set threshold, the processor 45 controls to output a warning for prompting maintenance. The warning may be a voice, or the warning may be displayed on a display device, or when there are light sources (e.g., pattite (registered trademark)) of a plurality of colors (e.g., three colors of red, yellow, and green), the pattite of a specific color (e.g., yellow) may be turned on (or off), or a vibration may be generated, or a mail may be automatically sent to the address of the user to notify the user of the address, or a combination of two or more of these.

Therefore, the processor 45 controls to issue a warning for prompting maintenance when the estimated value output from the estimating unit 452 satisfies a predetermined polishing deterioration condition. Thus, when the polishing state is deteriorated, maintenance such as replacement of the polishing member can be performed, and therefore, the polishing state can be prevented from being further deteriorated.

Next, a process of issuing a warning to prompt maintenance when the polishing deterioration condition is satisfied will be described with reference to fig. 15. Fig. 15 is a flowchart showing an example of a process of issuing a warning for prompting maintenance when a polishing deterioration condition is satisfied.

(step S410) first, the processor 45 acquires TCM signals during wafer polishing.

Next, the generating unit 451 calculates a feature value from the acquired TCM signal (step S420).

Next, (step S430) the estimation unit 452 inputs the feature quantity, for example, an estimated value of the standard deviation of the film thickness profile, to the machine learning model stored in the memory 41, which has been learned. Here, an example of the machine learning model in which learning is completed is a model in which learning data is learned, the learning data having a feature amount of the TCM signal as an input and a standard deviation of the film thickness profile as an output.

(step S440) next, the estimation unit 452 saves the estimation value in the memory 41.

Next, (step S450) the determination section 453 determines whether or not a predetermined number of estimated values have been stored. If the predetermined number of estimated values have not been stored, the process returns to step S410, and the subsequent processes are repeated.

Further, (step S460) when it is determined in step S450 that the predetermined number of estimated values have been stored, the processor 45 refers to the estimated values output to the plurality of substrates polished at different times stored in the storage 41, and outputs the maintenance timing using the tendency of the estimated values output to the plurality of substrates polished at different times. The output of the maintenance timing may also be "recommended maintenance after o hours". The processor 45 may also notify the user of the polishing apparatus of the maintenance timing. Thus, the maintenance time is automatically notified. The notification method can also show the maintenance time on the WEB picture or the application program, and can also distribute the mail to the address of the user.

Alternatively, the processor 45 may notify the user of the polishing apparatus when the time reaches the maintenance timing. Thus, the maintenance time is automatically notified. The notification method can also display the content to be maintained on a WEB screen or an application program, and can also distribute an email to the address of the user.

Specifically, for example, the processor 45 may store an estimated value of the standard deviation of the film thickness profile at a set time interval, calculate a variation of the estimated value per unit time obtained by dividing the difference between the estimated values at the set time interval, and output a timing equal to or higher than a set threshold as the maintenance timing. Thus, the timing at which the polishing state deteriorates can be predicted, and maintenance such as replacement of the polishing member can be performed at that timing, so that further deterioration of the polishing state can be prevented.

The processor 45 may adjust the polishing conditions of the subsequent substrate so as to obtain desired data on the film thickness of the polished substrate or desired parameters on the yield of the product contained in the polished substrate, based on the estimated value output from the estimating unit 452. Thus, the polishing condition of the subsequent substrate can be changed to improve the polishing state, so that the good polishing state can be maintained for a longer time.

The processor 45 may also use the feature quantities of the polishing apparatus in operation to relearn the machine learning model. This improves the estimation accuracy.

As described above, the polishing apparatus 10 according to the first embodiment can refer to the storage unit 41 storing the machine learning model that has been learned using the data for learning, which has the characteristic amount of the signal relating to the frictional force between the polishing member and the substrate during polishing or the characteristic amount of the temperature of the polishing member or the substrate during polishing as an input, and which has the data relating to the film thickness of the substrate after polishing or the parameter relating to the yield of the product included in the substrate after polishing as an output. The polishing apparatus 10 includes: a grinding table 100 provided with a grinding member and configured to be rotatable; a polishing head 1 which is rotatably provided opposite to a polishing table 100 and on which a substrate can be mounted on a surface opposite to the polishing table 100; and a control unit 500 that controls the polishing head and the polishing table to rotate the substrate mounted thereon, and presses the substrate against the polishing member to polish the substrate.

The polishing apparatus 10 includes a processor 45, and the processor 45 generates a feature amount based on a signal relating to a frictional force between the polishing member and the substrate during polishing or a temperature of the polishing member or the target substrate during polishing, inputs the generated feature amount to a machine learning model in which learning is completed, and outputs, as an estimation value, data relating to a film thickness of the substrate after polishing or any one of parameters relating to a product yield included in the substrate after polishing.

With this configuration, the polishing apparatus obtains data on the film thickness of the substrate after polishing during polishing, or an estimated value of a parameter related to the yield of the product included in the substrate after polishing, and therefore, the state of the substrate after polishing can be predicted even if the film thickness is not measured. Thus, the state of the substrate after polishing can be grasped without measuring the film thickness, and the number of times of measuring the film thickness can be reduced, so that missing defective products can be avoided and the throughput can be improved. Thus, the throughput can be improved by omitting the film thickness measurement during normal polishing. Further, by estimating the parameter relating to the yield, a defect or predicted defect can be detected. In addition, by updating the polishing parameters in accordance with the parameters relating to the yield, the yield can be improved.

The AI unit 4 may be installed in a gateway in a factory, and the polishing apparatus may be connected to the gateway through a network line. The gateway is preferably located adjacent the grinding means. When high-speed processing is required (for example, when the sampling rate is less than 100 ms), Edge calculation (Edge calculation) may be performed by the AI unit 4 in the polishing apparatus or the AI unit 4 mounted on the gateway. The AI portion 4 of the polishing apparatus may be mounted on a PC or a controller for the apparatus.

< second embodiment >

The description of the second embodiment is continued. The polishing apparatus 10 according to the first embodiment includes the AI unit 4, but differs from the second embodiment in that the AI unit 4 is not provided in the polishing apparatus but is provided in a plant management room, a clean room, or the like in a plant.

Fig. 16 is a schematic diagram showing the overall configuration of the polishing system according to the second embodiment. As shown in fig. 16, a polishing system S2 according to a second embodiment includes: grinding devices 10-1 to 3-N; and an AI unit 4 installed in the same factory as the polishing apparatuses 10-1 to 10-N installed in the factory or in a factory management room. The AI portion 4 and the polishing apparatuses 10-1 to 3-N can communicate via a local network NW 1. The AI unit 4 is mounted on a computer (e.g., a server or a fog (calculation)).

When the AI unit 4 is provided in the polishing apparatus or the gateway, the machine learning model having completed learning is executed by edge calculation, and high-speed processing can be performed. For example, processing can be performed at high speed on time (in real time).

When the AI unit 4 is installed in a server or a mist (calculation) in the plant, the machine learning model may be updated by collecting data of a plurality of polishing apparatuses in the plant. In addition, data of a plurality of polishing apparatuses in a factory may be collected and analyzed, and the analysis result may be reflected in the polishing parameter setting.

< third embodiment >

The description of the third embodiment is continued. The polishing apparatus 10 according to the first embodiment includes the AI unit 4, but differs from the third embodiment in that the AI unit 4 is not provided in the polishing apparatus but in the cloud.

Fig. 17 is a schematic diagram showing the overall configuration of the polishing system according to the third embodiment. As shown in fig. 17, a polishing system S3 according to the third embodiment includes: grinding devices 10-1 to 10-N provided in a plurality of factories; and an AI portion 4 disposed at the cloud. The AI unit 4 can communicate with the polishing apparatuses 10-1 to 10-N via a global network NW2 and a local network NW 1. The AI unit 4 is, for example, a computer (e.g., a server).

Therefore, by providing the AI unit 4 in the cloud physically separated from the polishing apparatus, the AI unit 4 can be shared among a plurality of factories, and the maintainability of the AI unit 4 is improved. Further, by using data during polishing in a plurality of factories and re-learning a machine learning model with a large amount of data, estimation accuracy can be improved more quickly.

In addition, the machine learning model may be updated by aggregating data (e.g., large volumes of data) from a plurality of polishing apparatuses comprising a plurality of plants. In addition, data (for example, a large amount of data) of a plurality of polishing apparatuses including a plurality of factories may be collectively analyzed, and the analysis result may be reflected in the polishing parameter setting.

In addition, the AI unit 4 may be provided in an analysis center for performing centralized analysis instead of in the cloud.

The place where the AI unit 4 is installed may be (1) a gateway in the polishing apparatus and/or (2) a gateway near the polishing apparatus and/or (3) a computer (PC, server, fog (computer) or the like) in the plant (for example, in a plant management room).

The installation location of the AI unit 4 may be (1) inside the polishing apparatus and/or (2) a gateway near the polishing apparatus and/or (4) a computer in the cloud (or analysis center).

The installation location of the AI unit 4 may be (1) a gateway in the polishing apparatus and/or (2) a gateway near the polishing apparatus and/or (3) a computer in the factory (for example, in a factory management room) and/or (4) a computer in the cloud (or an analysis center).

The AI section 4 may be disposed in a distributed manner in (1) a polishing apparatus, and/or (2) a gateway near the polishing apparatus, and/or (3) a computer (PC, server, or cloud (computing) etc.) in a factory (for example, in a factory management room), and/or (4) a computer in the cloud (or analysis center).

The input of the machine learning model in each of the embodiments is a characteristic amount of a signal relating to a frictional force between the polishing member and the substrate during polishing, but is not limited thereto. The input of the machine learning model may be a characteristic value of the temperature of the polishing member (here, the polishing pad 101) or the substrate during polishing. Therefore, if the frictional force between the polishing member and the substrate during polishing increases, the amount of heat generated by the polishing member or the substrate also increases, and the temperature of the polishing member or the substrate increases, so that the temperature of the polishing member or the substrate and the frictional force between the polishing member and the substrate during polishing have a positive correlation.

That is, the storage 41 may store a machine learning model that completes learning using data for learning, which includes data on the thickness of the substrate after polishing, a profile statistic of the thickness of the substrate after polishing, or a parameter on the yield of the product included in the substrate after polishing, as input, and outputs the data on the thickness of the substrate during polishing or the temperature characteristic of the substrate during polishing.

In this case, the generating section 451 may generate the characteristic amount from a signal relating to a frictional force between the polishing member and the substrate in polishing the substrate, or a temperature of the polishing member or the substrate during polishing, by pressing the target substrate against the polishing member while rotating the polishing head 1 and the polishing table 100 on which the target substrate is mounted.

At least a part of the AI unit 4 described in the above embodiment may be configured by hardware or software. In the case of a software configuration, a program for realizing at least a part of the functions of the AI section 4 may be stored in a recording medium such as a flexible disk or a CD-ROM and read and executed by a computer. The recording medium is not limited to a removable structure such as a magnetic disk or an optical disk, and may be a fixed type recording medium such as a hard disk device or a memory.

Further, the program that realizes at least a part of the functions of the AI unit 4 may be distributed via a communication line such as the internet (including wireless communication). Further, the program may be distributed in an encrypted, modulated, or compressed state via a wired or wireless line such as the internet, or stored in a recording medium.

Further, the AI unit 4 may be caused to function by one or more information processing apparatuses. When a plurality of information processing apparatuses are used, 1 of the information processing apparatuses may be used as a computer, and the computer executes a predetermined program to realize the functions of at least 1 means of the AI section 4.

In the invention of the method, all the steps (steps) may be realized by automatic computer control. In addition, each step may be performed by a computer, and the control between the steps may be performed manually. Further, at least a part of the entire process may be performed manually.

As described above, the present technology is not limited to the above embodiments as it is, and constituent elements may be modified and embodied in the implementation stage within a range not departing from the gist thereof. Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some constituent elements may be deleted from all the constituent elements shown in the embodiments. Further, the constituent elements of the different embodiments may be appropriately combined.

[ notation ] to show

1: grinding head

100 grinding table

100a table shaft

101 polishing pad

101a abrasive surface

102 stage rotating motor

110 top ring head

111 top ring shaft lever

112 rotating cylinder

113 timing pulley

114 rotating motor for top ring

115 timing belt

116 timing pulley

117 shaft rod of top ring head

124 up-down moving mechanism

126 bearing

128 bridge

129 supporting table

130 support post

132 ball screw

132a screw axis

132b screw cap

138 servo motor

20 front loading part

21:FOUP

22 transport robot

26: rotary joint

3: check ring

AI part

41 memory bank

42 memory

43 input unit

44 output part

45, processor

451 the generation part

452 estimating section

453: a judging section

5 cleaning part

500 control part

53 transport robot

6: film thickness measuring instrument

7: conveyer

S1-S3 grinding system

36页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种网纹辊辊体的直线研磨机

Polishing apparatus, information processing system, polishing method, and recording medium

相关技术

网友询问留言