Integrated circuit layout optimization method based on frequency domain electromagnetic response and capable of automatically starting and stopping process

文档序号:971422 发布日期:2020-11-03 浏览:3次 中文

阅读说明:本技术 进程自启停的基于频域电磁响应的集成电路版图优化方法 (Integrated circuit layout optimization method based on frequency domain electromagnetic response and capable of automatically starting and stopping process ) 是由 唐章宏 邹军 黄承清 王芬 汲亚飞 于 2020-06-08 设计创作,主要内容包括:本申请公开了进程自启停的基于频域电磁响应的集成电路版图优化方法,首先划分出多个并行粗颗粒,然后利用并行粗颗粒选取初始频率点,进而算出最优集成电路版图模型个体,并将其频率响应曲线与目标比较,在符合要求时插入新频率点,能够有效避免由于选取的典型优化频率点不足而导致出现伪最优目标,解决了典型优化频率点满足预定目标但整个频段内整体不满足预定目标的问题。另外,在计算过程中应用自动启停技术,避免了多进程并行计算时因为内存峰值大于可用物理内存而造成的硬盘读写瓶颈,同时保证在避免使用虚拟内存的情况下同时开启更多的进程,减少了进程之间的通信,同时解决计算实例复杂度不对等问题,提高并行计算效率。(The application discloses a frequency domain electromagnetic response-based integrated circuit layout optimization method with a process self-starting and self-stopping function. In addition, an automatic start-stop technology is applied in the calculation process, the hard disk read-write bottleneck caused by the fact that the peak value of a memory is larger than that of an available physical memory in the multi-process parallel calculation process is avoided, meanwhile, more processes are guaranteed to be started simultaneously under the condition that the virtual memory is avoided, the communication among the processes is reduced, meanwhile, the problem that the complexity of a calculation example is not equal is solved, and the parallel calculation efficiency is improved.)

1. A process self-starting and stopping integrated circuit layout optimization method based on frequency domain electromagnetic response is characterized by comprising the following steps:

step 100, dividing all independent complete calculations of the same type in the integrated circuit frequency domain electromagnetic response calculation process into a plurality of parallel coarse grains, wherein each parallel coarse grain executes a corresponding independent calculation task;

200, selecting at least one initial frequency point reflecting electromagnetic response characteristics required by the whole set frequency range to form a frequency point ordered sequence;

step 300, defining a target function according to an integrated circuit layout optimization target through first parallel coarse grains, and performing iterative operation on the frequency point ordered sequence by using an optimization algorithm until the target function slowly converges or reaches the optimization target to obtain an optimal integrated circuit layout model individual;

step 400, calculating the electromagnetic response characteristics of the optimal integrated circuit layout model individual in the set frequency band through second parallel coarse grains to obtain the frequency response curve of the optimal integrated circuit layout model individual, and comparing the frequency response curve of the optimal integrated circuit layout model individual with the optimization target;

step 500, judging whether frequency points which do not meet the optimization target of the integrated circuit layout exist or not through third parallel coarse grains, finishing optimization when judging that the frequency points do not exist, otherwise selecting k optimized frequency points in the set frequency band, determining kn new optimized frequency points from the k optimized frequency points, and judging whether the kn new optimized frequency points meet the condition of adding the frequency point ordered sequence or not, wherein when judging that the frequency points meet the condition, the kn new optimized frequency points are orderly added into the frequency point ordered sequence, the new frequency point ordered sequence is substituted into the step 300, and the step 300 is executed, otherwise, the optimization fails; wherein the content of the first and second substances,

when the parallel coarse grain is used for calculation, the size of an allocated memory required by a process allocated with a calculation task in the current parallel coarse grain to complete the calculation task is counted, the size of the current available physical memory is detected, the calculation task is executed when the allocated memory is smaller than the available physical memory, otherwise, the operation of the process in the later preset time is suspended, and the size of the current available physical memory is detected again until the allocated memory is smaller than the available physical memory.

2. The method of claim 1, wherein the slow convergence of the objective function comprises: and continuously performing iteration for a preset number of times, wherein the falling speed of the objective function of the integrated circuit layout optimization is not higher than the preset speed.

3. The method of claim 1, wherein the selecting k optimized frequency points within the set frequency band and determining kn new optimized frequency points from the k optimized frequency points comprises:

selecting k optimized frequency points from a set frequency band according to the principle of minimum distance between the frequency points, then respectively calculating the deviation between the electromagnetic response curve of the k optimized frequency points and the integrated circuit layout optimization target, sequencing the calculated deviations in a descending order to obtain a deviation sequence, and selecting the first kn frequency points in the deviation sequence as new optimized frequency points.

4. A method according to claim 3, wherein the frequency bin spacing minimization principle is: after a new optimized frequency point is added into the frequency point ordered sequence, the distance between adjacent frequency points in the obtained new frequency point ordered sequence is greater than delta d, where delta d is (fmax-fmin)/C, fmax and fmin are respectively the highest frequency and the lowest frequency in the set frequency band, and C is a constant value greater than the number of the frequency points being optimized.

5. The method of any one of claims 1 to 4, wherein said determining whether the kn new optimized frequency points are eligible for adding the ordered sequence of frequency points comprises:

when the number kn of the selected new optimized frequency points is 0, judging whether a judgment rule of slow convergence of the target function is effective or not, if so, reducing the convergence speed of the target function optimized by the integrated circuit layout, wherein the kn new optimized frequency points accord with the condition of adding the frequency point ordered sequence, and if not, the kn new optimized frequency points do not accord with the condition of adding the frequency point ordered sequence;

and when the number kn of the selected new optimized frequencies is larger than 0, directly judging that the kn new optimized frequency points meet the condition of adding the frequency point ordered sequence.

6. The method of claim 5, wherein reducing the convergence speed of the objective function for integrated circuit layout optimization comprises: increasing the value of the iteration times M and reducing the value of the objective function descending speed 6 of the integrated circuit layout optimization, wherein the value of M does not exceed the preset maximum iteration times Mmax, and the objective function descending speed is not less than 0.

7. The method of any of claims 1 to 6, wherein prior to allocating memory for the parallel coarse-grained process, the method further comprises: and the main process dynamically distributes each computing task in the parallel coarse grains and corresponding input parameters to all processes based on a dynamic distribution computing task strategy and a file marking strategy.

8. The method of claim 7, wherein dynamically assigning a computing task policy comprises: after each process completes the respective computing task, a new computing task is allocated to the process, and computing tasks with computing power higher than a computing power threshold value are distributed to different processes in a distributed mode according to the computing task allocation sequence of the processes.

9. The method of claim 7 or 8, wherein the file marking policy comprises: if a certain computing task in the parallel coarse grains is allocated to a process, generating an identification file of the computing task, which is used for indicating that the computing task is already allocated, so that other processes apply for allocating other computing tasks due to the existence of corresponding identification files when applying for allocating the computing task.

10. The method of claim 9, wherein the step of implementing the document tag comprises:

step A1, when applying for distributing calculation task, the process judges whether the calculation task has corresponding flag file SFi, if it is judged that the flag file SFi does not exist, step A2 is executed, if it does not exist, step A6 is executed;

step A2, judging whether the mark file is in a locked state, if not, executing step A3, otherwise, executing step A6;

step A3, locking the mark file;

step A4, generating the mark file;

step A5, unlocking the mark file, and completing the calculation of the calculation task;

and step A6, judging whether all calculation tasks in the parallel coarse grains are finished, if not, executing step A1 and applying for distributing the next calculation task, otherwise, finishing.

Technical Field

The application relates to the field of integrated circuit layout optimization, in particular to a frequency domain electromagnetic response-based integrated circuit layout optimization method for process self-starting and self-stopping.

Background

The trend of integrated circuits is to make the integrated circuits smaller and smaller, but the unit circuits contained therein are more and more, which makes the semiconductor devices and the unit circuits composed of the devices smaller and smaller in size, and the influence of parasitic parameters is not great, but rather serious parasitic effects are caused by the interconnection lines among the unit circuits in the chips, the PCB, and the chips of the microwave multi-chip module (MCM). Meanwhile, no matter the chip or the component, the package part for protecting the circuit and supporting the whole circuit structure is unavoidable, and some package structures such as a feeder wire or a plate, a ground wire or a plate, a chip lead-out wire or a strip, a via hole between multiple layers of metal plates, and the like also have obvious influence on the transmission of high-speed signals. These factors require designers to take into account the electrical characteristics of the entire system formed by the high-speed circuits through the interfaces over the entire operating band, while studying the interconnect and package structure of the high-speed integrated circuit system and the semiconductor unit circuits, by using an optimal design method.

However, in the actual optimization design process, when the individual objective function of each designed integrated circuit layout model is calculated, it is impossible to calculate the electromagnetic response of the integrated circuit layout model for all frequency points in the frequency band, and only a few typical optimized frequency points can be selected in the frequency band range, and the electromagnetic response characteristics of the integrated circuit individual at the optimized frequency points are calculated and compared with the preset objective to form the objective function.

The existing optimization method selects a fixed typical optimization frequency point for optimization according to a predetermined target condition, but after optimization, the selected typical optimization frequency point may only reach the predetermined target, but the whole frequency band cannot reach the predetermined target.

Meanwhile, in the electromagnetic response calculation process of the integrated circuit layout, massive large-scale numerical calculation of the same type is involved. In such large-scale numerical computation, different computation instances have different structures, so that the computation complexity of the different computation instances is unequal, and thus the current parallel computation efficiency needs to be improved.

In addition, the conventional parallel computing is basically parallel to a single computing instance, the parallel is realized in a computing part of a large number of loops, and the parallel particles are usually fine, so that a large amount of data exchange exists among different processes; moreover, because the calculation schedules of different processes are different, a large amount of waiting is inevitable when data sharing and synchronization are needed; moreover, because the calculation processes of the equivalent parts of the calculation processes of the single instance have the sequence and the data have the dependency, the calculation of the equivalent parts can not be parallelized when the single calculation instance is parallelized; all three phenomena lead to a reduction in parallel efficiency.

In addition, when large-scale numerical calculation is carried out on each thread to allocate a large memory in the conventional multi-thread parallel calculation, a mode of directly allocating the memory is adopted, and when the allocated memory is larger than the available physical memory, the system automatically opens up a part of space from the hard disk to serve as the virtual memory, writes the memory occupied by the inactive process into the virtual memory, and releases the corresponding physical memory. However, the read-write speed of the mechanical hard disk is about 80MB/s, and the read-write speed of the physical memory is increased by more than one hundred times, for example, for the DDR31333MHz server memory, the data transmission rate reaches 10.6 GB/s. Therefore, if the parallel computing is started in a large number of processes and no measures are taken, a part of hard disk storage space in the computing process can be read as a virtual memory, and the program running speed is reduced by more than one hundred times.

Disclosure of Invention

Object of the application

Based on this, in order to solve the problem that only a fixed typical optimization frequency point is selected in the optimization design process of the integrated circuit layout at the present stage, so that the obtained optimal integrated circuit layout can not be guaranteed to reach the preset target in the whole frequency band, and in order to improve the parallel efficiency and the program running speed and solve the problem that the complexity of the calculation example is not equal, the application discloses the following technical scheme.

(II) technical scheme

The application provides a frequency domain electromagnetic response-based integrated circuit layout optimization method for process self-starting and self-stopping, which comprises the following steps:

step 100, dividing all independent complete calculations of the same type in a frequency domain electromagnetic response calculation process into a plurality of parallel coarse grains, wherein each parallel coarse grain executes a corresponding independent calculation task;

200, selecting at least one initial frequency point reflecting electromagnetic response characteristics required by the whole set frequency range to form a frequency point ordered sequence;

step 300, defining a target function according to an integrated circuit layout optimization target through first parallel coarse grains, and performing iterative operation on the frequency point ordered sequence by using an optimization algorithm until the target function slowly converges or reaches the optimization target to obtain an optimal integrated circuit layout model individual;

step 400, calculating the electromagnetic response characteristics of the optimal integrated circuit layout model individual in the set frequency band through second parallel coarse grains to obtain the frequency response curve of the optimal integrated circuit layout model individual, and comparing the frequency response curve of the optimal integrated circuit layout model individual with the optimization target;

step 500, judging whether frequency points which do not meet the optimization target of the integrated circuit layout exist or not through third parallel coarse grains, finishing optimization when judging that the frequency points do not exist, otherwise selecting k optimized frequency points in the set frequency band, determining kn new optimized frequency points from the k optimized frequency points, and judging whether the kn new optimized frequency points meet the condition of adding the frequency point ordered sequence or not, wherein when judging that the frequency points meet the condition, the kn new optimized frequency points are orderly added into the frequency point ordered sequence, the new frequency point ordered sequence is substituted into the step 300, and the step 300 is executed, otherwise, the optimization fails; wherein the content of the first and second substances,

when the parallel coarse grain is used for calculation, the size of an allocated memory required by a process allocated with a calculation task in the current parallel coarse grain to complete the calculation task is counted, the size of the current available physical memory is detected, the calculation task is executed when the allocated memory is smaller than the available physical memory, otherwise, the operation of the process in the later preset time is suspended, and the size of the current available physical memory is detected again until the allocated memory is smaller than the available physical memory.

In one possible embodiment, the slow convergence of the objective function comprises: and continuously performing iteration for a preset number of times, wherein the falling speed of the objective function of the integrated circuit layout optimization is not higher than the preset speed.

In a possible implementation manner, the selecting k optimized frequency points within the set frequency band, and determining kn new optimized frequency points from the k optimized frequency points includes:

selecting k optimized frequency points from a set frequency band according to the principle of minimum distance between the frequency points, then respectively calculating the deviation between the electromagnetic response curve of the k optimized frequency points and the integrated circuit layout optimization target, sequencing the calculated deviations in a descending order to obtain a deviation sequence, and selecting the first kn frequency points in the deviation sequence as new optimized frequency points.

In a possible embodiment, the principle of minimum spacing between frequency points is: after a new optimized frequency point is added into the frequency point ordered sequence, the distance between adjacent frequency points in the obtained new frequency point ordered sequence is larger than delta d, delta d is (fmax-fmin)/C, fmax and fmin are respectively the highest frequency and the lowest frequency in the set frequency band, and C is a constant value larger than the number of the frequency points being optimized.

In a possible implementation, the determining whether the kn new optimized frequency points meet the condition for adding the ordered sequence of frequency points includes:

when the number kn of the selected new optimized frequency points is 0, judging whether a judgment rule of slow convergence of the target function is effective or not, if so, reducing the convergence speed of the target function optimized by the integrated circuit layout, wherein the kn new optimized frequency points accord with the condition of adding the frequency point ordered sequence, and if not, the kn new optimized frequency points do not accord with the condition of adding the frequency point ordered sequence;

and when the number kn of the selected new optimized frequencies is larger than 0, directly judging that the kn new optimized frequency points meet the condition of adding the frequency point ordered sequence.

In a possible implementation, the reducing the convergence speed of the objective function of the integrated circuit layout optimization includes: increasing the value of the iteration times M and reducing the value of the objective function descending speed of the integrated circuit layout optimization, wherein the value of M is not more than the preset maximum iteration times Mmax, and the objective function descending speed is not less than 0.

In one possible implementation, before allocating the memory for the parallel coarse-grained process, the method further includes: and the main process dynamically distributes each computing task in the parallel coarse grains and corresponding input parameters to all processes based on a dynamic distribution computing task strategy and a file marking strategy.

In one possible embodiment, the dynamically assigning the computational task policy includes: after each process completes the respective computing task, a new computing task is allocated to the process, and computing tasks with computing power higher than a computing power threshold value are distributed to different processes in a distributed mode according to the computing task allocation sequence of the processes.

In one possible embodiment, the file marking policy includes: if a certain computing task in the parallel coarse grains is allocated to a process, generating an identification file of the computing task, which is used for indicating that the computing task is already allocated, so that other processes apply for allocating other computing tasks due to the existence of corresponding identification files when applying for allocating the computing task.

In one possible embodiment, the step of implementing the document tag includes:

step A1, when applying for distributing calculation task, the process judges whether the calculation task has corresponding flag file SFi, if it is judged that the flag file SFi does not exist, step A2 is executed, if it does not exist, step A6 is executed;

step A2, judging whether the mark file is in a locked state, if not, executing step A3, otherwise, executing step A6;

step A3, locking the mark file;

step A4, generating the mark file;

step A5, unlocking the mark file, and completing the calculation of the calculation task;

and step A6, judging whether all calculation tasks in the parallel coarse grains are finished, if not, executing step A1 and applying for distributing the next calculation task, otherwise, finishing.

(III) advantageous effects

The integrated circuit layout optimization method based on the frequency domain electromagnetic response and capable of automatically starting and stopping the process can effectively avoid the occurrence of a pseudo-optimal target caused by the fact that the selected typical optimization frequency point is insufficient, and solves the problem that the typical optimization frequency point meets the integrated circuit layout optimization preset target but does not meet the integrated circuit layout optimization preset target in the whole frequency band; in addition, an automatic start-stop technology is applied in the calculation process, the hard disk read-write bottleneck caused by the fact that the peak value of a memory is larger than that of an available physical memory in the multi-process parallel calculation process is avoided, meanwhile, more processes are guaranteed to be started simultaneously under the condition that the virtual memory is avoided, the communication among the processes is reduced, meanwhile, the problem that the complexity of a calculation example is not equal is solved, and the parallel calculation efficiency is improved.

Drawings

The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining and illustrating the present application and should not be construed as limiting the scope of the present application.

Fig. 1 is a schematic flowchart of an embodiment of a method for optimizing a layout of an integrated circuit based on frequency domain electromagnetic response, which is capable of automatically starting and stopping a process disclosed in the present application.

Detailed Description

In order to make the implementation objects, technical solutions and advantages of the present application clearer, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the drawings in the embodiments of the present application.

An embodiment of a method for optimizing a layout of an integrated circuit based on frequency domain electromagnetic response for process self-start-stop disclosed in the present application is described in detail below with reference to fig. 1. As shown in fig. 1, the method disclosed in this embodiment includes the following steps 100 to 500.

Step 100, dividing all independent complete calculations of the same type in the integrated circuit frequency domain electromagnetic response calculation process into a plurality of parallel coarse grains, wherein each parallel coarse grain executes a corresponding independent calculation task.

Before parallel computing, the number of processes needs to be determined manually, and one process is taken as a main process.

The parallel coarse grain is defined according to the problem operation characteristics. The problem operation characteristics are different from industry to industry. For example, for large scale integrated circuit electromagnetic field distribution calculation, the problem operation characteristic is that the multilayer integrated circuit board of a certain structure has current and potential distribution of each layer board and electromagnetic field distribution among layers under the driving condition of current with different frequencies or different magnitudes.

Each parallel coarse grain is an independent execution module containing the least input/output parameters, each calculation task in each parallel coarse grain adopts a serial version, and the serial version of the method for complete operation is realized by combining all parallel coarse grains and processing tasks which do not adopt the parallel coarse grains for calculation. Defining array length to be more than 106The array of (2) is a large array, and the large array variable of the double-precision type array in the parallel coarse grains is pointed to.

The independent and complete calculation refers to a complete operation process including pre-calculation processing, dynamic allocation of large array variables for large-scale numerical calculation, calculation result sorting and release of the large array variables. The pre-calculation processing refers to processing a field problem including a complex calculation region by adopting a numerical calculation method, specifically, three-dimensional modeling is performed on the complex region to realize description of the complex calculation region, and then meshing is performed on the complex calculation region to realize region dispersion.

Specifically, it is assumed that four steps of b1, b2, b3 and b4 are sequentially performed to perform the frequency domain electromagnetic response calculation of the integrated circuit, wherein step b2 includes 1000 independent complete calculations, step b3 includes 500 independent complete calculations, and step b4 includes 500 independent complete calculations, so that parallel coarse grains b2, b3 and b4 are respectively divided for b2, b3 and b 4. Wherein b2 includes 1000 computing tasks, and b3 and b4 include 500 computing tasks, respectively. The parallel coarse grains b2, b3 and b4 are respectively designed into independent execution modules containing the minimum input/output parameters, serial versions of each calculation task in b2, b3 and b4 are designed, the independent execution modules of b2, b3 and b4 are called, and the processing tasks b1 except the parallel coarse grains in the complete running are combined to realize the serial versions of the complete running.

Before parallel coarse grain is used for parallel calculation, a main process is used for executing processing tasks except for parallel coarse grain. For example, b1 needs to be executed before parallel computation of the parallel coarse grain b2, so the processing task b1 is executed by the host process.

Step 200, according to a set frequency band provided by a user and an integrated circuit layout optimization target, selecting at least one initial frequency point reflecting electromagnetic response characteristics required by the whole set frequency band range to form a frequency point ordered sequence. The initial frequency point is used to reflect the optimal target value of the frequency point.

Step 300, by means of a first parallel coarse grain: and defining a target function according to the integrated circuit layout optimization target and performing iterative operation on the frequency point ordered sequence by using an optimization algorithm until the target function slowly converges or the optimization target is reached to obtain an optimal integrated circuit layout model individual.

The optimization algorithm is an operation method for searching a global optimal point in a state space. There are many optimization algorithms that can optimize a certain target, and for an integrated circuit layout, a typical optimization target is that for some preset ports, the input impedance of the ports is minimized in a specific frequency range, so that the voltage drop caused on the metal path of the layout is minimized. The optimization algorithm implemented may thus be selected in accordance with the optimization objective and the electromagnetic response characteristics of the integrated circuit layout to be optimized.

Furthermore, the optimization target of the integrated circuit layout may be set by a user, that is, the electromagnetic response characteristic (optimization target) of the integrated circuit layout to be optimized in what frequency band (set frequency band) is set according to the user's requirement, and the objective function is generated according to the optimization target.

The objective function is a function set for achieving the optimization goal specified by the user, and is used for representing the difference between the performance of the optimization individual and the performance of the optimization goal set by the user, the larger the difference is, the larger the calculated objective function value is, and the optimization goal is to minimize the objective function value, so that the optimization individual can achieve the performance required by the user (achieve the optimization goal) as much as possible.

Suppose the user's requirements are: for a certain preset port, the input impedance of the integrated circuit layout model is required to be below 1m omega within a set frequency range of 10 MHz-1 GHz. In step 200, 10MHz,100MHz, and 1GHz are selected as initial frequency points (optimized frequency points), and the calculated input impedance of a certain optimized integrated circuit layout model individual at the 3 initial frequency points is 1m Ω, 15m Ω, and 12m Ω, respectively.

The objective function is defined as:

T=∑iTiformula (1);

wherein T is the defined integrated circuit layout optimization objective function value; t isiFor each initial frequency point, an objective function, fiFor optimized frequency points, R (f)i) At a frequency point fiLower integrated circuit layout model individual actual input impedance, R, of the port0(fi) At a frequency point fiThe lower integrated circuit layout model is set to 1m omega for the optimization target of the port impedance. Equation (2) shows the frequency point fiThe input impedance of the lower port can be leveled or better than the optimization target, then TiNot equal to 0, otherwise Ti=(R(fi)-R0(fi))2. On the premise of the above user requirement, that is, when the input impedance of the port at 3 initial frequency points is 1m Ω, 15m Ω and 12m Ω, respectively, T ═ 0+ (1-15)2+(1-12)2+0=317。

The optimal integrated circuit layout model individual is an integrated circuit layout model with optimal electromagnetic response characteristics in a set frequency band. Wherein, the slow convergence of the objective function is: and continuously performing iteration for preset times, wherein the descending speed of the integrated circuit layout optimization objective function is not higher than the preset speed. For example, after N iterations, the descending speed of the integrated circuit layout optimization objective function is not higher than N, and the sum of N is a preset value which can be determined according to the integrated circuit layout optimization objective.

In step 300, an automatic start-stop technique is used for the parallel coarse grains, and the memory is dynamically allocated to the process with the calculation task, and the calculation of the calculation task is completed. The automatic start-stop technology is that the size of the available physical memory and the size of the dynamically allocated memory are judged at any time in the process of dynamically allocating the memory, if the dynamically allocated memory is smaller than the available physical memory, the memory is allocated and calculated, otherwise, the process suspends the allocation of the large array of memories, and the available physical memory is continuously updated after the suspension time T and is compared with the size of the dynamically allocated memory. By adopting the automatic start-stop technology, the hard disk read-write bottleneck caused by the fact that the memory peak value is larger than the available physical memory during the multi-process parallel computing is avoided.

Therefore, in the implementation process of step 300, when the iterative operation is performed through the first parallel coarse grain, the size of the allocated memory required by the process allocated with the computing task in the current parallel coarse grain to complete the computing task is counted, the size of the current available physical memory is detected, the computing task is executed when the allocated memory is smaller than the available physical memory, otherwise, the operation of the process within a preset time later is suspended, and the size of the current available physical memory is detected again until the allocated memory is smaller than the available physical memory. Wherein, the preset time T of the pause can adopt 1 second.

Specifically, when the first parallel coarse grain performs parallel computation, the memory required to be allocated for completing computation of each computation task is counted, and assuming that 6 processes currently share 16GB of available physical memory, and processes p1 to p5 occupy 15GB of available physical memory, while process p6 requires 3GB of memory for computation, so the size of allocated memory required at this time is larger than the size of available physical memory, then process p6 cannot be executed simultaneously with processes p1 to p5, and process p6 needs to be suspended, and assuming that 1 second later, other processes complete computation tasks and release physical memory so that 4GB of available physical memory currently exists, the size of allocated memory required is smaller than the size of available physical memory, then process p6 can execute its own computation task.

Step 400, by means of a second parallel coarse grain: and calculating the electromagnetic response characteristics of the optimal integrated circuit layout model individual in a set frequency band, obtaining the frequency response curve of the optimal integrated circuit layout model individual, and comparing the frequency response curve of the optimal integrated circuit layout model individual with an optimization target. In this step 400, an automatic start-stop technique is also used for the calculation processes such as the feature calculation of the second parallel coarse grains.

In the process of optimally designing the integrated circuit layout model, if all frequency points of the integrated circuit layout model within the set frequency range are observed and optimized, the required time cost is high, so that the method adopts a mode of selecting a specific frequency point and observing and optimizing the integrated circuit layout model at the frequency point.

Step 500, by a third parallel coarse granulation: judging whether frequency points which do not meet the optimization target of the integrated circuit layout exist or not, finishing optimization when judging that the frequency points do not exist, otherwise selecting k optimized frequency points (k is larger than or equal to 0) in a set frequency band, determining kn new optimized frequency points (k is larger than or equal to kn is larger than or equal to 0) from the k optimized frequency points, and judging whether the kn new optimized frequency points meet the condition of adding the frequency point ordered sequence or not. When the adding condition is judged to be met, kn new optimized frequency points are orderly added into the frequency point ordered sequence, the new frequency point ordered sequence is substituted into the step 300, the step 300 is executed again, and when the adding condition is judged to be not met, the optimization is failed. In this step 500, an automatic start-stop technique is also used for the third parallel coarse grain calculation process.

Because the selection of the objective function can influence the optimization result, the integrated circuit layout optimization objective function in the application is not fixed, but can be continuously corrected and adjusted along with the steps 400 and 500, so that the optimal integrated circuit layout model individual meeting the optimization objective is found.

After the parallel calculation of all the parallel coarse grains is completed, the main process collects the output parameters of all the processes, merges and sorts the output parameters to obtain a final result of complete operation, and performs subsequent processing on the final result.

Electromagnetic response calculation of a super-large-scale multi-layer integrated circuit model shows that the number of units generated by mesh division is greatly different due to different model structures in the same type of calculation, so that memories required by different model calculations are also greatly different.

The integrated circuit layout optimization method based on frequency domain electromagnetic response for process self-start-stop disclosed by the embodiment can effectively avoid the occurrence of a pseudo-optimal target due to the fact that the selected typical optimized frequency point is insufficient, and solves the problem that the typical optimized frequency point meets the preset target but the whole frequency band does not meet the preset target; in addition, an automatic start-stop technology is applied in the calculation process, the hard disk read-write bottleneck caused by the fact that the peak value of a memory is larger than that of an available physical memory in the multi-process parallel calculation process is avoided, meanwhile, more processes are guaranteed to be started simultaneously under the condition that the virtual memory is avoided, the communication among the processes is reduced, meanwhile, the problem that the complexity of a calculation example is not equal is solved, and the parallel calculation efficiency is improved.

In one embodiment, the manner of selecting k optimized frequency points and determining kn new optimized frequency points from the k optimized frequency points in step 500 is as follows: and selecting k optimized frequency points from a set frequency band on the premise of meeting the principle of minimum frequency point spacing. And then calculating the deviations between the electromagnetic response curves of the k optimized frequency point integrated circuit layout models and an optimized target, sequencing the calculated deviations in a descending order to obtain a deviation sequence, and finally selecting the first kn frequency points in the deviation sequence as new optimized frequency points.

The minimum principle of the frequency point spacing specifically comprises the following steps: each frequency point in the selected k optimized frequency points is required to meet the following conditions: and after the frequency point ordered sequence is added, the distance between adjacent frequency points in the obtained new frequency point ordered sequence is larger than delta d, wherein delta d is (fmax-fmin)/C, fmax and fmin are respectively the highest frequency and the lowest frequency in the set frequency band, and C is a constant value larger than the number of the frequency points being optimized. Therefore, no matter how many new optimized frequency points kn are determined later, each frequency point in the new optimized frequency points can enable the distance between adjacent frequency points of the sequence to be larger than delta d after the frequency point ordered sequence is added into the frequency point ordered sequence.

In an embodiment, the determining whether the kn optimized frequency points meet the condition of adding the frequency point ordered sequence includes:

when the number kn of the selected new optimized frequency points is 0, namely the selected k optimized frequency points are all the original optimized frequency points, judging whether a judgment rule that the integrated circuit layout optimized objective function is slow in convergence is effective, if so, reducing the convergence speed of the integrated circuit layout optimized objective function, namely, reducing the judgment rule that the integrated circuit layout optimized objective function is slow in convergence, wherein the kn new optimized frequency points accord with the condition of adding the frequency point ordered sequence, if not, the kn new optimized frequency points do not accord with the condition of adding the frequency point ordered sequence, the kn new optimized frequency points cannot add the frequency point ordered sequence, and the optimization fails.

When the number kn of the selected new optimized frequency points is larger than 0, namely frequency points different from the original optimized frequency points exist in the selected kn new optimized frequency points, the kn new optimized frequency points are directly judged to accord with the condition of adding the frequency point ordered sequence, and the kn new optimized frequency points can be added into the frequency point ordered sequence.

The effective objective function convergence slow judgment rule is a slow judgment rule that the convergence of the objective function of the integrated circuit layout optimization cannot be reduced without limit, otherwise, the optimization cannot be completed forever.

The operation of the judgment rule for reducing the slow convergence of the integrated circuit layout optimization target function is to increase the value of the iteration number N and reduce the value of the target function descending speed, wherein the value of N is not more than the preset maximum iteration number Nmax. The effective judgment rule for slow convergence of the integrated circuit layout optimization target function is specifically that after the judgment rule for slow convergence of the integrated circuit layout optimization target function is reduced for multiple times, that is, after N ═ N +/Δ N and ═ Δ N are repeated for multiple times, N < Nmax > and > ═ 0 are still satisfied, and if not, the judgment rule is invalid.

For example, according to an integrated circuit layout optimization target and a set frequency band, initially defining N to be 10, Nmax to be 0.1 and Nmax to be 100 according to actual conditions, determining that the integrated circuit layout optimization target function is slow in convergence when the decrease of the target function is not more than 0.1 after 10 iterations, stopping the optimization iteration to find out an optimal integrated circuit layout model individual, checking whether the port input impedance in the whole set frequency band of the optimal integrated circuit layout model individual meets the optimization target, and if not, checking whether an optimal frequency point which needs to be newly added exists. If it is determined that no newly added optimized frequency point exists according to the minimum principle of the frequency point spacing, the slow convergence judgment rule of the integrated circuit layout optimization objective function is reduced, N is redefined to be 20 and 0.05, and so on until N is 100 and 0, at this time, the slow convergence judgment rule of the integrated circuit layout optimization objective function fails, and the optimization fails.

A15-layer ultra-large-scale integrated circuit is optimized by using a differential evolution algorithm, the working frequency range of the 15-layer ultra-large-scale integrated circuit is 10 MHz-1 GHz, and the input impedance of a port set by a user in a set frequency range cannot be higher than 1m omega. Because the time cost for observing and optimizing all frequencies in the range of 10MHz to 1GHz is high, 5 initial frequency points of 10MHz,257.5MHz,505MHz,752.5MHz and 1GHz are selected in a uniformly distributed mode to form a frequency point ordered sequence, and the optimization aims to ensure that the input impedance of the port under the selected frequency points is not higher than 1m omega.

After iteration is carried out for 15 times, the input impedance of the port at the selected frequency point meets the optimization target, but after the frequency response curve of the optimal integrated circuit layout model individual is compared with the optimization target, the frequency responses within 400 MHz-561 MHz, 620 MHz-715 MHz and 810 MHz-890 MHz deviate from the preset target and do not reach below 1m omega although the input impedance of the port at the selected frequency point and the frequency near the selected frequency point can be below 1m omega. For example, the frequency point of 480.5MHz was not selected as the typical frequency point, but it was calculated that the input impedance of the port at this frequency point reached 0.5 Ω. Therefore, according to the principle of minimum distance between frequency points, frequency points 480.5M,667.5MHz and 850MHz are selected and inserted into the original frequency point ordered sequence, optimization is continued, and the selected frequency points are iterated for 7 times again to meet the target. But the frequency response curve of the optimal individual is compared with the optimization target, the frequency response does not meet the optimization target at 570 MHz-630 MHz and 710 MHz-820 MHz, the frequency points 600MHz and 765MHz are inserted for continuous optimization, the selected frequency points meet the target after 9 times of reiteration, and the frequency response curve of the optimal integrated circuit layout model individual is compared again with the optimization target to find no deviation, so that the optimization is completed.

In one embodiment, before allocating memory for each parallel coarse-grained process, the method further comprises: and the main process dynamically distributes each computing task in the parallel coarse grains and corresponding input parameters to all the processes based on the file marking strategy and the dynamic distribution computing task strategy. The task-process distribution mode is adopted for each parallel coarse particle in each step.

Specifically, the dynamic allocation of the computation task strategy adopts a mechanism of first applying and first allocating, which comprises: after each process finishes respective computing task, new computing task is distributed to the process, so that each process is distributed to the new computing task at any time after computing is finished, instead of computing the pre-distributed tasks after all computing tasks are distributed in advance, and waiting after running is finished due to the fact that CPU time of different computing tasks is greatly different is avoided; and the mechanism for applying for first allocation further comprises: the computing tasks with the computing force higher than the computing force threshold value are distributed to different processes in a distributed mode according to the computing task distribution sequence of each process, so that the computing tasks occupying a large amount of CPU processing time can be distributed to the processes in a more even mode instead of being distributed to one or more processes in a centralized mode, and parallel computing efficiency is improved.

Assuming that there are currently 3 processes and 10GB of available physical memory is shared, if the process p1 applies first, the process p1 allocates the computation task S1 in the parallel coarse grain, and if the computation of the computation task S1 needs to occupy 4GB, the remaining 7GB can be allocated to the memory. Because the computing task S1 is allocated, the process p2 can only apply for allocating the computing task S2 in the parallel coarse grain, and if the computing task S2 also needs to occupy 4GB, the process p2 and the process p1 can simultaneously work in parallel computing, and 2GB of allocable memory remains. Because the calculation tasks S1 and S2 are allocated, the process p3 can only apply for allocating the calculation task S3 in the parallel coarse grain, and if the calculation of the calculation task S3 needs to occupy 4GB, the memory allocation of the process p3 needs to be suspended based on the automatic start-stop technology until other processes complete calculation and release the memory.

In addition, in the multi-process parallel operation process, the chances that each process is allocated to a certain operation task are equal, so that if no measure is taken, multiple processes are allocated to the same operation task, which causes waste of operation resources, and therefore a mechanism that all operation tasks are uniquely allocated to a certain process needs to be adopted to avoid the phenomenon.

Therefore, the instant marking of the distributed tasks is adopted, namely the tasks are distributed to a certain process and marked at the same time, and other processes cannot redistribute and execute the tasks through the marking.

However, during parallel operation, the variables of each process are independent, each operation task has an asymmetric phenomenon, the operation states of each process are different, and information distributed by any process through the variable marking task cannot be immediately transmitted to other processes, so that the method adopts an external explicit marking method to ensure that all processes can obtain the information once the operation tasks are marked.

The file marking strategy adopted by the application therefore comprises: if a certain computing task in the parallel coarse grains is distributed to a process, an identification file for indicating that the computing task is already distributed to the computing task is generated, so that other processes apply for distributing other computing tasks due to the existence of corresponding identification files when applying for distributing the computing task.

The file marking strategy is adopted, so that if an operation task in parallel coarse grains is distributed to a process, a mark file of the operation task is generated immediately, when a certain process applies for distributing a certain operation task, the mark file of the operation task is attempted to be generated, if the mark file exists, the operation task is indicated to be distributed to other processes, and then the process automatically attempts to apply for distributing the next operation task.

Specifically, the specific implementation steps for implementing the correct allocation of the operation tasks by using the file marking policy include the following steps a1 to a 6.

Step a1, when applying for distributing the calculation task Si, the process determines whether the calculation task Si has a corresponding flag file SFi, if it is determined that the flag file SFi does not exist, step a2 is executed, and if it does not exist, step a6 is executed.

Step a2, determine whether the flag file SFi is in the locked state, if not, execute step A3, and if in the locked state, execute step a 6.

Step a3, the flag file SFi is locked.

Step a4, the flag file SFi is generated.

And step A5, unlocking the mark file SFi and completing the calculation of the calculation task.

And step a6, determining whether all the calculation tasks in the parallel coarse grains are completed, if not, making i equal to i +1, and then returning to execute step a1 to realize application allocation of the next calculation task, otherwise, ending. After the completion, it is described that all the operation tasks that need to be executed by the current parallel coarse grain have been completely allocated to all the processes, and the allocation of the current parallel coarse grain has been completed, and it is possible to return to execute other parallel coarse grains and allocate all the operation tasks that need to be executed by the other parallel coarse grains.

The above-mentioned using the flag file to achieve the correct allocation of the operation task may be implemented using a file marking technique. The file marking technology adopts a file locking and unlocking technology which ensures that only one process can read/write the same operation task at a time, and prevents multiple processes from operating the same file at the same time to cause repeated operation of the same operation task. The file read-write lock has high parallelism, a plurality of threads can occupy the read-write lock of a read mode at the same time, but only one thread can occupy the read-write lock of the write mode and three states of the read-write lock:

1. when the read-write lock is in a write-locked state, all threads attempting to lock the lock are blocked before the lock is unlocked;

2. when the read-write lock is in a read-locking state, all threads trying to lock it in a read mode can get access, but the threads locking it in a write mode will be blocked;

3. when the read-write lock is in the lock state of the read mode, if another thread tries to lock in the write mode, the read-write lock usually blocks the subsequent request of the read mode lock, so that the long-term occupation of the read mode lock can be avoided, and the long-term blocking of the waiting request of the write mode lock can be avoided.

Two common strategies to handle the reader-writer problem are strong reader synchronization (strong reader synchronization) and strong writer synchronization (strong writer synchronization). In strong reader synchronization, a reader is always given higher priority, and as long as a writer does not perform write operation currently, the reader can obtain access authority; in strong writer synchronization, priority is often given to writers, and readers can only wait until all waiting or executing writers end.

When the main process executes the parallel coarse grains, all processes including the main process apply for respective required computing tasks respectively, the main process dynamically distributes the computing tasks to all the processes based on the file marking technology, the file locking and unlocking technology and the dynamic distribution computing task strategy, after the last computing task in the parallel coarse grains is completed, the parallel computing of the parallel coarse grains is completed, and the fact that the subsequent parallel coarse grains can obtain computing results without waiting for too long time is guaranteed.

When the frequency domain electromagnetic response calculation of the integrated circuit is carried out, a common method is adopted, each node can only start a small number of processes at the same time, and if more processes are started, long-time waiting caused by using a virtual memory can be caused. If the automatic start-stop technology, the file marking technology and the dynamic allocation technology disclosed by the application are adopted, each node starts more processes, the calculation time can be greatly reduced, and more processes are guaranteed to be started simultaneously under the condition of avoiding using a virtual memory.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种最小化芯片布局面积的自适应选择方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类