Thread processing method and graphics processor

文档序号:1102571 发布日期:2020-09-25 浏览:32次 中文

阅读说明:本技术 线程处理方法和图形处理器 (Thread processing method and graphics processor ) 是由 林焕鑫 王卓立 马军超 单东方 沈伟锋 于 2018-02-14 设计创作,主要内容包括:一种应用于图形处理器的方法,该方法包括如下的步骤:第一线程处理器获取需要处理的第一待处理数据,确定第一待处理数据满足第一分支语句,将计数器中的数值加一步长。第一线程处理器根据计数器的数值确定M*N个线程中需运行第一分支语句的线程的数量。第一线程处理器在确认数量大于阈值的情况下,执行线程同步以及线程数据重映射,在有在第一分支的线程的数量较多的情况下,才使用线程数据重映射,可节约时间和运算资源。(A method for use in a graphics processor, the method comprising the steps of: the first thread processor obtains first data to be processed, determines that the first data to be processed satisfies a first branch statement, and increases the value in the counter by one step. The first thread processor determines the number of threads of the M x N threads that need to run the first branch statement based on the counter value. The first thread processor executes thread synchronization and thread data remapping when the number is confirmed to be larger than the threshold value, and uses the thread data remapping when the number of the threads of the first branch is large, thereby saving time and computing resources.)

A thread processing method applied to a graphics processor, the graphics processor being configured to process M thread bundles, each thread bundle including N threads, the graphics processor further including at least one thread bundle processor, a first thread bundle processor of the at least one thread bundle processor including an integer multiple of N thread processors, the first thread bundle processor including a first thread processor, the first thread processor being configured to run one of the N threads to process data to be processed that satisfies a judgment condition of a first branch statement or satisfies a judgment condition of a second branch statement, a counter being provided in the graphics processor, the method comprising:

the first thread processor acquires first to-be-processed data needing to be processed, determines that the first to-be-processed data meets a first branch statement, and increases the numerical value in the counter by one step;

the first thread processor determines the number of threads needing to run a first branch statement in the M x N threads according to the value of the counter;

the first thread processor performs thread synchronization and thread data remapping upon determining that the number is greater than a threshold.

The method of claim 1, wherein the graphics processor is further configured with a flag bit, the value of the flag bit being set to a first flag value, the first flag value indicating that no remapping is performed, the method comprising:

the first thread processor reading the flag bit prior to determining that the number is greater than a threshold; and the number of the first and second electrodes,

the first thread processor sets the first flag value to a second flag value after determining that the number is greater than a threshold and before performing thread synchronization, the second flag value indicating that remapping needs to be performed.

The method of claim 2, wherein the first threaded processor after performing the thread synchronization and before performing the thread data remapping, the method further comprising:

the first thread processor clears the value in the counter.

A method according to claim 2 or 3, wherein the first thread bundle processor comprises a second thread processor, the second thread processor being configured to run one of the N threads to process data to be processed satisfying a first branch statement or satisfying a second branch statement, the method further comprising:

the second thread processor reads the flag bit, and executes thread synchronization and thread data remapping when the value of the flag bit is confirmed to be the second flag value;

and when the second thread processor confirms that the value of the flag bit is the first flag value, determining the number of threads needing to run a first branch statement in the M x N threads according to the value of the counter, and executing thread synchronization and thread data remapping under the condition that the number is confirmed to be larger than a threshold value.

The method according to claim 4, wherein the first thread processor is configured to run a first thread of the N threads to process the data to be processed satisfying the determination condition of the first branch statement, the second thread processor is configured to run a second thread of the N threads to process the data to be processed satisfying the determination condition of the second branch statement, the graphics processor is further provided with a one-dimensional array, a first variable and a second variable, wherein the length of the one-dimensional array is M N, an initial value of the first variable is 0, an initial value of the second variable is M N-1, and the first thread processor performs the thread data remapping, comprising:

the first thread processor reads the numerical value of the second variable, writes the thread identification of the second thread into the one-dimensional array, takes the numerical value of the second variable as the position of a subscript, subtracts one from the numerical value of the second variable, and executes the thread synchronization;

the second thread processor reads the numerical value of the first variable, writes the thread identification of the second thread into the one-dimensional array, takes the numerical value of the first variable as a subscript position, adds one to the numerical value of the first variable, and executes the thread synchronization;

after the thread synchronization is finished, the first thread processor reads a numerical value at a position in the one-dimensional array, wherein the thread identifier of the first thread is used as a subscript, and the read numerical value is used as an updated thread identifier of the first thread generated by thread data remapping;

and after the thread synchronization is finished, the second thread processor reads the numerical value at the position of the one-dimensional array with the thread identifier of the second thread as the subscript, and uses the read numerical value as the updated thread identifier of the second thread generated by thread data remapping.

The method of claim 5, wherein the first threaded processor after performing the thread synchronization and before performing the thread data remapping, the method further comprising:

the first thread processor records the first data to be processed in an index table by taking the thread identification of the first thread as an index, wherein the thread identification of the first thread and the first data to be processed have a one-to-one correspondence relationship, and the index table records the one-to-one correspondence relationship between the thread identifications of the M x N threads and the data to be processed;

after the first thread processor performs the thread data remapping, the method further comprises:

the first thread processor reads third data to be processed corresponding to the updated thread identifier of the first thread in the index table by taking the updated thread identifier of the first thread generated after the thread data remapping is executed as an index;

the first thread processor executes the first branch statement when the third to-be-processed data satisfies a judgment condition of a first branch statement, and the first thread processor executes the second branch statement when the third to-be-processed data satisfies a judgment condition of a second branch statement.

The method according to any one of claims 1 to 6, wherein the threshold value is 1.

The method according to any one of claims 1 to 6, wherein the threshold value is a positive integer of 2 or more and 5 or less.

The method of any of claims 1 to 8, wherein the probability of the first threaded processor executing the first branch statement is less than the probability of the first threaded processor executing the second branch statement.

A thread processing method applied to a graphics processor for processing M thread bundles, each thread bundle including N threads, the graphics processor further including at least one thread bundle processor, a first thread bundle processor of the at least one thread bundle processor including an integer multiple of N thread processors, the first thread bundle processor including a first thread processor running a loop statement for running one of the N threads in one loop to process data to be processed satisfying a judgment condition of a first branch statement or satisfying a judgment condition of a second branch statement, a counter being provided in the graphics processor, the method comprising:

the first thread processor acquires first to-be-processed data needing to be processed in a first loop, determines that the first to-be-processed data meets a first branch statement, and increases the value in the counter by one step;

the first thread processor determines the number of threads needing to run a first branch statement in the M x N threads according to the value of the counter;

the first thread processor executes thread synchronization and clears the value in the counter under the condition that the number is confirmed to be larger than the threshold value;

the first thread processor performs thread data remapping.

The method of claim 10, further comprising:

and the first thread processor acquires second data to be processed which needs to be processed in a second cycle, determines that the second data to be processed meets the judgment condition of the second branch statement, and reduces the numerical value in the counter by one step length.

The method of claim 10 or 11, wherein the graphics processor is further configured with a flag bit, the value of the flag bit being set to a first flag value indicating that no remapping is performed, the method comprising:

the first thread processor reading the flag bit prior to determining that the number is greater than a threshold; and the number of the first and second electrodes,

the first thread processor sets the first flag value to a second flag value after determining that the number is greater than a threshold and before performing thread synchronization, the second flag value indicating that remapping needs to be performed.

The method of claim 12, wherein the first thread bundle processor comprises a second thread processor configured to run one of the N threads to process data to be processed that satisfies the first branch statement or satisfies the second branch statement, the method further comprising:

the second thread processor reads the flag bit, and executes thread synchronization and thread data remapping when the value of the flag bit is confirmed to be the second flag value;

and when the second thread processor confirms that the value of the flag bit is the first flag value, determining the number of threads needing to run a first branch statement in the M x N threads according to the value of the counter, and executing thread synchronization and thread data remapping under the condition that the number is confirmed to be larger than a threshold value.

The method according to claim 13, wherein the first thread processor is configured to run a first thread of the N threads to process the data to be processed satisfying the determination condition of the first branch statement, the second thread processor is configured to run a second thread of the N threads to process the data to be processed satisfying the determination condition of the second branch statement, the graphics processor is further provided with a one-dimensional array, a first variable and a second variable, wherein the length of the one-dimensional array is M × N, an initial value of the first variable is 0, an initial value of the second variable is M × N-1, and the first thread processor performs the thread data remapping, comprising:

the first thread processor reads the numerical value of the second variable, writes the thread identification of the second thread into the one-dimensional array, takes the numerical value of the second variable as the position of a subscript, subtracts one from the numerical value of the second variable, and executes the thread synchronization;

the second thread processor reads the numerical value of the first variable, writes the thread identification of the second thread into the one-dimensional array, takes the numerical value of the first variable as a subscript position, adds one to the numerical value of the first variable, and executes the thread synchronization;

after the thread synchronization is finished, the first thread processor reads a numerical value at a position in the one-dimensional array, wherein the thread identifier of the first thread is used as a subscript, and the read numerical value is used as an updated thread identifier of the first thread generated by thread data remapping;

and after the thread synchronization is finished, the second thread processor reads the numerical value at the position of the one-dimensional array with the thread identifier of the second thread as the subscript, and uses the read numerical value as the updated thread identifier of the second thread generated by thread data remapping.

The method of claim 14, wherein the first threaded processor, after performing the thread synchronization and before performing the thread data remapping, further comprises:

the first thread processor records the first data to be processed and the first loop variable in an index table by taking the thread identification of the first thread as an index, wherein the thread identification of the first thread and the first data to be processed have a one-to-one correspondence relationship;

after the first thread processor performs the thread data remapping, the method further comprises:

the first thread processor reads third data to be processed corresponding to the updated thread identifier of the first thread in the index table by taking the updated thread identifier of the first thread generated after the thread data remapping is executed as an index;

the first thread processor executes the first branch statement when the third to-be-processed data satisfies a judgment condition of a first branch statement, and the first thread processor executes the second branch statement when the third to-be-processed data satisfies a judgment condition of a second branch statement.

The method according to claim 15, wherein a loop variable of each thread is further recorded in the graphics processor, the loop variable is used to indicate a sequence number of a loop in which a thread is currently located, a correspondence relationship between the loop variable of the first thread and a thread identifier of the first thread and data to be processed of the first thread in the loop indicated by the loop variable is recorded in the index table, and after the first thread processor performs the thread data remapping, the method further comprises:

the first thread processor reads a loop variable corresponding to the updated thread identifier of the first thread in the index table by using the updated thread identifier of the first thread generated after the thread data remapping is executed as an index;

after executing the first branch statement or the second branch statement, the first thread processor adds one to a loop variable corresponding to the updated thread identifier of the first thread to obtain an updated loop variable, ends the first thread when the updated loop variable does not meet a loop condition specified by the loop statement, and runs a second loop of the first thread when the updated loop variable meets the loop condition specified by the loop statement.

The method of any one of claims 10 to 16, wherein the threshold is 1.

The method according to any one of claims 10 to 16, wherein the threshold value is a positive integer greater than or equal to 2 and less than or equal to 5.

The method of any of claims 10 to 18, wherein the probability of the first threaded processor executing the first branch statement is less than the probability of the first threaded processor executing the second branch statement.

A graphics processor for processing M thread bundles, each thread bundle comprising N threads, the graphics processor further comprising at least one thread bundle processor, a first thread bundle processor of the at least one thread bundle processor comprising an integer multiple of N thread processors, the first thread bundle processor comprising a first thread processor to run one of the N threads to process data to be processed satisfying a judgment condition of a first branch statement or satisfying a judgment condition of a second branch statement, the graphics processor having a counter disposed therein, wherein,

the first thread processor is used for acquiring first to-be-processed data needing to be processed, determining that the first to-be-processed data meets a first branch statement, and lengthening a numerical value in the counter by one step;

the first thread processor is used for determining the number of threads needing to run a first branch statement in the M x N threads according to the value of the counter;

the first thread processor is used for executing thread synchronization and thread data remapping under the condition that the number is confirmed to be larger than a threshold value.

The graphics processor of claim 20, further provided with a flag bit, the value of the flag bit being set to a first flag value indicating that no remapping is performed, wherein,

the first thread processor to read the flag bit prior to determining that the number is greater than a threshold; and the number of the first and second electrodes,

the first thread processor is configured to set the first flag value to a second flag value after determining that the number is greater than the threshold and before performing thread synchronization, the second flag value indicating that remapping needs to be performed.

The graphics processor of claim 21, wherein the first thread processor is further configured to clear the value in the counter after performing the thread synchronization and before performing the thread data remapping.

The graphics processor of claim 21 or 22, wherein the first thread bundle processor comprises a second thread processor to run one of the N threads to process data to be processed satisfying a first branch statement or satisfying a second branch statement,

the second thread processor is used for reading the flag bit and executing thread synchronization and thread data remapping when the value of the flag bit is confirmed to be the second flag value;

and the second thread processor is used for determining the number of threads needing to run a first branch statement in the M x N threads according to the value of the counter when the value of the flag bit is confirmed to be the first flag value, and executing thread synchronization and thread data remapping under the condition that the number is confirmed to be larger than a threshold value.

The graphics processor of claim 23, wherein the first thread processor is configured to run a first thread of the N threads to process data to be processed satisfying a determination condition of a first branch statement, the second thread processor is configured to run a second thread of the N threads to process data to be processed satisfying a determination condition of a second branch statement, the graphics processor is further provided with a one-dimensional array, a first variable, and a second variable, wherein the length of the one-dimensional array is M x N, an initial value of the first variable is 0, an initial value of the second variable is M x N-1, the first thread processor performs the thread data remapping,

the first thread processor is used for reading the numerical value of the second variable, writing the thread identification of the second thread into the one-dimensional array, taking the numerical value of the second variable as a subscript position, subtracting one from the numerical value of the second variable, and executing the thread synchronization;

the second thread processor is used for reading the numerical value of the first variable, writing the thread identification of the second thread into the one-dimensional array, taking the numerical value of the first variable as a subscript position, adding one to the numerical value of the first variable, and executing the thread synchronization;

the first thread processor is configured to read a numerical value at a position in the one-dimensional array where the thread identifier of the first thread is used as a subscript after the thread synchronization is finished, and use the read numerical value as an updated thread identifier of the first thread generated by the thread data remapping;

and the second thread processor is used for reading a numerical value at a position in the one-dimensional array, which takes the thread identifier of the second thread as a subscript, after the thread synchronization is finished, and taking the read numerical value as an updated thread identifier of the second thread generated by thread data remapping.

The graphics processor of claim 24, wherein the first thread processor, after performing the thread synchronization and before performing the thread data remapping,

the first thread processor is configured to record the first to-be-processed data in an index table by using the thread identifier of the first thread as an index, where the thread identifier of the first thread and the first to-be-processed data have a one-to-one correspondence relationship, and the index table records the one-to-one correspondence relationship between the thread identifiers of the M × N threads and the to-be-processed data;

the first thread processor is configured to, after performing the thread data remapping, read third to-be-processed data corresponding to an updated thread identifier of the first thread in the index table by using the updated thread identifier of the first thread generated after performing the thread data remapping as an index;

the first thread processor is configured to execute the first branch statement when the third to-be-processed data satisfies a determination condition of a first branch statement, and execute the second branch statement when the third to-be-processed data satisfies a determination condition of a second branch statement.

The method of any one of claims 20 to 25, wherein the threshold is 1.

The method according to any one of claims 20 to 25, wherein the threshold value is a positive integer of 2 or more and 5 or less.

The method of any of claims 20 to 27, wherein the probability of the first threaded processor executing the first branch statement is less than the probability of the first threaded processor executing the second branch statement.

A graphics processor for processing M thread bundles, each thread bundle comprising N threads, the graphics processor further comprising at least one thread bundle processor, a first thread bundle processor of the at least one thread bundle processor comprising an integer multiple of N thread processors, the first thread bundle processor comprising a first thread processor running a loop statement for running one of the N threads in one loop to process data to be processed satisfying a judgment condition of a first branch statement or satisfying a judgment condition of a second branch statement, the graphics processor having a counter disposed therein, wherein,

the first thread processor is used for acquiring first to-be-processed data needing to be processed in a first cycle, determining that the first to-be-processed data meets a first branch statement, and lengthening a numerical value in the counter by one step;

the first thread processor is used for determining the number of threads needing to run a first branch statement in the M x N threads according to the value of the counter;

the first thread processor is used for executing thread synchronization and clearing the numerical value in the counter under the condition that the number is confirmed to be larger than a threshold value;

the first thread processor is configured to perform thread data remapping.

The graphics processor of claim 29,

the first thread processor is configured to acquire second to-be-processed data that needs to be processed in a second cycle of the first thread, determine that the second to-be-processed data satisfies a determination condition of the second branch statement, and subtract a step length from a value in the counter.

The graphics processor of claim 29 or 30, further provided with a flag bit, the value of the flag bit being set to a first flag value indicating that no remapping is performed,

the first thread processor to read the flag bit prior to determining that the number is greater than a threshold;

the first thread processor is configured to set the first flag value to a second flag value after determining that the number is greater than the threshold and before performing thread synchronization, the second flag value indicating that remapping needs to be performed.

The method of claim 31, wherein the first thread bundle processor comprises a second thread processor to run one of the N threads to process data to be processed satisfying a first branch statement or satisfying a second branch statement,

the second thread processor is used for reading the flag bit and executing thread synchronization and thread data remapping when the value of the flag bit is confirmed to be the second flag value;

and the second thread processor is used for determining the number of threads needing to run a first branch statement in the M x N threads according to the value of the counter when the value of the flag bit is confirmed to be the first flag value, and executing thread synchronization and thread data remapping under the condition that the number is confirmed to be larger than a threshold value.

The graphics processor of claim 32, wherein the first thread processor is configured to run a first thread of the N threads to process data to be processed satisfying a determination condition of a first branch statement, the second thread processor is configured to run a second thread of the N threads to process data to be processed satisfying a determination condition of a second branch statement, the graphics processor is further provided with a one-dimensional array, a first variable, and a second variable, wherein the length of the one-dimensional array is M x N, an initial value of the first variable is 0, an initial value of the second variable is M x N-1, the first thread processor performs the thread data remapping,

the first thread processor is used for reading the numerical value of the second variable, writing the thread identification of the second thread into the one-dimensional array, taking the numerical value of the second variable as a subscript position, subtracting one from the numerical value of the second variable, and executing the thread synchronization;

the second thread processor is used for reading the numerical value of the first variable, writing the thread identification of the second thread into the one-dimensional array, taking the numerical value of the first variable as a subscript position, adding one to the numerical value of the first variable, and executing the thread synchronization;

the first thread processor is configured to read a numerical value at a position in the one-dimensional array where the thread identifier of the first thread is used as a subscript after the thread synchronization is finished, and use the read numerical value as an updated thread identifier of the first thread generated by the thread data remapping;

and the second thread processor is used for reading a numerical value at a position in the one-dimensional array, which takes the thread identifier of the second thread as a subscript, after the thread synchronization is finished, and taking the read numerical value as an updated thread identifier of the second thread generated by thread data remapping.

The graphics processor of claim 33, wherein the first thread processor runs a first thread,

the first thread processor is configured to record the first to-be-processed data in an index table by using a thread identifier of the first thread as an index after the thread synchronization is performed and before the thread data remapping is performed, where the thread identifier of the first thread and the first to-be-processed data have a one-to-one correspondence relationship;

the first thread processor is configured to, after performing the thread data remapping, read third to-be-processed data corresponding to an updated thread identifier of the first thread in the index table by using the updated thread identifier of the first thread generated after performing the thread data remapping as an index;

the first thread processor is configured to execute the first branch statement when the third to-be-processed data satisfies a determination condition of a first branch statement, and the first thread processor executes the second branch statement when the third to-be-processed data satisfies a determination condition of a second branch statement.

The method according to claim 34, wherein a loop variable of each thread is further recorded in the graphics processor, the loop variable is used to indicate a sequence number of a loop where the thread is currently located, a correspondence relationship between the loop variable of the first thread and the thread identifier of the first thread, and data to be processed by the first thread in the loop indicated by the loop variable is recorded in the index table,

the first thread processor is configured to, after performing the thread data remapping, read a loop variable corresponding to an updated thread identifier of the first thread in the index table as an index, where the updated thread identifier of the first thread is generated after performing the thread data remapping;

the first thread processor is configured to add one to a loop variable corresponding to an updated thread identifier of the first thread to obtain an updated loop variable after the first branch statement or the second branch statement is executed, end the first thread when the updated loop variable does not meet a loop condition specified by the loop statement, and run a second loop of the first thread when the updated loop variable meets the loop condition specified by the loop statement.

The method of any one of claims 29 to 35, wherein the threshold is 1.

The method according to any one of claims 29 to 35, wherein the threshold value is a positive integer greater than or equal to 2 and less than or equal to 5.

The method of any one of claims 29 to 37 wherein the probability of the first threaded processor executing the first branch statement is less than the probability of the first threaded processor executing the second branch statement.

59页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种分功能模块安装应用程序的方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!