News text classification system

文档序号:168645 发布日期:2021-10-29 浏览:21次 中文

阅读说明:本技术 新闻文本分类系统 (News text classification system ) 是由 高琦 范德儒 刘畅 于 2021-08-10 设计创作,主要内容包括:本发明公开了新闻文本分类系统,包括前端主机、数据处理芯片、分类处理芯片、算法控制箱、单文本处理控制箱、多文本处理控制箱,所述数据处理芯片与前端主机、算法控制箱之间设置有数据接口与一号连接线缆,所述分类处理芯片与前端主机、算法控制箱之间设置有四号连接线缆,所述前端主机与单文本处理控制箱之间设置有二号连接线缆,所述前端主机与多文本处理控制箱之间设置有三号连接线缆。本发明所述的新闻文本分类系统,单条新闻预测不超过0.2s,且根据设备情况运用GPU加速,训练数据超过1000000条,每一类数据十分均匀,结果更可靠,前端界面简单明了,批量预测使用多线程,准确率高,比人类平均正确率高。(The invention discloses a news text classification system which comprises a front-end host, a data processing chip, a classification processing chip, an algorithm control box, a single text processing control box and a multi-text processing control box, wherein a data interface and a first connecting cable are arranged between the data processing chip and the front-end host as well as between the data processing chip and the algorithm control box, a fourth connecting cable is arranged between the classification processing chip and the front-end host as well as between the classification processing chip and the algorithm control box, a second connecting cable is arranged between the front-end host and the single text processing control box, and a third connecting cable is arranged between the front-end host and the multi-text processing control box. According to the news text classification system, single news prediction does not exceed 0.2s, the GPU is used for acceleration according to equipment conditions, training data exceed 1000000, each type of data is very uniform, the result is more reliable, the front-end interface is simple and clear, multiple threads are used for batch prediction, the accuracy is high, and the average accuracy is higher than that of a human.)

1. News text classification system, including front end host computer (1), data processing chip (6), classification chip (13), algorithm control box (7), single text processing control box (10), many text processing control box (11), its characterized in that: be provided with data interface (4) and connection cable (5) No. one between data processing chip (6) and front end host computer (1), algorithm control box (7), be provided with between classification chip (13) and front end host computer (1), the algorithm control box (7) No. four connection cable (14), be provided with between front end host computer (1) and single text processing control box (10) No. two connection cable (9), be provided with between front end host computer (1) and many text processing control box (11) No. three connection cable (12).

2. The news text classification system of claim 1, wherein: the front end of the front end host (1) is provided with a control module (2) and a display panel (3), and the front end of the algorithm control box (7) is provided with a display interface (8) and a controller (15).

3. The news text classification system of claim 1, wherein: the system is characterized in that a news text classification module is arranged inside the front-end host (1), the news text classification module is connected with a model selection module, the model selection module is connected with a batch prediction module and a single sentence prediction module, and the batch prediction module and the single sentence prediction module are both connected with a prediction output module.

4. The news text classification system of claim 1, wherein: the output end of the news text classification module is connected with the input end of the model selection module, the output end of the model selection module is connected with the input ends of the batch prediction module and the single sentence prediction module, and the output ends of the batch prediction module and the single sentence prediction module are connected with the input end of the prediction output module.

5. The news text classification system of claim 1, wherein: a QT front end module, a news data processing module, a classification result processing module, an algorithm module, a single news text module and a plurality of news file modules are arranged between the front end host (1), the data processing chip (6), the classification processing chip (13), the algorithm control box (7), the single text processing control box (10) and the multi-text processing control box (11).

6. The news text classification system of claim 1, wherein: the output end of the QT front-end module is connected with the input end of the algorithm module through the news data processing module, and the input end of the QT front-end module is connected with the output end of the algorithm module through the classification result processing module.

Technical Field

The invention relates to the field of news text classification, in particular to a news text classification system.

Background

The news text classification system is a supporting device for classifying news text data, news is an important way for people to acquire information and know current affair hotspots, with the development of digitization of news industry, text data such as news reports, news comments, net friends vocalization and the like on a network platform are rapidly increased, the text data are correctly classified, the information can be better organized and utilized, automatic news text classification liberates people from complicated manual classification, classification tasks are more efficient and accurate, the improvement of retrieval efficiency of users is helped, the improvement of reading experience of users is facilitated, meanwhile, a foundation is laid for further data mining and analysis, with the continuous development of science and technology, the requirements of people on the manufacturing process of the news text classification system are higher and higher.

The existing news text classification system has certain disadvantages when in use, firstly, the preprocessing operation can not be well carried out on news data resources, the text classification is troublesome to carry out on the news data resources, the use of people is not facilitated, in addition, the disposable text processing is less, the efficiency is low, the accuracy is low, under the traditional classification mode, the news content is often checked through manual work, the news is divided into proper categories, a large amount of human resources can be consumed by the mode, the efficiency is not high, in the face of the large-scale and continuously-increasing text information, the classification of massive text information is unrealistic by means of manual work, certain adverse effect is brought to the use process of people, therefore, the news text classification system is provided.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a news text classification system, single news is predicted for no more than 0.2s, a GPU is used for acceleration according to equipment conditions, training data exceeds 1000000, each type of data is very uniform, the result is more reliable, a front-end interface is simple and clear, multiple threads are used for batch prediction, the accuracy is high, the accuracy of test data is higher than 94%, the average accuracy of the test data is higher than that of human, and the problem in the background technology can be effectively solved.

(II) technical scheme

In order to achieve the purpose, the invention adopts the technical scheme that: news text classification system, including front end host computer, data processing chip, classification processing chip, algorithm control box, single text processing control box, many text processing control box, be provided with data interface and a connecting cable between data processing chip and front end host computer, the algorithm control box, be provided with No. four connecting cables between classification processing chip and front end host computer, the algorithm control box, be provided with No. two connecting cables between front end host computer and the single text processing control box, be provided with No. three connecting cables between front end host computer and the many text processing control box.

Preferably, the front end of the front-end host is provided with a control module and a display panel, and the front end of the algorithm control box is provided with a display interface and a controller.

Preferably, a news text classification module is arranged inside the front-end host, the news text classification module is connected with a model selection module, the model selection module is connected with a batch prediction module and a single sentence prediction module, and the batch prediction module and the single sentence prediction module are both connected with a prediction output module.

Preferably, the output end of the news text classification module is connected with the input end of the model selection module, the output end of the model selection module is connected with the input ends of the batch prediction module and the single sentence prediction module, and the output ends of the batch prediction module and the single sentence prediction module are connected with the input end of the prediction output module.

Preferably, a QT front-end module, a news data processing module, a classification result processing module, an algorithm module, a single news text module and a plurality of news file modules are arranged between the front-end host, the data processing chip, the classification processing chip, the algorithm control box, the single text processing control box and the multi-text processing control box.

Preferably, the output end of the QT front-end module is connected with the input end of the algorithm module through a news data processing module, and the input end of the QT front-end module is connected with the output end of the algorithm module through a classification result processing module.

(III) advantageous effects

Compared with the prior art, the invention provides a news text classification system, which has the following beneficial effects: according to the news text classification system, single news prediction does not exceed 0.2s, a GPU is used for acceleration according to equipment conditions, training data exceeds 1000000, each type of data is very uniform, results are more reliable, a front-end interface is simple and clear, multithreading is used for batch prediction, accuracy is high, accuracy of test data is higher than 94%, the average accuracy of human is higher than that of human, text processing is carried out between a front-end host and an algorithm control box, data processing and classification result processing are carried out at the positions of a data processing chip and a classification processing chip, the front-end host is connected with a single text processing control box and a multi-text processing control box, one or more texts can be predicted, automatic news text classification releases human from tedious manual classification, classification tasks are made to be more efficient, a user is helped to improve retrieval efficiency, reading experience of the user, website operators are helped to know user requirements, the news data file uploading classification method has the advantages that the information is utilized more effectively, meanwhile, useful information can be analyzed and mined on the basis of classification, a foundation is laid for further data mining and analysis, the classification of news can be output according to the title and the text content of the input news, the classification of the news comprises ten types including finance, real estate, education, science and technology, military, automobile, sports, games, entertainment and the like, single news can be input, local uploading of csv/xlsx files is supported, the news is input in batches, the classification of the news is output, single sentence prediction realizes the classification of the single text input, batch prediction realizes the uploading classification of the news data files, a user uploads the title and the text content or specifies a csv/xlsx file folder path, the text classification of the news is returned accurately, a batch recognition result form is provided for the user, and the method has the advantages of accuracy, high speed, simplicity and convenience in operation, The method has the advantages of expandability and the like, and the technical framework consists of two parts, namely interface design and a core algorithm. The front-end technology completes the visual display of the system and provides an interface used by a user, and is built by a PyQt5 framework. The core algorithm is responsible for carrying out classification analysis on uploaded xlxs files and carrying out classification analysis on directly input characters, a PyQT5 framework is used at the front end, the model is trained by utilizing a Bert pre-training model, the whole news text classification system is simple in structure and convenient to operate, and the using effect is better than that of a traditional mode.

Drawings

Fig. 1 is a schematic diagram of the overall structure of the news text classification system of the present invention.

Fig. 2 is a schematic structural diagram of a front-end host in the news text classification system of the present invention.

Fig. 3 is a schematic structural diagram of a news text classification module in the news text classification system of the present invention.

Fig. 4 is a schematic structural diagram of a classification processing module in the news text classification system of the present invention.

In the figure: 1. a front-end host; 2. a control module; 3. a display panel; 4. a data interface; 5. a first connecting cable; 6. a data processing chip; 7. an algorithm control box; 8. displaying an interface; 9. a second connecting cable; 10. a single text processing control box; 11. a multi-text processing control box; 12. a third connecting cable; 13. a classification processing chip; 14. a fourth connecting cable; 15. and a controller.

Detailed Description

The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings and the detailed description, but those skilled in the art will understand that the following described embodiments are some, not all, of the embodiments of the present invention, and are only used for illustrating the present invention, and should not be construed as limiting the scope of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. The examples, in which specific conditions are not specified, were conducted under conventional conditions or conditions recommended by the manufacturer. The reagents or instruments used are not indicated by the manufacturer, and are all conventional products available commercially.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The first embodiment is as follows:

as shown in fig. 1-4, the news text classification system includes a front-end host 1, a data processing chip 6, a classification processing chip 13, an algorithm control box 7, a single text processing control box 10, and a multi-text processing control box 11, wherein a data interface 4 and a first connecting cable 5 are provided between the data processing chip 6 and the front-end host 1, and between the data processing chip 6 and the algorithm control box 7, a fourth connecting cable 14 is provided between the classification processing chip 13 and the front-end host 1, and between the front-end host 1 and the single text processing control box 10, a second connecting cable 9 is provided, and a third connecting cable 12 is provided between the front-end host 1 and the multi-text processing control box 11.

Further, the front end of the front end host 1 is provided with a control module 2 and a display panel 3, and the front end of the algorithm control box 7 is provided with a display interface 8 and a controller 15.

Further, a QT front-end module, a news data processing module, a classification result processing module, an algorithm module, a single news text module, and a plurality of news file modules are arranged between the front-end host 1, the data processing chip 6, the classification processing chip 13, the algorithm control box 7, the single text processing control box 10, and the multi-text processing control box 11.

Furthermore, the output end of the QT front-end module is connected with the input end of the algorithm module through the news data processing module, and the input end of the QT front-end module is connected with the output end of the algorithm module through the classification result processing module.

Example two:

on the basis of the first embodiment, as shown in fig. 1 to 4, the news text classification system includes a front-end host 1, a data processing chip 6, a classification processing chip 13, an algorithm control box 7, a single text processing control box 10, and a multi-text processing control box 11, a data interface 4 and a first connection cable 5 are provided between the data processing chip 6 and the front-end host 1, and between the classification processing chip 13 and the algorithm control box 7, a fourth connection cable 14 is provided, a second connection cable 9 is provided between the front-end host 1 and the single text processing control box 10, and a third connection cable 12 is provided between the front-end host 1 and the multi-text processing control box 11.

Furthermore, a news text classification module is arranged inside the front-end host 1, the news text classification module is connected with a model selection module, the model selection module is connected with a batch prediction module and a single sentence prediction module, and the batch prediction module and the single sentence prediction module are both connected with a prediction output module.

Furthermore, the output end of the news text classification module is connected with the input end of the model selection module, the output end of the model selection module is connected with the input ends of the batch prediction module and the single sentence prediction module, and the output ends of the batch prediction module and the single sentence prediction module are connected with the input end of the prediction output module.

The working principle is as follows: the invention comprises a front-end host 1, a control module 2, a display panel 3, a data interface 4, a first connecting cable 5, a data processing chip 6, an algorithm control box 7, a display interface 8, a second connecting cable 9, a single text processing control box 10, a multi-text processing control box 11, a third connecting cable 12, a classification processing chip 13, a fourth connecting cable 14 and a controller 15, when in use, text processing is carried out between the front-end host 1 and the algorithm control box 7, data processing and classification result processing are carried out at the positions of the data processing chip 6 and the classification processing chip 13, the front-end host 1 is connected with the single text processing control box 10 and the multi-text processing control box 11, single or multi-text prediction can be carried out, automatic news text classification releases human from fussy manual classification, so that the classification task is more efficient, the method helps users to improve retrieval efficiency, improves reading experience of users, assists website operators to know user requirements, enables information to be utilized more effectively, can analyze and mine useful information on the basis of classification, lays a foundation for further data mining and analysis, can output classification of news including finance, housing estate, education, science and technology, military, automobile, sports, games, entertainment and other ten types according to the title and the text content of the input news, can input single news, also supports local uploading of csv/xlsx files, inputs news in batches and outputs news classification, realizes classification of text input by single sentence prediction, realizes classification of uploading of news data files by batch prediction, uploads title and text content of users or appoints a csv/xlsx file folder path, and accurately returns text classification of news, the method provides a batch recognition result form for a user, and has the advantages of accuracy, high speed, simplicity in operation, expandability and the like, and the technical architecture consists of two parts, namely interface design and a core algorithm. The front-end technology completes the visual display of the system and provides an interface used by a user, and is built by a PyQt5 framework. The core algorithm is responsible for carrying out classification analysis on uploaded xlxs files and directly input characters, a PyQT5 framework is used at the front end, and the model is trained by utilizing a Bert pre-training model.

It is noted that, herein, relational terms such as first and second (a, b, etc.) and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:手持式工业电子装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!