Text processing method and device

文档序号:1170287 发布日期:2020-09-18 浏览:8次 中文

阅读说明:本技术 文本的处理方法和装置 (Text processing method and device ) 是由 王翰琦 于 2020-06-08 设计创作,主要内容包括:本申请公开了文本的处理方法和装置,涉及文本数据处理技术领域。具体实施方式包括:对文本进行切词,得到多个词语,以及按照预设类别,对所述多个词语进行分类;基于所述多个词语中各个词语的类别,确定所述各个词语的排列顺序,以及生成符合所述排列顺序的词序列;基于所述词序列中词语的排列顺序,生成目标有向无环图;在元数据集合中,确定出所述目标有向无环图的各个节点分别对应的元数据以及所确定的元数据之间的排列顺序。本申请可以通过对文本进行分类,生成能够体现词语类别的目标有向无环图,进而在元数据集合中准确地确定出相应的元数据,以及各个元数据之间的排列顺序,以实现高效、准确地对文本进行描述。本申请适用于任何基于大数据的标签计算系统语义转标签任务的模式。(The application discloses a text processing method and device, and relates to the technical field of text data processing. The specific implementation mode comprises the following steps: the method comprises the steps of cutting words of a text to obtain a plurality of words, and classifying the words according to preset categories; determining the arrangement sequence of each word in the plurality of words based on the category of each word, and generating a word sequence conforming to the arrangement sequence; generating a target directed acyclic graph based on the arrangement sequence of the words in the word sequence; and in the metadata set, determining metadata respectively corresponding to each node of the target directed acyclic graph and the determined arrangement sequence of the metadata. According to the method and the device, the texts can be classified to generate the target directed acyclic graph capable of reflecting word categories, so that corresponding metadata and the arrangement sequence among the metadata are accurately determined in the metadata set, and the texts are efficiently and accurately described. The method is suitable for any mode of semantic tag conversion task of the tag computing system based on big data.)

1. A method of processing text, the method comprising:

the method comprises the steps of cutting words of a text to obtain a plurality of words, and classifying the words according to preset categories;

determining the arrangement sequence of each word in the plurality of words based on the category of each word, and generating a word sequence conforming to the arrangement sequence;

generating a target directed acyclic graph based on the sequence order of the words in the word sequence, wherein the primary key of each node in the target directed acyclic graph comprises at least one word with the same category in the word sequence;

in a metadata set, metadata respectively corresponding to each node of the target directed acyclic graph and an arrangement sequence of the determined metadata are determined, wherein the metadata set comprises the metadata corresponding to the nodes, and the arrangement sequence of the determined metadata is the arrangement sequence of the nodes corresponding to the determined metadata.

2. The method of claim 1, wherein the generating a target directed acyclic graph based on the rank order of the terms in the sequence of terms comprises:

determining a directed acyclic graph corresponding to the word sequence as an initial directed acyclic graph according to the arrangement sequence of the words in the word sequence by taking at least one word with the same category in the word sequence as a primary key of a single node;

and carrying out topological sequencing on each node in the initial directed acyclic graph to obtain a target directed acyclic graph.

3. The method according to claim 1 or 2, wherein the nodes in the target directed acyclic graph and the nodes corresponding to the metadata in the metadata set both have identification information;

the determining, in the metadata set, metadata respectively corresponding to each node of the target directed acyclic graph and an arrangement order between the determined metadata includes:

taking the nodes of the target directed acyclic graph as target nodes, determining a metadata set corresponding to the category of the words contained in the primary key of each target node for each target node, and combining the metadata set into a target metadata set;

the determined metadata is metadata of which the identification information of the corresponding node is consistent with the identification information of the target node in the target metadata set.

4. The method of claim 1 or 2, wherein the determining a ranking order of respective terms of the plurality of terms based on their categories and generating a sequence of terms that conforms to the ranking order comprises:

generating an initial word sequence comprising each word in the plurality of words according to the arrangement sequence of each word in the text;

and adjusting the arrangement sequence of the words in the initial word sequence based on the sequencing priority of the words in each category, and generating the word sequence according with the adjusted arrangement sequence.

5. The method of claim 1, wherein the word-cutting the text to obtain a plurality of words comprises:

utilizing a full segmentation method to segment the text into at least two words;

and taking each unit word and unit word phrase in the at least two words as the plurality of words.

6. The method of claim 1, wherein metadata exists in a preset category, the method further comprising:

the determined metadata, the determined ranking order between the metadata, and the determined category of the metadata are persisted.

7. An apparatus for processing text, the apparatus comprising:

the word segmentation unit is configured to segment words of the text to obtain a plurality of words, and classify the words according to preset categories;

a category determination unit configured to determine an arrangement order of each of the plurality of words based on a category of the each word, and generate a word sequence conforming to the arrangement order;

a generating unit configured to generate a target directed acyclic graph based on an arrangement order of words in the word sequence, wherein a primary key of each node in the target directed acyclic graph includes at least one word with the same category in the word sequence;

a metadata determining unit configured to determine, in a metadata set, metadata respectively corresponding to each node of the target directed acyclic graph and an arrangement order between the determined metadata, wherein the metadata set includes the metadata corresponding to the nodes, and the arrangement order between the determined metadata is an arrangement order between the nodes corresponding to the determined metadata.

8. The apparatus of claim 7, wherein the generating unit is further configured to perform the generating the target directed acyclic graph based on the rank order of the words in the sequence of words as follows:

determining a directed acyclic graph corresponding to the word sequence as an initial directed acyclic graph according to the arrangement sequence of the words in the word sequence by taking at least one word with the same category in the word sequence as a primary key of a single node;

and carrying out topological sequencing on each node in the initial directed acyclic graph to obtain a target directed acyclic graph.

9. The apparatus according to claim 7 or 8, wherein the nodes in the target directed acyclic graph and the nodes corresponding to the metadata in the metadata collection each have identification information;

the metadata determining unit is further configured to determine metadata respectively corresponding to each node of the target directed acyclic graph and an arrangement order between the determined metadata in the metadata set according to the following manner:

taking the nodes of the target directed acyclic graph as target nodes, determining a metadata set corresponding to the category of the words contained in the primary key of each target node for each target node, and combining the metadata set into a target metadata set;

the determined metadata is metadata of which the identification information of the corresponding node is consistent with the identification information of the target node in the target metadata set.

10. The apparatus according to claim 7 or 8, wherein the category determining unit is further configured to perform the determining of the ranking order of the respective words based on the category of the respective words in the plurality of words, and the generating of the sequence of words that conforms to the ranking order as follows:

generating an initial word sequence comprising each word in the plurality of words according to the arrangement sequence of each word in the text;

and adjusting the arrangement sequence of the words in the initial word sequence based on the sequencing priority of the words in each category, and generating the word sequence according with the adjusted arrangement sequence.

11. The apparatus of claim 7, wherein the word segmentation unit is further configured to perform the word segmentation of the text into a plurality of words as follows:

utilizing a full segmentation method to segment the text into at least two words;

and taking each unit word and unit word phrase in the at least two words as the plurality of words.

12. The apparatus of claim 7, wherein the metadata exists in a preset category, the apparatus further comprising:

a persistence unit configured to persistently store the determined metadata, the determined arrangement order between the metadata, and the determined category of the metadata.

13. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

14. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-6.

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of text data processing, and particularly relates to a text processing method and device.

Background

The application scenarios of the text processing technology are very diverse, for example, intelligent voice interaction. At present, the text Processing method is mainly Natural Language Processing (NLP) technology.

In many cases, if the amount of text to be processed is large, processing the text using natural language processing techniques is likely to consume a large amount of computing time, and often requires reliance on third party platforms to implement natural language processing of the text, thereby further increasing time and capital costs.

Disclosure of Invention

A text processing method, a text processing device, an electronic device and a storage medium are provided.

According to a first aspect, there is provided a text processing method, including: the method comprises the steps of cutting words of a text to obtain a plurality of words, and classifying the words according to preset categories; determining the arrangement sequence of each word based on the category of each word in the plurality of words, and generating a word sequence conforming to the arrangement sequence; generating a target directed acyclic graph based on the sequence of the words in the word sequence, wherein the primary key of each node in the target directed acyclic graph comprises at least one word with the same category in the word sequence; in a metadata set, metadata respectively corresponding to each node of a target directed acyclic graph and an arrangement sequence of the determined metadata are determined, wherein the metadata set comprises the metadata corresponding to the nodes, and the arrangement sequence of the determined metadata is the arrangement sequence of the nodes corresponding to the determined metadata.

According to a second aspect, there is provided a text processing apparatus comprising: the word segmentation unit is configured to segment words of the text to obtain a plurality of words and classify the words according to preset categories; a category determination unit configured to determine an arrangement order of each word among the plurality of words based on a category of each word, and generate a word sequence conforming to the arrangement order; the generating unit is configured to generate a target directed acyclic graph based on the arrangement sequence of the words in the word sequence, wherein the primary key of each node in the target directed acyclic graph comprises at least one word with the same category in the word sequence; and the metadata determining unit is configured to determine metadata respectively corresponding to each node of the target directed acyclic graph and an arrangement sequence of the determined metadata in a metadata set, wherein the metadata set comprises the metadata corresponding to the nodes, and the arrangement sequence of the determined metadata is the arrangement sequence of the nodes corresponding to the determined metadata.

According to a third aspect, there is provided an electronic device comprising: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of the method of processing of text.

According to a fourth aspect, there is provided a computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the method of any of the embodiments of the processing method as text.

According to the scheme of the application, the texts can be classified to generate the target directed acyclic graph capable of reflecting word categories, so that corresponding metadata and the arrangement sequence among the metadata are accurately determined in the metadata set, and the texts are efficiently and accurately described.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram to which some embodiments of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method of processing text according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a method of processing text according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method of processing text according to the present application;

FIG. 5 is a schematic block diagram of one embodiment of a processing device according to the present application;

fig. 6 is a block diagram of an electronic device for implementing a text processing method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of a method or apparatus for processing text of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as video applications, live applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the terminal devices 101, 102, and 103.

Here, the terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the terminal devices 101, 102, 103. The background server may analyze and otherwise process the received data such as the text, and feed back a processing result (e.g., the determined metadata and the determined arrangement order between the metadata) to the terminal device.

It should be noted that the text processing method provided in the embodiment of the present application may be executed by the server 105 or the terminal devices 101, 102, and 103, and accordingly, the text processing apparatus may be disposed in the server 105 or the terminal devices 101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method of processing text according to the present application is shown. The text processing method comprises the following steps:

step 201, performing word segmentation on the text to obtain a plurality of words, and classifying the plurality of words according to a preset category.

In this embodiment, an execution subject (for example, the server or the terminal device shown in fig. 1) on which the text processing method operates may segment the text to obtain a plurality of words. The execution main body may classify the plurality of words according to a category preset for the word, so as to classify each of the plurality of words into any one of the preset categories. The words obtained by word segmentation can be characters, words or whole sentences.

In practice, the above-mentioned multiple words may be all words generated by word segmentation of the text, or may be partial words in the words generated by word segmentation.

Optionally, the preset categories may include at least two of the following 4 categories: entity class, operation class, number class, and merge relationship class. The category of a word may be not only the category to which the word belongs, i.e., one of the above-mentioned entity class, operation class, and numeric class, but also a merged relationship class between the word and other words as an entity class.

Specifically, the words of the entity class refer to entity words in the text. Operation class words refer to verbs in text, such as words of filtering, classification, and so on. The numeric class refers to words indicating number, times, frequency and the like, for example, "times greater than or equal to 7" includes words "times," "greater than," "equal to" and "7", which are all words in the numeric class and can be used as a primary key in a node in the directed acyclic graph.

The merge relationship class refers to the merge relationship between words of the entity class. Words having the same field may be merged. Merging refers to treating at least two words as a whole, such as merging the at least two words into a new word. When merging, a preset merging rule may be used for merging (for example, a preset merging relation is used).

Step 202, determining the arrangement order of each word based on the category of each word in the plurality of words, and generating a word sequence according with the arrangement order.

In this embodiment, the execution main body may determine an arrangement order of each of the plurality of words based on the category of each of the plurality of words, so as to generate a word sequence conforming to the arrangement order.

In practice, the execution body may determine the arrangement order of the words based on the category of the words in various ways. For example, each preset category may correspond to a priority of an arrangement order, and the higher the priority, the earlier the arrangement order is, the execution main body may directly generate the word sequence according to the priority of the category of each word. The order of the words in the same category, that is, the same sorting priority, may be arranged according to a preset rule, such as randomly arranged or arranged according to the order of the words in the text.

Step 203, generating a target directed acyclic graph based on the sequence order of the words in the word sequence, wherein the primary key of each node in the target directed acyclic graph comprises at least one word with the same category in the word sequence.

In this embodiment, the execution subject may generate a directed acyclic graph as the target directed acyclic graph based on the sequence order of the words in the word sequence. The target directed acyclic graph includes a plurality of nodes, and a primary key (key) of each node may include at least one term in a sequence of terms. If the primary key of a node includes more than one word, the words included in the primary key of the node are words of the same category (i.e., a preset category).

In practice, the execution agent may generate the target directed acyclic graph based on the ranking order in various ways. For example, the execution body may use the words that include the same field (i.e., the field with the same position in the word sequence) and have the same category as the primary key of a node in the word sequence, and generate the primary key of each node in the target directed acyclic graph and the arrangement order between each node, that is, the directional relationship, according to the arrangement order of the words in the word sequence. In the word sequence, the word that is the primary key to the node may be preceded by a word that is the primary key to the node (i.e., to the node that is pointed to the node).

Step 204, in the metadata set, metadata respectively corresponding to each node of the target directed acyclic graph and an arrangement sequence of the determined metadata are determined, wherein the metadata set comprises the metadata corresponding to the nodes, and the arrangement sequence of the determined metadata is the arrangement sequence of the nodes corresponding to the determined metadata.

In this embodiment, the executing entity may determine, for a node (e.g., each node) in the target directed acyclic graph, metadata corresponding to the node in one metadata set, and determine an arrangement order between the determined metadata. The metadata is structured data which corresponds to the nodes in the text and can embody the characteristics and the content of the words contained in the primary keys in the nodes, and the metadata is used for organizing, describing, retrieving, storing and managing text resources.

The metadata set corresponds to the category of the words contained in the primary key of the node. Only this category of metadata is included in the metadata collection. The category of the metadata is preset, and accordingly, the set of metadata is also preset. The execution subject or other electronic device may then perform a more in-depth analysis of the resulting metadata based on the categories. The execution body may add the same tag to the metadata determined for the text and the arrangement order between the metadata. The metadata set includes a plurality of metadata respectively corresponding to each node, that is, one node corresponds to one metadata. The nodes corresponding to each metadata in the metadata set may include nodes in the target directed acyclic graph, and may also include other nodes.

In practice, the determined metadata may be stored in files of various formats, such as Extensible Markup Language (XML) files.

Alternatively, the execution subject or other electronic device may instantiate a node in the target directed acyclic graph and store the instantiated result.

The method provided by the embodiment of the application can generate the target directed acyclic graph capable of reflecting word categories by classifying the texts, and further accurately determine the corresponding metadata and the arrangement sequence among the metadata in the metadata set, so that compared with the natural language processing technology in the prior art, the text can be more efficiently and accurately described.

In some optional implementation manners of this embodiment, nodes in the target directed acyclic graph and nodes corresponding to metadata in the metadata set both have identification information; step 204 may include: taking the nodes of the target directed acyclic graph as target nodes, determining a metadata set corresponding to the category of the words contained in the primary key of each target node for each target node, and combining the metadata set into a target metadata set; the determined metadata is metadata of which the identification information of the corresponding node is consistent with the identification information of the target node in the target metadata set.

In these alternative implementations, the execution subject may determine a set of metadata for the node of the target directed acyclic graph (e.g., each node of the target directed acyclic graph). And determining the metadata corresponding to the node in the target metadata set.

In practice, the executing entity or other electronic device may search the target metadata set for metadata corresponding to the node. Specifically, the identification information of the node is consistent with the identification information of the node corresponding to the searched metadata.

Specifically, the preset category corresponding to the determined metadata set is the category of the node. For each metadata set, the preset category of the metadata in the metadata set is the preset category corresponding to the metadata set. The set of metadata may be a data set in various forms, such as a data table.

In practice, the identification information of each node in the target directed acyclic graph is different. The identification information may include the primary key and may also include other information such as the category of the primary key, etc. If the primary keys included in the identification information are different, the identification information must be different, and if the primary keys included in the identification information are the same, the identification information may be the same or different.

These implementations can accurately find the metadata corresponding to each node in the set of metadata corresponding to the class of the node.

In some optional implementations of this embodiment, step 202 may include: generating an initial word sequence comprising each word according to the arrangement sequence of each word in the plurality of words in the text; and adjusting the arrangement sequence of the words in the initial word sequence based on the ordering priority of the words in each category, and generating the word sequence according with the adjusted arrangement sequence.

In these alternative implementations, the execution main body may be first arranged in the arrangement order of the words in the text. Each category, i.e., the words of the preset category, has its corresponding ranking priority. For example, the ordering priority of the entity words may be the highest level. The words of the same category may be sorted in the adjusted word sequence according to the order of arrangement of the words in the text. The execution body may also find entity words having a merged relationship, such as for entity words having the same field. In response to determining that the initial word sequence includes at least two words in the entity class having a merge relationship, the execution main body may adjust the at least two words in the entity class to be adjacent in the arrangement order.

In practice, the execution body may adjust the arrangement order of the words based on the order priority of the words in various ways. For example, the execution body may adjust the word with the highest ranking priority to the top of the word sequence, and adjust the word with the second priority to the rear of the word with the highest ranking priority. The sorting priorities of the words in different categories may also be the same, and the precedence order between the words in different categories with the same sorting priority may be arbitrary.

The implementation modes can generate word sequences meeting preset sequencing standards according to the sequencing priorities of different types of words on the basis of the sequence of the words in the text.

In some optional implementations of this embodiment, the segmenting the text in step 201 to obtain a plurality of words may include: utilizing a full segmentation method to segment the text into at least two words; and taking each unit word and unit word phrase in the at least two words as a plurality of words.

In these alternative implementations, the execution subject may perform word segmentation on the text by using a full segmentation method, and the word segmentation result is at least two words. Then, the execution main body may use each unit word in the at least two words and each unit word phrase as a word in the plurality of words. Specifically, the unit word phrase may be a phrase or a sentence composed of at least two unit words.

In practice, there may be a coincidence of fields in at least two words resulting from the full segmentation method. For example, the text "litchi ripe", and after word segmentation by the full segmentation method, at least two words may have "litchi", "litchi" and "branch". The words valid here are only the unit word "litchi", and "litchi" may thus be taken as a word among the above-mentioned words.

The implementation modes can utilize a full segmentation method to improve the recall rate of the words obtained by segmentation, and the effectiveness and the accuracy of the words are improved by acquiring unit words and unit word phrases.

In some optional implementations of this embodiment, the metadata has a preset category, and the method further includes: the determined metadata, the determined ranking order between the metadata, and the determined category of the metadata are persisted.

In these alternative implementations, the execution body may also perform persistent storage on the determined metadata, the determined arrangement order between the metadata, and the determined category of the metadata. Specifically, for each determined metadata, the arrangement order between the metadata and other determined metadata, and the category of the metadata may be stored correspondingly.

These implementations can persist various information of the determined metadata, thereby enabling persistent saving of text-related data.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the text processing method according to the present embodiment. In the application scenario of fig. 3, the execution main body 301 cuts words of the text to obtain a plurality of words 302, and classifies the words according to N preset categories. The execution main body 301 determines the arrangement order of each word from the plurality of words based on the category of each word, and generates a word sequence 303 conforming to the arrangement order. The execution body 301 generates a target directed acyclic graph 304 based on the arrangement order of the words in the word sequence 303, wherein the primary key of each node in the target directed acyclic graph includes at least one word in the word sequence 303 with the same category. The execution subject 301 determines metadata 305 respectively corresponding to each node of the target directed acyclic graph 304 in a metadata set, and an arrangement order 306 between the determined metadata 305, where the metadata set includes metadata corresponding to the nodes, and the arrangement order 306 between the determined metadata is an arrangement order between the nodes corresponding to the determined metadata.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method of processing text is shown. The process 400 includes the following steps:

step 401, performing word segmentation on the text to obtain a plurality of words, and classifying the plurality of words according to a preset category.

In this embodiment, an execution subject (for example, the server or the terminal device shown in fig. 1) on which the text processing method operates may segment the text to obtain a plurality of words. The execution main body may classify the plurality of words according to a category preset for the word, so as to classify each of the plurality of words into any one of the preset categories. The words obtained by word segmentation can be characters, words or whole sentences.

Step 402, determining the arrangement order of each word based on the category of each word in the plurality of words, and generating a word sequence according with the arrangement order.

In this embodiment, the execution main body may determine an arrangement order of each of the plurality of words based on the category of each of the plurality of words, so as to generate a word sequence conforming to the arrangement order.

And step 403, determining a directed acyclic graph corresponding to the word sequence as an initial directed acyclic graph according to the sequence of the words in the word sequence by taking at least one word with the same category in the word sequence as a primary key of a single node.

In this embodiment, the execution subject may determine, by using one word or at least two words with the same category in the word sequence as a primary key of a single node, an initial directed acyclic graph corresponding to an arrangement order of the words in the word sequence according to the arrangement order. That is, according to the arrangement order of the words in the word sequence, the primary keys of the nodes in the initial directed acyclic graph and the arrangement order among the nodes, that is, the directional relation, are generated.

And step 404, performing topological sequencing on each node in the initial directed acyclic graph to obtain a target directed acyclic graph.

In this embodiment, the executing body may perform topology sorting on each node in the initial directed acyclic graph to obtain a target directed acyclic graph. The executing body can obtain a topological sorting result by using a topological sorting rule. Then, the executing entity may directly take the topology ranking result as the target directed acyclic graph, but in order to ensure the validity of the target directed acyclic graph, the executing entity may first verify whether the topology ranking result is loop-free, that is, loop-free. If the verification result is no loop, the execution subject may use the topology sorting result as the target directed acyclic graph.

Step 405, in a metadata set, metadata respectively corresponding to each node of the target directed acyclic graph and an arrangement sequence of the determined metadata are determined, wherein the metadata set includes the metadata corresponding to the nodes, and the arrangement sequence of the determined metadata is the arrangement sequence of the nodes corresponding to the determined metadata.

In this embodiment, the executing entity may determine, for a node (e.g., each node) in the target directed acyclic graph, metadata corresponding to the node in one metadata set, and determine an arrangement order between the determined metadata.

According to the embodiment, the positions of the nodes in the initial directed acyclic graph can be adjusted through topological sorting, so that the sorting relation of the nodes in the generated target directed acyclic graph has more topological stability and accuracy.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of a text processing apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which may include the same or corresponding features or effects as the embodiment of the method shown in fig. 2, in addition to the features described below. The device can be applied to various electronic equipment.

As shown in fig. 5, the text processing apparatus 500 of the present embodiment includes: a word segmentation unit 501, a category determination unit 502, a generation unit 503, and a metadata determination unit 504. The word segmentation unit 501 is configured to segment words of the text to obtain a plurality of words, and classify the words according to preset categories; a category determination unit 502 configured to determine an arrangement order of each word based on a category of each word in the plurality of words, and generate a word sequence conforming to the arrangement order; a generating unit 503 configured to generate a target directed acyclic graph based on an arrangement order of words in the word sequence, wherein a primary key of each node in the target directed acyclic graph includes at least one word with the same category in the word sequence; a metadata determining unit 504 configured to determine, in a metadata set, metadata corresponding to each node of the target directed acyclic graph and an arrangement order between the determined metadata, wherein the metadata set includes the metadata corresponding to the nodes, and the arrangement order between the determined metadata is an arrangement order between the nodes corresponding to the determined metadata.

In this embodiment, specific processes of the word segmentation unit 501, the category determination unit 502, the generation unit 503, and the metadata determination unit 504 of the text processing apparatus 500 and technical effects brought by the specific processes can refer to related descriptions of step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2, which are not described herein again.

In some optional implementations of this embodiment, the generating unit is further configured to perform generating the target directed acyclic graph based on an arrangement order of words in the word sequence as follows: determining a directed acyclic graph corresponding to the word sequence as an initial directed acyclic graph according to the arrangement sequence of the words in the word sequence by taking at least one word with the same category in the word sequence as a primary key of a single node; and carrying out topological sequencing on each node in the initial directed acyclic graph to obtain a target directed acyclic graph.

In some optional implementation manners of this embodiment, nodes in the target directed acyclic graph and nodes corresponding to metadata in the metadata set both have identification information; the metadata determining unit is further configured to determine metadata respectively corresponding to each node of the target directed acyclic graph and the determined arrangement order of the metadata in the metadata set according to the following modes: taking the nodes of the target directed acyclic graph as target nodes, determining a metadata set corresponding to the category of the words contained in the primary key of each target node for each target node, and combining the metadata set into a target metadata set; the determined metadata is metadata of which the identification information of the corresponding node is consistent with the identification information of the target node in the target metadata set.

In some optional implementations of the embodiment, the category determining unit is further configured to perform determining an arrangement order of the respective words based on the category of each of the plurality of words, and generating the word sequence conforming to the arrangement order as follows: generating an initial word sequence comprising each word according to the arrangement sequence of each word in the plurality of words in the text; and adjusting the arrangement sequence of the words in the initial word sequence based on the ordering priority of the words in each category, and generating the word sequence according with the adjusted arrangement sequence.

In some optional implementations of the embodiment, the word segmentation unit is further configured to perform word segmentation on the text to obtain a plurality of words as follows: utilizing a full segmentation method to segment the text into at least two words; and taking each unit word and unit word phrase in the at least two words as a plurality of words.

In some optional implementations of this embodiment, the metadata exists in a preset category, and the apparatus further includes: a persistence unit configured to persistently store the determined metadata, the determined arrangement order between the metadata, and the determined category of the metadata.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, the embodiment of the present application is a block diagram of an electronic device of a text processing method. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for processing text provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the method of processing text provided herein.

The memory 602, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the text processing method in the embodiment of the present application (for example, the word segmentation unit 501, the category determination unit 502, the generation unit 503, and the metadata determination unit 504 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing, i.e., implementing the processing method of the text in the above method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 602.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the processing electronics of the text, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, which may be connected to text processing electronics over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the text processing method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the processing electronics for text, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a word segmentation unit, a category determination unit, a generation unit, and a metadata determination unit. The names of these units do not in some cases form a limitation on the units themselves, and for example, a generation unit may also be described as a "unit that generates a target directed acyclic graph based on the arrangement order of words in a word sequence".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: the method comprises the steps of cutting words of a text to obtain a plurality of words, and classifying the words according to preset categories; determining the arrangement sequence of each word based on the category of each word in the plurality of words, and generating a word sequence conforming to the arrangement sequence; generating a target directed acyclic graph based on the sequence of the words in the word sequence, wherein the primary key of each node in the target directed acyclic graph comprises at least one word with the same category in the word sequence; in a metadata set, metadata respectively corresponding to each node of a target directed acyclic graph and an arrangement sequence of the determined metadata are determined, wherein the metadata set comprises the metadata corresponding to the nodes, and the arrangement sequence of the determined metadata is the arrangement sequence of the nodes corresponding to the determined metadata.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于共现语言网络的文本关键词自动抽取方法和装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!