Recommended sentence generation device, recommended sentence generation method, and computer-readable recording medium

文档序号:1170275 发布日期:2020-09-18 浏览:15次 中文

阅读说明:本技术 推荐语句生成装置、推荐语句生成方法和计算机可读记录介质 (Recommended sentence generation device, recommended sentence generation method, and computer-readable recording medium ) 是由 铃木功一 于 2020-03-09 设计创作,主要内容包括:本公开提供了一种推荐语句生成装置、推荐语句生成方法和计算机可读记录介质。根据本发明的推荐语句生成装置是生成关于设施的推荐语句的推荐语句生成装置。所述推荐语句生成装置配备有:选择单元,其基于与所述设施相关联的话题词的出现频率,选择关于所述设施书写的文档数据;以及校正单元,其校正选择的文档数据中包括的预定词。(The present disclosure provides a recommended sentence generation apparatus, a recommended sentence generation method, and a computer-readable recording medium. A recommended sentence generating apparatus according to the present invention is a recommended sentence generating apparatus that generates a recommended sentence for a facility. The recommendation sentence generation apparatus is provided with: a selection unit that selects document data written with respect to the facility based on an appearance frequency of topic words associated with the facility; and a correction unit that corrects a predetermined word included in the selected document data.)

1. A recommendation sentence generating apparatus that generates a recommendation sentence on a subject, comprising:

a selecting unit that selects a document written about the topic based on an appearance frequency of topic words associated with the topic; and

a correcting unit that corrects a predetermined word included in the selected document.

2. The recommendation statement generation apparatus according to claim 1, further comprising:

an extraction unit that extracts an important sentence from the selected document based on an importance degree indicating reliability of information, wherein

The correction unit corrects the predetermined word included in the important sentence.

3. The recommendation statement generation apparatus according to claim 2, further comprising:

an importance calculating unit that calculates the importance of a sentence included in the selected document based on a word common to a plurality of sentences in the selected document.

4. The recommendation statement generating apparatus according to claim 3, wherein,

the importance calculating unit further calculates the importance of the sentence included in the selected document based on an amount of additional information associated with the topic.

5. The recommendation statement generation apparatus according to claim 3 or 4, wherein

The importance calculating unit calculates the importance of the sentence included in the selected document using the weight corresponding to the feature word associated with the topic.

6. The recommendation statement generation apparatus according to any one of claims 1 to 5, wherein

The correction unit performs at least one of a fixed conversion of converting the predetermined word into another predetermined word, a random conversion of converting the predetermined word into one of a plurality of other predetermined words, and an addition of adding another predetermined word to the predetermined word.

7. The recommendation sentence generation apparatus according to any one of claims 1 to 6, further comprising:

a classification unit that classifies the document into one of a plurality of topic clusters based on the topic word, wherein,

the selecting unit determines a main topic cluster from the plurality of topic clusters based on the number of classified documents, and selects a document classified into the main topic cluster.

8. The recommendation sentence generation apparatus according to claim 7, further comprising:

a total value calculation unit that quantizes each word of a predetermined part of speech included in the document and calculates a total value of the document

The classification unit classifies the document into one of the plurality of topic clusters based on the total value.

9. The recommendation statement generating apparatus according to claim 7 or 8, wherein

The classification unit classifies the document into one of the plurality of topic clusters by using an unsupervised data classification method.

10. A recommendation sentence generating method for generating a recommendation sentence on a subject, comprising:

a step of selecting a document written on the topic based on the frequency of occurrence of topic words associated with the topic; and

a step of correcting a predetermined word included in the selected document.

11. A computer-readable recording medium storing a recommendation sentence generation program executed by a computer to generate a recommendation sentence on a subject, comprising:

a step of selecting a document written on the topic based on the frequency of occurrence of topic words associated with the topic; and

a step of correcting a predetermined word included in the selected document.

Technical Field

The present invention relates to a recommended sentence generation device, a recommended sentence generation method, and a computer-readable recording medium.

Background

Conventionally, there is known an abstract sentence generation apparatus that deletes some unnecessary words from important sentences extracted by text shaping means and deletes one or some important sentences satisfying specific conditions (see japanese patent laid-open No. 7-43717 (JP 7-43717A)).

Disclosure of Invention

However, documents propagated through a Social Network Service (SNS) or the like are composed of sentences written in a free style. Such documents include, for example, symbols, pictorial symbols, Uniform Resource Locators (URLs), languages other than japanese, such as english, or sentences that include grammatical errors. Therefore, each sentence that is not corrected in the document is not suitable as, for example, a sentence for recommending a subject such as a facility or the like.

Accordingly, an object of the present invention is to provide a recommended sentence generating apparatus, a recommended sentence generating method, and a computer-readable recording medium, which can generate a sentence suitable as a recommended sentence on a subject.

A recommended sentence generating apparatus according to an aspect of the present invention is a recommended sentence generating apparatus that generates a recommended sentence about a subject. The recommendation sentence generation apparatus is provided with: a selecting unit that selects a document written about the topic based on an appearance frequency of topic words associated with the topic; and a correction unit that corrects the predetermined word included in the selected document.

A recommendation statement generation method according to another aspect of the present invention is a recommendation statement generation method for generating a recommendation statement regarding a subject. The recommendation statement generation method comprises the following steps: a step of selecting a document written on the topic based on the frequency of occurrence of topic words associated with the topic; and a step of correcting a predetermined word included in the selected document.

A computer-readable recording medium according to still another aspect of the present invention stores a recommendation sentence generation program that is executed by a computer to generate a recommendation sentence for a subject by the following steps. These steps include: a step of selecting a document written on the topic based on the frequency of occurrence of topic words associated with the topic; and a step of correcting a predetermined word included in the selected document.

According to the present invention, a sentence suitable as a recommendation sentence on a subject can be generated.

Drawings

Features, advantages and technical and industrial significance of exemplary embodiments of the present invention will be described below with reference to the accompanying drawings, wherein like reference numerals denote like elements, and wherein:

fig. 1 is a configuration diagram showing a schematic configuration of a recommendation sentence generation apparatus according to one of the embodiments;

fig. 2 is a view showing a schematic configuration of the facility cluster shown in fig. 1;

fig. 3 is a view showing a schematic configuration of the topic cluster shown in fig. 1;

FIG. 4 is a view showing a data structure of the parts-of-speech table shown in FIG. 1;

fig. 5 is a view showing an example of calculation of the importance of a sentence contained in selected document data;

FIG. 6 is a view showing a data structure of a weight table shown in FIG. 1;

fig. 7 is a view showing another example of calculation of the importance of a sentence contained in selected document data;

fig. 8 is a view showing a data structure of the fixed conversion table shown in fig. 1;

fig. 9 is a view showing a data structure of the random conversion table shown in fig. 1;

FIG. 10 is a view showing a data structure of an additional table shown in FIG. 1; and

fig. 11 is a flowchart showing a schematic operation of a recommendation sentence generation apparatus according to one of the embodiments.

Detailed Description

One of the embodiments of the present invention will be described below. In the drawings to be mentioned below, the same or similar components or elements are denoted by the same or similar reference numerals. It should be noted, however, that the drawings are schematic. Further, the technical scope of the present invention should not be construed as being limited to those embodiments.

Fig. 1 to 11 are intended to propose a recommended sentence generation apparatus, a recommended sentence generation method, and a recommended sentence generation program according to one of the embodiments. First, an overall configuration of a recommendation sentence generation apparatus according to one of the embodiments will be described with reference to fig. 1 to 10. Fig. 1 is a configuration diagram showing a schematic configuration of a recommended sentence generating apparatus 100 according to one of the embodiments. Fig. 2 is a view showing a schematic configuration of the facility cluster 32 shown in fig. 1. Fig. 3 is a view showing a schematic configuration of the topic cluster 33 shown in fig. 1. Fig. 4 is a view showing a data structure of the parts-of-speech table 34 shown in fig. 1. Fig. 5 is a view showing an example of calculation of the importance of a sentence contained in selected document data. Fig. 6 is a view showing a data structure of the weight table 35 shown in fig. 1. Fig. 7 is a view showing another example of calculation of the importance of a sentence contained in selected document data. Fig. 8 is a view showing a data structure of the fixed conversion table 36 shown in fig. 1. Fig. 9 is a view showing a data structure of the random conversion table 37 shown in fig. 1. Fig. 10 is a view showing a data structure of the additional table 38 shown in fig. 1.

The recommendation sentence generation apparatus 100 is designed to generate a recommendation sentence (also referred to as a recommendation sentence) about a subject such as a facility or the like. The subject of the recommendation sentence is not necessarily a facility, but may be, for example, an event, a place, a space, and the like. Incidentally, for the sake of simple explanation, the following description will be given under the assumption that the subject of the recommendation sentence is a facility.

As shown in fig. 1, the recommended sentence generating apparatus 100 is provided with, for example, a communication unit 10, an output unit 20, a storage unit 30, and a control unit 40. In addition, the recommendation sentence generation apparatus 100 is also provided with a bus 99 configured to transfer signals and data between the respective units of the recommendation sentence generation apparatus 100.

The communication unit 10 is designed to communicate (transmit and receive) data. The communication unit 10 is configured to be able to establish communication via the network NW based on one or more predetermined communication systems. In the case where the network NW or the other network combined with the network NW is the internet, at least one of the communication systems of the communication unit 10 is a communication system conforming to the internet protocol.

The output unit 20 is configured to output information. The output unit 20 is configured to include, for example, a display device such as a liquid crystal display, an Electroluminescence (EL) display, a plasma display, or the like. In the case of this example, the output unit 20 may output information by causing a display device to display text data such as characters, numerals, symbols, and the like, image data, video data, and the like.

The storage unit 30 is configured to store programs, data, and the like. The storage unit 30 is configured to include, for example, a hard disk drive, a solid state drive, and the like. The storage unit 30 stores in advance various programs executed by the control unit 40, data necessary for executing the programs, and the like.

Further, the storage unit 30 stores the cleaned document file 31, the facility cluster 32, and the topic cluster 33.

The cleaned-up document file 31 is a collection of a plurality of file data. The document data is data on a document for SNS. Further, the cleaned document file 31 includes a plurality of data-cleaned document data. That is, document data that is not necessary for generating a recommended sentence, for example, document data that does not include recommended content, document data that is not suitable as recommendation, document data that is considered news or notification, document data regarding unnecessary content, and the like are removed from the cleaned after-document file 31.

The facility clusters 32 are designed to form a facility group about which similar impressions or feelings are expressed. As shown in fig. 2, the facility cluster 32 includes, for example, 12 facility clusters 32-1 through 32-12. At least one facility is classified into each of the facility clusters 32-1 through 32-12. For example, the facility cluster 32-1 is a facility cluster about which a "savory" or similar impression or sensation is expressed, and the facility cluster 32-2 is a facility cluster about which a "clean" or similar impression or sensation is expressed. By thus aggregating facilities that are the subjects of recommended sentences into some groups each producing a similar impression, by omitting common processing, reducing the number of repetitions, and the like, efficiency can be made higher than a case where each of the facilities is considered. The facility clusters 32-1 through 32-12 will be collectively referred to hereinafter as "facility clusters 32".

Topic clusters 33 are designed to form document groups, each document group including topics in the same direction. As shown in fig. 3, the topic clusters 33 include, for example, 40 topic clusters 33-1 to 33-40. Topic clusters 33-1 through 33-40 are formed for each facility cluster 32. As a result, the document data included in the cleaned document file are each classified into one of the facility clusters 32-1 to 32-12, and each is also classified into one of the topic clusters 33-1 to 33-40 (12 × 40 — 480 classifications). For example, topic cluster 33-1 is a topic cluster for "savory", topic cluster 33-2 is a topic cluster for "cost-effective high/satiety", and topic cluster 33-3 is a topic cluster for "dessert/dessert". In addition, for example, the topic cluster 33-4 is a topic cluster about "crowd/reserve", and the topic cluster 33-5 is a topic cluster about "fashion/clean". By thus aggregating the document data into groups each including topics in the same direction, it is possible to specify a group of topics associated with a facility in the document data. Hereinafter, the topic clusters 33-1 to 33-40 are collectively referred to as "topic clusters 33".

Returning to the description of fig. 1, the storage unit 30 also stores a part-of-speech table 34, a weight table 35, a fixed conversion table 36, a random conversion table 37, and an additional table 38. These tables will be described later.

Returning to the description of fig. 1, the control unit 40 is configured to control the operations of the respective units of the recommendation sentence generation apparatus 100 such as the communication unit 10, the output unit 20, the storage unit 30, and the like. Further, the control unit 40 is configured to realize respective functions to be described later by, for example, executing a program stored in the storage unit 30. The control unit 40 is configured to include, for example, a processor such as a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or the like, a memory such as a Read Only Memory (ROM), a Random Access Memory (RAM), or the like, and a buffer memory device such as a buffer.

Further, the control unit 40 is configured with, for example, a total value calculation unit 41, a classification unit 42, a selection unit 43, an importance calculation unit 44, an extraction unit 45, and a correction unit 46 as functional configurations thereof.

The total value calculation unit 41 is configured to quantize words of a predetermined part of speech included in document data, and calculate a total value of the document data.

Specifically, the total value calculation unit 41 divides each document data included in the cleaned document file into a morpheme column by morpheme analysis, and determines the part of speech of each morpheme. Subsequently, the total value calculation unit 41 extracts a predetermined part of speech, for example, a part of speech that is lexically meaningful, more specifically, words that are nouns, verbs, adjectives, adverbs, and exclamations, in each document data by using the part of speech table 34 stored in the storage unit 30. In other words, the total value calculation unit 41 removes grammatically meaningful functional words, such as an assistant word, an assistant verb, and the like.

As shown in fig. 4, the parts-of-speech table 34 stores a quantization flag, a total flag, and an importance flag as one record for each part-of-speech and each part-of-speech information. The total value calculation unit 41 extracts at least one word that coincides with each part of speech and each part of speech information quantitatively labeled "1" from the document data. In the case where there are a plurality of matching words, the total value calculation unit 41 extracts all those words from the document data.

Returning to the description of fig. 1, subsequently, the total value calculation unit 41 quantizes the meaning of each extracted word based on the relationship between the appearance positions of adjacent words in document data by using a classifier (not shown) generated by mechanical learning. The classifier for quantifying the meaning of each Word is generated by a method (also referred to as "algorithm" or "model", and the same applies hereinafter) such as Word2Vec or the like that represents each Word as a vector. Incidentally, the classifier may be generated by the recommendation sentence generation apparatus 100, or may be generated by other apparatus and received via the network NW and the communication unit 10.

Subsequently, the total value calculation unit 41 extracts a word that coincides with each part of speech and each piece of part of speech information of which the total flag is "1" in each document data by using the part of speech table 34 shown in fig. 4. Subsequently, the total value calculation unit 41 calculates a total value by summing up the numerical values of the respective extraction words in the document data. Thus, the total value of each document data is calculated, and the content mentioned in each document data is quantized.

Incidentally, in the present application, the term "word" is required to be at least shorter than a sentence, and is used in a meaning including morphemes, individual words, expressions, phrases, and the like.

The classification unit 42 is configured to classify the document data into one of the plurality of topic clusters 33-1 to 33-40 based on the topic words associated with the facilities. In the foregoing example of topic cluster 33-1, the topic word associated with the facility is "savory (having two kanji characters and two hiragana characters)" or a word similar thereto. As words similar to "savory (having two kanji characters and two hiragana characters)", there may be mentioned, for example, "savory (having two kanji characters and one hiragana character)", "savory (having four hiragana characters)", "savory (having three hiragana characters)", "sweet", "like", "best", "pleasant", "many", and the like.

More specifically, the classification unit 42 is configured to classify each document data into one of a plurality of topic clusters based on the calculated total value. In this way, the total values of the document data including the topic words associated with each other are made close to each other by quantizing the words of predetermined parts of speech included in the document data, respectively, and calculating the total values of the document data, respectively. Therefore, the accuracy of classifying the document data into the topic clusters 33, respectively, can be improved based on the total value.

Specifically, the classification unit 42 classifies each document data into one of the 40 topic clusters 33-1 to 33-40 shown in fig. 3 by using an unsupervised data classification method, for example, a k-means method (also referred to as k-means clustering). In this way, by using the unsupervised data classification method, no supervision data is required, and classification of document data into topic clusters 33 is facilitated.

The selection unit 43 is configured to select document data written with respect to each facility based on the frequency of occurrence of the aforementioned topic words. In this way, document data suitable as a recommended sentence for each facility can be selected based on the frequency of occurrence of the topic words associated with each facility.

More specifically, the selection unit 43 is configured to determine at least one main topic cluster from the plurality of topic clusters 33-1 to 33-40 based on the number of the classified document data, and select the document data classified into the at least one main topic cluster.

Specifically, the selection unit 43 counts the number of document data classified into each of the topic clusters 33-1 to 33-40 for each facility, and determines the first three topic clusters and the topic cluster containing two or more document data as the main topic cluster. Then, the selection unit 43 selects the document data classified as the main topic cluster. In the case where there are a plurality of document data classified into the major topic cluster, the selection unit 43 selects all those document data. In this way, the document data written about the main topic relating to each facility is selected by determining the main topic cluster from the plurality of topic clusters 33-1 to 33-40 based on the number of the classified document data, and selecting the document data classified as the main topic cluster. Thereby, document data more suitable as a recommendation sentence for each facility can be selected.

The importance calculating unit 44 is configured to calculate the importance of each sentence included in the selected document data based on a word common to a plurality of sentences in the selected document data.

It should be noted here that the importance degree indicates the reliability of the information, and is an index for extracting an important sentence from document data. The important sentence is a sentence suitable for generating a recommended sentence about the facility as a subject. For example, an important sentence is a sentence that contains highly reliable information, contains a large amount of information, and includes an impression or evaluation representing a feature of a facility.

Specifically, the importance calculating unit 44 divides the document data selected by the selecting unit 43 into sentences based on delimiters (e.g., stop signs, periods, exclamation marks, question marks, spaces, etc.). In the case where one sentence obtained by the division satisfies a predetermined condition, when that one sentence is the first sentence in the document data, the importance calculating unit 44 generates a sentence by combining that one sentence with the following sentence, and when the sentence is not the first sentence in the document data, the importance calculating unit 44 generates a sentence by combining that one sentence with the immediately preceding sentence. On the other hand, in the case where one sentence obtained by the division does not satisfy the predetermined condition, the importance calculating unit 44 generates the sentence by directly using the one sentence. For example, the predetermined condition is that the number of characters in a sentence is less than a predetermined value, and/or that only an expression of an impression exists in a sentence as a result of the morpheme analysis.

Incidentally, in the present application, the term "sentence" includes one sentence or a series of meaningful sentences including two sentences obtained by combining one sentence and one sentence with each other.

In addition, the importance calculating unit 44 calculates the importance of each sentence in the selected document data. The importance of each sentence is calculated by using a method in which the importance increases as the number of words common to all sentences included in the selected document data increases, such as Lex Rank or the like. In this way, it is possible to easily calculate the importance representing the reliability of the information by calculating the importance of each sentence included in the selected document data based on the word common to the plurality of sentences in the selected document data.

Further, the importance calculating unit 44 is configured to calculate the importance of each sentence included in the selected document data further based on the amount of additional information associated with each facility.

For example, when the importance of the sentence in the selected document data is calculated for the facility "famous ancient city" respectively, the result as shown in fig. 5 is obtained. Sentences including many elements common to a plurality of sentences (e.g., "stairs", "cool", "interesting", "dog mountain city", etc., as indicated in bold in fig. 5) are high in importance. Further, the importance of a sentence including much additional information such as "welcome will", "famous-ancient city structure", etc., underlined in fig. 5 is higher than that of a sentence including only "fun". In this way, by calculating the importance of each sentence included in the selected document data further based on the amount of additional information associated with the facility, the importance of the sentence including a large amount of additional information can be made high, and the importance can be made to reflect the large amount of additional information.

Further, the importance calculating unit 44 is configured to calculate the importance of each sentence included in the selected document data by using the weight corresponding to each feature word associated with each facility.

Specifically, when a feature word associated with each facility is included in each sentence in the selected document data, the importance degree calculation unit 44 performs weighting, i.e., multiplication by a weight corresponding to the feature word, by using the parts-of-speech table 34 stored in the storage unit 30. In the present embodiment, the feature words associated with each facility are words expressing impressions and evaluations, which represent features of each facility classified into each of the facility clusters 32-1 to 32-12.

As shown in fig. 6, for each of the facility clusters 32-1 to 32-12, the value of the weight and the feature word corresponding to the weight are stored in the weight table 35. Incidentally, "facility cluster i (i is an integer of 1 to 12)" shown in fig. 6 corresponds to the facility cluster 32-j (j is an integer of 1 to 12). Incidentally, for words representing recommendations and common in the facilities of the respective facility clusters 32-1 to 32-12, the weights may be stored in the storage unit 30.

For example, in the case where the aforementioned facility "famous ancient city" is classified as the facility cluster 32-7, the sentence numbered "1" includes the feature word "cool" having the weight of "1.6". Therefore, the importance calculating unit 44 calculates the weighted importance "0.0268" by multiplying the unweighted importance by the weight. Likewise, the sentence numbered "2" includes the feature word "interesting" with a weight of "1.1". Therefore, the importance calculating unit 44 calculates the weighted importance "0.0185" by multiplying the unweighted importance by the weight. On the other hand, the statement numbered "2" does not include any of the characteristic words of the facility clusters 32-7. In this case, the importance calculating unit 44 calculates the weighted importance "0.0076" by multiplying the unweighted importance by, for example, the weight "0.5". In this way, by calculating the importance of each sentence contained in the selected document data using the weight corresponding to the feature word associated with the facility, the importance of the sentence containing the feature word can be made high, and the importance can be made to reflect whether or not there is an impression of the facility, an evaluation of the facility, and a word indicating a recommended facility.

The extraction unit 45 is configured to extract an important sentence from the selected document data based on the importance.

Specifically, the extraction unit 45 extracts a sentence having the highest importance degree among the selected document data as an important sentence. Therefore, an important sentence having the highest importance is extracted for each facility.

The correction unit 46 is configured to correct a predetermined word included in the selected document data. It should be noted here that the inventors of the present invention have found that a sentence is realized by correcting a predetermined word in the sentence. As a result, a sentence suitable as a recommended sentence for a facility can be generated by correcting a predetermined word in document data selected suitable for the recommended sentence for the facility.

More specifically, the correcting unit 46 is configured to correct a predetermined word included in the extracted important sentence. In this way, by correcting the predetermined word included in the extracted important sentence, it is possible to correct the important sentence whose reliability of information is high, and generate a sentence more suitable as a recommended sentence about a facility.

Specifically, if the predetermined expression is at the beginning of the important sentence, the correcting unit 46 first deletes the predetermined expression. The predetermined expressions include, for example, symbols, words of a predetermined part of speech (e.g., exclamatory words, conjunctions, helpers, etc.), and expressions relating to date and time (e.g., "yesterday," "today," "last week," "this week," etc.).

Subsequently, the correcting unit 46 converts the predetermined word included in the important sentence before correction into another predetermined word by using the fixed conversion table 36 stored in the storage unit 30.

As shown in fig. 8, the fixed conversion table 36 is a table in which words before conversion and words after conversion are paired with each other. In the case where a word stored in the conversion front column exists in the important sentence before correction or at the end of the sentence, the correction unit 46 converts the word into a word stored in the conversion rear column in the corresponding row. For example, "just gone … … (with two Chinese characters and in tongue)" in or at the end of the important sentence before correction is converted into "just gone … … (with one Chinese character and in tongue)".

Further, the correction unit 46 randomly converts a predetermined word included in the important sentence before correction into one of a plurality of other predetermined words by using the random conversion table 37 stored in the storage unit 30.

As shown in fig. 9, the random conversion table 37 is a table in which a word before conversion is paired with a plurality of words after conversion. In the case where a word stored in a column before conversion exists in the important sentence before correction or at the sentence end thereof, the correction unit 46 randomly converts the word into a word stored in one of the post-conversion candidate 1 column, the post-conversion candidate 2 column, the post-conversion candidate 3 column, or the post-conversion candidate 4 column in the corresponding row. For example, "tasty (with three hiragana characters)" in an important sentence or at the end of a sentence before correction is converted into "tasty (with two katakana characters and one hiragana character)", "tasty (with one kanji and one hiragana character)", "tasty (with two kanji and one hiragana character)" or "tasty (with two kanji and two hiragana characters)". In the case where the number of post-conversion candidates is less than 4, the word is randomly converted into a word in a range corresponding to the number of post-conversion candidates.

Subsequently, in the case where the end of the important sentence has a question mark or a period, the correction unit 46 leaves the end of the important sentence as it is. Otherwise, the correction unit 46 adds a period to the end of the important sentence. Next, in the case where the sentence end of the corrected important sentence has a predetermined word, the correcting unit 46 adds another predetermined word thereto by using the additional table 38 stored in the storage unit 30.

As shown in fig. 10, the additional table 38 is a table in which the target word and the additional word are paired with each other. In a case where the word stored in the target column exists at the end of the corrected important sentence, the correction unit 46 adds thereto the word stored in the additional column in the corresponding row. For example, "(it) is very good. "added to the end of the sentence of the corrected important sentence" (i) just gone ", thus yielding" (i) just gone. (it) is very good. "furthermore," (it) is very good. "added to the end of the sentence of the corrected important sentence" (i am). ", thus get" (i) go … …. (it) is very good. ". In this way, the correction unit 46 performs at least one of fixed conversion of a predetermined word into other predetermined words, random conversion of a predetermined word into one of a plurality of other predetermined words, and addition of other predetermined words to a predetermined word. As a result, a sentence suitable as a recommendation sentence for each facility can be easily generated.

The respective functions of the control unit 40 may be realized by a program executed by a computer (microprocessor). Accordingly, the respective functions possessed by the control unit 40 may be realized by hardware, software, or a combination of hardware and software, and should not be limited to any one of them.

Further, in the case where the respective functions of the control unit 40 are realized by software or a combination of hardware and software, their processes may be performed in a multitasking manner, a multithreading manner, or both of the multitasking manner and the multithreading manner, and are not limited to any one of them.

Incidentally, the cleaning of the later document file 31, the facility cluster 32, the topic cluster 33, the part of speech table 34, the weight table 35, the fixed conversion table 36, the random conversion table 37, and the additional table 38 should not be limited to the above-described examples in structure and form. For example, each of the cleaned-up document file 31, the facility cluster 32, the topic cluster 33, the part-of-speech table 34, the weight table 35, the fixed conversion table 36, the random conversion table 37, and the additional table 38 may be only a data or a database. Further, in the case where at least one of the cleaned-up document file 31, the facility cluster 32, the topic cluster 33, the part-of-speech table 34, the weight table 35, the fixed conversion table 36, the random conversion table 37, and the additional table 38 is a database, the grouping unit of data can be subdivided by normalization.

Next, a general operation of the recommendation sentence generation apparatus according to one of the embodiments will be described with reference to fig. 11. Fig. 11 is a flowchart showing a schematic operation of the recommendation sentence generation apparatus 100 according to one of the embodiments.

For example, when a plurality of document data included in the cleaned document file 31 are each classified into one of a plurality of topic clusters 33-1 to 33-40, the recommended sentence generating apparatus 100 executes the recommended sentence generating process S200 shown in fig. 11.

Incidentally, in the following description, it is assumed that document data are each classified into one of the plurality of topic clusters 33-1 to 33-40.

First, the selecting unit 43 determines a main topic cluster from the plurality of topic clusters 33-1 to 33-40 based on the number of the classified document data, and selects the document data classified as the main topic cluster (S201).

Subsequently, the importance calculating unit 44 calculates the importance of each sentence in the document data selected in step S201 based on the word shared in the plurality of sentences in the document data selected in step S201 (S202).

Subsequently, the extraction unit 45 extracts an important sentence from the document data selected in step S201 based on the importance degree calculated in step S202 (S203).

Subsequently, the correction unit 46 corrects the predetermined word in the important sentence extracted in step S203 (S204). Thereby, a recommendation sentence about the facility is generated.

Subsequently, the correction unit 46 outputs the recommended sentence generated by step S204 to the output unit 20 (S205). Incidentally, instead of outputting the recommended sentence generated by step S204 to the output unit 20 or in addition to outputting the recommended sentence generated by step S204 to the output unit 20, the correction unit 46 may transmit the recommended sentence generated by step S204 to another device via the communication unit 10 and the network NW.

In the present embodiment, an example is proposed in which document data included in the cleaned document file 31 are each classified into one of the plurality of topic clusters 33-1 to 33-40 before the recommendation sentence generation process S200 is started, but the present invention should not be limited thereto. The document data included in the cleaned document file 31 may be classified into the plurality of topic clusters 33-1 to 33-40, respectively, as a step (process) in the recommendation sentence generation processing S200.

Exemplary embodiments of the present invention have been described above. With the recommended sentence generating apparatus 100, the recommended sentence generating method, and the recommended sentence generating program according to the present embodiment, document data written about a facility is selected based on the frequency of appearance of topic words associated with the facility. Accordingly, document data suitable for a recommendation sentence about a facility can be selected. Further, the predetermined words included in the selected document data are corrected. It should be noted here that the inventors of the present invention have found that a sentence is realized by correcting a predetermined word in the sentence. As a result, a sentence suitable as a recommended sentence for a facility can be generated by correcting a predetermined word in the selected document data suitable for the recommended sentence for the facility.

The above examples are intended to facilitate the understanding of the invention and are not intended to explain the invention in any limiting manner. The respective elements provided in the embodiments and the arrangement, materials, conditions, shapes, sizes, and the like thereof are not limited to those illustrated, but may be appropriately changed. Furthermore, the configurations presented in the different embodiments may be partially replaced or combined with each other.

22页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:会话特征提取方法、会话识别模型训练方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!