Title correction method and device for multimedia information, electronic equipment and storage medium

文档序号:1215829 发布日期:2020-09-04 浏览:8次 中文

阅读说明:本技术 多媒体信息的标题修正方法、装置、电子设备及存储介质 (Title correction method and device for multimedia information, electronic equipment and storage medium ) 是由 陈小帅 于 2020-05-27 设计创作,主要内容包括:本发明提供了一种基于人工智能的多媒体信息的标题修正方法、装置、电子设备及计算机可读存储介质;方法包括:对多媒体信息进行类型识别处理,得到多媒体信息的类型;对多媒体信息的标题进行错误识别处理,得到标题中的错误位置;根据错误位置的文本搜索与类型对应的候选修正数据库,得到用于修正错误位置的文本的多个候选修正文本;对多个候选修正文本进行筛选,将筛选后得到的候选修正文本作为修正文本,并将标题的错误位置的文本替换为修正文本,以形成多媒体信息的正确标题。通过本发明,能够自动并准确地修正多媒体信息的标题,提高标题修正的效率。(The invention provides a title correction method, a title correction device, electronic equipment and a computer-readable storage medium for multimedia information based on artificial intelligence; the method comprises the following steps: performing type identification processing on the multimedia information to obtain the type of the multimedia information; carrying out error identification processing on the title of the multimedia information to obtain an error position in the title; searching a candidate correction database corresponding to the type according to the text of the error position to obtain a plurality of candidate correction texts for correcting the text of the error position; and screening a plurality of candidate corrected texts, taking the candidate corrected texts obtained after screening as corrected texts, and replacing the texts at the wrong positions of the titles with the corrected texts to form the correct titles of the multimedia information. The invention can automatically and accurately correct the title of the multimedia information and improve the efficiency of title correction.)

1. A title modification method for multimedia information based on artificial intelligence, the method comprising:

performing type identification processing on multimedia information to obtain the type of the multimedia information;

carrying out error identification processing on the title of the multimedia information to obtain an error position in the title;

searching a candidate correction database corresponding to the type according to the text of the error position to obtain a plurality of candidate correction texts for correcting the text of the error position;

screening the candidate corrected texts, taking the candidate corrected texts obtained after screening as corrected texts, and

and replacing the text of the wrong position of the title with the corrected text to form the correct title of the multimedia information.

2. The method of claim 1, wherein prior to said performing type identification processing on the multimedia information, the method further comprises:

extracting features of a plurality of modalities of the multimedia information;

wherein, when the multimedia information is a video, the characteristics of the plurality of modalities include: a video fusion feature, an audio fusion feature, and a text feature of a title of the multimedia information.

3. The method of claim 2, wherein said extracting features of a plurality of modalities of the multimedia information comprises:

coding each video frame in the multimedia information to obtain vector representation of each video frame, and performing fusion processing on the vector representation of each video frame to obtain the video fusion characteristics;

coding each audio frame in the multimedia information to obtain vector representation of each audio frame, and performing fusion processing on the vector representation of each audio frame to obtain the audio fusion characteristics;

and coding the text at each position in the title of the multimedia information to obtain a corresponding vector, and combining the vectors at each position into a vector sequence to be used as the text characteristic of the title.

4. The method of claim 2, wherein performing type identification processing on the multimedia information to obtain the type of the multimedia information comprises:

fusing the video fusion feature, the audio fusion feature and the text feature to obtain a multi-modal fusion feature of the multimedia information;

mapping the multi-modal fusion features to probabilities corresponding to a plurality of candidate multimedia information types, and

and determining the candidate multimedia information type with the maximum probability as the type of the multimedia information.

5. The method of claim 2, wherein said performing error identification processing on the title of the multimedia information to obtain the error position in the title comprises:

and mapping the text features of the title to correspond to the error probability of each position in the title, and determining the position with the error probability larger than an error threshold value as the error position.

6. The method of claim 1,

the type identification processing of the multimedia information comprises the following steps:

the type recognition processing is carried out by calling a video type classification submodel in the multitask recognition model;

the error identification processing of the title of the multimedia information comprises the following steps:

the error recognition process is performed by invoking an error classification submodel in the multitask recognition model.

7. The method of claim 6,

before the performing type identification processing on the multimedia information, the method further includes:

performing type recognition processing on a multimedia information sample through the multi-task recognition model to obtain a prediction type of the multimedia information sample, and obtaining the prediction type of the multimedia information sample

Carrying out error identification processing on the title of the multimedia information sample to obtain a prediction error position in the title;

constructing a loss function of the multi-task identification model according to the prediction type of the multimedia information sample, the multimedia information type label of the multimedia information sample, the prediction error position in the multimedia information sample and the error position label in the multimedia information sample;

and updating the parameters of the multi-task recognition model until the loss function is converged, and taking the updated parameters of the multi-task recognition model when the loss function is converged as the parameters of the trained multi-task recognition model.

8. The method of claim 7, further comprising:

extracting partial text in the title of the multimedia information positive sample from the multimedia information positive sample set;

querying a text library for an error text corresponding to the partial text;

replacing part of the text in the title with the error text to generate a multimedia information negative sample containing the error text, and

and determining the position of the error text as the error position label of the multimedia information negative sample.

9. The method of claim 1, wherein searching a candidate correction database corresponding to the type according to the text of the error position to obtain a plurality of candidate correction texts for correcting the text of the error position comprises:

for a candidate correction database corresponding to the type of the multimedia information, performing at least one of the following processes:

inquiring the candidate correction text corresponding to the pinyin of the text at the error position;

inquiring the candidate corrected texts corresponding to the fonts of the texts at the wrong positions;

and inquiring the candidate corrected texts corresponding to partial texts in the texts at the error positions.

10. The method according to claim 1, wherein the screening the candidate modified texts, and using the candidate modified texts obtained after the screening as modified texts comprises:

for any one of the candidate corrected texts, performing the following processing:

replacing the text of the error position of the title with the candidate corrected text to generate a corrected title;

carrying out smoothness degree prediction processing on the title before correction through a language model to obtain the smoothness degree of the title before correction;

carrying out smoothness degree prediction processing on the corrected title through the language model to obtain the smoothness degree of the corrected title;

taking the difference value of the smoothness degrees before and after title correction as the language smoothness degree of the candidate corrected text;

and when the language smoothness degree of the candidate corrected text is greater than the threshold value of the language smoothness degree corresponding to the type of the multimedia information, taking the candidate corrected text as the corrected text of the title.

11. The method of claim 10,

the language model comprises a type personalized language model and a general language model;

the obtaining the smoothness degree of the modified title by performing smoothness degree prediction processing on the modified title through the language model comprises:

carrying out smoothness degree prediction processing on the corrected title through the type personalized language model to obtain a first smoothness degree of the corrected title;

conducting passing degree prediction processing on the corrected title through the universal language model to obtain a second passing degree of the corrected title;

carrying out weighted summation on the first compliance degree and the second compliance degree to obtain the final compliance degree of the corrected title;

the type personalized language model is obtained by training according to multimedia information samples corresponding to the types of the multimedia information, and the universal language model is obtained by training according to multimedia information samples including all types of the multimedia information.

12. The method of claim 10, wherein before the determining the candidate corrected text as the corrected text for the title, the method further comprises:

performing word segmentation processing on the title before the correction to obtain the number of texts included in the title before the correction;

performing word segmentation processing on the corrected title to obtain the number of texts included in the corrected title;

taking the difference value of the number of texts included before and after the title is corrected as a reference threshold value of the title;

and determining the difference value between the language type threshold value corresponding to the type of the multimedia information and the reference threshold value of the title as the language compliance degree threshold value corresponding to the type of the multimedia information.

13. An apparatus for title modification of multimedia information, the apparatus comprising:

the identification module is used for carrying out type identification processing on the multimedia information to obtain the type of the multimedia information; carrying out error identification processing on the title of the multimedia information to obtain an error position in the title;

the search module is used for searching a candidate correction database corresponding to the type according to the text of the error position to obtain a plurality of candidate correction texts for correcting the text of the error position;

a screening module for screening the candidate corrected texts, taking the candidate corrected texts obtained after screening as corrected texts, and

and the replacing module is used for replacing the text at the wrong position of the title with the corrected text so as to form the correct title of the multimedia information.

14. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing the artificial intelligence based multimedia information title modification method of any one of claims 1 to 12 when executing the executable instructions stored in the memory.

15. A computer-readable storage medium storing executable instructions for causing a processor to perform the method for title modification of artificial intelligence based multimedia information according to any one of claims 1 to 12 when executed.

28页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于表情词典与情感常识的微博情感分析方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!