News content safety monitoring method, system, device and storage medium

文档序号：907626 发布日期：2021-02-26 浏览：20次中文

阅读说明：本技术 一种新闻内容安全监测方法、系统、装置和存储介质 (News content safety monitoring method, system, device and storage medium ) 是由康维舒斌贺弘联方华孔泽平王冠华周珞肖顺红陈光林周欣霓谢宇于 2020-11-19 设计创作，主要内容包括：本申请涉及一种新闻内容安全监测方法、系统、装置和存储介质,包括建立监测词列表,监测词列表内包含错别字、敏感词和红标词；依次调取所述监测词列表内的错别字、敏感词和红标词,与新闻内容匹配；若新闻内容存在监测词列表内的错别字、敏感词和红标词,则根据出现的监测词做出不同级别的警报提示,以提示编辑对所述新闻内容修改；获取并识别所述新闻内容中的图片和视频,根据图片和视频做相关处理；发送修改后的新闻内容至主编端,以提示主编对所述新闻内容人工检查,修改后的新闻内容确认无误后对修改后的新闻内容进行发布。本申请具有减小新闻发布内容的错字、敏感词和红标词的几率,提高用户的阅读体验效果。(The application relates to a news content safety monitoring method, a system, a device and a storage medium, which comprises the steps of establishing a monitoring word list, wherein the monitoring word list contains wrongly written characters, sensitive words and red-marked words; sequentially calling wrongly-written characters, sensitive words and red mark words in the monitoring word list, and matching the wrongly-written characters, the sensitive words and the red mark words with news contents; if the news content has wrongly written characters, sensitive words and red mark words in the monitoring word list, alarm prompts of different levels are made according to the appearing monitoring words so as to prompt editing to modify the news content; acquiring and identifying pictures and videos in the news content, and performing related processing according to the pictures and the videos; and sending the modified news content to a main editing end to prompt the main editing to manually check the news content, and releasing the modified news content after the modified news content is confirmed to be correct. The method and the device have the advantages that the probability of wrong words, sensitive words and red-marked words of news release contents is reduced, and the reading experience effect of a user is improved.)

1. A news content security monitoring method is characterized by comprising the following steps:

establishing a monitoring word list, wherein the monitoring word list comprises wrongly written characters, sensitive words and red mark words;

sequentially calling wrongly-written characters, sensitive words and red mark words in the monitoring word list, and matching the wrongly-written characters, the sensitive words and the red mark words with news contents;

if the news content has wrongly written characters, sensitive words and red mark words in the monitoring word list, alarm prompts of different levels are made according to the appearing monitoring words so as to prompt editing to modify the news content;

acquiring and identifying pictures or videos in the news content, and performing related processing according to the pictures or the videos, wherein the related processing comprises coding or deleting processing on the pictures or the videos according to the categories of the pictures or the videos;

and sending the modified news content to a main editing end to prompt the main editing end to manually check the news content through the main editing end, and releasing the modified news content after the modified news content is confirmed to be error-free.

2. The method of claim 1, wherein the obtaining and identifying the picture in the news content, and the performing the relevant processing according to the picture comprises:

preparing a corresponding preset number of sample pictures for each picture category, and calibrating the picture category corresponding to each sample;

training a recognition model of a preset type by using the sample picture;

acquiring pictures in the news content;

inputting the picture into the recognition model for recognition, and outputting a recognition result;

performing relevant processing on the picture according to the identification result;

if the recognition result is that the face of the person appears on the picture, the face of the person appearing on the picture is coded;

and if the recognition result is that the picture is violent, deleting the picture.

3. The method of claim 2, wherein the obtaining and identifying the picture in the news content, and the performing the relevant processing according to the picture further comprises:

acquiring the picture;

comparing the picture with a picture in a cloud server;

acquiring all websites in the cloud server which are the same as the pictures;

acquiring the picture release dates of all websites, and selecting the author of the website picture with the earliest release date;

and marking the lower part of the picture in the news content to be transferred from the author.

4. The method of claim 2, wherein the obtaining and identifying a video in the news content, and the performing the correlation process according to the video further comprises:

acquiring all frame images of the video, and identifying corresponding pixel values of the frame images;

comparing corresponding pixel values of the adjacent frame images, if the difference value of the corresponding pixel values of the adjacent frame images is larger than a preset value, extracting a next frame image of the two adjacent frame images, and defining the next frame image as a scene frame;

acquiring a website which simultaneously contains videos of all the scene frames in the cloud server;

acquiring the video release date of the website, and selecting the author of the website video with the earliest release date;

the annotation below the video in the news content is transferred from the author.

5. The method of claim 1, wherein sending the modified news content to the home agent comprises:

the main editing end receives the modified news content, so that the main editing end carries out manual inspection on the modified news content through the main editing end;

if new wrongly-written characters, new sensitive words and new red-marked words appear, the main editing end marks the new wrongly-written characters, the new sensitive words and the new red-marked words and supplements the new wrongly-written characters, the new sensitive words and the new red-marked words to the monitoring word list;

and sending the marked news content to the editing end so that the editing can modify the marked news content.

6. The method of claim 3, wherein after the editing end completes modifying the news content, the method further comprises:

extracting keywords of the news content, and matching the keywords with the news content in the cloud server according to the keywords;

selecting a website with news content with matching degree larger than a preset value, and acquiring the release time of the news content in the website;

and acquiring the time of the editing end after the news content is modified, comparing the time with the release time, and if the difference value is greater than a preset value, judging that the modified news content does not have timeliness.

7. The method of claim 3, wherein after the editing end completes modifying the news content, the method further comprises:

extracting keywords of the modified news contents, and matching the keywords with the news contents with higher search times ranking in the cloud server;

and if the modified news content is matched with the news content with higher rank of the search times, generating a high-heat identifier and sending the high-heat identifier to a main editing end so that the main editing end can release the modified news content to a position where a display end is more striking.

8. A news content security monitoring system, comprising:

the device comprises a creating device and a monitoring word list, wherein the monitoring word list comprises wrongly written characters, sensitive words and red mark words;

the matching device is used for sequentially calling wrongly-written characters, sensitive words and red-marked words in the monitoring word list and matching the wrongly-written characters, the sensitive words and the red-marked words with news contents;

the warning device is used for making warning prompts of different levels according to the appearing monitoring words to prompt editing to modify the news content if the news content has wrongly written characters, sensitive words and red mark words in the monitoring word list;

the processing device is used for acquiring and identifying pictures and videos in the news content and performing related processing according to the pictures and the videos;

the first judgment device extracts keywords of the news content and matches the news content in the cloud server according to the keywords; selecting a website with news content with matching degree larger than a preset value, and acquiring the release time of the news content in the website; acquiring the time of the editing end after the news content is modified, comparing the time with the release time, and if the difference value is greater than a preset value, judging that the modified news content does not have timeliness;

the second judgment device extracts the keywords of the modified news content and matches the news content with higher rank of search times in the cloud server according to the keywords; if the modified news content is matched with the news content with the higher rank of the search times, generating a high-heat identifier, and sending the high-heat identifier to a main editing end so that the main editing end can release the modified news content to a position where a display end is more striking;

and the confirming device is used for sending the modified news content to the main editing end so as to prompt the main editing to manually check the news content, and the modified news content is released after the modified news content is confirmed to be error-free.

9. A news content detecting apparatus, comprising:

the system comprises a creating module, a searching module and a searching module, wherein the creating module is used for creating a monitoring word list, and the monitoring word list comprises wrongly-written characters, sensitive words and red-marked words;

the matching module is used for sequentially calling wrongly written characters, sensitive words and red mark words in the monitoring word list and matching the wrongly written characters, the sensitive words and the red mark words with news contents;

the alarm module is used for making alarm prompts of different levels according to the appearing monitoring words to prompt editing to modify the news content if the news content has wrongly written characters, sensitive words and red mark words in the monitoring word list;

the processing module is used for acquiring and identifying pictures and videos in the news content and carrying out related processing according to the pictures and the videos;

the first judgment module extracts keywords of the news content and matches the news content in the cloud server according to the keywords; selecting a website with news content with matching degree larger than a preset value, and acquiring the release time of the news content in the website; acquiring the time of the editing end after the news content is modified, comparing the time with the release time, and if the difference value is greater than a preset value, judging that the modified news content does not have timeliness;

the second judgment module is used for extracting the keywords of the modified news contents and matching the keywords with the news contents with higher search times ranking in the cloud server; if the modified news content is matched with the news content with the higher rank of the search times, generating a high-heat identifier, and sending the high-heat identifier to a main editing end so that the main editing end can release the modified news content to a position where a display end is more striking;

and the confirmation module is used for sending the modified news content to the main editing end so as to prompt the main editing to manually check the news content, and the modified news content is released after the modified news content is confirmed to be error-free.

10. A computer-readable storage medium, in which a computer program is stored which can be loaded by a processor and which executes the method of any one of claims 1 to 7.

Technical Field

The present application relates to the field of news content monitoring, and in particular, to a method, a system, an apparatus, and a storage medium for monitoring security of news content.

Background

In the process of writing news manuscripts, the situation of wrongly written or mispronounced characters can be avoided in editing, even contents with sensitive words or red mark words can be released carelessly in editing, at present, contents are generally checked by editing and written by a user, then the user is checked by a main editor, and the news contents are released after the contents are checked to be correct.

However, the probability of error is high in human inspection, and many factors are affected, for example, the mental state is not good on the day of editing or main editing, and the inspection effect is not good, so that the manuscripts with wrongly written characters, sensitive words and red-marked words are published on the network.

At present, the network has a high propagation speed and a wide propagation range, and if the published news content has wrongly written characters, sensitive words and red target words, the network is easy to bring about bad spurious influences and influence the reading experience of net friends.

Disclosure of Invention

In order to reduce the probability of wrong words, sensitive words and red mark words of news release contents, the application provides a method, a system and a device for monitoring the safety of the news contents.

In a first aspect, the present application provides a news content security monitoring method, which adopts the following technical scheme:

a news content security monitoring method comprises the following steps:

establishing a monitoring word list, wherein the monitoring word list comprises wrongly written characters, sensitive words and red mark words;

The present invention in a preferred example may be further configured to: the acquiring and identifying the picture in the news content, and the performing relevant processing according to the picture comprises the following steps:

preparing a corresponding preset number of sample pictures for each picture category, and calibrating the picture category corresponding to each sample;

training a recognition model of a preset type by using the sample picture;

acquiring pictures in the news content;

inputting the picture into the recognition model for recognition, and outputting a recognition result;

performing relevant processing on the picture according to the identification result;

if the recognition result is that the face of the person appears on the picture, the face of the person appearing on the picture is coded;

and if the recognition result is that the picture is violent, deleting the picture.

By adopting the technical scheme, training is carried out according to a plurality of sample pictures, different types of the recognition models are respectively marked, then the pictures in the news content are input into the recognition models, so that the type of the pictures is judged, if the pictures show the face of a person, the face of the person of the pictures is coded, and the phenomenon that the news content infringes the portrait right is avoided; if the picture is a violent picture, the picture is deleted, so that the violent picture is prevented from being sent to a network carelessly by editing, and adverse effects are avoided.

The present invention in a preferred example may be further configured to: the acquiring and identifying the picture in the news content and performing related processing according to the picture further comprise:

acquiring the picture;

comparing the picture with a picture in a cloud server;

acquiring all websites in the cloud server which are the same as the pictures;

acquiring the picture release dates of all websites, and selecting the author of the website picture with the earliest release date;

and marking the lower part of the picture in the news content to be transferred from the author.

By adopting the technical scheme, the pictures in the news content are obtained, the same pictures are searched in the cloud server according to the pictures, all websites which issue the same pictures are found, the one with the earliest picture issuing date in all the websites is obtained, the picture issuing of the website is judged as the source file, the picture or the video is automatically marked below the picture of the news content and is changed into an exuberant author, and the infringement phenomenon after the news content is issued is avoided.

The present invention in a preferred example may be further configured to: the acquiring and identifying the video in the news content and performing relevant processing according to the video further comprise:

acquiring all frame images of the video, and identifying corresponding pixel values of the frame images;

acquiring a website which simultaneously contains videos of all the scene frames in the cloud server;

acquiring the video release date of the website, and selecting the author of the website video with the earliest release date;

the annotation below the video in the news content is transferred from the author.

By adopting the technical scheme, all frame images of the video in the news content are obtained, the first frame of each scene conversion in the video is found out according to the pixel value comparison of every two adjacent frames in all the frame images, the frame is defined as the scene frame, all videos in the cloud server are matched according to all the scene frames in the video, the video simultaneously containing all the scene frames is found out, the website which releases the same video earliest is obtained according to the found same or video, the video of the website is judged as the source file, the author of the picture transferred from the website is automatically marked under the video of the news content, and the infringement phenomenon after the news content is released is avoided

The present invention in a preferred example may be further configured to: the sending of the modified news content to the main editing end includes:

the main editing end receives the modified news content, so that the main editing end carries out manual inspection on the modified news content through the main editing end;

and sending the marked news content to the editing end so that the editing can modify the marked news content.

By adopting the technical scheme, after news contents are modified, the main editing is needed to manually check the modified news contents again, the situation that some words are not monitored is avoided, if new wrongly written characters, sensitive words or red-marked words appear, the main editing is used for supplementing the new wrongly written characters, sensitive words or red-marked words into the monitoring word list, the situation that the same errors occur in editing is avoided, the server still cannot monitor the same errors, the monitoring word list is continuously updated, and the probability that the wrongly written characters, sensitive words or red-marked words occur in the news contents is further reduced.

The present invention in a preferred example may be further configured to: after the editing end finishes modifying the news content, the method further comprises the following steps:

extracting keywords of the news content, and matching the keywords with the news content in the cloud server according to the keywords;

selecting a website with news content with matching degree larger than a preset value, and acquiring the release time of the news content in the website;

By adopting the technical scheme, the keywords of the news content are extracted according to the written news content, the keywords can be words frequently appearing in the title and the news content, similar news is searched in the cloud server according to the keywords, the release time of the similar news is obtained, if the news release time is too early, the news content does not have timeliness, the re-release significance of the news content is not large, the news manuscript can be archived by editing, if similar news events appear later, the content of the news manuscript can be extracted, and the news content is released in combination with the past news content, so that the news content is full.

The present invention in a preferred example may be further configured to: after the editing end finishes modifying the news content, the method further comprises the following steps:

extracting keywords of the modified news contents, and matching the keywords with the news contents with higher search times ranking in the cloud server;

By adopting the technical scheme, according to the keywords of the news content, the news content with higher rank of the matching search times is searched in the cloud server, if the news content is matched with the keywords of the news content, the news content is judged to be closer to the current affair hotspot, the news manuscript is considered to have higher exposure significance, and then the master edition is prompted to release the news content to a more striking position so as to improve the click rate.

In a second aspect, the present application provides a news content security monitoring system, which adopts the following technical solutions:

a news content security monitoring system, comprising:

the device comprises a creating device and a monitoring word list, wherein the monitoring word list comprises wrongly written characters, sensitive words and red mark words;

the processing device is used for acquiring and identifying pictures and videos in the news content and performing related processing according to the pictures and the videos;

By adopting the technical scheme, when a news manuscript is edited and written, the news manuscript is subjected to content safety monitoring, monitoring words in a monitoring word list are called and matched with the text of the news manuscript, if the monitoring words appear in the news manuscript, different alarm prompts are carried out aiming at different monitoring words so as to remind the editor of modifying the news manuscript; the monitoring words comprise wrongly-written characters, sensitive words and red logograms, wherein the sensitive words represent names of national leaders, national important institution leaders, song leaders in provinces and cities and institution leaders, certain sensitive events and the like, the red logograms represent words with sensitive political tendency, violence tendency, unhealthy colors or unlawful words and the like, if the words appear in the newsletters, the monitoring prompts editing to timely modify the news content after the words are detected, and the situation that the newsletters with the words are released carelessly to cause bad paradox influence is avoided; the server simultaneously carries out related processing on the pictures and the videos in the news content, so that violation phenomena of the pictures and the videos in the news manuscript are avoided, if the faces of the people appear, the faces need to be coded, and if violent pictures or videos appear, the server reminds the user to edit the violent pictures or videos needing to be deleted; searching pictures and videos to be released in the news content in the cloud server, finding out a website with the earliest releasing time, and automatically identifying that the pictures and the videos in the news content are transferred from the website, so that infringement is avoided; then, the timeliness of the news content is judged, if similar content is published and the publishing time is early, the publishing significance of the news content is not great, and the news manuscript can be archived for subsequent use; searching whether the news content is attached to the hot spot, performing corresponding processing according to whether the news is attached to the hot spot, and if the news content is attached to the hot spot, releasing the news content to a striking position so as to improve the click rate; when the editors modify the newsletters, the newsletters are sent to the master edition, so that the master edition manually checks to avoid the occurrence of unmonitored words, pictures or videos, further reduce the phenomenon that the illegal words, pictures or videos occur in the newsletters, further avoid causing poor spurious influences, and further improve the reading experience of readers.

In a third aspect, the news monitoring device provided by the application adopts the following technical scheme:

a news content detection apparatus, comprising:

the processing module is used for acquiring and identifying pictures and videos in the news content and carrying out related processing according to the pictures and the videos;

In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program that can be loaded by a processor and execute any one of the above methods for monitoring news content security.

In summary, the present application includes at least one of the following beneficial technical effects:

1. in the scheme, the server carries out safety monitoring on edited and written news contents, extracts the text of the news contents to be matched with wrongly written characters, sensitive words and red-marked words of a monitoring list, and if the wrongly written characters, the sensitive words and the red-marked words occur, relevant warning needs to be given so as to enable the editing to be correspondingly modified;

2. in the scheme, the server judges the edited and modified news content for the first time, judges whether the news manuscript has timeliness or not, and archives the news content for later use if the news content is released by people many days ago;

3. in the scheme, the server judges the edited and modified news content for the second time, judges whether the news draft is attached to the real-time hot spot, and prompts the main editor to release the news draft to a more striking position if the news draft is attached to the hot spot so as to attract the interest of readers and increase the click rate.

Drawings

Fig. 1 is a block diagram of a flow chart in a first embodiment of the present application.

Fig. 2 is a schematic diagram of a system in a second embodiment of the present application.

Fig. 3 is a block diagram of the third embodiment of the present application.

Detailed Description

The present application is described in further detail below with reference to figures 1-3.

The present embodiment is only for explaining the present invention, and it is not limited to the present invention, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present invention.

In the embodiment of the invention, the sensitive words represent names of national leaders, leaders of national important institutions, song leaders and institutional leaders in provinces, cities and counties, certain sensitive events and the like, and the red words represent words with sensitive political tendency, violence tendency, unhealthy colors, non-civilized words and the like.

The first embodiment is as follows:

a news content security monitoring method, referring to fig. 1, includes:

101. and establishing a monitoring word list, wherein the monitoring word list comprises wrongly written characters, sensitive words and red-marked words.

Specifically, a large amount of news text data, novel text data and magazine text data are obtained;

screening out wrongly written characters, sensitive words and red-marked words;

extracting the words and inputting the words into a monitoring word list;

the list of monitoring words is supplemented by the common errors that occur subsequently.

102. And sequentially calling wrongly-written characters, sensitive words and red-marked words in the monitoring word list to be matched with news contents.

Specifically, when a news manuscript is edited and written, the server monitors news contents, extracts wrongly written characters, sensitive words and red-marked words in the monitoring word list, and matches the wrongly written characters, the sensitive words and the red-marked words with the news contents to monitor whether the wrongly written characters, the sensitive words and the red-marked words appear in the news manuscript.

103. And if the news content has wrongly written characters, sensitive words and red mark words in the monitoring word list, making alarm prompts of different levels according to the appearing monitoring words so as to prompt editing to modify the news content.

Specifically, after the occurrence of wrongly written characters, sensitive words and red mark words in the news content is monitored, different levels of alarms are sent out, and if the occurrence of wrongly written characters in the news content is monitored, a modification message is sent to an editing end to prompt the editing to modify the wrongly written characters; if the sensitive words exist in the news content, the sensitive words are marked with yellow to prompt the user to edit the attention sensitive words, and whether the sensitive words are modified or not is considered; and if the fact that the news content has the red mark word is monitored, a forbidding message is sent to the editing end to prompt that the editing forbids to release the news content. The editing end can be a mobile phone end or a computer end.

After monitoring different types of monitoring words, respectively giving different alarm prompts to the different types of monitoring words, and sending a modification message to an editing end when wrongly written characters appear, wherein only the wrongly written characters need to be modified; when the sensitive words appear, the sensitive words are marked with yellow, on one hand, the yellow marks are more obvious, the warning effect is better, and on the other hand, the sensitive words are judged by prompting editing and whether the sensitive words can be applied to the news content or not; and when the red-marked words appear, sending a prohibition message to the editing, improving the vigilance mind of the editing, indicating that the content with the red-marked words is prohibited to be released strictly, and prompting that the security level of the news content is low and needs to be changed in time.

104. And acquiring and identifying pictures or videos in the news content, and performing related processing according to the pictures or the videos, wherein the related processing comprises coding or deleting processing on the pictures or the videos according to the categories of the pictures or the videos.

Specifically, a picture or a video in the news content is obtained, the category of the picture or the video is identified, and then relevant processing is carried out according to the category; for example, if the category of the picture or video is violent picture or video, the picture or video is deleted.

Further, preparing a corresponding preset number of sample pictures for each picture category, and calibrating the picture category corresponding to each sample;

training a recognition model of a preset type by using the sample picture;

acquiring pictures in the news content;

inputting the picture into the recognition model for recognition, and outputting a recognition result;

performing relevant processing on the picture according to the identification result;

if the recognition result is that the face of the person appears on the picture, the face of the person appearing on the picture is coded;

and if the recognition result is that the picture is violent, deleting the picture.

Specifically, the server acquires a large number of sample pictures of each category, specifically violent pictures and face pictures, and if the number of the sample pictures is enough, the identification accuracy is relatively higher; marking all sample pictures, and marking the corresponding picture type as a violent picture or a human face picture; then, continuously training the identification model by using the sample pictures to ensure that the identification result of the identification model is continuously accurate, inputting the pictures in the news content into the identification model by the server, preferably inputting each frame image of the video in the news content into the identification model, judging the pictures or the video, and deleting the pictures or the video if the pictures are judged to be violent pictures; and if the picture is judged to be the face picture, extracting the picture or the video, and coding the face part of the picture or the video.

Specifically, the step of recognizing the face part in the picture or the video is to recognize the face position and the area in the background by using a face recognition technology and perform coding processing on the position.

The formats of the pictures and the videos are not limited.

Further, acquiring the picture;

comparing the picture with a picture in a cloud server;

acquiring all websites in the cloud server which are the same as the pictures;

acquiring the picture release dates of all websites, and selecting the author of the website picture with the earliest release date;

and marking the lower part of the picture in the news content to be transferred from the author.

Specifically, the server first obtains a picture in the news content, and performs feature extraction on the picture, and the specific algorithm may be a SIFT descriptor, a fingerprint algorithm function, a bundling features algorithm, a hash function, or the like, or different algorithms may be designed according to different images, for example, an image feature is extracted by a method of local N-order moments of the image.

The features of the pictures are coded, the image codes in the cloud server are used as a lookup table, and for the pictures extracted from the news content, if the resolution of the pictures is high, the pictures can be subjected to down-sampling, so that the picture feature extraction and coding processing is performed after the calculation amount is reduced.

Carrying out global or local similarity calculation on an image database in an image search engine by using the image coding value in the news content; according to the required robustness, a threshold value is set, then the pictures with high similarity are pre-stored, then all the pictures with high similarity are screened to obtain better matched pictures, and a characteristic detection algorithm can be specifically used.

And after the pictures with higher similarity are selected, acquiring the websites of the pictures, capturing the picture publishing time of the websites, comparing the picture publishing time of the websites, selecting the website which is published earliest, acquiring the author ID of the pictures published in the website, automatically capturing and marking the author ID below the pictures in the news content to show that the pictures are transferred from the author, and avoiding infringement after the news is published.

Further, all frame images of the video are obtained, and corresponding pixel values of the frame images are identified;

acquiring a website which simultaneously contains videos of all the scene frames in the cloud server;

acquiring the video release date of the website, and selecting the author of the website video with the earliest release date;

the annotation below the video in the news content is transferred from the author.

Specifically, a video in the news content is obtained, an image of each frame in the video is obtained, a pixel value of each image is obtained, the pixel values of all two adjacent frames are compared, if a difference value of the pixel values of the two adjacent frames is larger than a preset value, a next frame of the two adjacent frames is extracted as a scene frame, and the scene frame represents a first frame of the video after each scene conversion.

Extracting and coding features of all scene frames, wherein the specific method is the same as the picture extraction, and the details are not repeated herein; dividing all scene frames in a video into a first scene frame, a second scene frame and a third scene frame … … nth scene frame, matching the first scene frame, the second scene frame and the third scene frame … … nth scene frame with all videos in a cloud server, screening through matching each time, screening out videos identical to the videos in news content, acquiring websites of the videos, capturing video release time of the websites, selecting the website which releases the videos at the earliest, acquiring an author ID of the videos released in the websites, automatically capturing and marking the author ID to the lower side of the videos in the news content to show that the videos are transferred from the author, and avoiding the phenomenon of infringement after news release.

105. Extracting keywords of the news content, and matching the keywords with the news content in the cloud server according to the keywords;

selecting a website with news content with matching degree larger than a preset value, and acquiring the release time of the news content in the website;

Specifically, the server extracts keywords in the news content, and the keywords may be titles or words with the largest occurrence frequency in the news content, and are matched with the news content in the cloud server.

In the matching process, some websites with active news release can be pre-selected and set, and matching is performed on the websites without matching all news websites.

And if news content consistent with the key word is found, acquiring the release time of the news content in the website, selecting the earliest release time, comparing the release time with the time for editing the news content to be released, and if the difference is greater than a preset value, judging that the news content does not have timeliness.

When the news content does not have timeliness, the meaning of the news content re-release is not large, the news manuscript can be archived by editing, if similar news events appear later, the content of the news manuscript can be proposed again, and the news content is enriched by combining the past news content release.

106. And extracting the keywords of the modified news contents, and matching the keywords with the news contents with higher search times ranking in the cloud server according to the keywords.

And if the modified news content is matched with the news content with higher rank of the search times, generating a high-heat identifier and sending the high-heat identifier to a main editing end so that the main editing end can release the modified news content to a position where a display end is more striking.

Specifically, keywords in the news content are extracted, and the keywords in this step are consistent with the keywords in step 105, and are not repeated here; the method comprises the steps of matching news contents with high rank according to keywords and search times in a server, specifically matching hot searches of browsers such as microblogs, hundredths and google, judging that the news contents are hot news if the news contents are matched with the hot searches, generating hot marks, sending the hot marks to a main editing end, and issuing the news contents to a position where a display end is more striking by the main editing end when the main editing end sees the hot marks, wherein the display end can be a mobile phone end or a computer end, and the striking position can be the top of an interface of the display end.

107. And sending the modified news content to a main editing end to prompt the main editing end to manually check the news content through the main editing end, and releasing the modified news content after the modified news content is confirmed to be error-free.

Specifically, after the edited newsfeed is modified, the newsfeed is sent to the primary editing end, so that the primary editing end performs manual inspection, the phenomenon that the illegal words, pictures or videos are not monitored is avoided, the phenomenon that the illegal words, pictures or videos appear in the newsfeed is further reduced, whether news content is published or not is determined by the primary editing, the undesirable spurious influence is further avoided, and meanwhile the reading experience of a reader is further improved. The main editing end can be a mobile phone end or a computer end.

Further, the main editing end receives the modified news content, so that the main editing end manually checks the modified news content through the main editing end;

and sending the marked news content to the editing end so that the editing can modify the marked news content.

Specifically, after the news content is modified, the main editor discovers that new wrongly written characters, sensitive words or red-marked words appear, supplements the new wrongly written characters, sensitive words or red-marked words into the monitoring word list, and avoids the phenomenon that the same errors occur in editing, the server still cannot monitor the new wrongly written characters, sensitive words or red-marked words, so that the monitoring word list is continuously updated, the monitoring word list is continuously perfected, the quantity of the monitored words is reduced, and the probability that the wrongly written characters, sensitive words or red-marked words appear in the news content is further reduced.

The implementation principle of the embodiment is as follows:

when a news manuscript is edited and written, performing content safety monitoring on the news manuscript, calling monitoring words in a monitoring word list and matching texts of the news manuscript, and if the monitoring words appear in the news manuscript, performing different alarm prompts aiming at different monitoring words to remind the editor to modify the news manuscript;

the monitoring words comprise wrongly-written characters, sensitive words and red-labeled words, and the monitoring words prompt the editing to modify the news content in time after being monitored, so that the phenomenon that the paradoxical influence is caused by carelessly releasing the news manuscript with the words is avoided; the server simultaneously carries out related processing on the pictures and the videos in the news content, so that violation phenomena of the pictures and the videos in the news manuscript are avoided, if the faces of the people appear, the faces need to be coded, and if violent pictures or videos appear, the server reminds the user to edit the violent pictures or videos needing to be deleted;

searching pictures and videos to be released in the news content in the cloud server, finding out a website with the earliest releasing time, and automatically identifying that the pictures and the videos in the news content are transferred from the website, so that infringement is avoided;

the timeliness of the news content is judged, if similar content is published and the publishing time is early, the publishing significance of the news content is not large, and the news manuscript can be archived for subsequent use;

searching whether the news content is attached to the hot spot, performing corresponding processing according to whether the news is attached to the hot spot, and if the news content is attached to the hot spot, releasing the news content to a striking position so as to improve the click rate; when the editing end is used for editing the news draft, the news draft is sent to the main editing end, so that the main editing end manually checks the news content according to the main editing end, the phenomenon that the illegal words, pictures or videos are not monitored is avoided, the phenomenon that the illegal words, pictures or videos appear in the news draft is further reduced, the undesirable spurious influence is further avoided, and meanwhile the reading experience of a reader is further improved.

Example two:

a news content security monitoring system, referring to fig. 2, comprising:

the creating device 201 creates a monitoring word list, wherein the monitoring word list comprises wrongly-written characters, sensitive words and red-marked words.

And the matching device 202 is used for sequentially calling wrongly-written characters, sensitive words and red-marked words in the monitoring word list to be matched with news contents.

And the alarm device 203 is used for making alarm prompts of different levels according to the appearing monitoring words to prompt editing to modify the news content if the news content has wrongly written characters, sensitive words and red-marked words in the monitoring word list.

And the processing device 204 is used for acquiring and identifying the pictures and the videos in the news content and performing related processing according to the pictures and the videos.

Specifically, a corresponding preset number of sample pictures are prepared for each picture category, and the picture category corresponding to each sample is calibrated;

training a recognition model of a preset type by using the training sample picture;

acquiring pictures in the news content;

inputting the picture into the recognition model for recognition, and outputting a recognition result;

performing relevant processing on the picture according to the identification result;

if the recognition result is that the face of the person appears on the picture, the face of the person appearing on the picture is coded;

and if the recognition result is that the picture is violent, deleting the picture.

Further, acquiring the picture;

comparing the picture with a picture in a cloud server;

acquiring all websites in the cloud server which are the same as the pictures;

acquiring the picture release dates of all websites, and selecting the author of the website picture with the earliest release date;

and marking the lower part of the picture in the news content to be transferred from the author.

Further, all frame images of the video are obtained, and corresponding pixel values of the frame images are identified;

acquiring a website which simultaneously contains all videos of the extracted frames in the cloud server;

acquiring the video release date of the website, and selecting the author of the website video with the earliest release date;

the annotation below the video in the news content is transferred from the author.

The first judgment device 205 is used for extracting keywords of the news content and matching the keywords with the news content in the cloud server; selecting a website with news content with matching degree larger than a preset value, and acquiring the release time of the news content in the website; and acquiring the time of the editing end after the news content is modified, comparing the time with the release time, and if the difference value is greater than a preset value, judging that the modified news content does not have timeliness.

The second judgment device 206 is configured to extract the keywords of the modified news content, and match the keywords with the news content with a higher rank of search times in the cloud server; and if the modified news content is matched with the news content with higher rank of the search times, generating a high-heat identifier and sending the high-heat identifier to a main editing end so that the main editing end can release the modified news content to a position where a display end is more striking.

And the confirming device 207 is used for sending the modified news content to the main editing end to prompt the main editing to manually check the news content, and the modified news content is released after being confirmed without errors.

Specifically, the main editor receives the modified news content, so that the main editor manually checks the modified news content through the main editor;

and sending the marked news content to the editing end so that the editing can modify the marked news content.

Example three:

a news content security monitoring apparatus, referring to fig. 3, comprising:

the creating module 301 creates a monitoring word list, where the monitoring word list includes wrongly written characters, sensitive words, and red-marked words.

And the matching module 302 is used for sequentially calling wrongly-written characters, sensitive words and red-marked words in the monitoring word list to be matched with news contents.

And the alarm module 303, if the news content has wrongly written characters, sensitive words and red-marked words in the monitoring word list, makes alarm prompts of different levels according to the appearing monitoring words to prompt editing to modify the news content.

The processing module 304 acquires and identifies the pictures and videos in the news content, and performs related processing according to the pictures and the videos.

Specifically, a corresponding preset number of sample pictures are prepared for each picture category, and the picture category corresponding to each sample is calibrated;

training a recognition model of a preset type by using the training sample picture;

acquiring pictures in the news content;

inputting the picture into the recognition model for recognition, and outputting a recognition result;

performing relevant processing on the picture according to the identification result;

if the recognition result is that the face of the person appears on the picture, the face of the person appearing on the picture is coded;

and if the recognition result is that the picture is violent, deleting the picture.

Further, acquiring the picture;

comparing the picture with a picture in a cloud server;

acquiring all websites in the cloud server which are the same as the pictures;

acquiring the picture release dates of all websites, and selecting the author of the website picture with the earliest release date;

and marking the lower part of the picture in the news content to be transferred from the author.

Further, all frame images of the video are obtained, and corresponding pixel values of the frame images are identified;

acquiring a website which simultaneously contains all videos of the extracted frames in the cloud server;

acquiring the video release date of the website, and selecting the author of the website video with the earliest release date;

the annotation below the video in the news content is transferred from the author.

The first judging module 305 extracts a keyword of the news content, and matches the news content in the cloud server according to the keyword; selecting a website with news content with matching degree larger than a preset value, and acquiring the release time of the news content in the website; and acquiring the time of the editing end after the news content is modified, comparing the time with the release time, and if the difference value is greater than a preset value, judging that the modified news content does not have timeliness.

The second judging module 306 is configured to extract a keyword of the modified news content, and match the keyword with the news content with a higher rank of search times in the cloud server; and if the modified news content is matched with the news content with higher rank of the search times, generating a high-heat identifier and sending the high-heat identifier to a main editing end so that the main editing end can release the modified news content to a position where a display end is more striking.

The confirming module 307 sends the modified news content to the main editing end to prompt the main editing end to manually check the news content, and the modified news content is released after the modified news content is confirmed without error.

Specifically, the main editor receives the modified news content, so that the main editor manually checks the modified news content through the main editor;

and sending the marked news content to the editing end so that the editing can modify the marked news content.

It should be noted that: when the device and the system for monitoring news content security provided by the above embodiments execute the method for monitoring news content security, only the division of the above functional modules is taken as an example, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the device and the internal structure of the device are divided into different functional modules, so as to complete all or part of the above described functions. In addition, the embodiments of the method, the apparatus, and the system for monitoring the safety of news content provided by the embodiments belong to the same concept, and specific implementation processes thereof are described in the embodiments of the method for monitoring the safety of news content, and are not described herein again.

It will be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.

The non-volatile memory may be ROM, Programmable Read Only Memory (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), or flash memory.

Volatile memory can be RAM, which acts as external cache memory. There are many different types of RAM, such as Static Random Access Memory (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synclink DRAM (SLDRAM), and direct memory bus RAM.

The processor mentioned in any of the above may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the programs of the above-mentioned news content security monitoring method. The processing module and the storage module may be decoupled, and are respectively disposed on different physical devices, and are connected in a wired or wireless manner to implement respective functions of the processing module and the storage module, so as to support the system chip to implement various functions in the foregoing embodiments. Alternatively, the processing module and the memory may be coupled to the same device.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a computer-readable storage medium, which includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned computer-readable storage media comprise: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

20页详细技术资料下载

News content safety monitoring method, system, device and storage medium

相关技术

网友询问留言