Junk phone data processing method and system

文档序号:1878433 发布日期:2021-11-23 浏览:30次 中文

阅读说明:本技术 垃圾电话数据处理方法及系统 (Junk phone data processing method and system ) 是由 陈帅 于 2021-08-20 设计创作,主要内容包括:本发明公开了一种垃圾电话数据处理方法及系统,包括号码风险等级以及风险类型标签的定义;用户根据风险类型标签对可疑号码进行标记;系统以原始的记录形式,快速存储用户标记的号码信息并进行预处理,然后将处理后的数据存入数据库软件中;根据风险等级定义,计算每个号码的风险权重后保存或者更新数据库软件中的号码风险权重信息;在线查询电话号码的风险等级,并且生成加密的高风险号码离线数据库,便于用户对电话号码的风险等级进行离线查询。本发明提供的垃圾电话数据处理方法及系统直接对接用户并进行数据处理,每天系统会根据用户上报的数据实时更新,每日更新使单次数据量减少,从而提高更新的效率。(The invention discloses a junk phone data processing method and a system, comprising the definition of number risk level and risk type label; the user marks the suspicious number according to the risk type label; the system rapidly stores the number information marked by the user in an original recording form, preprocesses the number information, and then stores the processed data into database software; according to the risk grade definition, calculating the risk weight of each number, and then storing or updating number risk weight information in database software; and inquiring the risk level of the telephone number online, and generating an encrypted high-risk number offline database, so that a user can conveniently perform offline inquiry on the risk level of the telephone number. The junk call data processing method and the junk call data processing system provided by the invention can directly butt joint the user and process data, the system can update in real time every day according to the data reported by the user, and the daily update reduces the single data volume, thereby improving the updating efficiency.)

1. A spam telephony data processing method, comprising:

acquiring and analyzing number information marked by a user; the number information marked by the user comprises a number, a risk tag code and a timestamp;

storing the analyzed number information of the user mark into database software;

for each number information marked by the user, calculating a risk weight value of the number according to a risk label code and a preset risk level definition in the number information;

if the number is marked for the first time, recording the risk weight value of the number in database software;

if the number is recorded in the database software, updating the risk weight value of the number in the database software;

and inquiring the risk level of the number on line according to the latest risk weight value, and generating an encrypted high-risk number offline database for the user to perform offline inquiry on the risk level of the telephone number.

2. The junk phone data processing method according to claim 1, wherein for the user's tag request, an asynchronous processing storage mode is adopted, and the log file is automatically saved as a log file through a log recording system of a website server software Nginx and uploaded to a cloud-end platform; the number information marked by the user comes from a log file stored by the cloud platform;

before the number information marked by the user is automatically stored as a log file, the number information marked by the user is encrypted by using an AES encryption algorithm.

3. The spam data processing method of claim 1, wherein when calculating the risk weight of the number, the risk weight of the number is calculated by accumulating the risk weight of the number according to the score of each label by using a multi-process paging query database table, and finally the number is analyzed into a national area code and number format and stored into the data storage module together with the risk weight corresponding to the number.

4. A spam call data processing method according to claim 3 wherein said accumulation calculation method is to store the numbers in a set of number-weight value key value pairs, and finally to parse the numbers in said set into the format of country area code and number.

5. The spam call data processing method according to claim 1, wherein key information in the log file is filtered by lines through regular expression matching when analyzing the number information marked by the user, and finally the file in the JSON format is generated by processing.

6. The spam telephony data processing method of claim 1, further comprising: and the offline database module classifies according to the country, imports the high-risk number into the encrypted database file, compares the high-risk number with the previously stored offline database file, and generates a differential file.

7. The spam telephony data processing method of claim 1, wherein the risk label comprises: promotion, fraud, life service, others, normal number and ringing 6 types.

8. The spam telephony data processing method of claim 1, wherein the risk level comprises: high risk, medium risk and safety 3 levels.

9. The spam data processing method of claim 1, wherein the risk weight value of a phone number is updated by adding or subtracting the risk weight value of the phone number according to the score value of the tag.

10. A spam telephony data processing system, comprising:

the network transmission module is used for encrypting and uploading the suspicious number information marked by the user to the log recording module;

the log recording module is used for storing suspicious number information;

the cloud platform uploading module is used for compressing the log file stored with the suspicious number information at regular time every day and then uploading the compressed log file to the cloud platform;

the cloud platform downloading module is used for downloading the compressed log file from the cloud platform at regular time every day and then decompressing the log file;

the log preprocessing module is used for processing the decompressed log file to generate a JSON format file;

the weight calculation module is used for calculating the risk weight of the number according to the grading rule in the label module;

the tag module is used for storing a risk type tag and a risk grade;

the data storage module is used for storing the number and the risk weight value corresponding to the number;

the off-line database module is used for importing high-risk numbers into the encrypted database file according to the country classification, comparing the high-risk numbers with the off-line database file stored in the past and generating a differential file;

the interface module is used for inquiring the risk number by the user;

and the disaster recovery module is used for recovering the missing data after interruption.

Technical Field

The invention relates to the technical field of data processing, in particular to a junk phone data processing method and system.

Background

The junk calls refer to promotional, fraud or other types of calls which are dialed to users and are unwilling or refusing to answer, and the flooding of the junk calls seriously affects the normal life of people, the image of operators and even the social stability.

Although a plurality of existing junk call processing methods and systems exist, existing junk call data collection mainly comprises periodic batch updating of old data sets, which is deficient in efficiency and timeliness, and batch updating can cause slow operation efficiency of the whole system, and people still suffer from harassment of new junk calls in daily life due to untimely information updating and overdue junk call data caused by periodic processing.

Disclosure of Invention

Therefore, the embodiment of the invention provides a method and a system for processing junk call data, which are used for solving the problems of low operation efficiency and excessive junk call data caused by regular batch update of the junk call data in the prior art.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

in a first aspect, a method for processing spam call data includes:

acquiring and analyzing number information marked by a user; the number information marked by the user comprises a number, a risk tag code and a timestamp;

storing the analyzed number information of the user mark into database software;

for each number information marked by the user, calculating a risk weight value of the number according to a risk label code and a preset risk level definition in the number information;

if the number is marked for the first time, recording the risk weight value of the number in database software;

if the number is recorded in the database software, updating the risk weight value of the number in the database software;

and inquiring the risk level of the number on line according to the latest risk weight value, and generating an encrypted high-risk number offline database for the user to perform offline inquiry on the risk level of the telephone number.

Further, for a marking request of a user, a storage mode of asynchronous processing is adopted, the marking request is automatically stored as a log file through a log recording system of website server software Nginx, and the log file is uploaded to a cloud-end platform; the number information marked by the user comes from a log file stored by the cloud platform;

before the number information marked by the user is automatically stored as a log file, the number information marked by the user is encrypted by using an AES encryption algorithm.

Furthermore, when the risk weight of the number is calculated, a database table is inquired in a multi-process paging mode, the risk weight of the number is accumulated and calculated according to the score value of each label, finally the number is analyzed into a national area code and number format, and the number and the risk weight corresponding to the number are stored into the data storage module.

Further, the accumulation calculation method is that the numbers are stored in a set with number-weight values as key value pairs, and finally the numbers in the set are analyzed into the formats of the country area code and the number.

Further, when the number information marked by the user is analyzed, the key information in the log file is filtered according to lines through regular expression matching, and finally the file in the JSON format is generated through processing.

Further, the method also comprises the following steps: and the offline database module classifies according to the country, imports the high-risk number into the encrypted database file, compares the high-risk number with the previously stored offline database file, and generates a differential file.

Further, the risk label includes: promotion, fraud, life service, others, normal number and ringing 6 types.

Further, the risk classes include: high risk, medium risk and safety 3 levels.

Further, when the risk weight value of the number is updated, the risk weight of the phone number is accumulated or subtracted according to the score value of the tag.

In a second aspect, a spam telephony data processing system, comprising:

the network transmission module is used for encrypting and uploading the suspicious number information marked by the user to the log recording module;

a log recording module for storing suspicious number information

The cloud platform uploading module is used for compressing the log file stored with the suspicious number information at regular time every day and then uploading the compressed log file to the cloud platform;

the cloud platform downloading module is used for downloading the compressed log file from the cloud platform at regular time every day and then decompressing the log file;

the log preprocessing module is used for processing the decompressed log file to generate a JSON format file;

the weight calculation module is used for calculating the risk weight of the number according to the grading rule in the label module;

the tag module is used for storing a risk type tag and a risk grade;

the data storage module is used for storing the number and the risk weight value corresponding to the number;

the off-line database module is used for importing high-risk numbers into the encrypted database file according to the country classification, comparing the high-risk numbers with the off-line database file stored in the past and generating a differential file;

the interface module is used for inquiring the risk number by the user;

and the disaster recovery module is used for recovering the missing data after interruption.

The invention has at least the following beneficial effects: the invention provides a junk phone data processing method and a system, wherein a user marks suspicious numbers according to risk type labels; the log recording module is used for rapidly storing the number information marked by the user in an original recording form; the cloud platform downloading module acquires number data marked by a user, processes the number information from a log file into a JSON file convenient for analysis through a log preprocessing module, and then stores the analyzed data into a data storage module; calculating the risk weight of each number according to the risk grade definition, and then storing or updating the number risk weight information in the data storage module; and inquiring the risk level of the telephone number online, and generating an encrypted high-risk number offline database, so that a user can conveniently perform offline inquiry on the risk level of the telephone number. The junk call data processing method and the junk call data processing system can correct the junk call data by self, carry out data processing by adopting a mode of updating every day and directly butting users, update the system in real time every day according to the data reported by the users, and reduce the single updating data volume due to the daily updating, thereby improving the updating efficiency; the data updating is more timely by directly connecting the user.

Drawings

In order to more clearly illustrate the prior art and the present invention, the drawings which are needed to be used in the description of the prior art and the embodiments of the present invention will be briefly described. It should be apparent that the drawings in the following description are merely exemplary, and that other drawings may be derived from the provided drawings by those of ordinary skill in the art without inventive effort.

The structures, proportions, sizes, and other dimensions shown in the specification are for illustrative purposes only and are not intended to limit the scope of the present invention, which is defined by the claims, and it is to be understood that all such modifications, changes in proportions, or alterations in size which do not affect the efficacy or objectives of the invention are not to be seen as within the scope of the present invention.

FIG. 1 is a flow chart provided by the present invention;

FIG. 2 is another flow chart provided by the present invention;

fig. 3 is a system configuration diagram provided by the present invention.

Description of reference numerals:

1-a network transmission module; 2-a log module; 21-logging module; 22-cloud platform upload module; 3-disaster recovery module; 4-cloud platform download module; 5-a log preprocessing module; 6-a weight calculation module; 7-a label module; 8-a data storage module; 81-offline database module; 9-interface module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the description of the present invention, "a plurality" means two or more unless otherwise specified. The terms "first," "second," "third," "fourth," and the like in the description and claims of the present invention and in the above-described drawings (if any) are intended to distinguish between referenced items. For a scheme with a time sequence flow, the term expression does not need to be understood as describing a specific sequence or a sequence order, and for a scheme of a device structure, the term expression does not have distinction of importance degree, position relation and the like.

Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements specifically listed, but may include other steps or elements not expressly listed that are inherent to such process, method, article, or apparatus or that are added to a further optimization scheme based on the present inventive concept.

Referring to fig. 1 to 2, fig. 1 illustrates the present invention with a spam call data processing system as an execution main body, and fig. 2 illustrates the present invention with a client, a third party, and a spam call data processing system as main bodies.

A junk phone data processing method comprises the following steps:

the user marks the suspicious number according to the risk type label;

the risk type label and the risk level are positioned in a label module of the system, which is a core module of the system, and the judgment standard of the junk telephone number is determined, and the judgment standard is specifically divided as follows:

the risk type tags define 6 types in total: promotion (telemarker), robotic (robocall), fraud (Scam), Life service (Life service), Other (Other) and Normal numbers (Normal call), and 1 automatically recorded risk type: ring One Ring (One Ring). The scoring criteria for each type are as follows:

table 1: risk type scoring criteria

Telemarketer Robocall Scam Life service Other Normal call One Ring
20 30 40 -5 0 -10 15

The risk levels are divided into 3 levels: high Risk (High Risk), Medium Risk (Medium Risk), and safety (Low Risk). The weight values for each risk level are divided as follows:

table 2: risk level weight value

High Risk Medium Risk Low Risk
[60,+∞) [30,60) (-∞,30)

When a user marks a risk type label on a telephone number, a weight calculation module of the system can accumulate or subtract the risk weight of the telephone number according to the score value of the label; if one number is marked by mistake, the user can mark the number with the risk label again, so that the weight calculation module can self-correct the risk number data in the data storage module.

The user client encrypts and uploads the marked suspicious number information to a log recording module through a network transmission module for storage;

when a user marks a number, the system sends the number, the risk label code, the timestamp and other information to the log module through the network transmission module, and the network transmission module encrypts the transmitted information through an Advanced Encryption Standard (AES) Encryption algorithm to ensure the security of data.

The cloud platform uploading module is used for compressing the log file stored with the suspicious number information at regular time every day and then uploading the compressed log file to the cloud end platform;

specifically, in order to process the network request of the user more quickly, the system adopts a storage mode of asynchronous processing, namely a method of recording before processing. The number information marked by the user is uploaded to a log recording module of the system, the log recording module automatically requests to store the information into a log file through a log recording system of website (Web) server software Nginx, and a system server side does not need to do any processing. And the cloud platform uploading module is used for compressing the log files at regular time every day and then uploading the compressed log files to the cloud platform.

The cloud platform downloading module downloads the compressed log file from the cloud platform regularly every day and then decompresses the log file;

the log preprocessing module processes the decompressed log file to generate a JSON format file;

the log file record form obtained after decompression is inconvenient for subsequent processing, and a log preprocessing module of the system filters key information in the log file according to lines through regular expression matching and converts the key information into a JSON format file. Then the system will keep a copy of the log and JSON file for subsequent checking and verification.

The weight calculation module calculates the risk weight of the number according to the scoring rule in the tag module;

and storing the risk weight value to a data storage module.

After the data processed by the log preprocessing module is obtained, the weight calculating module of the system calculates the risk weight of the number according to the scoring rule in the label module. In order to increase the processing speed of the system, a multi-process paging query database table is opened. And then, according to the score values of all the labels, performing accumulation calculation on the risk weight of the number, wherein the accumulation method is to store the number into a set taking the number-weight value as a key value pair. And finally, resolving the numbers in the set into a national area code and number format, and storing the national area code and the number format and a risk weight value corresponding to the number into a data storage module.

The multi-process paging query database table is as follows:

table 3: raw _ log raw data Table

Table 4: phone _ weight number weight table

Field(s) Type (B) Description of the invention
id int Self-increment id
raw_phone varchar Original telephone number
cc varchar International number area code
phone varchar Number with international code removed
tag_w int Self-defined tag accumulation weights
risk_w int The third-party risk level weight is given,is added only once
first_level1_tag_code varchar Most tagged primary tagcode
first_level1_tag varchar Most marked primary tag
first_level1_tag_count int Number of most marked primary tags
second_level1_tag_code varchar Marking the second most primary tagcode
second_level1_tag varchar Marking the first level tag of the second most
second_level1_tag_count int Number of first-level tags marking second most
reverse_phone varchar Reverse character string of original number
length_phone int Original numberLength of character string

Table 5: phone _ level1_ tag _ count number first-level tag number table

Table 6: hit _ rate daily number hit table

Field(s) Type (B) Description of the invention
id int Self-increment id
log_date varchar Telephone number
all_count int Total number of records on the day
hit_count int Hit count in current database
rate float Hit rate of the day

Table 7: run _ status record running status table

Field(s) Type (B) Description of the invention
run_date varchar Date of execution (Main Key)
status_code int Code of the step of execution
type_code int The small step of the execution step (db2weight)
run_index int Is executed to the next

In order to support the risk information of offline number query of the user, the offline database module of the system also can import high-risk numbers into the encrypted database file according to the classification of the country, and compare the high-risk numbers with the previously generated offline database file to generate a differential file, so that the user can perform incremental update, and the size of the file during downloading is reduced.

The invention can also obtain the risk level of the number in the system: an Application Program Interface (API) interface module of the system provides a function for inquiring the risk number by the user. Firstly, judging whether the incoming number has a plus number, if so, analyzing the number into a national area code and a number, and then inquiring a number weight table in a database according to the area code and the number; if the number is not plus, judging whether a country code is introduced, converting the country code into a country area code, and inquiring a database number weight table; if neither the + number nor the country code is present, the country code is obtained from the requested network protocol (IP) address and the database is queried. Finally, if the IP address is not available, the number is reversed, and fuzzy matching query is carried out in the database, so that the query speed can be improved.

The system of the invention also has disaster tolerance: the system has the function of supporting recovery of missing data after an interruption. The disaster recovery module of the system firstly obtains the last successful date during operation, then compares the date with the current date to be executed, and if the difference value is larger than 1, the disaster recovery module starts to execute from the next day of the last success. And the progress of each step is recorded in the process of execution for recovering the operation.

Referring to fig. 3, a system for processing spam call data includes:

the network transmission module 1 is used for encrypting and uploading the suspicious number information marked by the user to the log recording module 21;

a log recording module 21 for storing the suspicious number information

The cloud platform uploading module 22 is used for compressing the log file stored with the suspicious number information at regular time every day and then uploading the compressed log file to the cloud platform;

the cloud platform downloading module 4 is used for downloading the compressed log files from the cloud platform at regular time every day and then decompressing the log files;

the log preprocessing module 5 is used for processing the decompressed log file to generate a JSON format file;

the weight calculation module 6 is used for calculating the risk weight of the number according to the scoring rule in the tag module 7;

a label module 7 for storing a risk type label and a risk level;

the data storage module 8 is used for storing the numbers and the risk weight values corresponding to the numbers;

the offline database module 81 is used for classifying according to the country, importing high-risk numbers into the encrypted database file, comparing the high-risk numbers with the previously stored offline database file, and generating a differential file;

the interface module 9 is used for inquiring the risk number by the user;

and the disaster recovery module 3 is used for recovering the missing data after the interruption.

All the technical features of the above embodiments can be arbitrarily combined (as long as there is no contradiction between the combinations of the technical features), and for brevity of description, all the possible combinations of the technical features in the above embodiments are not described; these examples, which are not explicitly described, should be considered to be within the scope of the present description.

The present invention has been described in considerable detail by the general description and the specific examples given above. It should be noted that it is obvious that several variations and modifications can be made to these specific embodiments without departing from the inventive concept, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种技能组转接系统及其工作方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类