Data classification storage system and method based on big data

文档序号:190314 发布日期:2021-11-02 浏览:25次 中文

阅读说明:本技术 一种基于大数据的数据分类存储系统及方法 (Data classification storage system and method based on big data ) 是由 姜义凡 于 2021-07-26 设计创作,主要内容包括:本发明公开了一种基于大数据的数据分类存储系统及方法,所述数据存储系统包括数据库、预判断模块、考察地址划分模块、第一地址分析模块和第二地址分析模块,所述数据库按照级别从高到低依次包括常用数据库、临时数据库和垃圾箱,所述常用数据库中的文件不限制存储时长,所述临时数据库中的文件的存储时长为第一存储时长,所述垃圾箱中的文件的存储时长为第二存储时长,所述第一存储时长大于第二存储时长,所述预判断模块用于在预设时间段内接收到一批文件时,设该批文件为待分类文件,获取各个待分类文件的发件地址,判断某个待分类文件的发件地址是否为黑名单中的发件地址。(The invention discloses a data classification storage system and a data classification storage method based on big data, wherein the data storage system comprises a database, a pre-judgment module, an investigation address division module, a first address analysis module and a second address analysis module, the databases comprise a common database, a temporary database and a garbage can from high to low in sequence according to the levels, the storage time length of the files in the common database is not limited, the storage time length of the files in the temporary database is a first storage time length, the storage duration of the files in the garbage bin is a second storage duration, the first storage duration is longer than the second storage duration, the pre-judging module is used for setting a batch of files as files to be classified when receiving the batch of files in a preset time period, acquiring the delivery address of each file to be classified, and judging whether the delivery address of a certain file to be classified is the delivery address in the blacklist or not.)

1. A data classification storage system based on big data is characterized by comprising a database, a pre-judgment module, an investigation address division module, a first address analysis module, a second address analysis module and an unread identification adding module, wherein the database sequentially comprises a common database, a temporary database and a garbage can from high to low according to levels, files in the common database are not limited in storage duration, the storage duration of the files in the temporary database is a first storage duration, the storage duration of the files in the garbage can is a second storage duration, the first storage duration is longer than the second storage duration, the pre-judgment module is used for setting a batch of files to be classified as the files to be classified when receiving the batch of files in a preset time period, acquiring the sending addresses of the files to be classified, judging whether the sending address of a certain file to be classified is the sending address in a black list or not, when the file sending address is judged to be the file sending address in the blacklist, the file to be classified is stored in the garbage bin, when the file sending address is judged to be the file sending address outside the blacklist, the file sending address of the file to be classified is set as an investigation address, the investigation address is divided into a first address and a second address by an investigation address dividing module, the first address analyzing module analyzes the file sent by the first address, the storage mode of the file to be classified corresponding to each first address is judged, the second address analyzing module selects an associated address of the second address from the first address, the storage mode of the file to be classified corresponding to the second address is confirmed according to the storage mode of the file to be classified corresponding to the first address, and the unread mark adding module is used for adding corresponding unread marks to the file when the file is stored in a corresponding database.

2. The big data based data classification storage system according to claim 1, wherein: the review address dividing module comprises a first similarity obtaining module, a reference file selecting module and an address classifying module, wherein the first similarity obtaining module is used for collecting the similarity between the sending address corresponding to each batch of files received in the latest period of time and the review addresses of the batch of files to be classified as a first similarity, the reference file selecting module is used for sequencing the first similarity corresponding to each batch of files according to the sequence from large to small, the batch of files corresponding to the first sequence are selected as reference files, the address classifying module is used for obtaining the sending address in the intersection set of the sending address of the reference files and the review addresses as a first address, and the sending addresses of the files to be classified except the first address are second addresses.

3. The big data based data classification storage system according to claim 2, wherein: the first address analysis module comprises a return point number acquisition module, a return point number comparison module and a first storage index comparison module, wherein the return point number acquisition module is used for acquiring the return point number of a reference file corresponding to each first address, the return point number is the number of times of clicking the reading file again after the reading file is received, the return point number comparison module is used for judging whether the return point number is more than or equal to 1, when the return point number corresponding to a certain first address is more than or equal to 1, the file to be classified corresponding to the first address is stored in a common database, and when the return point number corresponding to a certain first address is less than 1, the first storage index comparison module acquires that the ratio of the reading time length of the reference file sent by each first address to the total reading time length of the reference file is a first storage index, and when the first storage index is larger than or equal to a first storage threshold value, storing the files to be classified corresponding to the first address into a common database, and when the first storage index is smaller than the first storage threshold value, storing the files to be classified corresponding to the first address into a temporary database.

4. The big data based data sorting and storing system according to claim 3, wherein: the second address analysis module comprises a first index acquisition module, a second index acquisition module, an association index calculation module, an association difference calculation module, an association address selection module and a pre-degradation identifier adding module, wherein the first index acquisition module acquires the times Cz of receiving the files sent by each investigation address in the latest period of time and the times C0 of reading the files sent by the investigation address without clicking, so that the first index X of a certain investigation address is C0/Cz, the second index acquisition module acquires the reading condition of the files sent by the certain investigation address in the latest period of time to obtain the second index Y of the certain investigation address is G0/Cz, G0 is the average value of the times of unread intervals between the files sent by two adjacent click-reading investigation addresses, and the association index calculation module obtains the first index, the second index and the pre-degradation identifier adding module according to the first index, The second index calculates the relevance index P of the investigation address to be 0.5C 0/Cz + 0.5G 0/Cz, the relevance difference calculation module is used for calculating the difference value between the relevance index of each first address and the relevance index of a certain second address to be the relevance difference value, the relevance address selection module sorts the absolute value of the relevance difference value corresponding to a certain second address according to the sequence from small to large, selects the first address corresponding to the first sorting as the relevance address of the second address, stores the investigation file sent by the second address into a database stored by the investigation file sent by the relevance address, the pre-degradation identifier adding module adds a pre-degradation identifier to the investigation file sent by the second address when the relevance difference value corresponding to the relevance address of the second address is larger than a relevance threshold value, wherein when a pre-degradation identifier is added to a certain investigation file, and if the duration of the unread identification is greater than or equal to the duration threshold, moving the investigation file into a database at the next level.

5. A data classification storage method based on big data is characterized in that: the data classification storage method comprises the following steps:

the method comprises the steps that a database is established in advance, the database sequentially comprises a common database, a temporary database and a garbage can from high to low according to the level, the storage duration of files in the common database is not limited, the storage duration of the files in the temporary database is a first storage duration, the storage duration of the files in the garbage can is a second storage duration, and the first storage duration is longer than the second storage duration;

when a batch of files are received in a preset time period, setting the batch of files as files to be classified, acquiring a mail sending address of each file to be classified, and storing the files to be classified into a garbage can when the mail sending address of a certain file to be classified is a mail sending address in a blacklist;

acquiring the file to be classified as an inspection address which is a file address outside a blacklist, dividing the inspection address into a first address and a second address,

analyzing the files sent by the first addresses, and judging the storage modes of the files to be classified corresponding to the first addresses;

and selecting an associated address of the second address from the first address, and confirming the storage mode of the file to be classified corresponding to the second address according to the storage mode of the file to be classified corresponding to the first address.

6. The big data based data classification storage method according to claim 5, wherein: the dividing of the investigative address into a first address and a second address comprises:

collecting the similarity between the sending address corresponding to each batch of files received in the latest period of time and the investigation address of the batch of files to be classified as a first similarity, sorting the first similarities corresponding to the batches of files according to a descending order, selecting the batch of files corresponding to the first sorted batch of files as a reference file,

and acquiring a sending address in an intersection set of the sending address and the investigation address of the reference file as a first address, and acquiring a sending address except the first address in the sending addresses of the files to be classified as a second address.

7. The big data based data classification storage method according to claim 6, wherein: the analyzing the file sent by the first address comprises:

respectively acquiring the number of times of returning points of the reference file corresponding to each first address, storing the file to be classified corresponding to the first address into a common database when the number of times of returning points is more than or equal to 1, and adding an unread mark, wherein the number of times of returning points is the number of times of clicking the read file again after receiving the read file;

otherwise, collecting the ratio of the reading time length of the reference file sent by each first address to the total reading time length of the reference file as a first storage index,

if the first storage index is larger than or equal to a first storage threshold value, storing the file to be classified corresponding to the first address into a common database, adding an unread mark which disappears after the file is clicked and read,

and if the first storage index is smaller than the first storage threshold, storing the file to be classified corresponding to the first address into a temporary database, and adding an unread identifier.

8. The big data based data classification storage method according to claim 7, wherein: the selecting the associated address of the second address from the first address comprises:

acquiring the times Cz of receiving the files sent by each investigation address in the latest period of time and the times C0 of not clicking to read the files sent by the investigation address, then the first index X of a certain investigation address is C0/Cz,

acquiring the reading condition of a file sent by a certain investigation address in the latest period of time to obtain a second index Y of the certain investigation address as G0/Cz, wherein G0 is the average value of the unread interval times between the files sent by two adjacent clicking reading investigation addresses;

then the relevance index P of a certain address under investigation is 0.5C 0/Cz + 0.5G 0/Cz,

calculating the difference value between the association index of each first address and the association index of a certain second address as an association difference value, sequencing the absolute values of the association difference values corresponding to the certain second address from small to large, selecting the first address corresponding to the first sequencing as the association address of the second address, storing the investigation file sent by the second address into a database in which the investigation file sent by the association address is stored, and adding an unread identifier.

9. The big data based data classification storage method according to claim 8, wherein: the method for confirming the storage mode of the file to be classified corresponding to the second address further comprises the following steps: and when the association difference value corresponding to the associated address of the second address is greater than the association threshold, adding a pre-downgrade identifier to the investigation file sent by the second address, wherein when the pre-downgrade identifier is added to a certain investigation file, if the duration of the unread identifier of the investigation file is greater than or equal to the duration threshold, moving the investigation file into a database at the next level.

Technical Field

The invention relates to the technical field of data classification storage, in particular to a data classification storage system and a data classification storage method based on big data.

Background

With the increasing progress of social informatization, more and more enterprises work through information technologies such as the internet. Whether between departments inside an enterprise or between the inside of the enterprise and the outside of the enterprise, a lot of administrative file data are generated in the office process through information technology, some of the administrative file data are extremely important and need to be stored for a long time, some of the administrative file data are unrelated to pain and itch, even if the administrative file data are not processed, the file data are improperly stored, and the file data are easily lost if the administrative file data are not classified; in the prior art, the administrative document data are classified and integrated manually, but the manual classification mode is low in efficiency.

Disclosure of Invention

The present invention is directed to a system and a method for classifying and storing data based on big data, so as to solve the problems mentioned in the background art.

In order to solve the technical problems, the invention provides the following technical scheme: a data classification storage system based on big data comprises a database, a pre-judgment module, an investigation address dividing module, a first address analysis module, a second address analysis module and an unread identification adding module, wherein the database sequentially comprises a common database, a temporary database and a garbage can from high to low according to the level, files in the common database are not limited in storage duration, the storage duration of the files in the temporary database is a first storage duration, the storage duration of the files in the garbage can is a second storage duration, the first storage duration is longer than the second storage duration, the pre-judgment module is used for setting the files as files to be classified when a batch of files are received in a preset time period, acquiring the delivery addresses of the files to be classified, and judging whether the delivery address of a certain file to be classified is the delivery address in a black list or not, when the file sending address is judged to be the file sending address in the blacklist, the file to be classified is stored in the garbage bin, when the file sending address is judged to be the file sending address outside the blacklist, the file sending address of the file to be classified is set as an investigation address, the investigation address is divided into a first address and a second address by an investigation address dividing module, the first address analyzing module analyzes the file sent by the first address, the storage mode of the file to be classified corresponding to each first address is judged, the second address analyzing module selects an associated address of the second address from the first address, the storage mode of the file to be classified corresponding to the second address is confirmed according to the storage mode of the file to be classified corresponding to the first address, and the unread mark adding module is used for adding corresponding unread marks to the file when the file is stored in a corresponding database.

Further, the review address dividing module includes a first similarity obtaining module, a reference file selecting module and an address classifying module, the first similarity obtaining module collects the similarity between the sending address corresponding to each batch of files received in the last period of time and the review address of the batch of files to be classified as a first similarity, the reference file selecting module sorts the first similarity corresponding to each batch of files according to a descending order, selects the batch of files corresponding to the first sorted sequence as a reference file, the address classifying module obtains the sending address in the intersection set of the sending address and the review address of the reference file as a first address, and the sending addresses except the first address in the sending address of the files to be classified are second addresses.

Further, the first address analysis module includes a recall number obtaining module, a recall number ratio comparison module and a first storage index comparison module, the recall number obtaining module is configured to obtain recall times of the reference files corresponding to the respective first addresses, where the recall times are times of re-clicking the read file after receiving the read file, the recall number ratio comparison module is configured to determine whether the recall number is greater than or equal to 1, when the recall number corresponding to a certain first address is greater than or equal to 1, store the to-be-classified file corresponding to the first address into the common database, and when the recall number corresponding to the certain first address is less than 1, the first storage index comparison module collects a ratio of reading time of the reference file received and sent by the respective first address to total reading time of the reference file as a first storage index, and when the first storage index is larger than or equal to a first storage threshold value, storing the files to be classified corresponding to the first address into a common database, and when the first storage index is smaller than the first storage threshold value, storing the files to be classified corresponding to the first address into a temporary database.

Further, the second address analysis module includes a first index obtaining module, a second index obtaining module, an association index calculating module, an association difference calculating module, an association address selecting module and a pre-degradation identifier adding module, where the first index obtaining module obtains the number of times Cz of receiving the file sent by each inspection address in the latest period of time and the number of times C0 of reading the file sent by the inspection address without clicking, so that the first index X of a certain inspection address is C0/Cz, the second index obtaining module obtains the reading condition of the file sent by a certain inspection address in the latest period of time to obtain the second index Y of a certain inspection address is G0/Cz, where G0 is the average value of the number of unread intervals between files sent by two adjacent inspection addresses, and the association index calculating module obtains the first index, the second index, the association index calculating module calculates the association index according to the first index, the association index calculating module, and the association index calculating module, The second index calculates the relevance index P of the investigation address to be 0.5C 0/Cz + 0.5G 0/Cz, the relevance difference calculation module is used for calculating the difference value between the relevance index of each first address and the relevance index of a certain second address to be the relevance difference value, the relevance address selection module sorts the absolute value of the relevance difference value corresponding to a certain second address according to the sequence from small to large, selects the first address corresponding to the first sorting as the relevance address of the second address, stores the investigation file sent by the second address into a database stored by the investigation file sent by the relevance address, the pre-degradation identifier adding module adds a pre-degradation identifier to the investigation file sent by the second address when the relevance difference value corresponding to the relevance address of the second address is larger than a relevance threshold value, wherein when a pre-degradation identifier is added to a certain investigation file, and if the duration of the unread identification is greater than or equal to the duration threshold, moving the investigation file into a database at the next level.

A data classification storage method based on big data comprises the following steps:

the method comprises the steps that a database is established in advance, the database sequentially comprises a common database, a temporary database and a garbage can from high to low according to the level, the storage duration of files in the common database is not limited, the storage duration of the files in the temporary database is a first storage duration, the storage duration of the files in the garbage can is a second storage duration, and the first storage duration is longer than the second storage duration;

when a batch of files are received in a preset time period, setting the batch of files as files to be classified, acquiring a mail sending address of each file to be classified, and storing the files to be classified into a garbage can when the mail sending address of a certain file to be classified is a mail sending address in a blacklist;

acquiring the file to be classified as an inspection address which is a file address outside a blacklist, dividing the inspection address into a first address and a second address,

analyzing the files sent by the first addresses, and judging the storage modes of the files to be classified corresponding to the first addresses;

and selecting an associated address of the second address from the first address, and confirming the storage mode of the file to be classified corresponding to the second address according to the storage mode of the file to be classified corresponding to the first address.

Further, the dividing the investigation address into the first address and the second address includes:

collecting the similarity between the sending address corresponding to each batch of files received in the latest period of time and the investigation address of the batch of files to be classified as a first similarity, sorting the first similarities corresponding to the batches of files according to a descending order, selecting the batch of files corresponding to the first sorted batch of files as a reference file,

and acquiring a sending address in an intersection set of the sending address and the investigation address of the reference file as a first address, and acquiring a sending address except the first address in the sending addresses of the files to be classified as a second address.

Further, the analyzing the file sent by the first address includes:

respectively acquiring the number of times of returning points of the reference file corresponding to each first address, storing the file to be classified corresponding to the first address into a common database when the number of times of returning points is more than or equal to 1, and adding an unread mark, wherein the number of times of returning points is the number of times of clicking the read file again after receiving the read file;

otherwise, collecting the ratio of the reading time length of the reference file sent by each first address to the total reading time length of the reference file as a first storage index,

if the first storage index is larger than or equal to a first storage threshold value, storing the file to be classified corresponding to the first address into a common database, adding an unread mark which disappears after the file is clicked and read,

and if the first storage index is smaller than the first storage threshold, storing the file to be classified corresponding to the first address into a temporary database, and adding an unread identifier.

Further, the selecting an associated address of the second address from the first address includes:

acquiring the times Cz of receiving the files sent by each investigation address in the latest period of time and the times C0 of not clicking to read the files sent by the investigation address, then the first index X of a certain investigation address is C0/Cz,

acquiring the reading condition of a file sent by a certain investigation address in the latest period of time to obtain a second index Y of the certain investigation address as G0/Cz, wherein G0 is the average value of the unread interval times between the files sent by two adjacent clicking reading investigation addresses;

then the relevance index P of a certain address under investigation is 0.5C 0/Cz + 0.5G 0/Cz,

calculating the difference value between the association index of each first address and the association index of a certain second address as an association difference value, sequencing the absolute values of the association difference values corresponding to the certain second address from small to large, selecting the first address corresponding to the first sequencing as the association address of the second address, storing the investigation file sent by the second address into a database in which the investigation file sent by the association address is stored, and adding an unread identifier.

Further, the determining a storage manner of the file to be classified corresponding to the second address further includes: and when the association difference value corresponding to the associated address of the second address is greater than the association threshold, adding a pre-downgrade identifier to the investigation file sent by the second address, wherein when the pre-downgrade identifier is added to a certain investigation file, if the duration of the unread identifier of the investigation file is greater than or equal to the duration threshold, moving the investigation file into a database at the next level.

Compared with the prior art, the invention has the following beneficial effects: according to the method and the device, the similarity comparison is carried out on the address of the received current batch of files and the address of the file which is received in the history, and the file corresponding to the batch with the larger similarity is selected as the reference object of the storage mode of the current batch of files, so that the reasonability of the storage mode of the current batch of files is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a block diagram of a big data based data sorting storage system according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a technical solution: a data classification storage system based on big data comprises a database, a pre-judgment module, an investigation address dividing module, a first address analysis module, a second address analysis module and an unread identification adding module, wherein the database sequentially comprises a common database, a temporary database and a garbage can from high to low according to the level, files in the common database are not limited in storage duration, the storage duration of the files in the temporary database is a first storage duration, the storage duration of the files in the garbage can is a second storage duration, the first storage duration is longer than the second storage duration, the pre-judgment module is used for setting the files as files to be classified when a batch of files are received in a preset time period, acquiring the delivery addresses of the files to be classified, and judging whether the delivery address of a certain file to be classified is the delivery address in a black list or not, when the file sending address is judged to be the file sending address in the blacklist, the file to be classified is stored in the garbage bin, when the file sending address is judged to be the file sending address outside the blacklist, the file sending address of the file to be classified is set as an investigation address, the investigation address is divided into a first address and a second address by an investigation address dividing module, the first address analyzing module analyzes the file sent by the first address, the storage mode of the file to be classified corresponding to each first address is judged, the second address analyzing module selects an associated address of the second address from the first address, the storage mode of the file to be classified corresponding to the second address is confirmed according to the storage mode of the file to be classified corresponding to the first address, and the unread mark adding module is used for adding corresponding unread marks to the file when the file is stored in a corresponding database.

The review address dividing module comprises a first similarity obtaining module, a reference file selecting module and an address classifying module, wherein the first similarity obtaining module is used for collecting the similarity between the sending address corresponding to each batch of files received in the latest period of time and the review addresses of the batch of files to be classified as a first similarity, the reference file selecting module is used for sequencing the first similarity corresponding to each batch of files according to the sequence from large to small, the batch of files corresponding to the first sequence are selected as reference files, the address classifying module is used for obtaining the sending address in the intersection set of the sending address of the reference files and the review addresses as a first address, and the sending addresses of the files to be classified except the first address are second addresses.

The first address analysis module comprises a return point number acquisition module, a return point number comparison module and a first storage index comparison module, wherein the return point number acquisition module is used for acquiring the return point number of a reference file corresponding to each first address, the return point number is the number of times of clicking the reading file again after the reading file is received, the return point number comparison module is used for judging whether the return point number is more than or equal to 1, when the return point number corresponding to a certain first address is more than or equal to 1, the file to be classified corresponding to the first address is stored in a common database, and when the return point number corresponding to a certain first address is less than 1, the first storage index comparison module acquires that the ratio of the reading time length of the reference file sent by each first address to the total reading time length of the reference file is a first storage index, and when the first storage index is larger than or equal to a first storage threshold value, storing the files to be classified corresponding to the first address into a common database, and when the first storage index is smaller than the first storage threshold value, storing the files to be classified corresponding to the first address into a temporary database.

The second address analysis module comprises a first index acquisition module, a second index acquisition module, an association index calculation module, an association difference calculation module, an association address selection module and a pre-degradation identifier adding module, wherein the first index acquisition module acquires the times Cz of receiving the files sent by each investigation address in the latest period of time and the times C0 of reading the files sent by the investigation address without clicking, so that the first index X of a certain investigation address is C0/Cz, the second index acquisition module acquires the reading condition of the files sent by the certain investigation address in the latest period of time to obtain the second index Y of the certain investigation address is G0/Cz, G0 is the average value of the times of unread intervals between the files sent by two adjacent click-reading investigation addresses, and the association index calculation module obtains the first index, the second index and the pre-degradation identifier adding module according to the first index, The second index calculates the relevance index P of the investigation address to be 0.5C 0/Cz + 0.5G 0/Cz, the relevance difference calculation module is used for calculating the difference value between the relevance index of each first address and the relevance index of a certain second address to be the relevance difference value, the relevance address selection module sorts the absolute value of the relevance difference value corresponding to a certain second address according to the sequence from small to large, selects the first address corresponding to the first sorting as the relevance address of the second address, stores the investigation file sent by the second address into a database stored by the investigation file sent by the relevance address, the pre-degradation identifier adding module adds a pre-degradation identifier to the investigation file sent by the second address when the relevance difference value corresponding to the relevance address of the second address is larger than a relevance threshold value, wherein when a pre-degradation identifier is added to a certain investigation file, and if the duration of the unread identification is greater than or equal to the duration threshold, moving the investigation file into a database at the next level.

A data classification storage method based on big data comprises the following steps:

the method comprises the steps that a database is established in advance, the database sequentially comprises a common database, a temporary database and a garbage can from high to low according to the level, the storage duration of files in the common database is not limited, the storage duration of the files in the temporary database is a first storage duration, the storage duration of the files in the garbage can is a second storage duration, and the first storage duration is longer than the second storage duration; files in the common database can be manually set with any storage time length;

when a batch of files are received in a preset time period, setting the batch of files as files to be classified, acquiring a mail sending address of each file to be classified, and storing the files to be classified into a garbage can when the mail sending address of a certain file to be classified is a mail sending address in a blacklist; the file in the embodiment refers to a mail, the sending address refers to a sending mailbox of the mail, and in the working process of a company, the situation that the mail is checked by regularly logging in the mailbox sometimes occurs, and at the moment, if each mail received in the period of time needs to be read by a user, the mail is classified and stored, so that time and energy are consumed; the blacklist is used for storing a sending address for moving the file into the garbage can;

acquiring the file to be classified as an inspection address which is a file address outside a blacklist, dividing the inspection address into a first address and a second address,

the dividing of the investigative address into a first address and a second address comprises:

collecting similarity between a sending address corresponding to each batch of files received in a latest period of time and an investigation address of the batch of files to be classified as first similarity, sorting the first similarity corresponding to each batch of files from large to small, selecting the batch of files corresponding to the first sorting as a reference file, wherein each batch of files in the application refers to mailbox mails received in each preset time period in the latest period of time, when the similarity is compared, the addresses corresponding to the batches of received files are compared with the addresses of the files to be classified, such as a certain batch of received files a1, a2, a3, a4 and a5, the addresses of the corresponding sending files are b1, b2, b3, b4 and b5, the files of the files to be classified are c1, c2, c3, c4 and c5, and the addresses of the corresponding sending files are b1, b3, b5, b2 and b6, comparing the addresses "b 1, b2, b3, b4 and b 5" with the addresses "b 1, b3, b5, b2 and b 6", the more the number of the same addresses are contained in the two addresses, the higher the similarity of the addresses is, and here, the similarity of the addresses "b 1, b2, b3, b4 and b 5" with the addresses "b 1, b3, b5, b2 and b 6" is set to be the highest, and the addresses "a 1, a2, a3, a4 and a 5" are taken as reference files;

acquiring a sending address in an intersection set of the sending address and the investigation address of the reference file as a first address, and acquiring a sending address except the first address in the sending addresses of the files to be classified as a second address; in the above example, "b 1, b2, b3, b 5" is the first address, and "b 6" becomes the second address;

analyzing the files sent by the first addresses, and judging the storage modes of the files to be classified corresponding to the first addresses;

analyzing the file sent by the first address comprises:

respectively acquiring the number of times of returning points of the reference file corresponding to each first address, storing the file to be classified corresponding to the first address into a common database when the number of times of returning points is more than or equal to 1, and adding an unread mark, wherein the number of times of returning points is the number of times of clicking the read file again after receiving the read file; when the duration of clicking the read file data again is larger than the duration threshold, counting the number of point return times; adding unread marks when newly received files are stored in a database, wherein the unread marks disappear after the corresponding files are read; in the above example, "a 1, a2, a3, a4, a 5" are reference files, the corresponding addresses of the senders are "b 1, b2, b3, b4, b 5", respectively, the first address is "b 1, b2, b3, b 5", the number of times of returning the files "a 1, a2, a3, a 5" is obtained,

otherwise, collecting the ratio of the reading time length of the reference file sent by each received first address to the total reading time length of the reference file as a first storage index, and further obtaining the ratio r1/k1 of the reading time length r1 of the reference file to the file size k1 and the average value e of the ratio of the reading time length when the first address history corresponding to the reference file sends the file size, if r1/k1 is out of the fluctuation range of the average value e, modifying the value of r1 to be e k1, modifying the ratio of e k1 to the total reading time length of the reference file to be a first storage index, wherein the total reading time length of the reference file is also the time length obtained by modifying r1 to e k1,

if the first storage index is larger than or equal to a first storage threshold value, storing the file to be classified corresponding to the first address into a common database, adding an unread mark, wherein the unread mark disappears after the file is clicked and read, the unread mark can be used as a storage mark of the file data in the database, and when the unread mark is added to the file data, the file data is set to the top;

and if the first storage index is smaller than the first storage threshold, storing the file to be classified corresponding to the first address into a temporary database, and adding an unread identifier. The staff can also adjust the storage place of the file in the database after reading the file data in the actual process;

selecting an associated address of a second address from the first address, and confirming a storage mode of a file to be classified corresponding to the second address according to the storage mode of the file to be classified corresponding to the first address; in the above example, the associated address of "b 6" is selected from "b 1, b2, b3 and b 5";

the selecting the associated address of the second address from the first address comprises:

acquiring the number of times Cz of receiving the file sent by each investigation address in the latest period of time and the number of times C0 of not clicking to read the file sent by the investigation address, so that the first index X of a certain investigation address is C0/Cz, in the above example, b1, b3, b5, b2, and b6 are investigation addresses, for example, in the latest period of time, 10 files are sent by the investigation address "b 1", and the reading condition of each file is respectively: reading, no reading, etc., the number of times CO of reading the document without clicking is 5, Cz is 10,

acquiring the reading condition of a file sent by a certain investigation address in the latest period of time to obtain a second index Y of the certain investigation address as G0/Cz, wherein G0 is the average value of the unread interval times between the files sent by two adjacent clicking reading investigation addresses; in the above example, Cz is 10, G0 is (3+2+0+0)/4 is 5/4

The relevance index P of a certain inspection address is 0.5 × C0/Cz +0.5 × G0/Cz, and whether the worker has reading tendency and frequent reading tendency for the file sent by the issue address is judged according to the number of unread times and the number of unread times of the interval between two adjacent readings;

calculating the difference value between the association index of each first address and the association index of a certain second address as an association difference value, sequencing the absolute values of the association difference values corresponding to the certain second address from small to large, selecting the first address corresponding to the first sequencing as the association address of the second address, storing the investigation file sent by the second address into a database in which the investigation file sent by the association address is stored, and adding an unread identifier. When the absolute value of the correlation difference between the two addresses is not much different, the fact that the reading tendencies of the files sent by the two addresses are similar is shown;

in the above example, for example, if the address in "b 1" is the associated address of "b 6", if the file to be classified sent by b1 is stored in the common database, the file to be classified sent by b6 is also stored in the common database, and if the file to be classified sent by b1 is stored in the temporary database, the file to be classified sent by b6 is also stored in the temporary database;

the method for confirming the storage mode of the file to be classified corresponding to the second address further comprises the following steps: and when the association difference value corresponding to the associated address of the second address is greater than the association threshold, adding a pre-downgrade identifier to the investigation file sent by the second address, wherein when the pre-downgrade identifier is added to a certain investigation file, if the duration of the unread identifier of the investigation file is greater than or equal to the duration threshold, moving the investigation file into a database at the next level. For example, when the file a3 stored in the common database has pre-degraded identifications, and when the duration of the unread identification on the a3 file is greater than or equal to the duration threshold, the a3 file is moved to be stored in the temporary database. The relevance difference value in the application refers to the relevance index of the second address minus the relevance index of the first address, and when the relevance index is larger, the fact that a worker tends not to read the file is indicated, so that the file is stored in a database at a lower level when the file is not read for a long time.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:文件处理方法、装置、电子设备和可读存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!