Portrayal generation system based on big data

文档序号:923970 发布日期:2021-03-02 浏览:2次 中文

阅读说明:本技术 基于大数据的画像生成系统 (Portrayal generation system based on big data ) 是由 张静雅 朱金星 葛丹妮 段力阁 于 2021-01-28 设计创作,主要内容包括:本发明涉及一种基于大数据的画像生成系统,包括第一数据库、第二数据库、第三数据库、处理器和存储有计算机程序的存储器,当计算机程序被处理器执行时,实现步骤S1、从第一数据库中获取待测设备id的输入特征信息,并输入到设备id分类模型中,判断是否为目标类型设备id,若是,则执行步骤S2;步骤S2、基于待测设备id、预设的第一时间段和待测设备id对应的目标地址从第二数据库中获取目标wifi ssid;步骤S3、基于目标wifi ssid和预设的第二时间段从第二数据库中获取目标设备id集合;步骤S4、基于目标设备id集合、第一数据库和/或第二数据库和/或第三数据库生成目标画像。本发明能够准确全面地获取小微企业的特征信息,提高了小微企业画像的精确度。(The invention relates to an portrait generating system based on big data, which comprises a first database, a second database, a third database, a processor and a memory for storing a computer program, wherein when the computer program is executed by the processor, the step S1 is realized, input characteristic information of an id of a device to be tested is obtained from the first database and is input into a device id classification model, whether the device id is a target type device id is judged, and if the device id is the target type device id, the step S2 is executed; step S2, acquiring a target wifi ssid from a second database based on the device id to be tested, a preset first time period and a target address corresponding to the device id to be tested; step S3, acquiring a target device id set from a second database based on the target wifi ssid and a preset second time period; step S4, generating a target representation based on the set of target device ids, the first database and/or the second database and/or the third database. The method can accurately and comprehensively acquire the characteristic information of the small and micro enterprises, and improves the accuracy of the portrait of the small and micro enterprises.)

1. A portrait generating system based on big data, comprising a first database, a second database, a third database, a processor and a memory storing a computer program, wherein the first database is used for storing feature information and corresponding time information corresponding to a device id and a device id, the second database is used for storing wifi ssid information, wifi position information and wifi connection time information connected by the device id and the device id, the third database is used for storing tag information corresponding to the device id and the device id, and when the computer program is executed by the processor, the following steps are realized:

step S1, acquiring feature information corresponding to a device id to be tested from the first database as corresponding input feature information, inputting the feature information into a pre-trained device id classification model, judging whether the device id to be tested is a target type device id, and if so, executing step S2, wherein the target type device id is a device id of a small and micro enterprise owner;

step S2, acquiring a target wifi ssid from the second database based on the device id to be tested, a preset first time period and a target address corresponding to the device id to be tested, wherein the target address is a working place corresponding to the device id to be tested, and the target wifi ssid is a working place wifi ssid of the device id to be tested;

step S3, acquiring all target device ids from the second database based on the target wifi ssid and a preset second time period, and constructing a target device id set, wherein the target device id is a device id of a staff of a small micro enterprise corresponding to the device id to be detected;

and step S4, generating a target portrait based on the target equipment id set, the first database and/or the second database and/or the third database, wherein the target portrait is a portrait of a small micro enterprise corresponding to the equipment id to be tested.

2. The system of claim 1,

when the computer program is executed by the processor, the step S10 of training the device id classification model further includes:

step S101, acquiring equipment ids of a plurality of small micro enterprise owners and equipment ids of a plurality of non-small micro enterprise owners, wherein the acquired equipment ids of the small micro enterprise owners are first equipment ids, and the acquired equipment ids of the non-small micro enterprise owners are second equipment ids;

step S102, acquiring corresponding characteristic information from the first database based on each first device id to serve as corresponding input characteristic information, constructing a positive sample characteristic set, acquiring corresponding characteristic information from the first database based on each second device id to serve as corresponding input characteristic information, and constructing a negative sample characteristic set;

and S103, training based on the positive sample feature set and the negative sample feature set to obtain the equipment id classification model.

3. The system of claim 2,

the step S101 includes:

step S111, randomly extracting first equipment ids of a plurality of small micro enterprise owners from a preset small micro enterprise owner equipment id set;

and step S112, randomly extracting a plurality of second device ids with the feature information similarity lower than a preset similarity threshold corresponding to the first device id from the first database.

4. The system of claim 2,

the device id classification model is a logistic regression model.

5. The system of claim 1,

the step S2 includes:

step S21, acquiring all wifi ssids of the device id to be tested, wherein the connection frequency of the device id to be tested in the first time period exceeds a preset connection frequency threshold value, from the second database based on the device id to be tested and a preset first time period, and forming a first wifi ssid list;

step S22, acquiring the number of the connection devices corresponding to each wifi ssid in the first wifi ssid list in each preset time period;

step S23, acquiring the number distribution characteristics of the connected devices of each wifi ssid in the first wifi ssid list in the preset working time period and the preset non-working time period based on the number of the connected devices corresponding to each wifi ssid in the first wifi ssid list in each preset time period and the preset working time period and non-working time period;

step S24, judging whether the wifi ssids are working wifi ssids or not based on the number distribution characteristics of the connecting devices of each wifi ssid in the first wifi ssid list in a preset working time period and a non-working time period, and if yes, storing the wifi ssids and corresponding wifi position information into a preset second wifi ssid list;

and S25, determining the wifi ssid with the wifi position information in the second wifi ssid list and the nearest target address distance corresponding to the id of the device to be tested as the target wifi ssid.

6. The system of claim 1,

the step S3 includes:

and S31, obtaining a target device id which is connected with the target wifi ssid and exceeds a preset connection time threshold and/or a total connection time threshold in a second time period from the second database based on the target wifi ssid and the preset second time period, and constructing a target device id set.

7. The system of claim 1,

the step S4 includes:

step S41, acquiring first dimension portrait characteristics from the first database based on the target device id set, wherein the first dimension portrait characteristics include the number of second preset APPs corresponding to each target device id in a preset fourth time period and the active characteristics of the second preset APPs in the preset fourth time period, and the second preset APPs are job-seeking APPs;

and/or the presence of a gas in the gas,

acquiring a second dimension portrait characteristic from the second database based on the target device id set, wherein the second dimension portrait characteristic comprises the number of target device ids connected with the target wifi ssid in a preset non-working time period in a preset fourth time period and the number of newly added device ids connected with the target wifi ssid in the fourth time period;

and/or the presence of a gas in the gas,

acquiring third dimension portrait characteristics from the third database based on the target device id set, wherein the third dimension portrait characteristics comprise tag information corresponding to each target device id;

and step S42, generating the target portrait based on the first dimension portrait characteristics and/or the second dimension portrait characteristics and/or the third dimension portrait characteristics.

8. The system according to any one of claims 1 to 7,

the input feature information includes any one or a combination of any plurality of the following features: the method comprises the steps of presetting the number of APP in a first type in a preset third time period, presetting the active characteristics of the APP in the first type in the preset third time period, presetting the stability characteristics of equipment in the third time period, presetting the corresponding number of home landmark signs in the third time period, and presetting the corresponding number of working landmark signs in the third time period, wherein the first type of preset APP comprises one or more of enterprise APP, industrial and commercial APP and tax APP.

9. The system according to any one of claims 1 to 7,

the system also includes a display device for displaying the target representation.

Technical Field

The invention relates to the technical field of computers, in particular to an portrait generation system based on big data.

Background

The user portrait is an effective tool for outlining target users and connecting user appeal and design direction, and with the continuous development of computer technology and big data technology, the user portrait is widely applied in various fields. A user representation is a tagged user model that is abstracted from information such as user social attributes, lifestyle habits, and consumption behaviors. The core task in constructing a user representation is to label the user with a "tag", which is a highly refined feature identifier obtained by analyzing the user information.

The enterprise, as the subject of the socio-economic activity, is involved in various aspects of the socio-economic activity. Accordingly, there is an increasing demand for enterprise representations (i.e., enterprise-level user representations). Aiming at medium-sized enterprises or large enterprises, comprehensive and accurate enterprise information can be obtained from big data to construct an enterprise portrait. The small micro-enterprise is an important supporting force in economic and social development of China, but the small micro-enterprise is difficult to acquire accurate and comprehensive enterprise information, so that an accurate enterprise portrait is difficult to construct, and therefore, the technical problem of how to generate an accurate small micro-enterprise portrait based on big data is urgently to solve.

Disclosure of Invention

The invention aims to provide an image generation system based on big data, which can accurately and comprehensively acquire the characteristic information of small and micro enterprises and improve the accuracy of the image of the small and micro enterprises.

According to a first aspect of the present invention, there is provided a big data-based portrait creation system, including a first database, a second database, a third database, a processor, and a memory storing a computer program, wherein the first database is used for storing feature information corresponding to a device id and the device id and corresponding time information, the second database is used for storing wifi ssid information, wifi position information, and wifi connection time information of the device id and the device id, the third database is used for storing tag information corresponding to the device id and the device id, and when the computer program is executed by the processor, the following steps are implemented:

step S1, acquiring feature information corresponding to a device id to be tested from the first database as corresponding input feature information, inputting the feature information into a pre-trained device id classification model, judging whether the device id to be tested is a target type device id, and if so, executing step S2, wherein the target type device id is a device id of a small and micro enterprise owner;

step S2, acquiring a target wifi ssid from the second database based on the device id to be tested, a preset first time period and a target address corresponding to the device id to be tested, wherein the target address is a working place corresponding to the device id to be tested, and the target wifi ssid is a working place wifi ssid of the device id to be tested;

step S3, acquiring all target device ids from the second database based on the target wifi ssid and a preset second time period, and constructing a target device id set, wherein the target device id is a device id of a staff of a small micro enterprise corresponding to the device id to be detected;

and step S4, generating a target portrait based on the target equipment id set, the first database and/or the second database and/or the third database, wherein the target portrait is a portrait of a small micro enterprise corresponding to the equipment id to be tested.

Compared with the prior art, the invention has obvious advantages and beneficial effects. By the technical scheme, the portrait generation system based on big data provided by the invention can achieve considerable technical progress and practicability, has wide industrial utilization value and at least has the following advantages:

according to the method, the three databases are arranged, whether the equipment id is the equipment id of the small micro enterprise owner is judged firstly, then the working wifi of the small micro enterprise is determined from the second database through the equipment id of the small micro enterprise owner, the equipment id of the staff is determined from the second database through the working wifi of the small micro enterprise, and finally the corresponding characteristics or labels are obtained from the first database and/or the second database and/or the third database through the equipment id of the staff to generate the image of the small micro enterprise.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.

Drawings

FIG. 1 is a diagram of a big data based representation generation system according to an embodiment of the present invention.

Detailed Description

To further illustrate the technical means and effects of the present invention for achieving the predetermined objects, the following detailed description will be given to an embodiment of an image generation system based on big data and its effects, which are provided by the present invention, with reference to the accompanying drawings and preferred embodiments.

An embodiment of the present invention provides a big data-based portrait generation system, as shown in fig. 1, including a first database, a second database, a third database, a processor, and a memory storing a computer program, where the first database is used to store a device id and feature information corresponding to the device id and corresponding time information, and the feature information corresponding to the device id may include a name of an installed APP, an active feature of the installed APP, device replacement communication card information, home location information, work location information, and the like. The second database is used for storing wifi ssid information, wifi positional information and wifi connection time information which are connected by the device id and the device id, and fields of all records in the second database can comprise the device id, the wifi ssid information, the wifi positional information and the wifi connection time information which are connected by the device id and the device id, wherein the wifi ssid is the only identifier of each wifi, the wifi positional information refers to geographical positional information which corresponds to the wifi ssid, for example, the wifi positional information can be geohash information, and the wifi connection time information refers to connection time information which corresponds to all records. The third database is used for storing the device id and the tag information corresponding to the device id, wherein the tag information is a tag of the pre-calculated personnel information corresponding to the device id, and the tag information can be, for example, age, gender, income level, consumption capacity and the like. When executed by a processor, the computer program implementing the steps of:

step S1, acquiring feature information corresponding to a device id to be tested from the first database as corresponding input feature information, inputting the feature information into a pre-trained device id classification model, judging whether the device id to be tested is a target type device id, and if so, executing step S2, wherein the target type device id is a device id of a small and micro enterprise owner;

wherein the input feature information includes any one or a combination of any plurality of the following features: the method comprises the steps of presetting the number of APP in a first type in a preset third time period, presetting the active characteristics of the APP in the first type in the preset third time period, presetting the stability characteristics of equipment in the third time period, presetting the corresponding number of home landmark signs in the third time period, and presetting the corresponding number of working landmark signs in the third time period, wherein the first type of preset APP comprises one or more of enterprise APP, industrial and commercial APP and tax APP.

It can be understood that, since the information of the small micro enterprise is not easily obtained comprehensively, in many application scenarios, it cannot be directly known whether a certain device id is the id of the small micro enterprise owner device, so that the device id to be tested can be determined through step S1. Of course, all the device ids in the first database may be determined based on step S1, and the device ids of the small micro enterprise owners are batch-screened out.

Step S2, acquiring a target wifi ssid from the second database based on the device id to be tested, a preset first time period and a target address corresponding to the device id to be tested, wherein the target address is a working place corresponding to the device id to be tested, and the target wifi ssid is a working place wifi ssid of the device id to be tested;

the working place corresponding to the device id to be tested can be directly obtained by the existing technology or method, for example, by the disclosed information platform, or by the way of positioning under line, the real position of the small and micro enterprise can be obtained, which is not limited by the present invention. The first time period may be set by specific application requirements, for example, may be set to 3 months.

Step S3, acquiring all target device ids from the second database based on the target wifi ssid and a preset second time period, and constructing a target device id set, wherein the target device id is a device id of a staff of a small micro enterprise corresponding to the device id to be detected;

the second time period may be the same as or different from the first time period, and is specifically set according to application requirements, and preferably, the second time period may also be set to 3 months.

And step S4, generating a target portrait based on the target equipment id set, the first database and/or the second database and/or the third database, wherein the target portrait is a portrait of a small micro enterprise corresponding to the equipment id to be tested.

According to the embodiment of the invention, the three databases are arranged, whether the equipment id is the equipment id of the small micro enterprise owner is judged firstly, then the equipment id of the small micro enterprise is determined from the second database, the equipment id of the employee is determined from the second database through the working wifi of the small micro enterprise, and finally the corresponding characteristic or label is obtained from the first database and/or the second database and/or the third database through the equipment id of the employee to generate the image of the small micro enterprise.

According to the invention, the system can be physically implemented as one server or as a server group comprising a plurality of servers. Those skilled in the art will appreciate that parameters such as the model and specification of the server do not affect the scope of the present invention

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.

As an embodiment, when the computer program is executed by a processor, the step S10 of training the device id classification model further includes:

step S101, acquiring equipment ids of a plurality of small micro enterprise owners and equipment ids of a plurality of non-small micro enterprise owners, wherein the acquired equipment ids of the small micro enterprise owners are first equipment ids, and the acquired equipment ids of the non-small micro enterprise owners are second equipment ids;

step S102, acquiring corresponding characteristic information from the first database based on each first device id to serve as corresponding input characteristic information, constructing a positive sample characteristic set, acquiring corresponding characteristic information from the first database based on each second device id to serve as corresponding input characteristic information, and constructing a negative sample characteristic set;

and S103, training based on the positive sample feature set and the negative sample feature set to obtain the equipment id classification model.

The device id classification model is preferably a logistic regression model, and the positive sample feature set and the negative sample feature set are trained in the logistic regression model directly to obtain the device id classification model.

As an embodiment, the step S101 includes:

step S111, randomly extracting first equipment ids of a plurality of small micro enterprise owners from a preset small micro enterprise owner equipment id set;

the preset mini-enterprise owner device id set is a set of device ids known as mini-enterprise owners.

And step S112, randomly extracting a plurality of second device ids with the feature information similarity lower than a preset similarity threshold corresponding to the first device id from the first database.

A plurality of second device ids whose feature information similarity corresponding to the first device id is lower than a preset similarity threshold may be randomly selected from the mass data of the first database based on the feature information of the device id of the small-sized enterprise owner through step S112, where the second device ids are device ids of non-small-sized enterprise owners. Model training is performed based on the obtained positive and negative sample sets, and the accuracy of the model can be improved.

As an example, the step S2 may include:

step S21, acquiring all wifi ssids of the device id to be tested, wherein the connection frequency of the device id to be tested in the first time period exceeds a preset connection frequency threshold value, from the second database based on the device id to be tested and a preset first time period, and forming a first wifi ssid list;

the device id may be connected to a plurality of wifi ssids in one time interval, but some non-target wifi with few connection times exist, so that some noises can be directly filtered by setting a connection frequency threshold, the workload of subsequent calculation is reduced, and the accuracy of results is improved.

There are typically many wifi ssids in the first wifi ssid list obtained through step S21, so it can be further filtered through steps S22-S24.

Step S22, acquiring the number of the connection devices corresponding to each wifi ssid in the first wifi ssid list in each preset time period;

step S23, acquiring the number distribution characteristics of the connected devices of each wifi ssid in the first wifi ssid list in the preset working time period and the preset non-working time period based on the number of the connected devices corresponding to each wifi ssid in the first wifi ssid list in each preset time period and the preset working time period and non-working time period;

step S24, judging whether the wifi ssids are working wifi ssids or not based on the number distribution characteristics of the connecting devices of each wifi ssid in the first wifi ssid list in a preset working time period and a non-working time period, and if yes, storing the wifi ssids and corresponding wifi position information into a preset second wifi ssid list;

it can be understood that the wifi ssid of the working place has obvious distribution characteristics in the working period and the non-working period, so the non-working wifi ssid can be filtered from the first wifi ssid list through the steps S22-S24 based on the distribution characteristics to obtain the wifi ssid of the working place.

And S25, determining the wifi ssid with the wifi position information in the second wifi ssid list and the nearest target address distance corresponding to the id of the device to be tested as the target wifi ssid.

As an example, the step S3 includes:

and S31, obtaining a target device id which is connected with the target wifi ssid and exceeds a preset connection time threshold and/or a total connection time threshold in a second time period from the second database based on the target wifi ssid and the preset second time period, and constructing a target device id set.

The target wifi ssid is a device id and other conditions that some visiting users may exist in a preset second time period, and the device ids are not employee device ids, so that noise can be filtered by setting a connection time threshold and/or a connection total time threshold, workload of subsequent calculation is reduced, and accuracy of a calculation result is improved.

As an example, the step S4 includes:

step S41, acquiring first dimension portrait characteristics from the first database based on the target device id set, wherein the first dimension portrait characteristics include the number of second preset APPs corresponding to each target device id in a preset fourth time period and the active characteristics of the second preset APPs in the preset fourth time period, and the second preset APPs are job-seeking APPs;

and/or the presence of a gas in the gas,

acquiring a second dimension portrait characteristic from the second database based on the target device id set, wherein the second dimension portrait characteristic comprises the number of target device ids connected with the target wifi ssid in a preset non-working time period in a preset fourth time period and the number of newly added device ids connected with the target wifi ssid in the fourth time period;

and/or the presence of a gas in the gas,

acquiring third dimension portrait characteristics from the third database based on the target device id set, wherein the third dimension portrait characteristics comprise tag information corresponding to each target device id;

the first dimension portrait characteristic can depict the stability of staff of small and micro enterprises; the second dimension portrait characteristics can represent overtime behavior characteristics of employees of the small and micro enterprise and quantity characteristics of newly-entered employees; the tags of multiple dimensions of all employees in the third dimension representation feature can characterize the enterprise representation from multiple dimensions.

And step S42, generating the target portrait based on the first dimension portrait characteristics and/or the second dimension portrait characteristics and/or the third dimension portrait characteristics.

As one embodiment, the system may also include a display device for displaying the target representation for viewing by a user.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于物联网的信息检测处理方法、设备、介质及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!