Network data capturing method based on big data

文档序号:1324304 发布日期:2020-07-14 浏览:2次 中文

阅读说明:本技术 基于大数据的网络数据抓取方法 (Network data capturing method based on big data ) 是由 张俊杰 耿雁萍 于 2020-03-17 设计创作,主要内容包括:本发明提供了一种基于大数据的网络数据抓取方法,包括将监听终端配置为代理服务器;目标APP通过所述代理服务器向目标服务器发送通信数据;所述代理服务器模拟所述目标APP向所述目标服务器发送通信数据;所述代理服务器根据大数据分析获得目标字段;配置抓取规则,所述代理服务器根据所述目标字段抓取所述目标服务器发送的数据。本发明通过将监听终端配置为代理服务器,代理服务器模拟所述目标APP向所述目标服务器发送通信数据,在通过大数据分析或者目标字段之后,配置抓取规则,所述代理服务器根据所述目标字段抓取所述目标服务器发送的数据。从而能够自动抓取网络新闻热点,无需人工进行配置,高效化、智能化。(The invention provides a network data capturing method based on big data, which comprises the steps that a monitoring terminal is configured to be a proxy server; the target APP sends communication data to a target server through the proxy server; the proxy server simulates the target APP to send communication data to the target server; the proxy server obtains a target field according to big data analysis; and configuring a capturing rule, and capturing data sent by the target server by the proxy server according to the target field. The monitoring terminal is configured as a proxy server, the proxy server simulates the target APP to send communication data to the target server, a capture rule is configured after big data analysis or a target field is passed, and the proxy server captures the data sent by the target server according to the target field. Therefore, network news hotspots can be automatically captured, manual configuration is not needed, and high efficiency and intellectualization are achieved.)

1. The network data capturing method based on big data is characterized by comprising the following steps: comprises that

Configuring the monitoring terminal as a proxy server;

the target APP sends communication data to a target server through the proxy server;

the proxy server simulates the target APP to send communication data to the target server;

the proxy server obtains a target field according to big data analysis;

and configuring a capturing rule, and capturing data sent by the target server by the proxy server according to the target field.

2. The big data-based network data crawling method according to claim 1, wherein: the proxy server simulates the target APP to send communication data to the target server, including

The proxy server repeatedly captures communication data sent to the target server by the target APP for N times, wherein N is a positive integer greater than or equal to 2;

comparing the communication data captured each time to obtain constant parameters and variable parameters in the communication data;

decompiling the target APP by a decompilation tool to obtain a source code of the target APP;

taking variable parameters as key words, searching a function containing the key words in the source code, and defining the function as a candidate function;

the decompiling tool carries out dynamic debugging on the source code, and when the output of a certain candidate function is equal to the value of the variable parameter, the candidate function is a target function;

obtaining a construction method of variable parameters according to the plaintext and the encryption mode of the target function;

and according to the construction method of the constant parameter and the variable parameter, the proxy server simulates the target APP to send communication data to the target server.

3. The big data-based network data crawling method according to claim 2, wherein: the decompiling tool is an Android code compiler.

4. The big data-based network data crawling method according to claim 1, wherein: the proxy server obtains a target field according to big data analysis, including

The proxy server obtains hot search data according to big data analysis;

the proxy server captures hot search data actively pushed by the target server within a preset time period;

and the proxy server acquires the target field according to the hot searching data.

5. The big data-based network data crawling method according to claim 1, wherein: the configuration grabbing rule comprises configuration grabbing priority, configuration grabbing efficiency and configuration grabbing fields.

6. The big data-based network data crawling method according to claim 1, wherein: the proxy server captures data sent by the target server according to the target field, including

The proxy server repeatedly captures data sent by the target server for M times within preset time, wherein M is a positive integer greater than or equal to 2;

and comparing the data with the target field aiming at the data captured each time, if the data comprises the target field, comparing the data with the data stored in the database, and if the data is not coincident with the data stored in the database, storing the data in the database.

7. The big data-based network data crawling method according to claim 1, wherein: the monitoring terminal comprises a Scapy framework.

Technical Field

The invention relates to the technical field of data capture, in particular to a network data capture method based on big data.

Background

At present, with the rapid development of mobile internet, mobile terminal APPs (applications) become the main battlefield on which people surf the internet, so that the data capture demand for the mobile terminal APPs is greater, for example, data capture in news APPs such as surf APP, Tencent news APP, Baidu APP, today's headline APP, and the like.

At present, the frame for data capture mainly comprises WebCollector, Nutch, PySpider, WebMagic and the like, and the existing capture method directly uses UR L of a webpage as an entry address.

However, the inventor finds that, when the mobile terminal APP communicates with the server, because the request communication data packet usually contains many parameter signatures, if the signature algorithms of the parameters cannot be obtained, the crawler often cannot obtain the signature algorithms of the parameters, so that the crawler cannot simulate the request of the mobile terminal APP for communicating with the server, and the data content in the mobile APP cannot be captured. In addition, the current mobile terminal APP often pushes the user according to the current news hotspot, and a method for automatically capturing the news hotspot is lacked at present, so that capturing rules are often required to be manually configured, and the mobile terminal APP is not intelligent enough.

Disclosure of Invention

According to the defects of the prior art, the invention provides a network data capturing method based on big data, and aims to solve one of the technical problems in the background art.

The technical problem to be solved by the invention is realized by adopting the following technical scheme:

the method for capturing network data based on big data comprises

Configuring the monitoring terminal as a proxy server;

the target APP sends communication data to a target server through the proxy server;

the proxy server simulates the target APP to send communication data to the target server;

the proxy server obtains a target field according to big data analysis;

and configuring a capturing rule, and capturing data sent by the target server by the proxy server according to the target field.

As an optional implementation way, the proxy server simulates the target APP to send communication data to the target server, including

The proxy server repeatedly captures communication data sent to the target server by the target APP for N times, wherein N is a positive integer greater than or equal to 2;

comparing the communication data captured each time to obtain constant parameters and variable parameters in the communication data;

decompiling the target APP by a decompilation tool to obtain a source code of the target APP;

taking variable parameters as key words, searching a function containing the key words in the source code, and defining the function as a candidate function;

the decompiling tool carries out dynamic debugging on the source code, and when the output of a certain candidate function is equal to the value of the variable parameter, the candidate function is a target function;

obtaining a construction method of variable parameters according to the plaintext and the encryption mode of the target function;

and according to the construction method of the constant parameter and the variable parameter, the proxy server simulates the target APP to send communication data to the target server.

As an optional implementation manner, the decompiling tool is an Android code compiler.

As an optional implementation way, the proxy server obtains the target field according to big data analysis, including

The proxy server obtains hot search data according to big data analysis;

the proxy server captures hot search data actively pushed by the target server within a preset time period;

and the proxy server acquires the target field according to the hot searching data.

As an optional implementation manner, the configuration capture rule includes a configuration capture priority, a configuration capture efficiency, and a configuration capture field.

As an optional implementation manner, the proxy server captures data sent by the target server according to the target field, including

The proxy server repeatedly captures data sent by the target server for M times within preset time, wherein M is a positive integer greater than or equal to 2;

and comparing the data with the target field aiming at the data captured each time, if the data comprises the target field, comparing the data with the data stored in the database, and if the data is not coincident with the data stored in the database, storing the data in the database.

As an optional implementation, the listening terminal includes a script framework.

The invention has the beneficial effects that:

the monitoring terminal is configured as a proxy server, the proxy server simulates the target APP to send communication data to the target server, a capture rule is configured after big data analysis or a target field is passed, and the proxy server captures the data sent by the target server according to the target field. Therefore, network news hotspots can be automatically captured, manual configuration is not needed, and high efficiency and intellectualization are achieved.

Drawings

The invention is further illustrated with reference to the following figures and examples.

FIG. 1 is a logic diagram of the present embodiment;

fig. 2 is a logic diagram of the proxy server simulating the target APP to send communication data to the target server according to the embodiment.

Detailed Description

The following embodiments are provided to describe the embodiments of the present invention, and to further describe the detailed description of the embodiments of the present invention, such as the shapes, configurations, mutual positions and connection relationships of the components, the functions and operation principles of the components, the manufacturing processes and operation methods, etc., so as to help those skilled in the art to more fully, accurately and deeply understand the inventive concept and technical solutions of the present invention.

In order to achieve the above object, as shown in fig. 1, the present invention provides a network data capturing method based on big data, which includes the steps of

S10, configuring the monitoring terminal as a proxy server;

s20, the target APP sends communication data to the target server through the proxy server;

s30, the proxy server simulates the target APP to send communication data to the target server;

s40, the proxy server obtains a target field according to big data analysis;

s50, configuring a grabbing rule, and grabbing the data sent by the target server by the proxy server according to the target field.

The monitoring terminal is configured as a proxy server, the proxy server simulates the target APP to send communication data to the target server, a capture rule is configured after big data analysis or a target field is passed, and the proxy server captures the data sent by the target server according to the target field. Therefore, network news hotspots can be automatically captured, manual configuration is not needed, and high efficiency and intellectualization are achieved.

As an alternative implementation, as shown in fig. 2, the proxy server simulates the target APP to send communication data to the target server, including

S31, the proxy server repeatedly captures communication data sent by the target APP to the target server for N times, wherein N is a positive integer greater than or equal to 2;

s32, comparing the communication data captured each time to obtain constant parameters and variable parameters in the communication data;

s33, decompiling the target APP by a decompiling tool to obtain a source code of the target APP;

s34, with the variable parameters as keywords, searching a function containing the keywords in the source code, and defining the function as a candidate function;

s35, dynamically debugging the source code by a decompilation tool, and when the output of a certain candidate function is equal to the value of the variable parameter, determining the candidate function as a target function;

s36, obtaining a construction method of variable parameters according to the plaintext and the encryption mode of the objective function;

s37, according to the construction method of the constant parameter and the variable parameter, the proxy server simulates the target APP to send communication data to the target server.

Therefore, the constant parameters and the variable parameters in the request data packet are found out by capturing and analyzing the communication data, then the variable parameters are decoded in a decompilation mode and other modes, the application program of the mobile terminal and the communication protocol of the server are cracked, the construction method of the variable parameters is obtained, according to the construction methods of the constant parameters and the variable parameters, the proxy server simulates the target APP to send the communication data to the target server, and the capture of the data of the mobile terminal APP is further achieved.

Optionally, the decompiling tool is an Android code compiler.

As an optional implementation way, the proxy server obtains the target field according to big data analysis, including

The proxy server obtains hot search data according to big data analysis;

the proxy server captures hot search data actively pushed by the target server within a preset time period;

and the proxy server acquires the target field according to the hot searching data.

Thus, the hot search news on the network can be automatically acquired.

As an optional implementation manner, the configuration capture rule includes a configuration capture priority, a configuration capture efficiency, and a configuration capture field.

As an optional implementation manner, the proxy server captures data sent by the target server according to the target field, including

The proxy server repeatedly captures data sent by the target server for M times within preset time, wherein M is a positive integer greater than or equal to 2;

and comparing the data with the target field aiming at the data captured each time, if the data comprises the target field, comparing the data with the data stored in the database, and if the data is not coincident with the data stored in the database, storing the data in the database.

Optionally, the listening terminal includes a script framework.

The invention has been described in an illustrative manner, and it is to be understood that the invention is not limited to the precise form disclosed, and that various insubstantial modifications of the inventive concepts and solutions, or their direct application to other applications without such modifications, are intended to be covered by the scope of the invention. The protection scope of the present invention shall be subject to the protection scope defined by the claims.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:小程序的数据获取方法、装置、计算机设备和存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!