Data cleaning method and device based on time series data self-increment characteristics

文档序号:1671220 发布日期:2019-12-31 浏览:38次 中文

阅读说明:本技术 基于时间序列数据自增特征的数据清洗方法及装置 (Data cleaning method and device based on time series data self-increment characteristics ) 是由 王典 吕慧华 金丽娟 于 2019-09-23 设计创作,主要内容包括:本发明实施例提供了一种基于时间序列数据自增特征的数据清洗方法及装置,所述方法包括:根据与原始里程时间一一对应的原始里程数据,基于时间序列逐个判断每个原始里程数据是否为异常跳变数据,若是,则对异常跳变数据进行更新,当判断和更新完所有的原始里程数据后,计算所述预设时间范围内的相对里程或任意两个原始里程时间点之间的相对里程。本发明实施例能够基于时间序列逐个消除车辆在预设时间范围内产生的异常里程数据,从而保证了数据清洗效果,降低了出现异常里程数据误检、漏检的概率,从而能够得到较为干净的里程数据,进而根据清洗后的里程数据可以准确计算所述预设时间范围内的相对里程或任意两个原始里程时间点之间的相对里程。(The embodiment of the invention provides a data cleaning method and device based on time series data self-increment characteristics, wherein the method comprises the following steps: and judging whether each original mileage data is abnormal jump data one by one based on the time sequence according to the original mileage data corresponding to the original mileage time one by one, if so, updating the abnormal jump data, and after all the original mileage data are judged and updated, calculating the relative mileage within the preset time range or the relative mileage between any two original mileage time points. According to the embodiment of the invention, abnormal mileage data generated by the vehicle within the preset time range can be eliminated one by one based on the time sequence, so that the data cleaning effect is ensured, the probability of false detection and missed detection of the abnormal mileage data is reduced, cleaner mileage data can be obtained, and further, the relative mileage within the preset time range or the relative mileage between any two original mileage time points can be accurately calculated according to the cleaned mileage data.)

1. A data cleaning method based on time series data auto-increment features is characterized by comprising the following steps:

acquiring original mileage data which are generated by a vehicle within a preset time range and correspond to original mileage time one by one;

and judging whether each original mileage data is abnormal jump data one by one according to the original mileage data corresponding to the original mileage time one by one, if so, updating the abnormal jump data, and after judging and updating all the original mileage data, calculating the relative mileage within the preset time range or the relative mileage between any two original mileage time points.

2. The data washing method based on the time-series data self-increment feature of claim 1, wherein the step of judging whether each original mileage data is abnormal jump data one by one according to the original mileage data corresponding to the original mileage time one by one, if so, updating the abnormal jump data, and after all the original mileage data are judged and updated, calculating the relative mileage within the preset time range or the relative mileage between any two original mileage time points specifically comprises the steps of:

s1, initializing original mileage time T0, T1, …, Tn-2, Tn-1 and original mileage data corresponding to the original mileage time T0, T1, …, Tn-2 and Tn-1 in a one-to-one mode, wherein the original mileage data are V0, V1,. once, Vn-2 and Vn-1; wherein Ti is less than Ti +1, the minimum unit of the original mileage time is 1s, and the minimum unit of the original mileage data is 0.01 km;

s2, defining cumulative mileage increment inc ═ 0.00km, preValue ═ V0, and prevetime ═ T0;

s3, sequentially traversing original mileage data V0, V1, original mileage, Vn-2, Vn-1 and original mileage time T0, T1, original mileage, Tn-2 and Tn-1 from 1.. n-1;

acquiring the current ith element, reading original mileage data as Vi, and taking original mileage time as Ti;

calculating div-preValue if div>G1km or div<G2km, inc + ═ div, if G3km<div<G1km, then further calculate: speed is div/(Ti-preTime), if speed is>a0km/s, judging that the original mileage data Vi abnormally jumps, and updating the accumulated mileage increment inc + (div); wherein, a0km/s is the maximum vehicle speed coefficient per second, G1Critical threshold value for judging severe jump of mileage data, G2To determine the critical threshold for the range data to undergo a declining change, [ G3,G1]For judging the constraint range of the suspicious jump of the mileage data, G1、G3Are all greater than zero, G2Less than zero, G1>G3>G2

S4, updating preValue ═ Vi, prestime ═ Ti;

s5, if inc! If the value is 0.00km, updating Vi-inc;

s6, if i < n-1, the loop goes to step S3;

and S7, calculating the relative mileage in a preset time range from T0 to Tn-1 as Vn-1-V0, or calculating the relative mileage between any two original mileage time points Tk and Tj as Vj-Vk.

3. A data cleaning method based on time series data auto-increment features is characterized by comprising the following steps:

acquiring original oil consumption data which are generated by a vehicle within a preset time range and correspond to original oil consumption time one by one;

judging whether each original oil consumption data is abnormal jumping data one by one according to original oil consumption data corresponding to the original oil consumption time one by one, and if so, updating the abnormal jumping data; and after all the original oil consumption data are judged and updated, calculating the relative oil consumption within the preset time range or the relative oil consumption between any two original oil consumption time points.

4. The data cleaning method based on the time series data self-increment characteristic of claim 3, wherein the method comprises the steps of judging whether each original oil consumption data is abnormal jump data one by one according to the original oil consumption data corresponding to the original oil consumption time one by one, if so, updating the abnormal jump data, and after all the original oil consumption data are judged and updated, calculating the relative oil consumption in the preset time range or the relative oil consumption between any two original oil consumption time points, and specifically comprises the following steps:

s1, initializing original oil consumption time T0, T1, …, Tn-2 and Tn-1, and setting original oil consumption data corresponding to the original oil consumption time T0, T1, …, Tn-2 and Tn-1 as V0, V1, 9, Vn-2 and Vn-1; wherein Ti is less than Ti +1, the minimum unit of the original oil consumption time is 1s, and the minimum unit of the original oil consumption data is 0.5L;

s2, defining cumulative fuel consumption increment inc as 0.00km, preValue as V0, and prevetime as T0;

s3, sequentially traversing original oil consumption data V0, V1, original oil consumption data Vn-2, Vn-1 and original oil consumption time T0, T1, original oil consumption data Tn-2 and Tn-1 from 1.. n-1;

acquiring the current ith element, reading original oil consumption data as Vi, and reading original oil consumption time as Ti;

calculating div-preValue if div>H1Or div<H2If it is H, inc + ═ div3<div<H1Then, further calculating: limit ═ b0L/s (Ti-prestime), if div>limit, then determine originalAbnormal jump of the oil consumption data Vi occurs, and at the moment, the accumulated oil consumption increment inc + (div) is updated; wherein b is0L/s is the maximum oil consumption coefficient per second, H1Critical threshold value for judging severe jump of fuel consumption data, H2To determine the threshold for the drop in fuel consumption data, [ H ]3,H1]For determining the constraint range of the fuel consumption data in the form of suspicious jumps, H1、H3Are all greater than zero, H2Less than zero, H1>H3>H2

S4, updating preValue ═ Vi, prestime ═ Ti;

s5, if inc! When the value is 0.0L, updating Vi-inc;

s6, if i < n-1, the loop goes to step S3;

and S7, calculating the relative oil consumption within a preset time range from T0 to Tn-1 to Vn-1 to V0, or calculating the relative oil consumption between Tk and Tj between any two original oil consumption time points to be Vj-Vk.

5. A data washing device based on time series data auto-increment features is characterized by comprising:

the first acquisition module is used for acquiring original mileage data which are generated by a vehicle within a preset time range and correspond to original mileage time one by one;

and the first data cleaning module is used for judging whether each original mileage data is abnormal jump data one by one according to the original mileage data corresponding to the original mileage time one by one, if so, updating the abnormal jump data, and after all the original mileage data are judged and updated, calculating the relative mileage within the preset time range or the relative mileage between any two original mileage time points.

6. The data washing device based on the time-series data increasing feature of claim 5, wherein the first data washing module is specifically configured to execute the following processing procedures:

s1, initializing original mileage time T0, T1, …, Tn-2, Tn-1 and original mileage data corresponding to the original mileage time T0, T1, …, Tn-2 and Tn-1 in a one-to-one mode, wherein the original mileage data are V0, V1,. once, Vn-2 and Vn-1; wherein Ti is less than Ti +1, the minimum unit of the original mileage time is 1s, and the minimum unit of the original mileage data is 0.01 km;

s2, defining cumulative mileage increment inc ═ 0.00km, preValue ═ V0, and prevetime ═ T0;

s3, sequentially traversing original mileage data V0, V1, original mileage, Vn-2, Vn-1 and original mileage time T0, T1, original mileage, Tn-2 and Tn-1 from 1.. n-1;

acquiring the current ith element, reading original mileage data as Vi, and taking original mileage time as Ti;

calculating div-preValue if div>G1km or div<G2km, inc + ═ div, if G3km<div<G1km, then further calculate: speed is div/(Ti-preTime), if speed is>a0km/s, judging that the original mileage data Vi abnormally jumps, and updating the accumulated mileage increment inc + (div); wherein a is0km/s is the maximum vehicle speed coefficient per second, G1Critical threshold value for judging severe jump of mileage data, G2To determine the critical threshold for the range data to undergo a declining change, [ G3,G1]For judging the constraint range of the suspicious jump of the mileage data, G1、G3Are all greater than zero, G2Less than zero, G1>G3>G2

S4, updating preValue ═ Vi, prestime ═ Ti;

s5, if inc! If the value is 0.00km, updating Vi-inc;

s6, if i < n-1, the loop goes to step S3;

and S7, calculating the relative mileage in a preset time range from T0 to Tn-1 as Vn-1-V0, or calculating the relative mileage between any two original mileage time points Tk and Tj as Vj-Vk.

7. A data washing device based on time series data auto-increment features is characterized by comprising:

the second acquisition module is used for acquiring original oil consumption data which are generated by the vehicle within a preset time range and correspond to the original oil consumption time one by one;

the second data cleaning module is used for judging whether each original oil consumption data is abnormal jumping data one by one according to the original oil consumption data corresponding to the original oil consumption time one by one, and if yes, updating the abnormal jumping data; and after all the original oil consumption data are judged and updated, calculating the relative oil consumption within the preset time range or the relative oil consumption between any two original oil consumption time points.

8. The data cleansing apparatus according to claim 7, wherein the second data cleansing module is specifically configured to perform the following processing procedures:

s1, initializing original oil consumption time T0, T1, …, Tn-2 and Tn-1, and setting original oil consumption data corresponding to the original oil consumption time T0, T1, …, Tn-2 and Tn-1 as V0, V1, 9, Vn-2 and Vn-1; wherein Ti is less than Ti +1, the minimum unit of the original oil consumption time is 1s, and the minimum unit of the original oil consumption data is 0.5L;

s2, defining cumulative fuel consumption increment inc as 0.00km, preValue as V0, and prevetime as T0;

s3, sequentially traversing original oil consumption data V0, V1, original oil consumption data Vn-2, Vn-1 and original oil consumption time T0, T1, original oil consumption data Tn-2 and Tn-1 from 1.. n-1;

acquiring the current ith element, reading original oil consumption data as Vi, and reading original oil consumption time as Ti;

calculating div-preValue if div>H1Or div<H2If it is H, inc + ═ div3<div<H1Then, further calculating: limit ═ b0L/s (Ti-prestime), if div>If the limit is greater than the preset threshold, judging that the original fuel consumption data Vi generates abnormal jump, and updating the accumulated fuel consumption increment inc + (div); wherein b is0L/s is the maximum oil consumption coefficient per second, H1Critical threshold value for judging severe jump of fuel consumption data, H2To determine the threshold for the drop in fuel consumption data, [ H ]3,H1]For determining the constraint range of the fuel consumption data in the form of suspicious jumps, H1、H3Are all greater than zero, H2Less than zero, H1>H3>H2

S4, updating preValue ═ Vi, prestime ═ Ti;

s5, if inc! When the value is 0.0L, updating Vi-inc;

s6, if i < n-1, the loop goes to step S3;

and S7, calculating the relative oil consumption within a preset time range from T0 to Tn-1 to Vn-1 to V0, or calculating the relative oil consumption between Tk and Tj between any two original oil consumption time points to be Vj-Vk.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the data cleansing method based on the time-series data auto-increment feature according to claim 1 or 2 when executing the computer program; or, the processor, when executing the computer program, implements the steps of the data cleansing method based on the time-series data auto-increment feature according to claim 3 or 4.

10. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the data cleansing method based on the time-series data self-increment feature according to claim 1 or 2; or, the computer program when being executed by a processor implements the steps of the data cleansing method based on the time-series data autofluorescence feature of claim 3 or 4.

Technical Field

The invention relates to the technical field of vehicle transportation, in particular to a data cleaning method and device based on time series data self-increment characteristics.

Background

In the process of analyzing the vehicle, mileage oil consumption data needs to be analyzed frequently, a fuel gas consumption report is counted, and pulse mileage and ECU oil consumption data need to be processed at the moment. Since there are many unexpected factors and unexpected situations during the operation of the vehicle, various abnormal values are likely to occur, and therefore, in order to obtain clean vehicle driving data, it is necessary to perform a cleaning operation on the data, to remove abnormal data in the vehicle driving data, and to obtain accurate vehicle driving data.

For pulse mileage data in vehicle driving data, most terminal devices report GPS positioning information and pulse mileage data at the same time at present. Therefore, the GPS mileage can be calculated by a spherical distance formula by using the GPS positioning coordinates, and the GPS mileage is used as the pulse mileage within the same time interval range. And there are several problems introduced here: first, GPS location information is not continuous due to the issue of reporting and acquisition time intervals. Therefore, when the distance between two adjacent points is calculated, the closest distance between two points on the spherical surface is actually calculated, which results in that the accumulated GPS distance result is smaller than the distance of the actual driving track. This error is typically around 5%. Secondly, the GPS positioning information itself also has positioning point drift and accuracy error. In certain military sensitive areas or due to weather geographic factors, the GPS module may not be able to locate accurately or normally. Therefore, the GPS mileage can only be used as an auxiliary reference value, and cannot completely replace the actual use value of the pulse mileage.

For oil consumption data in vehicle driving data, part of terminal equipment calculates oil consumption by monitoring the change of the height of an oil level reported by an oil level sensor. This not only precision is not high, moreover because the vehicle goes in-process, external factors such as acceleration, electromagnetic environment, angular velocity, temperature, slope change lead to the liquid level to fluctuate, have increased the degree of difficulty for accurate oil level collection. The oil level data is not reported accurately, so the oil consumption calculated by oil level reduction can only be used as an auxiliary reference value, and the actual use value of the oil consumption of the ECU cannot be completely replaced.

Disclosure of Invention

Aiming at the problems in the prior art, the embodiment of the invention provides a data cleaning method and device based on time series data self-increment characteristics.

In a first aspect, an embodiment of the present invention provides a data cleansing method based on a time series data self-increment feature, including:

acquiring original mileage data which are generated by a vehicle within a preset time range and correspond to original mileage time one by one;

and judging whether each original mileage data is abnormal jump data one by one according to the original mileage data corresponding to the original mileage time one by one, if so, updating the abnormal jump data, and after judging and updating all the original mileage data, calculating the relative mileage within the preset time range or the relative mileage between any two original mileage time points.

Further, the step of judging whether each original mileage data is abnormal jump data one by one according to the original mileage data corresponding to the original mileage time one by one, if so, updating the abnormal jump data, and after judging and updating all the original mileage data, calculating the relative mileage within the preset time range or the relative mileage between any two original mileage time points specifically includes:

s1, initializing original mileage time T0, T1, …, Tn-2, Tn-1 and original mileage data corresponding to the original mileage time T0, T1, …, Tn-2 and Tn-1 in a one-to-one mode, wherein the original mileage data are V0, V1,. once, Vn-2 and Vn-1; wherein Ti is less than Ti +1, the minimum unit of the original mileage time is 1s, and the minimum unit of the original mileage data is 0.01 km;

s2, defining cumulative mileage increment inc ═ 0.00km, preValue ═ V0, and prevetime ═ T0;

s3, sequentially traversing original mileage data V0, V1, original mileage, Vn-2, Vn-1 and original mileage time T0, T1, original mileage, Tn-2 and Tn-1 from 1.. n-1;

acquiring the current ith element, reading original mileage data as Vi, and taking original mileage time as Ti;

calculating div-preValue if div>G1km or div<G2km, inc + ═ div, if G3km<div<G1km, then further calculate: speed is div/(Ti-preTime), if speed is>a0km/s, judging that the original mileage data Vi abnormally jumps, and updating the accumulated mileage increment inc + (div); wherein a is0km/s is the maximum vehicle speed coefficient per second, G1Critical threshold value for judging severe jump of mileage data, G2To determine the critical threshold for the range data to undergo a declining change, [ G3,G1]For judging the constraint range of the suspicious jump of the mileage data, G1、G3Are all greater than zero, G2Less than zero, G1>G3>G2

S4, updating preValue ═ Vi, prestime ═ Ti;

s5, if inc! If the value is 0.00km, updating Vi-inc;

s6, if i < n-1, the loop goes to step S3;

and S7, calculating the relative mileage in a preset time range from T0 to Tn-1 as Vn-1-V0, or calculating the relative mileage between any two original mileage time points Tk and Tj as Vj-Vk.

In a second aspect, an embodiment of the present invention further provides a data cleansing method based on a time series data auto-increment feature, including:

acquiring original oil consumption data which are generated by a vehicle within a preset time range and correspond to original oil consumption time one by one;

judging whether each original oil consumption data is abnormal jumping data one by one according to original oil consumption data corresponding to the original oil consumption time one by one, and if so, updating the abnormal jumping data; and after all the original oil consumption data are judged and updated, calculating the relative oil consumption within the preset time range or the relative oil consumption between any two original oil consumption time points.

Further, the method includes that whether each original oil consumption data is abnormal jump data or not is judged one by one according to original oil consumption data corresponding to original oil consumption time one by one, if yes, the abnormal jump data is updated, and after all the original oil consumption data are judged and updated, the relative oil consumption within a preset time range or the relative oil consumption between any two original oil consumption time points is calculated, and the method specifically includes the following steps:

s1, initializing original oil consumption time T0, T1, …, Tn-2 and Tn-1, and setting original oil consumption data corresponding to the original oil consumption time T0, T1, …, Tn-2 and Tn-1 as V0, V1, 9, Vn-2 and Vn-1; wherein Ti is less than Ti +1, the minimum unit of the original oil consumption time is 1s, and the minimum unit of the original oil consumption data is 0.5L;

s2, defining cumulative fuel consumption increment inc as 0.00km, preValue as V0, and prevetime as T0;

s3, sequentially traversing original oil consumption data V0, V1, original oil consumption data Vn-2, Vn-1 and original oil consumption time T0, T1, original oil consumption data Tn-2 and Tn-1 from 1.. n-1;

acquiring the current ith element, reading original oil consumption data as Vi, and reading original oil consumption time as Ti;

calculating div-preValue if div>H1Or div<H2If it is H, inc + ═ div3<div<H1Then, further calculating: limit ═ b0L/s (Ti-prestime), if div>If the limit is greater than the preset threshold, judging that the original fuel consumption data Vi generates abnormal jump, and updating the accumulated fuel consumption increment inc + (div); wherein b is0L/s is the maximum oil consumption coefficient per second, H1Critical threshold value for judging severe jump of fuel consumption data, H2To determine the threshold for the drop in fuel consumption data, [ H ]3,H1]For determining the constraint range of the fuel consumption data in the form of suspicious jumps, H1、H3Are all greater than zero, H2Less than zero, H1>H3>H2

S4, updating preValue ═ Vi, prestime ═ Ti;

s5, if inc! When the value is 0.0L, updating Vi-inc;

s6, if i < n-1, the loop goes to step S3;

and S7, calculating the relative oil consumption within a preset time range from T0 to Tn-1 to Vn-1 to V0, or calculating the relative oil consumption between Tk and Tj between any two original oil consumption time points to be Vj-Vk.

In a third aspect, an embodiment of the present invention further provides a data cleansing apparatus based on a time-series data auto-increment characteristic, including:

the first acquisition module is used for acquiring original mileage data which are generated by a vehicle within a preset time range and correspond to original mileage time one by one;

and the first data cleaning module is used for judging whether each original mileage data is abnormal jump data one by one according to the original mileage data corresponding to the original mileage time one by one, if so, updating the abnormal jump data, and after all the original mileage data are judged and updated, calculating the relative mileage within the preset time range or the relative mileage between any two original mileage time points.

Further, the first data cleansing module is specifically configured to execute the following processing procedures:

s1, initializing original mileage time T0, T1, …, Tn-2, Tn-1 and original mileage data corresponding to the original mileage time T0, T1, …, Tn-2 and Tn-1 in a one-to-one mode, wherein the original mileage data are V0, V1,. once, Vn-2 and Vn-1; wherein Ti is less than Ti +1, the minimum unit of the original mileage time is 1s, and the minimum unit of the original mileage data is 0.01 km;

s2, defining cumulative mileage increment inc ═ 0.00km, preValue ═ V0, and prevetime ═ T0;

s3, sequentially traversing original mileage data V0, V1, original mileage, Vn-2, Vn-1 and original mileage time T0, T1, original mileage, Tn-2 and Tn-1 from 1.. n-1;

acquiring the current ith element, reading original mileage data as Vi, and taking original mileage time as Ti;

calculating div-preValue if div>G1km or div<G2km, inc + ═ div, if G3km<div<G1km, then further calculate: speed is div/(Ti-preTime), if speed is>a0km/s, judging that the original mileage data Vi abnormally jumps, and updating the accumulated mileage increment inc + (div); wherein a is0km/s is the maximum vehicle speed coefficient per second, G1Critical threshold value for judging severe jump of mileage data, G2To determine the critical threshold for the range data to undergo a declining change, [ G3,G1]For judging the constraint range of the suspicious jump of the mileage data, G1、G3Are all greater than zero, G2Less than zero, G1>G3>G2

S4, updating preValue ═ Vi, prestime ═ Ti;

s5, if inc! If the value is 0.00km, updating Vi-inc;

s6, if i < n-1, the loop goes to step S3;

and S7, calculating the relative mileage in a preset time range from T0 to Tn-1 as Vn-1-V0, or calculating the relative mileage between any two original mileage time points Tk and Tj as Vj-Vk.

In a fourth aspect, an embodiment of the present invention further provides a data cleansing apparatus based on a time series data auto-increment feature, including:

the second acquisition module is used for acquiring original oil consumption data which are generated by the vehicle within a preset time range and correspond to the original oil consumption time one by one;

the second data cleaning module is used for judging whether each original oil consumption data is abnormal jumping data one by one according to the original oil consumption data corresponding to the original oil consumption time one by one, and if yes, updating the abnormal jumping data; and after all the original oil consumption data are judged and updated, calculating the relative oil consumption within the preset time range or the relative oil consumption between any two original oil consumption time points.

Further, the second data cleansing module is specifically configured to execute the following processing procedures:

s1, initializing original oil consumption time T0, T1, …, Tn-2 and Tn-1, and setting original oil consumption data corresponding to the original oil consumption time T0, T1, …, Tn-2 and Tn-1 as V0, V1, 9, Vn-2 and Vn-1; wherein Ti is less than Ti +1, the minimum unit of the original oil consumption time is 1s, and the minimum unit of the original oil consumption data is 0.5L;

s2, defining cumulative fuel consumption increment inc as 0.00km, preValue as V0, and prevetime as T0;

s3, sequentially traversing original oil consumption data V0, V1, original oil consumption data Vn-2, Vn-1 and original oil consumption time T0, T1, original oil consumption data Tn-2 and Tn-1 from 1.. n-1;

acquiring the current ith element, reading original oil consumption data as Vi, and reading original oil consumption time as Ti;

calculating div-preValue if div>H1Or div<H2If it is H, inc + ═ div3<div<H1Then, further calculating: limit ═ b0L/s (Ti-prestime), if div>If the limit is greater than the preset threshold, judging that the original fuel consumption data Vi generates abnormal jump, and updating the accumulated fuel consumption increment inc + (div); wherein b is0L/s is the maximum oil consumption coefficient per second, H1Critical threshold value for judging severe jump of fuel consumption data, H2To determine the threshold for the drop in fuel consumption data, [ H ]3,H1]For determining the constraint range of the fuel consumption data in the form of suspicious jumps, H1、H3Are all greater than zero, H2Less than zero, H1>H3>H2

S4, updating preValue ═ Vi, prestime ═ Ti;

s5, if inc! When the value is 0.0L, updating Vi-inc;

s6, if i < n-1, the loop goes to step S3;

and S7, calculating the relative oil consumption within a preset time range from T0 to Tn-1 to Vn-1 to V0, or calculating the relative oil consumption between Tk and Tj between any two original oil consumption time points to be Vj-Vk.

In a fifth aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the steps of the data cleansing method based on the time-series data auto-increment feature according to the first aspect; or, the processor, when executing the computer program, implements the steps of the data cleansing method based on the time-series data auto-increment feature according to the second aspect.

In a sixth aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the data cleansing method based on the time-series data auto-increment feature according to the first aspect; or the computer program is executed by a processor to implement the steps of the data cleansing method based on the time-series data auto-increment feature according to the second aspect.

According to the technical scheme, the data cleaning method and device based on the time series data self-increment characteristic, provided by the embodiment of the invention, firstly obtain the original mileage data which are generated by a vehicle in a preset time range and correspond to the original mileage time one by one, then judge whether each original mileage data is abnormal jump data one by one based on the time series according to the original mileage data which correspond to the original mileage time one by one, if so, update the abnormal jump data, and after all the original mileage data are judged and updated, calculate the relative mileage in the preset time range or the relative mileage between any two original mileage time points. Therefore, the data cleaning method and device based on the time series data self-increment characteristic provided by the embodiment of the invention can eliminate abnormal mileage data generated by the vehicle within the preset time range one by one based on the time series, thereby ensuring the data cleaning effect, reducing the probability of false detection and missed detection of the abnormal mileage data, obtaining cleaner mileage data, and further accurately calculating the relative mileage within the preset time range or the relative mileage between any two original mileage time points according to the cleaned mileage data.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of a data cleansing method based on a time-series data auto-increment feature according to an embodiment of the present invention;

FIG. 2 is a flow chart of another method for cleansing data based on a time-series data auto-increment feature according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a data cleansing apparatus based on a time-series data auto-increment feature according to another embodiment of the present invention;

FIG. 4 is a schematic structural diagram of another data washing apparatus based on a time-series data auto-increment feature according to another embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The pulse mileage and ECU oil consumption data reported by the vehicle in the driving process have obvious data characteristics: the values remain constant or increase in small amplitude with time. Therefore, it can be seen that: relative pulse mileage in the time interval range is interval ending pulse mileage and interval starting pulse mileage; and (3) relative ECU fuel consumption in the time interval range is equal to the ECU fuel consumption at the end of the interval, and the ECU fuel consumption at the beginning of the interval. However, in an actual production environment, various abnormal values exist in the mileage and fuel consumption data reported by the terminal, or the numerical values of two points adjacent in time undergo a large-amplitude ascending and descending change or a small-amplitude descending change (because the number of emergency factors and emergency conditions in the vehicle running process is large, various abnormal values are easily caused). In this case, the relative mileage and the relative fuel consumption in the time interval range may be calculated to be negative or slightly larger. It is self-evident that the importance of identifying that the mileage and fuel consumption data report abnormal values and performing vehicle driving data cleansing. In order to eliminate abnormal mileage data points and abnormal fuel consumption data points generated during the driving process of a vehicle, the invention provides a data cleaning method and a device based on time series data self-increment characteristics, and the contents provided by the invention are explained in detail through specific embodiments.

Fig. 1 shows a flowchart of a data cleansing method based on a time-series data auto-increment feature according to an embodiment of the present invention. As shown in fig. 1, a data cleansing method based on a time-series data auto-increment feature provided by an embodiment of the present invention includes the following steps:

step 101: and acquiring original mileage data which are generated by the vehicle within a preset time range and correspond to the original mileage time one by one.

In this embodiment, the preset time range may be a week, a day, or a certain time period of a day.

Step 102: and judging whether each original mileage data is abnormal jump data one by one according to the original mileage data corresponding to the original mileage time one by one, if so, updating the abnormal jump data, and after judging and updating all the original mileage data, calculating the relative mileage within the preset time range or the relative mileage between any two original mileage time points.

In the present embodiment, the data based on the time-series data self-increment characteristic refers to data having a characteristic that a numerical value is maintained or slightly increased with time, such as pulse mileage data and ECU fuel consumption data, and in the present embodiment, the pulse mileage data is taken as an example for detailed description.

In this embodiment, the step 102 may be specifically implemented as follows:

s1, initializing original mileage time T0, T1, …, Tn-2, Tn-1 and original mileage data corresponding to the original mileage time T0, T1, …, Tn-2 and Tn-1 in a one-to-one mode, wherein the original mileage data are V0, V1,. once, Vn-2 and Vn-1; wherein Ti is less than Ti +1, the minimum unit of the original mileage time is 1s, and the minimum unit of the original mileage data is 0.01 km;

s2, defining cumulative mileage increment inc ═ 0.00km, preValue ═ V0, and prevetime ═ T0;

s3, sequentially traversing original mileage data V0, V1, original mileage, Vn-2, Vn-1 and original mileage time T0, T1, original mileage, Tn-2 and Tn-1 from 1.. n-1;

acquiring the current ith element, reading original mileage data as Vi, and taking original mileage time as Ti;

calculating div-preValue if div>G1km or div<G2km, inc + ═ div, if G3km<div<G1km, then further calculate: speed is div/(Ti-preTime), if speed is>a0km/s, judging that the original mileage data Vi abnormally jumps, and updating the accumulated mileage increment inc + (div); wherein a is0km/s is the maximum vehicle speed coefficient per second, G1Critical threshold value for judging severe jump of mileage data, G2To determine the critical threshold for the range data to undergo a declining change, [ G3,G1]For judging the constraint range of the suspicious jump of the mileage data, G1、G3Are all greater than zero, G2Less than zero, G1>G3>G2(ii) a For example, a0The value of km/s can be 0.1km/s, G1The km can be 100km, G2The value of km can be-0.01 km, G3The value can be 2.0 km; it should be noted that the values of these parameters are only for illustration, and in the actual application process, other different values may also be set according to needs, for example, the critical threshold G for determining that the mileage data has severe jump1And may be set to 80km, etc. as needed, which is not a limitation of the present invention.

S4, updating preValue ═ Vi, prestime ═ Ti;

s5, if inc! If the value is 0.00km, updating Vi-inc;

s6, if i < n-1, the loop goes to step S3;

and S7, calculating the relative mileage in a preset time range from T0 to Tn-1 as Vn-1-V0, or calculating the relative mileage between any two original mileage time points Tk and Tj as Vj-Vk.

According to the technical scheme, the data cleaning method based on the time series data self-increment characteristic comprises the steps of firstly obtaining original mileage data which are generated by a vehicle in a preset time range and correspond to original mileage time in a one-to-one mode, then judging whether each original mileage data is abnormal jump data or not one by one on the basis of a time sequence according to the original mileage data which correspond to the original mileage time in the one-to-one mode, if yes, updating the abnormal jump data, and after all the original mileage data are judged and updated, calculating the relative mileage in the preset time range or the relative mileage between any two original mileage time points. Therefore, the data cleaning method based on the time series data self-increment characteristic provided by the embodiment of the invention can eliminate abnormal mileage data generated by the vehicle within the preset time range one by one based on the time series, thereby ensuring the data cleaning effect, reducing the probability of false detection and missed detection of the abnormal mileage data, obtaining cleaner mileage data, and further accurately calculating the relative mileage within the preset time range or the relative mileage between any two original mileage time points according to the cleaned mileage data.

In addition, in this embodiment, it should be noted that the data cleansing method based on the time-series data auto-increment feature provided in this embodiment does not conflict with the scheme of calculating the GPS mileage through GPS positioning information, and both of them may perform miss-filling, reference comparison and parameter tuning.

Fig. 2 is a flowchart illustrating another data cleansing method based on a time-series data auto-increment feature according to an embodiment of the present invention. As shown in fig. 2, the data cleansing method based on the time-series data auto-increment feature provided by the embodiment of the present invention includes the following steps:

step 201: acquiring original oil consumption data which are generated by a vehicle within a preset time range and correspond to original oil consumption time one by one;

step 202: judging whether each original oil consumption data is abnormal jumping data one by one according to original oil consumption data corresponding to the original oil consumption time one by one, and if so, updating the abnormal jumping data; and after all the original oil consumption data are judged and updated, calculating the relative oil consumption within the preset time range or the relative oil consumption between any two original oil consumption time points.

Based on the content of the foregoing embodiment, in this embodiment, the foregoing step 202 can be specifically implemented by:

s1, initializing original oil consumption time T0, T1, …, Tn-2 and Tn-1, and setting original oil consumption data corresponding to the original oil consumption time T0, T1, …, Tn-2 and Tn-1 as V0, V1, 9, Vn-2 and Vn-1; wherein Ti is less than Ti +1, the minimum unit of the original oil consumption time is 1s, and the minimum unit of the original oil consumption data is 0.5L;

s2, defining cumulative fuel consumption increment inc as 0.00km, preValue as V0, and prevetime as T0;

s3, sequentially traversing original oil consumption data V0, V1, original oil consumption data Vn-2, Vn-1 and original oil consumption time T0, T1, original oil consumption data Tn-2 and Tn-1 from 1.. n-1;

acquiring the current ith element, reading original oil consumption data as Vi, and reading original oil consumption time as Ti;

calculating div-preValue if div>H1Or div<H2If it is H, inc + ═ div3<div<H1Then, further calculating: limit ═ b0L/s (Ti-prestime), if div>If the limit is greater than the preset threshold, judging that the original fuel consumption data Vi generates abnormal jump, and updating the accumulated fuel consumption increment inc + (div); wherein b is0L/s is the maximum oil consumption coefficient per second, H1Critical threshold value for judging severe jump of fuel consumption data, H2To determine the threshold for the drop in fuel consumption data, [ H ]3,H1]For determining the constraint range of the fuel consumption data in the form of suspicious jumps, H1、H3Are all greater than zero, H2Less than zero, H1>H3>H2(ii) a E.g. b0L/s can be 0.0417L/s; h1Can take the value of 50L, H2Can take the value of-0.5L, H3The value may be 5.0L, it should be noted that the values of these parameters are only for example, and in the actual application process, other different values may also be set according to needs, for example, a critical threshold H for determining that the fuel consumption data has severe jump1And may be set to 45L, etc. as desired, and the present invention is not limited thereto.

S4, updating preValue ═ Vi, prestime ═ Ti;

s5, if inc! When the value is 0.0L, updating Vi-inc;

s6, if i < n-1, the loop goes to step S3;

and S7, calculating the relative oil consumption within a preset time range from T0 to Tn-1 to Vn-1 to V0, or calculating the relative oil consumption between Tk and Tj between any two original oil consumption time points to be Vj-Vk.

According to the technical scheme, the data cleaning method based on the time series data self-increment characteristic, provided by the embodiment of the invention, comprises the steps of firstly obtaining original oil consumption data which are generated by a vehicle in a preset time range and correspond to original oil consumption time one by one, then judging whether each original oil consumption data is abnormal jump data one by one according to the original oil consumption data which correspond to the original oil consumption time one by one, and if so, updating the abnormal jump data; and after all the original oil consumption data are judged and updated, calculating the relative oil consumption within the preset time range or the relative oil consumption between any two original oil consumption time points. Therefore, the data cleaning method based on the time series data self-increment characteristic provided by the embodiment of the invention can eliminate abnormal fuel consumption data generated by the vehicle within the preset time range one by one based on the time series, thereby ensuring the data cleaning effect, reducing the probability of false detection and missed detection of the abnormal fuel consumption data, obtaining cleaner fuel consumption data, and further accurately calculating the relative fuel consumption within the preset time range or the relative fuel consumption between any two original mileage time points according to the cleaned fuel consumption data.

In this embodiment, it should be noted that the data cleansing based on the time-series data auto-increment feature provided in this embodiment does not conflict with the solution of calculating the fuel consumption by obtaining the fuel level data through the fuel level sensor, and both of them can perform the missing filling, the reference comparison, and the parameter optimization.

According to the above description, the data cleansing method based on the time series data auto-increment characteristic provided by the embodiment can effectively remove abnormal data, so that the generated pulse mileage and ECU fuel consumption curve is smoother and continuous, and the noise reduction effect is very obvious. In addition, the data cleaning method based on the time series data self-increment characteristic provided by the embodiment has good universality, can be used for different types of vehicle running data with the same characteristic (the value of the data is kept unchanged or slightly increased along with the increase of time), and has a simple and understandable thought.

Fig. 3 shows a schematic structural diagram of a data washing device based on a time-series data auto-increment feature according to an embodiment of the present invention. As shown in fig. 3, a data cleansing apparatus based on a time-series data auto-increment feature according to an embodiment of the present invention includes: a first acquisition module 11 and a first data cleansing module 12, wherein:

the first acquisition module 11 is used for acquiring original mileage data which are generated by the vehicle within a preset time range and correspond to the original mileage time one by one;

the first data cleaning module 12 is configured to judge whether each original mileage data is abnormal jump data one by one according to original mileage data corresponding to original mileage time one by one, update the abnormal jump data if the original mileage data is abnormal jump data, and calculate a relative mileage within the preset time range or a relative mileage between any two original mileage time points after all the original mileage data are judged and updated.

Further, the first data cleansing module 12 is specifically configured to execute the following processing procedures:

s1, initializing original mileage time T0, T1, …, Tn-2, Tn-1 and original mileage data corresponding to the original mileage time T0, T1, …, Tn-2 and Tn-1 in a one-to-one mode, wherein the original mileage data are V0, V1,. once, Vn-2 and Vn-1; wherein Ti is less than Ti +1, the minimum unit of the original mileage time is 1s, and the minimum unit of the original mileage data is 0.01 km;

s2, defining cumulative mileage increment inc ═ 0.00km, preValue ═ V0, and prevetime ═ T0;

s3, sequentially traversing original mileage data V0, V1, original mileage, Vn-2, Vn-1 and original mileage time T0, T1, original mileage, Tn-2 and Tn-1 from 1.. n-1;

acquiring the current ith element, reading original mileage data as Vi, and taking original mileage time as Ti;

calculating div-preValue if div>G1km or div<G2km, inc + ═ div, if G3km<div<G1km, then further calculate: speed is div/(Ti-preTime), if speed is>a0km/s, judging that the original mileage data Vi abnormally jumps, and updating the accumulated mileage increment inc + (div); wherein a is0km/s is the maximum vehicle speed coefficient per second, G1Critical threshold value for judging severe jump of mileage data, G2To determine the critical threshold for the range data to undergo a declining change, [ G3,G1]For judging the constraint range of the suspicious jump of the mileage data, G1、G3Are all greater than zero, G2Less than zero, G1>G3>G2

S4, updating preValue ═ Vi, prestime ═ Ti;

s5, if inc! If the value is 0.00km, updating Vi-inc;

s6, if i < n-1, the loop goes to step S3;

and S7, calculating the relative mileage in a preset time range from T0 to Tn-1 as Vn-1-V0, or calculating the relative mileage between any two original mileage time points Tk and Tj as Vj-Vk. Since the data cleansing apparatus based on the time series data increasing feature provided by the embodiment of the present invention can be used for executing the data cleansing method based on the time series data increasing feature described in the first embodiment, the operation principle and the beneficial effect are similar, and therefore, detailed description is not provided herein, and specific contents can be referred to the description of the above embodiment.

Fig. 4 shows a schematic structural diagram of a data washing device based on a time-series data auto-increment feature according to an embodiment of the present invention. As shown in fig. 4, a data cleansing apparatus based on a time-series data auto-increment feature according to an embodiment of the present invention includes: a second acquisition module 21 and a second data cleansing module 22, wherein:

the second obtaining module 21 is configured to obtain original fuel consumption data, which are generated by the vehicle within a preset time range and correspond to the original fuel consumption time one to one;

the second data cleaning module 22 is configured to judge whether each original oil consumption data is abnormally jumped data one by one according to the original oil consumption data corresponding to the original oil consumption time one by one, and if yes, update the abnormally jumped data; and after all the original oil consumption data are judged and updated, calculating the relative oil consumption within the preset time range or the relative oil consumption between any two original oil consumption time points.

Further, the second data cleansing module 22 is specifically configured to perform the following processing procedures:

s1, initializing original oil consumption time T0, T1, …, Tn-2 and Tn-1, and setting original oil consumption data corresponding to the original oil consumption time T0, T1, …, Tn-2 and Tn-1 as V0, V1, 9, Vn-2 and Vn-1; wherein Ti is less than Ti +1, the minimum unit of the original oil consumption time is 1s, and the minimum unit of the original oil consumption data is 0.5L;

s2, defining cumulative fuel consumption increment inc as 0.00km, preValue as V0, and prevetime as T0;

s3, sequentially traversing original oil consumption data V0, V1, original oil consumption data Vn-2, Vn-1 and original oil consumption time T0, T1, original oil consumption data Tn-2 and Tn-1 from 1.. n-1;

acquiring the current ith element, reading original oil consumption data as Vi, and reading original oil consumption time as Ti;

calculating div-preValue if div>H1Or div<H2If it is H, inc + ═ div3<div<H1Then, further calculating: limit ═ b0L/s (Ti-prestime), if div>If the limit is greater than the preset threshold, judging that the original fuel consumption data Vi generates abnormal jump, and updating the accumulated fuel consumption increment inc + (div); wherein b is0L/s is the maximum oil consumption coefficient per second, H1Critical threshold value for judging severe jump of fuel consumption data, H2To determine the threshold for the drop in fuel consumption data, [ H ]3,H1]For determining the constraint range of the fuel consumption data in the form of suspicious jumps, H1、H3Are all greater than zero, H2Less than zero, H1>H3>H2

S4, updating preValue ═ Vi, prestime ═ Ti;

s5, if inc! When the value is 0.0L, updating Vi-inc;

s6, if i < n-1, the loop goes to step S3;

and S7, calculating the relative oil consumption within a preset time range from T0 to Tn-1 to Vn-1 to V0, or calculating the relative oil consumption between Tk and Tj between any two original oil consumption time points to be Vj-Vk.

Since the data cleansing apparatus based on the time series data increasing feature provided by the embodiment of the present invention can be used for executing the data cleansing method based on the time series data increasing feature described in the second embodiment, the operation principle and the beneficial effect are similar, and therefore, detailed description is not provided herein, and specific contents can be referred to the description of the above embodiment.

Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 5: a processor 301, a memory 302, a communication interface 303, and a communication bus 304;

the processor 301, the memory 302 and the communication interface 303 complete mutual communication through the communication bus 304; the communication interface 303 is used for realizing information transmission between related devices;

the processor 301 is configured to call a computer program in the memory 302, and the processor implements all the steps of the data cleansing method based on the time-series data auto-increment feature when executing the computer program, for example, the processor implements the following processes when executing the computer program: acquiring original mileage data which are generated by a vehicle within a preset time range and correspond to original mileage time one by one; and judging whether each original mileage data is abnormal jump data one by one according to the original mileage data corresponding to the original mileage time one by one, if so, updating the abnormal jump data, and after judging and updating all the original mileage data, calculating the relative mileage within the preset time range or the relative mileage between any two original mileage time points. Or, the processor implements the following processes when executing the computer program: acquiring original oil consumption data which are generated by a vehicle within a preset time range and correspond to original oil consumption time one by one; judging whether each original oil consumption data is abnormal jumping data one by one according to original oil consumption data corresponding to the original oil consumption time one by one, and if so, updating the abnormal jumping data; and after all the original oil consumption data are judged and updated, calculating the relative oil consumption within the preset time range or the relative oil consumption between any two original oil consumption time points.

Based on the same inventive concept, another embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements all the steps of the above data cleansing method based on time-series data auto-increment features, for example, the processor implements the following processes when executing the computer program: acquiring original mileage data which are generated by a vehicle within a preset time range and correspond to original mileage time one by one; and judging whether each original mileage data is abnormal jump data one by one according to the original mileage data corresponding to the original mileage time one by one, if so, updating the abnormal jump data, and after judging and updating all the original mileage data, calculating the relative mileage within the preset time range or the relative mileage between any two original mileage time points. Or, the processor implements the following processes when executing the computer program: acquiring original oil consumption data which are generated by a vehicle within a preset time range and correspond to original oil consumption time one by one; judging whether each original oil consumption data is abnormal jumping data one by one according to original oil consumption data corresponding to the original oil consumption time one by one, and if so, updating the abnormal jumping data; and after all the original oil consumption data are judged and updated, calculating the relative oil consumption within the preset time range or the relative oil consumption between any two original oil consumption time points.

In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the technical solutions mentioned above may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the data cleansing method based on the time-series data auto-increment feature according to various embodiments or some portions of embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:全压孔可疏通式毕托巴流量传感器

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!