Redundant data cleaning method for RFID data stream

文档序号:1846136 发布日期:2021-11-16 浏览:16次 中文

阅读说明:本技术 一种rfid数据流的冗余数据清洗方法 (Redundant data cleaning method for RFID data stream ) 是由 鲁建厦 屠佳苠 包秦 于 2021-08-13 设计创作,主要内容包括:本发明公开了一种RFID数据流的冗余数据清洗方法,包括如下步骤:S1,当数据到达时,对其epc进行k次哈希计算,并映射至Timestamp数组,再进行k次哈希计算,并映射至ReaderID数组中;S2,获取下一个数据:S21,其epc的哈希对应Timestamp的值为0,则其为非冗余数据;S22,否则,在ReaderID中存在任意两个差值且不为0,差值为其readerID减去其epc哈希对应ReaderID的值,则其为非冗余数据;S23,若S22不成立,其timestamp减去其epc的哈希在Timestamp的值≤时间阈值,则其为冗余数据;S24,上报非冗余数据,并更新数组,否则丢弃。(The invention discloses a method for cleaning redundant data of an RFID data stream, which comprises the following steps: s1, when the data arrive, performing hash calculation on the epc for k times, mapping the epc to a Timestamp array, performing hash calculation for k times, and mapping the epc to a ReaderID array; s2, acquiring the next data: s21, if the hash value of the epc corresponding to the Timestamp is 0, the hash value is non-redundant data; s22, otherwise, any two difference values exist in the ReaderID and are not 0, and the difference value is obtained by subtracting the value of the ReaderID corresponding to the epc hash from the readerID, so that the difference value is non-redundant data; s23, if S22 is false, the hash of the Timestamp minus the epc is not more than the time threshold value when the value of the Timestamp is not more than the time threshold value, and the hash is redundant data; and S24, reporting the non-redundant data and updating the array, otherwise, discarding.)

1. A method for cleaning redundant data of an RFID data stream is characterized by comprising the following steps:

s1, when the RFID datax i When arriving, pairx i .epcTo carry outkSub-independent hash function computationkE {1,2,3, …, n }, n representing a natural number, i.e., a natural numberHash 1(x i .epc),…,Hash k (x i .epc) And mapped to a Timestamp array, pairx i .epcTo perform two roundskSub-independent hash function computationHash 1[Hash 1(x i .epc)],…,Hash k [Hash k (x i .epc)]And mapping to a ReaderID array; the RFID datax i Is a set of three sets of data in the format ofx i =<epc,readerID, timestamp>Wherein, in the step (A),epcis the unique identification number of each label,readerIDto read the reader address of the tag, i.e. to indicate the location of the tag,timestampthe time stamp of the tag is read for the reader,ie {1,2,3, …, n }, n representing a natural number;

s2, comparing the next datax i+1 Time ofx i+1 .timestampAnd positionx i+1 .readerIDInformation and values in two-dimensional arrayx i+1 Whether the data is redundant data or not comprises the following steps:

s21, if Timestamp [ is present ]Hash k (x i+1 .epc)]=0,k∈{1,2,…,nIs judgedx i+1 Is non-redundant data;

s22, otherwise, for allk∈{1,2,…,n}, if there is onex i+1 .readerID-ReaderID[Hash k (Hash k (x i+1 .epc)]=ε k And any two difference values in the ReaderID arrayε k Equal and not 0, and judgex i+1 Is non-redundant data;

s23, if S22 does not hold, for allk∈{1,2,…,nGet it out ifx i+1 .timestamp-Timestamp[Hash k (x i+1 .epc)]≤τThen, thenx i+1 For redundant data, otherwisex i+1 For non-redundant data, saidτIs a time threshold;

s24, ifx i+1 For non-redundant data, thenx i+1 Reporting and updatingkTwo-dimensional array:

Timestamp[Hash k (x i+1 .epc)]=x i+1 .timestamp(k=1,2,…,n)

ReaderID[Hash k (Hash k (x i+1 .epc))]=x i+1 .readerID(k=1,2,…,n);

if it is notx i+1 And the data is redundant data, and is discarded.

2. The method of claim 1, wherein the mapping is computed using a hash functionValues corresponding to positions in the array, and comparing the datatimestampPut into the place of the Timestamp array,readerIDinto the location of the ReaderID array.

3. The method of claim 2, wherein the value calculated by the hash function is a positive integer used to mark the number of bits in the array, so that the value calculated by the hash function corresponds to a position in the array.

4. The method according to claim 1, wherein in S1, the first pair isx i .epcTo carry outkThe secondary independent hash function calculation is used for mapping to the Timestamp array, and then the obtained hash value is subjected to a second roundkA sub-independent hash function calculation for mapping to the ReaderID array.

5. The method for flushing redundant data of an RFID data stream of claim 1, wherein before S1, initialization is performed to set the Timestamp array and the ReaderID array to 0 simultaneously.

6. The method of claim 1, wherein the reader/writer is a logical reader/writer, and a plurality of readers/writers monitoring the same area are called logical readers/writers.

7. Method for redundant data cleansing of an RFID data stream according to one of the claims 1 to 6, characterized in that an RFID data stream is constructedS={x 1,x 2,x 3,…,x nOf said S1x i Is thatx 1When the execution of the S34 is finished, the S34 is executedi=i+1 and return to S31 until the last entry of the data streamx nS34 is complete.

Technical Field

The invention relates to the technical field of data filtering, in particular to a method for cleaning redundant data of an RFID data stream.

Background

With the popularization of the RFID technology in the discrete workshop, the RFID network in the manufacturing workshop becomes more complex, and the number of tag objects is increased, so that most of the existing data cleaning algorithms are only suitable for the application scene with relatively single and simple reading and writing environment, while the RFID reading and writing network in the discrete manufacturing workshop is huge, numerous production elements and complex reading environment, so that the cleaning accuracy and the real-time performance cannot be guaranteed.

RFID data washing, can filter unreliable data (redundant reading, reading more and neglect reading) in the RFID data stream, traditional RFID redundant data filtering algorithm includes: bloom Filter algorithm (BF), Time Bloom Filter algorithm (TBF), Time Interval Bloom Filter algorithm (TIBF). The RFID data flow in the discrete manufacturing workshop has the characteristics of large data volume, high real-time performance and strong spatiality, and in the redundant data filtering of the mass RFID data flow, a Bloom Filter (BF) has the advantages of high time and space efficiency, but also has the defects of single data record dimension and false positive misjudgment, so that a time bloom filter algorithm (TBF) is provided, the TBF algorithm changes a bit array into a time array, the mass redundant data can be filtered, and the algorithm has high real-time performance. However, with the frequent transitions (dynamic property) of the tag, the RFID data stream has strong spatiality, resulting in a sharp increase of the error rate of the TBF algorithm.

Therefore, in a discrete manufacturing environment where the RFID tag objects are large in quantity and flowability, the cleaning effect needs to be improved (the error rate needs to be further reduced).

Disclosure of Invention

In order to solve the defects of the prior art, the invention provides a Time Bloom Filter algorithm (TRBF) considering the address of a reader-writer based on a TBF algorithm, thereby realizing the purposes of improving the cleaning effect and reducing the error rate, and adopting the following technical scheme:

a method for cleaning redundant data of an RFID data stream comprises the following steps:

s1, when the new RFID datax i When arriving, pairx i .epcTo carry outkSub-independent hash function computationkE {1,2,3, …, n }, n representing a natural number, i.e., a natural numberHash 1(x i .epc),…,Hash k (x i .epc) And mapped to a Timestamp array, pairx i .epcTo perform two roundskSub-independent hash function computationHash 1[Hash 1(x i .epc)],…,Hash k [Hash k (x i .epc)]And mapping to a ReaderID array; the RFID datax i Is a set of three sets of data in the format ofx i =<epc,readerID,timestamp>Wherein, in the step (A),epcis the unique identification number of each label,readerIDto read the reader address of the tag, i.e. to indicate the location of the tag,timestampthe time stamp of the tag is read for the reader,ie {1,2,3, …, n }, n representing a natural number; the two times of hash are to ensure that the positions mapped in the Timestamp array and the ReaderID array are different as much as possible, so that the time and the position information of the data are mutually staggered in the two-dimensional array, and the misjudgment probability is further reduced;

s2, comparing the next datax i+1 Time ofx i+1 .timestampAnd positionx i+1 .readerIDInformation and values in two-dimensional arrayx i+1 Whether the data is redundant data or not comprises the following steps:

s21, if Timestamp [ is present ]Hash k (x i+1 .epc)]=0,k∈{1,2,…,nIs judgedx i+1 Is non-redundant data; because the hash mappings of the same epc are the same, in a certain group of mappings, if only one bit of the mapping in an array is 0, the assignment is not performed, so that the data can be directly judged to arrive for the first time, and therefore, the data is non-redundant data;

s22, otherwise, for allk∈{1,2,…,n}, if there is onex i+1 .readerID-ReaderID[Hash k (Hash k (x i+1 .epc)]=ε k And any two difference values in the ReaderID arrayε k Equal and not 0, thenExplanation labelx i+1 .epcAddress of the reader/writerx i+1 .readerIDData changed, i.e. the tag moved between the readers and the data is known from definition 3x i+1 Is data generated by another reader-writer, and can be directly judgedx i+1 Is non-redundant data;

s23, if S22 is not true, then the judgment is made in time, that is, all the signals are processedk∈{1,2,…,nGet it out ifx i+1 .timestamp-Timestamp[Hash k (x i+1 .epc)]≤τThen, thenx i+1 For redundant data, otherwisex i+1 For non-redundant data, saidτIs a time threshold;

s24, ifx i+1 For non-redundant data, thenx i+1 Reporting and updatingkTwo-dimensional array:

Timestamp[Hash k (x i+1 .epc)]=x i+1 .timestamp(k=1,2,…,n)

ReaderID[Hash k (Hash k (x i+1 .epc))]=x i+1 .readerID(k=1,2,…,n);

if it is notx i+1 And the data is redundant data, and is discarded.

Specifically, the mapping is to map the value calculated by the hash function to the position in the array and then map the datatimestampPut into the place of the Timestamp array,readerIDinto the location of the ReaderID array.

Specifically, the value calculated by the hash function is a positive integer and is used for marking the number of bits of the array, so that the value calculated by the hash function corresponds to a position in the array.

Specifically, in S1, the first step is tox i .epcTo carry outkSub-independent hash function computationMapping the hash value to a Timestamp array, and performing a second round on the obtained hash valuekA sub-independent hash function calculation for mapping to the ReaderID array.

Specifically, before S1, when the TRBF algorithm is initialized, the Timestamp array and the ReaderID array are set to 0 at the same time.

Specifically, the reader is a logic reader, in order to improve the identification probability of the tag, multiple readers are deployed in some monitoring areas, and multiple readers monitoring the same area are called the same logic reader.

In particular, an RFID data stream is constructedS={x 1,x 2,x 3,…,x nOf said S1x i Is thatx 1When the execution of the S34 is finished, the S34 is executedi=i+1 and return to S31 until the last entry of the data streamx nS34 is complete.

The invention has the advantages and beneficial effects that:

the invention provides a time bloom filtering method (TRBF) considering reader address, which not only keeps the advantages of high cleaning speed of BF algorithm and TBF algorithm, small occupied memory space and high mass data flow filtering efficiency, but also can use data flow generated by dynamic label to clean redundant data, and simulation result shows that the TRBF algorithm has lower error rate compared with similar inventions (TBF and TIBF).

Drawings

FIG. 1 is a schematic diagram of the TRBF algorithm of the present invention.

Fig. 2 is a flow chart of the method of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

The invention belongs to the filtration of RFID redundant data, and designs a discrete workshop production condition aiming at massive RFID data streams in a production environmentConsidering a Time Bloom Filter algorithm (TRBF) of a reader address for cleaning redundant data of an RFID data stream, wherein the TRBF takes another attribute in label data, namely the reader address, into consideration on the basis of the TBF algorithmreaderIDThe attribute can also be understood as the physical position of the tag, and the improved TRBF algorithm expands the one-dimensional time array of the TBF into two-dimensional arrays, namely the time array Timestamp and the reader address array ReaderID. The method comprises the following steps:

definition 1: the RFID data stream generated in any time period is defined asS={x 1,x 2,x 3,…,x nAny one of the RFID datax i Is a set of three sets of data in the format ofx i =<epc,readerID,timestamp>Wherein, in the step (A),epcis the unique identification number of each label,readerIDto read the reader address of the tag, i.e. to indicate the location of the tag,timestampthe time stamp of the tag is read for the reader,ie {1,2,3, …, n }, n representing a natural number.

Definition 2: under the same logic reader-writer (in order to improve the identification probability of the tag, a plurality of reader-writers are deployed in some monitoring areas, the reader-writers for cooperatively monitoring the same area are called the same logic reader-writer, and no specific description is given, the reader-writers are reader-writers with different logics), the existing data streamSIn which there is datax i Andx i’ Sii’) And is andx i .epc=x i’ .epcif, ifx i’ .timestamp-x i .timestampτ(whereinτA time threshold set for a user in an actual application environment), and the data is considered at this timex i’ Relative tox i Is a redundant data that is a redundant data,i,i’e {1,2,3, …, n }, n representing a natural number.

Definition 3: existing data streams under different readersSIn which data of different readers and writers are presentx i Andx j Sij) And is andx i .epc=x j .epcif, ifx j .timestamp-x i .timestampτBut, however, dox i .readerIDx j .readerIDThen the data is identified at this timex j Relative tox i Instead of the redundant data, the data is,i,je {1,2,3, …, n }, n representing a natural number.

Aiming at the characteristics of massive RFID data stream and strong spatiality in a discrete manufacturing workshop, a Time and reader ID Bloom Filter (TRBF) algorithm considering reader-writer addresses is provided, and as shown in FIG. 1, tag1, tag2 and tag3 represent different tags.

The core idea of the TRBF algorithm is to maintain two arrays of Timestamp and ReaderID, and the Timestamp is used for identifying the tagepcTo carry outkAnd secondary random Hash mapping, namely mapping the mapping result into a Timestamp array and a ReaderID array respectively, and comparing the time information and the reader-writer address information of the tag with the data in the Timestamp array and the ReaderID array respectively so as to judge whether the data is redundant data.

As shown in FIG. 2, when there is new RFID datax i When it arrives, it is determinedx i If the data is redundant data, the TRBF algorithm executes the redundant cleaning steps as follows:

1. when the TRBF algorithm is in an initialization state, the Timestamp array and the ReaderID array are simultaneously set to 0.

2. When new data is presentx i When arriving, pairx i .epcTo carry outkSub-independent hash function computationkE {1,2,3, …, n }, n representing a natural number, i.e., a natural numberHash 1(x i .epc),…,Hash k (x i .epc) Mapping to a Timestamp array, and performing two hash functions simultaneouslyHash 1[Hash 1(x i .epc)],…,Hash k [Hash k (x i .epc)]And the two times of hash are carried out to ensure that the positions mapped in the Timestamp array and the ReaderID array are different as much as possible, so that the time and the position information of the data are mutually staggered in the two-dimensional array, and the misjudgment probability is further reduced.

Specifically, each piece of RFID datax i In the format of<epc,readerID,timestamp>(tag identification, reader address, Timestamp), mapping to values in the Timestamp array, performing Hash calculation on the epc, corresponding the obtained values to the positions in the Timestamp array, and then mapping the datatimestampPut into place in the array, for example:Hash 1 (x i .epc) =2, willx i Timestamp is assigned to the third bit of the Timestamp array (the first bit is 0), Timestamp [2 ]]=x i .timestamp。

The same holds true for values in the ReaderID array, but two hashes are performedHash 1 [Hash 1 (x i .epc)]The values in the array being the reader addressx i .readerIDSince the address of the reader/writer, unlike the time stamp, increases with the increase of time and cannot be directly determined by the difference, the determination method is different.

3. By comparing the next datax i+1 Time ofx i+1 .timestampAnd positionx i+1 .readerIDInformation and values in two-dimensional arrayx i+1 Whether the data is redundant data or not comprises the following steps:

(1) if Timestamp [ is present ]Hash k (x i+1 .epc)]=0,k∈{1,2,…,nCan directly judgex i+1 Is non-redundant data. Since the hash mapping for the same epc is the same,therefore, in a certain set of mapping, if only one bit of mapping in the array is 0, it indicates that no assignment is made, so that it can be directly determined that the data arrives for the first time, and thus is non-redundant data.

(2) Otherwise, for allk∈{1,2,…,n}, if there is onex i+1 .readerID-ReaderID[Hash k (Hash k (x i+1 .epc)]=ε k And any two difference values in the ReaderID arrayε k Equal and not 0, indicating a labelx i+1 .epcAddress of the reader/writerx i+1 .readerIDData changed, i.e. the tag moved between the readers and the data is known from definition 3x i+1 Is data generated by another reader-writerx j Can directly judgex i+1 Is non-redundant data.

(3) If (2) is not true, then the judgment in time is made, that is, allk∈{1,2,…,nGet it out ifx i+1 .timestamp-Timestamp[Hash k (x i+1 .epc)]≤τThen, thenx i+1 For redundant datax i’ Otherwisex i+1 Is non-redundant data.

(4) If it is notx i+1 If the data is non-redundant data, the data is reported and updated simultaneouslykTwo-dimensional array:

Timestamp[Hash k (x i+1 .epc)]=x i+1 .timestamp(k=1,2,…,n)

ReaderID[Hash k (Hash k (x i+1 .epc))]=x i+1 .readerID(k=1,2,…,n);

if it is notx i+1 Is redundant data, then directlyIt is discarded.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种应用于数字化企业的数据联动分析系统及方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!