House type structure analysis method and device

文档序号：192488 发布日期：2021-11-02 浏览：49次中文

阅读说明：本技术 一种房型结构分析方法及装置 (House type structure analysis method and device ) 是由虞秀华刘豪杰俞俊明费智翔刘安曾呈白付壮于 2021-07-26 设计创作，主要内容包括：本发明公开了一种房型结构分析方法及装置,该方法包括：步骤S1,获取三维物体点云数据以及三维空气点云数据；步骤S2,对三维物体点云识别得到家具点云,并对三维物体点云数据经由二维物体投影得到旋转矫正矩阵,根据旋转矫正矩阵对三维物体点云、三维空气点云以及家具点云进行旋转矫正；步骤S3,结合矫正后的三维物体点云与各点所属家具信息生成抗干扰的二维俯视角物体投影与法向量投影,并分别生成二维房间掩膜、各房间所属的二维角点、角点之间的关联性；步骤S4,依据角点生成房间的局部边界以及楼层的全局边界,并基于边界的角点、平行距离进行优化；步骤S5,融入家具位置与通道位置信息,生成二维房型结构。(The invention discloses a house type structure analysis method and a device, wherein the method comprises the following steps: step S1, acquiring three-dimensional object point cloud data and three-dimensional air point cloud data; step S2, identifying the three-dimensional object point cloud to obtain furniture point cloud, projecting the three-dimensional object point cloud data through a two-dimensional object to obtain a rotation correction matrix, and performing rotation correction on the three-dimensional object point cloud, the three-dimensional air point cloud and the furniture point cloud according to the rotation correction matrix; step S3, combining the corrected three-dimensional object point cloud and the furniture information of each point to generate an anti-interference two-dimensional depression angle object projection and a normal vector projection, and respectively generating a two-dimensional room mask, two-dimensional angular points of each room and the relevance among the angular points; step S4, generating local boundaries of the room and global boundaries of the floor according to the corner points, and optimizing based on the corner points and the parallel distance of the boundaries; and step S5, integrating the furniture position and the channel position information to generate a two-dimensional house type structure.)

1. A house type structure analysis method comprises the following steps:

step S1, scanning a room needing structural analysis by using an RGBD camera and an odometer camera to obtain RGBD camera data and odometer camera data, and converting the RGBD camera data and the odometer camera data into three-dimensional object point cloud data and three-dimensional air point cloud data;

step S2, identifying the three-dimensional object point cloud to obtain furniture point cloud data, projecting the three-dimensional object point cloud data through a two-dimensional object to obtain a rotation correction matrix, and performing rotation correction on the three-dimensional object point cloud, the three-dimensional air point cloud and the furniture point cloud according to the obtained rotation correction matrix;

step S3, combining the corrected three-dimensional object point cloud and the furniture information of each point to generate an anti-interference two-dimensional depression angle object projection and a normal vector projection, and respectively generating a two-dimensional room mask, two-dimensional angular points of each room and the relevance between the angular points through three groups of neural networks;

step S4, generating local boundaries of the room and global boundaries of the floor according to the corner points, and simultaneously optimizing based on the corner points and the parallel distance of the boundaries;

and step S5, integrating the furniture position information and the channel position information to generate the final two-dimensional house type structure.

2. The house type structure analysis method of claim 1, wherein the step S2 further comprises:

step S200, processing the three-dimensional object point cloud data by utilizing sparse convolution, deepening a channel after each down-sampling convolution for the processed data based on a U-Net network, then symmetrically performing up-sampling, gradually reducing the number of the channels, and finally taking the category of the maximum probability value as the furniture category classification result after a full connection layer and Softmax, thereby obtaining furniture point cloud data;

step S201, a rotation correction matrix is obtained by projecting the original three-dimensional object point cloud data through a two-dimensional object, and rotation correction is carried out on the three-dimensional object point cloud, the three-dimensional air point cloud and the furniture point cloud data according to the rotation correction matrix.

3. A house type structure analysis method according to claim 2, characterized in that: in step S201, according to the original three-dimensional object point cloud data, extracting a ground plane as an xy plane, then rotating a z-axis to a direction perpendicular to the ground plane, and making a z-axis coordinate at a lowest point be 0 to obtain a first transformation matrix; and then projecting the three-dimensional object point cloud data to a two-dimensional plane to obtain a projected image, determining a line segment main direction by using a Hough transform detection line segment in the projected image, rotating the plane of the projected image around a z axis according to the angle of the line segment main direction, and taking the xy coordinate minimum point as a coordinate origin to obtain a second transform matrix, thereby obtaining a final rotation transform matrix according to the first transform matrix and the second transform matrix.

4. A house type structure analysis method according to claim 2, characterized in that: in step S201, during the rotation correction, the number and length of the segments are counted according to different angles, the segment direction with the longest total length is regarded as the segment main direction, the rotation angle is limited within 45 degrees, and the main direction of the segment is rotated to the main direction or the direction perpendicular to the main direction, so as to achieve the purpose of fine adjustment.

5. A house type structure analysis method according to claim 2, characterized in that: step S3 further includes:

step S300, extracting interest points according to the corrected three-dimensional object point cloud and the furniture point cloud, and outputting two-dimensional depression angle object projection and two-dimensional depression angle normal vector projection of noise resistance;

step S301, calculating a characteristic diagram of the input anti-noise two-dimensional depression angle object projection diagram by using a classical Mask-RCNN network, generating candidate bounding boxes of each room by combining an area generation network, performing semantic segmentation on pixels in each bounding box, and extracting a room Mask of the bounding box;

step S302, according to the two-dimensional room mask extracted in the step S301 and the anti-noise two-dimensional depression object projection image and the anti-noise two-dimensional depression normal vector projection image obtained in the step S300, an encoder is formed on the basis of the convolutional layer with cavity convolution to extract features of the images, then the feature images pass through a decoder based on a ResNet network structure, and two parallel branches of the convolutional layer are input, so that two-dimensional coordinates with angular points and probability confidence values of the angular points extending out of a wall surface line segment in a plurality of directions are obtained;

and step S303, combining the corner point information obtained in the step S302 with a room mask, a two-dimensional depression angle object projection and a two-dimensional depression angle normal vector projection, and combining local room information and integral floor information in a neural network, so as to predict the probability value of each corner point for each room in sequence.

6. The house type structure analysis method of claim 5, further comprising, after step S3:

and step S304, sequencing all the rooms according to the number of pixel values occupied by the masks, traversing and calculating the coincidence rate between the masks of the rooms, combining the masks of the small rooms and the corner information into the large room for the room with the coincidence rate exceeding a preset threshold value, and deleting the duplicate points.

7. The house type structure analysis method of claim 6, wherein the step S4 further comprises:

step S400, aiming at the corner points which are associated and combined in each room, optimizing and generating a local boundary which belongs to the room based on a slope;

step S401, local boundaries are merged to generate global boundaries, and the boundaries are closed and merged respectively based on corner points between boundary lines and parallel boundary distances.

8. The house type structure analysis method of claim 7, wherein: in step S400, all the corner points shared by multiple rooms are locked and unchanged, and for local corner points of a single room, slope values of two continuous edges and corresponding included angles are calculated, and if the slope values are smaller than a preset threshold value, the two continuous edges are combined into the same edge.

9. The house type structure analysis method of claim 7, wherein the step S5 further comprises:

s500, fusing the optimized global boundary and channel position information in the air point cloud;

and S501, generating a final two-dimensional house type structure according to the corrected three-dimensional furniture classification point cloud data and the boundary line segment information of the channel and the door which is merged in the step S500.

10. A house type structure analysis apparatus comprising:

the system comprises a point cloud data generation unit, a data acquisition unit and a data conversion unit, wherein the point cloud data generation unit is used for scanning a room needing structural analysis by using an RGBD camera and an odometer camera to obtain RGBD camera data and odometer camera data and converting the RGBD camera data and the odometer camera data into three-dimensional object point cloud data and three-dimensional air point cloud data;

the rotation correction unit is used for identifying the three-dimensional object point cloud to obtain furniture point cloud data, projecting the three-dimensional object point cloud through a two-dimensional object to obtain a rotation correction matrix, and performing rotation correction on the three-dimensional object point cloud, the three-dimensional air point cloud and the furniture point cloud according to the rotation correction matrix;

the room mask and corner point extraction association unit is used for generating anti-interference two-dimensional depression angle object projection and normal vector projection by combining the corrected three-dimensional object point cloud and the furniture information to which each point belongs, and respectively generating two-dimensional room masks, two-dimensional corner points to which each room belongs and associations among the corner points through three groups of neural networks;

the boundary generating unit is used for generating local boundaries of the room and global boundaries of the floors according to the angular points, and optimizing the local boundaries and the global boundaries based on the angular points, the parallel distances and other factors of the boundaries;

and the house type structure generating unit is used for integrating the furniture position information and the channel position information to generate a final two-dimensional house type structure.

Technical Field

The invention relates to the technical field of computer vision and machine learning, in particular to a house type structure analysis method and device.

Background

With the popularization and wide development of computer technology, people can acquire house source information more conveniently and quickly. At present, on a house property trading platform, each house can be connected with a corresponding house type graph, the house type graph describes the overall property of the house, information such as the pattern, the plan, the orientation, the area size and the like of the whole house can be visually seen from the house type graph, and the house type graph is an essential information source for a customer to know the house.

In order to provide real house source information for the client, most house property trading platforms display the house type graph and the detailed information corresponding to the house type graph, for example, the detailed information of the house type graph is included, the house type graph is formed by which compartments, the specific information of each compartment, and the like.

However, at present, house layout drawings and detailed information thereof are manually recorded in a manual offline mode, and the manual offline recording mode is time-consuming and labor-consuming and can cause differences caused by different human measurement standards. Moreover, the number of houses is large, and each house type graph is almost impossible to complete through manual offline entry.

The closest prior art to the present invention is an ICCV (IEEE International Conference on Computer Vision) technical scheme published in 2019 at the IEEE initiative: "Floor-SP, Inverse CAD for Floor by Sequential root Room-with short Path", i.e. Floor-SP: the technical scheme is that room segmentation is firstly carried out on a two-dimensional projection graph of room three-dimensional scanning, so that the plan graph of the whole floor is simplified into reconstruction of a plurality of single room polygons, a mask and a whole projection of each room are used as input of subsequent processing, for each room, the association between corner points and corner points of the room is firstly identified through a neural network, then an energy function related to the points and lines is constructed, and the room structure graph is expected to have the smallest possible corner points and edges by minimizing the energy function value, and the corner points and the edges are shared by more rooms. And combining the results of different rooms in the last loop, thereby outputting the final room type structure diagram.

However, this solution may have some disadvantages: for example, when the house data is not a blank house, but has a case of interference items of furniture, residents and the like, the simple two-dimensional projection data tends to become very dirty, thereby greatly affecting the accuracy of subsequent prediction; optimization based on an energy function is accompanied by extremely strong prior, and the structure is often too simple, so that two parallel but close walls are connected into one wall possibly for one line less; if a heuristic algorithm is adopted, the resources and time consumed by calculation are huge.

In summary, the Floor-SP technical solutions in the prior art have some technical problems to be solved:

1. the projection process for the data is too simple: the two-dimensional map as input data is easily influenced by other objects such as furniture, residents, temporary placement objects and the like, and a plurality of other miscellaneous points and lines are mixed in the projection map besides the wall surface projection, so that the accuracy and the reliability of subsequent prediction are greatly influenced;

2. the lack of pre-processing steps necessary before data projection: the three-dimensional model obtained by modeling based on the mainstream method at present often follows a coordinate system of the three-dimensional model, rather than a world coordinate system of a scanned object in the real world, it is impossible to ensure that the main wall direction in the input model is parallel to the x axis and the y axis. Therefore, if the main direction of the input model cannot be automatically calibrated, the results of multiple predictions may be different, thereby affecting the reliability of the model, and the method is also contrary to the prior assumption that the main direction is mostly horizontal and vertical in the optimization process;

3. energy function optimization based on strong prior: the energy function is usually constructed on the basis of its prior understanding of the object, e.g. as many corner points and lines as possible should be shared. However, such a priori not only makes the final output sometimes too simple, but also makes some predictions of situations that are not anticipated in advance difficult. And the heuristic optimization method used by the method consumes a large amount of computing resources and time when the situation is complex.

4. The result is lack of in-room object information: in many applications based on house-type structures, such as viewing a house or interior, users are concerned about the furniture inside the house, in addition to the basic large house type. The prior art is not at all concerned with this.

5. Lack of channel information in house type: the connection between rooms and the position of the door are also important parts of the house type. However, the existing method does not produce any channel structure, even if the corner points of the channel are obtained in the detection, the channel structure is optimized by the energy function because the channel structure is positioned on the same straight line with the corner points of the whole wall surface.

Disclosure of Invention

In order to overcome the defects of the prior art, the present invention provides a house type structure analysis method and device, which is used for realizing the house type structure analysis based on computer vision and scene object recognition technology.

In order to achieve the above object, the present invention provides a house type structure analysis method, which comprises the following steps:

and step S5, integrating the furniture position information and the channel position information to generate the final two-dimensional house type structure.

Preferably, the step S2 further includes:

Preferably, in step S201, according to the original three-dimensional object point cloud data, extracting a ground plane as an xy plane, then rotating a z-axis to a direction perpendicular to the ground plane, and making a z-axis coordinate at the lowest point be 0 to obtain a first transformation matrix; and then projecting the three-dimensional object point cloud data to a two-dimensional plane to obtain a projected image, determining a line segment main direction by using a Hough transform detection line segment in the projected image, rotating the plane of the projected image around a z axis according to the angle of the line segment main direction, and taking the xy coordinate minimum point as a coordinate origin to obtain a second transform matrix, thereby obtaining a final rotation transform matrix according to the first transform matrix and the second transform matrix.

Preferably, in step S201, during the rotation correction, the number and length of the segments are counted according to different angles, the segment direction with the longest total length is regarded as the segment main direction, the rotation angle is limited within 45 degrees, and the segment main direction is rotated to the main direction or the direction perpendicular to the main direction, so as to achieve the purpose of fine adjustment.

Preferably, the step S3 further includes:

Preferably, after step S3, the method further includes:

Preferably, the step S4 further includes:

step S400, aiming at the corner points which are associated and combined in each room, optimizing and generating a local boundary which belongs to the room based on a slope;

Preferably, in step S400, all the corner points shared by multiple rooms are locked and unchanged, and for local corner points of a single room, slope values of two continuous edges and corresponding included angles are calculated, and if the slope values are smaller than a preset threshold, the two continuous edges are combined into the same edge.

Preferably, the step S5 further includes:

s500, fusing the optimized global boundary and channel position information in the air point cloud;

In order to achieve the above object, the present invention also provides a house type structure analysis apparatus, comprising:

and the house type structure generating unit is used for integrating the furniture position information and the channel position information to generate a final two-dimensional house type structure.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention combines with a semantic segmentation network based on a submanibed spark simplified relational network, in order to prevent the recognition error of other recognition points mixed in a certain object, the recognition result is removed from the floor and the wall surface to obtain mutually unconnected furniture clusters, the number of three-dimensional points belonging to different furniture in each cluster is counted, the class with the largest voting value is taken as the furniture classification of the cluster, the two-dimensional minimum bounding box information of the cluster is obtained, the type and the position of the furniture are embodied in the form of the minimum bounding box in the final house type detection result, and meanwhile, the furniture can be replaced by any furniture model.

2. The invention combines the information of three-dimensional data acquisition, defines the space between a data collector and a data point as an air point besides an object point obtained by scanning a target, combines the position of a wall surface line segment in a house type structure identification result, and digs and marks the position of a channel in the wall surface result line segment, wherein the proportion of the air point near the branching position exceeds the continuous length and the continuous width of a threshold value.

3. According to the method, more direct optimization based on geometric features is adopted for local and global diagonal points and edges, so that more detailed features compared with optimization based on an energy function are reserved while the great calculation speed is improved. The calculation consumption of the optimization method is only linearly increased along with the complexity caused by the factors such as the number of rooms and the shapes of the rooms, so that the stable and high efficiency is kept.

4. The invention does not project the whole three-dimensional floor point cloud data to two dimensions as input, but only projects the identified wall surface part to a two-dimensional plane as input by combining the result of a semantic segmentation network, and sets a threshold value to filter out possible noise and prediction error. Therefore, even if an object exists in the room, only the wall surface related information exists in the input projection data, and the simplicity and the accuracy can be kept.

5. The method combines Hough transform to detect two-dimensional straight lines in a preprocessing stage, calculates the wall length in each direction and obtains the main direction of three-dimensional data in two-dimensional projection, and rotates the main direction to keep the direction of the main wall of input data parallel to a coordinate axis. The method can eliminate the influence caused by angle deviation in the data modeling process, so that the inputs at different angles have the same output result, the horizontal-vertical prior assumption for wall surface direction correction is more reliable, and the result is more robust.

Drawings

FIG. 1 is a flow chart of a prior art analysis of a house type structure;

FIG. 2 is a flow chart illustrating the steps of a method for analyzing a house type structure according to the present invention;

FIG. 3 is a flow chart of a method for house type structure analysis in an embodiment of the present invention;

FIG. 4 is a flow chart of a three-dimensional point cloud furniture identification network in an embodiment of the present invention;

FIG. 5 is a flow chart of the generation of a rotation correction matrix in an embodiment of the present invention;

FIG. 6 is a flow chart of room mask extraction according to an embodiment of the present invention;

FIG. 7 is a flow chart of single room corner extraction in an embodiment of the present invention;

FIG. 8 is a flow chart of multi-room corner association in an embodiment of the present invention;

FIG. 9 is a schematic diagram of local boundary optimization in an embodiment of the present invention;

FIG. 10 is a diagram illustrating global boundary optimization in an embodiment of the present invention;

fig. 11 is a system architecture diagram of a house type structure analysis apparatus according to the present invention.

Detailed Description

Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.

Fig. 2 is a flowchart illustrating steps of a house type structure analysis method according to an embodiment of the present invention, and fig. 3 is a flowchart illustrating a house type structure analysis method according to an embodiment of the present invention. As shown in fig. 2 and 3, the method for analyzing a house type structure of the present invention includes the following steps:

and step S1, scanning the room to be structurally analyzed by using the RGBD camera and the odometer camera to obtain RGBD camera data and odometer camera data, and converting the RGBD camera data and the odometer camera data into three-dimensional object point cloud data and three-dimensional air point cloud data.

In a specific embodiment of the present invention, the RGBD camera data and the odometer camera data are converted into three-dimensional object point cloud data by a ready-made Slam algorithm, that is, the RGBD camera data (providing one frame of point cloud) and the odometer camera data (providing camera position and orientation) are fused by the Slam algorithm to obtain multiple frames of point clouds (that is, three-dimensional object point clouds), and the three-dimensional air point cloud data, that is, the space between the observed position and the obstacle is calculated by an Octomap algorithm, that is, the three-dimensional air point cloud is a cube that is uniformly divided into equal sizes by the Octomap and each cube corresponds to one air point, and the multiple frames of fusion are air point clouds.

And step S2, identifying the three-dimensional object point cloud to obtain furniture point cloud information, and projecting the three-dimensional object point cloud data through a two-dimensional object to obtain a rotation correction matrix, so that rotation correction is performed on the three-dimensional object point cloud, the three-dimensional air point cloud and the furniture point cloud.

Specifically, step S2 further includes:

and S200, processing the three-dimensional object point cloud data by using sparse convolution, increasing the channel depth by 32 after each downsampling convolution on the processed data based on a U-Net network, and finally, after a full connection layer and Softmax, taking the class of the maximum probability value as a furniture class classification result of the point, thereby obtaining the furniture point cloud data.

Fig. 4 is a flow chart of a three-dimensional point cloud furniture identification network in an embodiment of the present invention, as shown in fig. 4, an input of the network is generated three-dimensional object point cloud data, an output is a furniture point cloud corresponding to the input, but RGB data thereof is replaced with a predicted furniture category, specifically, the network performs scale conversion processing on the input three-dimensional object point cloud data first, in the embodiment of the present invention, aiming at a characteristic that the point cloud data is sparse in space, the present invention processes the point cloud data by using a novel sparse convolution, which only performs convolution operation at a space point where data exists, so that not only can calculation efficiency be improved, but also it can be ensured that spatial information of the data does not "expand" after several convolution operations to become fuzzy, i.e. conv in fig. 4 is a sparse convolution instead of a normal convolution, as shown in fig. 4, the method is based on a U-Net network, a backbone network (backbone) of the U-Net network adopts a ResNet network structure, the depth of a channel is increased by 32 after each down-sampling convolution, then up-sampling is symmetrically carried out, the number of the channels is gradually reduced by 32, and finally the class of the maximum probability value is taken as the classification result of the furniture class at the point after a full connection layer and Softmax.

For the relationship between the three-dimensional object point cloud, the three-dimensional air point cloud and the furniture point cloud, the simple explanation is that the three-dimensional object point cloud is all scanned object data, the three-dimensional air point cloud is a part which is not occupied by the object point cloud in the whole space, and the furniture point cloud is one part of the three-dimensional object point cloud.

Step S201, a rotation correction matrix is obtained by projecting the original three-dimensional object point cloud data through a two-dimensional object, and rotation correction is carried out on the three-dimensional object point cloud, the three-dimensional air point cloud and the furniture point cloud according to the rotation correction matrix.

In order to facilitate subsequent correction of the wall surface direction and make the structure output result more stable, the invention needs to turn the building structure of the point cloud to the main direction of the wall surface to be parallel to the horizontal or vertical direction.

Specifically, as shown in fig. 5, in step S201, according to the original object point cloud file, that is, the three-dimensional object point cloud data obtained in step S1, a ground plane is extracted by a ransac (random sampling consistency algorithm) as an xy plane, then the z axis is rotated to the direction perpendicular to the ground plane, and the z axis coordinate at the lowest point is made to be 0, so as to obtain a transformation matrix T1; then, the three-dimensional object point cloud data is projected to a two-dimensional plane to obtain a projected image, a main direction of a line segment is determined by using hough transform detection line segments in the projected image (the line segment direction with the longest total length is taken as the main direction of the line segment), the plane of the projected image is rotated around a z axis according to the angle of the main direction of the line segment, namely, the plane of the projected image is rotated around the z axis until the line segment corresponding to the wall surface is parallel to a main direction of coordinate axes (namely, the direction of an x axis or the direction of a y axis), and a minimum value point of xy coordinates is taken as a coordinate origin to obtain a transform matrix T2, so that a final rotation transform matrix T is obtained according to the transform matrices T1 and T2, namely T is T2T 1.

Specifically, in the present embodiment, it is assumed that an xy plane is parallel to a horizontal plane and a z axis is perpendicular to the horizontal plane in the three-dimensional object point cloud, and then the xy plane is rotated about the z axis. Firstly, for the z axis, a plane is extracted through ransac, because the plane with the largest number of points can be extracted, a ground plane can be obtained, because the number of points of the whole ground in a floor is certainly more than that of a wall, and the first rotation matrix can be obtained by taking the normal vector direction of the ground plane as the z axis.

For the rotation of xy plane, the three-dimensional object point cloud data is projected to the two-dimensional plane to obtain a projection image, the single channel gray value of a certain pixel represents the number of points with different heights under the plane coordinate, the Hough transform is carried out on the projection image to detect line segments, because objects such as floors, furniture and the like can filter out points except wall surfaces according to the result of furniture identification during projection, and in addition, because of the existence of factors such as scanning error, noise and the like, a plurality of scattered short lines exist in the Hough transform result (if the shortest length of the line segments is set to filter out the lines, a plurality of short wall corner structures are lost), therefore, for the purpose of resisting interference, the invention counts the number and the length of the line segments according to different angles, takes the line segment direction with the longest total length as the main direction of the line segments, and limits the rotation angle within 45 degrees in order to carry out the idea of keeping original data information as much as possible, and rotating the main axis (namely the main direction of the line segment) of the data to the main direction of the coordinate axis (namely the direction of the x axis or the direction of the y axis) so as to achieve the aim of fine adjustment, thereby obtaining a second transformation matrix, and multiplying the second transformation matrix by the first transformation matrix to obtain a complete rotation correction matrix.

And finally, respectively carrying out rotation correction on the three-dimensional object point cloud, the three-dimensional air point cloud and the furniture point cloud by using the finally obtained rotation correction matrix.

And step S3, combining the corrected three-dimensional object point cloud and the furniture information of each point to generate an anti-interference two-dimensional depression angle object projection and a normal vector projection, and respectively generating a two-dimensional room mask, two-dimensional angular points of each room and the relevance among the angular points through three groups of neural networks.

Specifically, step S3 further includes:

and S300, extracting interest points through the furniture point cloud according to the corrected three-dimensional object point cloud, and outputting two-dimensional depression angle object projection and two-dimensional depression angle normal vector projection which are resistant to noise, wherein two-dimensional projection images used in subsequent prediction are both the two projections.

Because the subsequent network detects the independent room information and the corresponding angular point information based on the two-dimensional contour, the three-dimensional point cloud data is projected to the two-dimensional plane, so that the single-channel gray value of a certain pixel represents the number of points with different heights under the plane coordinate, and a projected two-dimensional overlook picture (after furniture identification and filtering) is obtained. However, due to the existence of objects such as floors and furniture, and factors such as scanning errors and noises, the invention sets a threshold (default 50) for the projected two-dimensional overlooking picture as follows, and increases the gray value of wall points while filtering non-wall points, thereby obtaining the anti-noise two-dimensional overlooking object projection:

meanwhile, the average normal vector in the vertical direction of each plane point is calculated in the projection process, and the average normal vector is output as another three-channel projection diagram, namely, the two-dimensional depression normal vector projection is output. The normal vector can greatly assist the detection of the angular point, because the normal vector of the wall surface should be parallel to the ground. In the invention, an angular point refers to a corner where two walls meet, normal vectors of the two walls are all parallel to the ground but have different angles on an xy plane, and a place where lines of two sets of normal vectors meet is the corner.

Step S301, calculating a characteristic diagram of the input anti-noise two-dimensional depression object projection diagram by using a classical Mask-RCNN network, generating candidate bounding boxes of each room by combining an area generation network (RPN), and performing semantic segmentation on pixels in each bounding box to extract a room Mask of the bounding box.

As shown in FIG. 6, the room extraction belongs to an example segmentation task, and the task is extracted by using a classic Mask-RCNN network in the field of the task. The Mask-RCNN network calculates a Feature Map of an input anti-noise two-dimensional depression object projection Map (corrected) through a series of Feature Pyramid Networks (FPNs) with ResNet101 as a backbone (backbone), then obtains a Feature Map (Feature Map) with ROI (region of interest) by combining with a region generation network (RPN), and performs semantic segmentation on pixels in each bounding Box while generating a candidate bounding Box (Box Regression) and a classification result (score, but obtaining score) of each room, and extracts a room Mask (Mask) of the bounding Box.

This is done before the corner point calculation in order to be able to locally simplify the structural detection of the whole house type to the house type detection of a single room. Considering that most rooms are in a tetragonal structure or a similar aggregate, the detection can be performed with high accuracy and low difficulty

Step S302, according to the two-dimensional room mask extracted in the step S301 and the anti-noise two-dimensional depression object projection graph and the anti-noise two-dimensional depression normal vector projection graph obtained in the step S300, the convolution layer based on the cavity convolution forms an encoder to extract features of the pictures, then the feature graph passes through a decoder based on a ResNet network structure and is input into two parallel branches of the convolution layer, and therefore two-dimensional coordinates of all corner points and probability confidence values of the corner points extending out of a wall surface line segment in a plurality of directions are obtained.

In the embodiment of the present invention, as shown in fig. 7, a common two-dimensional room mask, an anti-noise two-dimensional downward-looking object projection image (corrected), and an anti-noise two-dimensional downward-looking normal vector projection image (corrected) are input, and the output corner information includes two-dimensional coordinates of a corner point, and probability confidence values of 36 directions (every 10 degrees) in which the corner point extends out of a wall segment.

Specifically, the room mask calculated in step S301 is expanded to completely cover the two-dimensional overhead projection and the two-dimensional normal vector projection of the room. The picture is characterized by the convolutional layer composition encoder based on the hole convolution (all the conv-s1 and con-s2 parts in the figure), and then the feature map passes through a decoder module based on ResNet (residual part in the figure) and is input into two parallel branches of the convolutional layer. The hole convolution is used here to obtain a large field with a small amount of parameters, while different stride can compress the image. A first branch predicts a set of confidence maps, each confidence map representing a different corner of the room; the second branch predicts the degree of association between corner points, i.e. which two corner points are likely to be connected as a wall. The two groups of branches control the output result to be a probability value within 0-1 through a Sigmoid layer.

And step S303, combining the corner point information obtained in the step S302 with a room mask, a two-dimensional depression angle object projection and a two-dimensional depression angle normal vector projection, and combining local room information and integral floor information in a neural network, so as to predict the probability value of each corner point for each room in sequence.

In the embodiment of the present invention, as shown in fig. 8, the main body of the network feature extraction and inference part (i.e., conv-s1 and conv-s2 in the figure) is also a series of hole convolutions, and then the probability values of the corner points belonging to the room are calculated through the full connection layer and the Sigmoid layer, and the corner points higher than a certain threshold (default 0.5) are considered to belong to the room, and if the probability values of a certain corner point to multiple rooms are higher than the threshold, they are considered to be shared.

Step S304, sequencing all the rooms according to the number of pixel values occupied by the masks, traversing and calculating the coincidence rate between the masks of the rooms, combining the masks of the small rooms and the corner point information into the large room for the room with higher coincidence rate (default is 30%), and deleting the repetition points.

There may not be a complete theorem on the independence of certain rooms. Such as a semi-open kitchen, which is in communication with the living room but is blocked. For rooms like this, the computation of the masks may coincide. In the invention, the situations are combined into the same room, therefore, all the rooms are sorted according to the number of the pixel values occupied by the masks, the coincidence rate between the masks of the rooms is calculated in a traversing manner, and for the room with higher coincidence rate (default of 30%), the masks of the small rooms and the corner point information are combined into the large room, and the repetition point is deleted at the same time.

And step S4, generating local boundaries of the room and global boundaries of the floors according to the corner points, and optimizing based on the corner points of the boundaries, the parallel distance and other factors.

Specifically, step S4 further includes:

and step S400, aiming at the corner points which are associated and combined in each room, optimizing based on the slope and generating the local boundary of the room.

Since the single room structure is generally simple and the corner points are formed around the borders. In the invention, the mask boundary of a single room is obtained by calculation through an eight-field boundary tracking algorithm, the angular points of the mask boundary correspond to the nearest points on the boundary, the sequence of the connection of the angular points is obtained by a boundary traversing method, and the angular points are sequentially connected to obtain the boundary.

The prediction result of the neural network usually has a certain deviation due to the accuracy of prediction or the distribution of the affected data; in addition, in order to not leak the corner point prediction result, the threshold value of the confidence probability is set to be lower, but redundant corner points are obtained. Therefore, some optimization of the corner point prediction results of each room is required. As shown in fig. 9, all the corner points shared by multiple rooms are locked and kept unchanged to prevent the global structure from being affected, and for the local corner points of a single room, the slope values of two continuous edges and the corresponding included angles thereof are calculated (the denominator plus the minimum value is prevented to be 0):

angle＝arctan(k)

if the number of the edges is smaller than a certain threshold value (30 degrees by default), the edges are merged into the same edge. In addition, since the point cloud data is corrected before, most wall surfaces can be assumed a priori to be parallel or perpendicular to the main direction, if the included angle between the line segment and the horizontal or vertical direction is smaller than a certain threshold value (18 degrees by default), the line segment is corrected to be a horizontal or vertical side.

Step S401, local boundaries are merged to generate global boundaries, and the boundaries are closed and merged respectively based on corner points between boundary lines and parallel boundary distances.

As shown in fig. 10, all local edges are merged into one list and duplicate edges are removed. To efficiently compute intersection relationships, global edges are categorized as horizontal edges, vertical edges, and oblique edges, e.g., horizontal edges only intersect vertical edges and oblique edges. And drawing an end point which is not shared with other lines near the corner point to the corner point, thereby realizing the closing of the wall surface structure. Meanwhile, as the multiple walls of most house-type structures are on the same horizontal plane, the parallel edges and the coincident edges are traversed, and the edges with the distance smaller than a certain threshold value (default is 3 pixels) are combined, so that the simplicity of the structure is ensured.

And step S5, integrating the furniture position information and the channel (door) position information to generate the final two-dimensional house structure.

Specifically, step S5 further includes:

and S500, fusing the optimized global boundary and the channel position information in the air point cloud.

And processing the three-dimensional air point cloud data, and defining the positions which are all air within a certain height (default of 0.5 m to 2 m) as passable areas. And comparing the generated wall surface structure data with the passable area data, and if the air point occupation ratio in a certain range of a section of wall surface in a certain continuous length direction (the length is not lower than a threshold value and is 0.5 m by default) and in a certain continuous width direction (the width is not lower than the threshold value and is 0.5 m by default) exceeds a certain threshold value (0.5 by default), determining that the section is the channel structure in the wall surface.

And S501, generating a final two-dimensional house type structure according to the corrected three-dimensional furniture classification point cloud data and the boundary line segment information of the channel and the door which is merged in the step S500.

In the embodiment of the invention, the results of the furniture identification network are removed from the floor and the wall to obtain clusters of furniture which are not mutually connected, the number of three-dimensional points belonging to different furniture in each cluster is counted, the class with the largest voting value is taken as the furniture classification of the cluster, and the two-dimensional minimum bounding box information is obtained.

Fig. 11 is a system architecture diagram of a house type structure analysis apparatus according to the present invention. As shown in fig. 11, the present invention provides a house type structure analysis apparatus, including:

the point cloud data generating unit 1 is configured to scan a room to be structurally analyzed by using an RGBD camera and an odometer camera to obtain RGBD camera data and odometer camera data, and convert the RGBD camera data and the odometer camera data into three-dimensional object point cloud data and three-dimensional air point cloud data.

And the rotation correction unit 2 is used for identifying the three-dimensional object point cloud to obtain furniture point cloud information, projecting the three-dimensional object point cloud data through a two-dimensional object to obtain a rotation correction matrix, and performing rotation correction on the three-dimensional object point cloud, the three-dimensional air point cloud and the furniture point cloud according to the obtained rotation correction matrix.

Specifically, the rotation correcting unit 2 further includes:

and the furniture point cloud information identification module 201 is used for processing the three-dimensional object point cloud data by using sparse convolution, increasing the channel depth by 32 after each time of downsampling convolution on the processed data based on a U-Net network, and finally, after the full connection layer and Softmax, taking the category of the maximum probability value as the furniture category classification result of the point, thereby obtaining the furniture point cloud data.

The projection and rotation correction module 202 is configured to obtain a rotation correction matrix by projecting the original three-dimensional object point cloud data through a two-dimensional object, and perform rotation correction on the three-dimensional object point cloud, the three-dimensional air point cloud, and the furniture point cloud according to the rotation correction matrix.

In order to facilitate subsequent correction of the wall surface direction and enable the structure output result to be more stable, the invention needs to rotate the point cloud building structure to the main direction of the wall surface to be parallel to the horizontal or vertical direction.

In the invention, according to the point cloud data of the three-dimensional object, the projection and rotation correction module 202 extracts a ground plane as an xy plane through ransac (random sampling consistency algorithm), then rotates a z axis to a direction perpendicular to the ground plane, and makes a z axis coordinate at the lowest point be 0 to obtain a transformation matrix T1; then, the three-dimensional object point cloud data is projected to a two-dimensional plane to obtain a projected image, a main direction of a line segment is determined by using hough transform detection line segments in the projected image (the line segment direction with the longest total length is taken as the main direction of the line segment), the plane of the projected image is rotated around a z axis according to the angle of the main direction of the line segment, namely, the plane of the projected image is rotated around the z axis until the line segment corresponding to the wall surface is parallel to a main direction of coordinate axes (namely, the direction of an x axis or the direction of a y axis), and a minimum value point of xy coordinates is taken as a coordinate origin to obtain a transform matrix T2, so that a final rotation transform matrix T is obtained according to the transform matrices T1 and T2, namely T is T2T 1.

And the room mask and corner point extraction association unit 3 is used for generating anti-interference two-dimensional depression angle object projection and normal vector projection by combining the corrected three-dimensional object point cloud and the furniture information to which each point belongs, and respectively generating a two-dimensional room mask, two-dimensional corner points to which each room belongs and associations among the corner points through three groups of neural networks. The three groups of neural networks are respectively a room mask extraction network, a single room corner extraction network and a multi-room corner association network.

Specifically, the room mask and corner extraction associating unit 3 further includes:

the anti-noise processing module 301 is configured to extract points of interest through the furniture point cloud according to the corrected three-dimensional object point cloud, output two-dimensional depression object projection and two-dimensional depression normal vector projection of the anti-noise, and obtain two-dimensional projection images used in subsequent prediction.

meanwhile, the average normal vector of each plane point in the vertical direction is calculated in the projection process, and the average normal vector is output as another three-channel projection diagram, namely, the two-dimensional depression normal vector projection is output. The normal vector can greatly assist the detection of the angular point, because the normal vector of the wall surface is parallel to the ground, in the invention, the angular point refers to the corner where two walls meet, the normal vectors of the two walls are all parallel to the ground but have different angles on an xy plane, and the place where the lines of two pairs of normal vectors meet is the corner.

The room Mask extraction module 302 is configured to calculate a feature map for the input anti-noise two-dimensional depression object projection map by using a classical Mask-RCNN network, and perform semantic segmentation on pixels in each bounding box while generating candidate bounding boxes of each room by using a region generation network (RPN), so as to extract a room Mask of the candidate bounding boxes.

And the single room corner extraction module 303 is configured to extract features from the picture by an encoder composed of convolutional layers based on cavity convolution according to the two-dimensional room mask extracted by the room mask extraction module 302 and the anti-noise two-dimensional depression object projection image and the anti-noise two-dimensional depression normal vector projection image obtained by the anti-noise processing module 301, and then pass the feature images through a decoder based on a ResNet network structure and input two parallel branches of the convolutional layers, so as to obtain two-dimensional coordinates with corners and probability confidence values of the corners extending out of a wall segment in a plurality of directions.

Specifically, the room mask calculated by the room mask extraction module 302 is expanded to completely cover the two-dimensional overhead projection and the two-dimensional normal vector projection of the room. And (3) forming an encoder by the convolutional layer based on the cavity convolution to extract features of the picture, then passing the feature graph through a decoder module based on ResNet, and inputting two parallel branches of the convolutional layer. The hole convolution is used here to obtain a large field with a small amount of parameters, while different stride can compress the image. A first branch predicts a set of confidence maps, each confidence map representing a different corner of the room; the second branch predicts the degree of association between corner points, i.e. which two corner points are likely to be connected as a wall. The two groups of branches control the output result to be a probability value within 0-1 through a Sigmoid layer.

And the multi-room corner association module 304 is configured to combine the corner information obtained by the single-room corner extraction module 303 with a room mask, a two-dimensional depression object projection, and a two-dimensional depression normal vector projection, and combine local room information and integral floor information in a neural network, so as to predict a probability value of each corner for each room in sequence.

The room information merging module 305 is configured to sort all the rooms according to the number of pixel values occupied by the masks, calculate the coincidence rate between the room masks in a traversal manner, merge the masks of the small rooms and the corner point information into the large room for the room with a high coincidence rate (default is 30%), and delete the repetition points.

And the boundary generating unit 4 is used for generating local boundaries of the room and global boundaries of the floors according to the angular points, and optimizing based on the angular points of the boundaries, the parallel distances and other factors.

Specifically, the boundary generating unit 4 further includes:

and a local boundary generating module 401, configured to optimize and generate, based on a slope, a local boundary belonging to each room for the corner points associated and combined with each room.

The prediction result of the neural network usually has a certain deviation due to the accuracy of prediction or the distribution of the affected data; in addition, in order to not leak the corner point prediction result, the threshold value of the confidence probability is set to be lower, but redundant corner points are obtained. Therefore, some optimization of the corner point prediction results of each room is required. All the angular points shared by multiple rooms are locked and unchanged, the global structure is prevented from being influenced, and for the local angular points of a single room, the slope values of two continuous edges and the corresponding included angles (the denominator plus the minimum value is prevented to be 0) are calculated:

angle＝arctan(k)

And a global boundary generating module 402, configured to combine the local boundaries to generate a global boundary, and close and combine the boundaries based on the corner points between the boundary lines and the parallel boundary distances, respectively.

And the house type structure generating unit 5 is used for integrating furniture position information and channel (door) position information to generate a final two-dimensional house type structure.

Specifically, the house-type structure generating unit 5 further includes:

and the channel and gate extraction module 501 is configured to perform fusion processing on the optimized global boundary and channel position information in the air point cloud.

And a final house type structure generating module 502, configured to generate a final two-dimensional house type structure according to the obtained corrected three-dimensional furniture classification point cloud data and the boundary line segment information of the channel and the gate merged through the channel and gate extracting module 501.

Compared with the prior art, the invention has the following advantages:

1. the invention combines a point cloud-based semantic segmentation network to identify twenty common furniture such as tables, chairs, beds, bookshelves, sofas and the like, and embodies the identified furniture in the final house type detection result.

2. The invention combines the information of three-dimensional data acquisition, defines the space between a data collector and data points as air points, analyzes the space occupancy and connectivity of the air points near the wall surface position in the house type structure recognition result, and digs and marks the channel position in the wall surface result line segment.

3. In the prior art, an energy function based on strong prior is used for optimizing corner points and line segments of each room, in order to enable results to be concise as much as possible and lose a large number of detail features, and in addition, the calculation consumption of a heuristic optimization method used in the situation of complexity rise is increased sharply.

4. In the prior art, the process of projecting three-dimensional data to two dimensions is lack of processing, and when original three-dimensional data contains interference items such as furniture, residents and temporary placed objects, the input data can be disordered by direct projection. The invention combines a point cloud-based semantic segmentation network, only projects the identified wall surface part to a two-dimensional plane as input, and sets a threshold value to filter out possible noise and prediction errors. So that the input data can be kept concise and correct when objects are present in the room.

5. The method combines the detection of two-dimensional straight lines to keep the direction of the main wall surface of the input data parallel to the coordinate axes. The method can eliminate the influence caused by angle deviation in the data modeling process, so that the output result is more robust, and the horizontal-vertical prior assumption for correcting the wall surface direction is more reliable.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.

23页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：调整单应性矩阵参数的方法和装置

House type structure analysis method and device

相关技术

网友询问留言