Attitude estimation method, server and network equipment

文档序号：1326796 发布日期：2020-07-14 浏览：8次中文

阅读说明：本技术 一种姿态估计方法、服务器和网络设备 (Attitude estimation method, server and network equipment ) 是由周佳俊于 2019-01-07 设计创作，主要内容包括：本发明提供了一种姿态估计方法、服务器和网络设备,涉及通信技术领域,该所述姿态估计方法包括：获取网络设备通过基站的移动边缘计算MEC服务器发送的至少一个视频信息；根据所述视频信息进行视频解码,获得解码后的视频内容；根据所述视频内容以及所述网络设备的设备信息,计算所述网络设备的姿态向量；向所述网络设备发送所述姿态向量。本发明实施例通过获取网络设备通过基站的MEC服务器发送的至少一个视频信息,根据所述视频信息进行视频解码,获得解码后的视频内容,并根据所述视频内容以及所述网络设备的设备信息,计算所述网络设备的姿态向量,并向所述网络设备发送所述姿态向量,能够保证低时延、高带宽和强处理能力。(The invention provides an attitude estimation method, a server and network equipment, and relates to the technical field of communication, wherein the attitude estimation method comprises the following steps: acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station; performing video decoding according to the video information to obtain decoded video content; calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment; sending the pose vector to the network device. According to the embodiment of the invention, at least one piece of video information sent by a network device through an MEC server of a base station is acquired, video decoding is carried out according to the video information to acquire decoded video content, the attitude vector of the network device is calculated according to the video content and the device information of the network device, and the attitude vector is sent to the network device, so that low time delay, high bandwidth and strong processing capability can be ensured.)

1. An attitude estimation method applied to a business processing server is characterized by comprising the following steps:

acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station;

performing video decoding according to the video information to obtain decoded video content;

calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment;

sending the pose vector to the network device.

2. The pose estimation method of claim 1, wherein said video decoding according to the video information comprises:

when the network equipment comprises single-camera equipment, decoding video information of the single camera;

when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.

3. The pose estimation method of claim 1, wherein when the network device is a device comprising a single camera, the computing the pose vector of the network device from the video content and device information of the network device comprises:

acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content;

acquiring image characteristics in the single-frame image;

estimating an attitude estimation vector of the network equipment according to the image characteristics;

acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information;

wherein the image features comprise an accelerated segmentation test Feature (FAST) and a Scale Invariant Feature (SIFT).

4. The pose estimation method of claim 1, wherein when the network device is a device including at least two cameras, the computing the pose vector of the network device from the video content and device information of the network device comprises:

acquiring video content obtained after decoding the video information of each camera;

acquiring a plurality of video frames at the same moment after the video contents are synchronized;

acquiring three-dimensional coordinates of the same object shot by at least two cameras and two-dimensional coordinates of the object in an image according to the video frame at the same moment; the at least two cameras are cameras with partial video frame overlapping in video content shot at the same moment;

and calculating a gesture vector corresponding to each camera according to the three-dimensional coordinates, the two-dimensional coordinates and physical parameter information in the equipment information.

5. The pose estimation method of claim 4, wherein after calculating the pose vector of the network device based on the video content and the device information of the network device, further comprising:

calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras;

processing the video contents according to the relative attitude vector to obtain processed video contents;

sending the processed video content to a cloud server;

wherein the relative attitude relationship is:

wherein R is a translation vector in the relative attitude vector;

t is a rotation vector in the relative attitude vector;

R_Aa translation vector of one of the cameras;

T_Ais the rotation vector of one camera;

R_Bis the translation vector of another camera;

T_Bis the rotation vector of the other camera.

6. An attitude estimation method applied to network equipment is characterized by comprising the following steps:

acquiring at least one piece of video information shot by the network equipment;

sending the video information to a service processing server through a mobile edge computing MEC server of a base station;

receiving an attitude vector sent by the service processing server; and the attitude vector is obtained by the service processing server through calculation according to the video content decoded by the video information and the equipment information of the network equipment.

7. A server, the server being a traffic processing server comprising a processor and a transceiver, wherein the processor is configured to:

acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station;

performing video decoding according to the video information to obtain decoded video content;

calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment;

sending the pose vector to the network device.

8. The server of claim 7, wherein the processor is specifically configured to:

when the network equipment comprises single-camera equipment, decoding video information of the single camera;

when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.

9. The server of claim 7, wherein when the network device is a device comprising a single camera, the processor is specifically configured to:

acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content;

acquiring image characteristics in the single-frame image;

estimating an attitude estimation vector of the network equipment according to the image characteristics;

acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information;

wherein the image features comprise an accelerated segmentation test Feature (FAST) and a Scale Invariant Feature (SIFT).

10. The server of claim 7, wherein when the network device is a device comprising at least two cameras, the processor is specifically configured to:

acquiring video content obtained after decoding the video information of each camera;

acquiring a plurality of video frames at the same moment after the video contents are synchronized;

11. The server of claim 10, wherein the processor is further configured to:

calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras;

processing the video contents according to the relative attitude vector to obtain processed video contents;

sending the processed video content to a cloud server;

wherein the relative attitude relationship is:

wherein R is a translation vector in the relative attitude vector;

t is a rotation vector in the relative attitude vector;

R_Aa translation vector of one of the cameras;

T_Ais the rotation vector of one camera;

R_Bis the translation vector of another camera;

T_Bis the rotation vector of the other camera.

12. A network device comprising a processor and a transceiver, wherein the transceiver is configured to:

acquiring at least one piece of video information shot by the network equipment;

sending the video information to a service processing server through a mobile edge computing MEC server of a base station;

13. A server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the pose estimation method according to any of the claims 1 to 5 when executing the program.

14. A network device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the pose estimation method of claim 6 when executing the program.

15. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the pose estimation method according to any one of claims 1 to 5 or 6.

Technical Field

The invention relates to the technical field of communication, in particular to a posture estimation method, a server and network equipment.

Background

In the field of AR and robots, accurate and timely Positioning and S L AM (map building) can be achieved by shooting pictures with a single or multiple cameras and calculating the real-time position of the device itself in a reverse direction, which is a general capability that almost all AR devices and robot devices must have.

In current-stage research and application, such camera pose estimation methods based on computer vision are roughly divided into two categories according to different requirements on processing time delay: one is an application that has real-time requirements and a small motion range in a short time, resulting in a small variation difference between two consecutive pictures. Such as AR mobile phones, AR glasses, robots, autonomous vehicles, etc., which are continuously and freely movable in space. The method is extremely sensitive to the time delay requirement of attitude estimation, and the accurate position of the method needs to be acquired in real time; the other type is an application which is completely insensitive to time delay, allows a shot picture or video to be stored and then transmitted to a high-performance machine or a cloud platform for operation to obtain a gesture, such as initial gesture calibration of a VR panoramic camera. The pictures and video contents provided by the applications are often different greatly, so that the calculation amount is large and the requirement on the calculation capacity is high.

The problem of local attitude estimation in the first class of applications is very obvious, and the current devices cannot provide long-time and high-reliability services. Due to the limitation of mobility, the performance of a processor and various other chips of equipment such as mobile phones, AR glasses and robots is relatively low, and the phenomenon that frame levels are asynchronous and the like often occurs when the posture of each frame is calculated in real time, the posture parameters of the equipment at the current position at a certain moment are actually taken, and errors are brought to upper-layer application. In the path planning and automatic driving business of the robot, such errors may cause deviations in physical positions, possibly causing serious influences. In addition, the mobile device is generally powered by a battery, and due to such calculation tasks as burden, the current applications such as mobile phones AR and glasses AR may cause overheating of the body and sudden increase of power consumption in a few minutes, which may not actually support long-term normal use of the applications. In the second type of application, a memory card device needs to be connected to each lens of the VR panoramic camera to store the shot video. And after shooting is finished, the video is uniformly guided into the PC or processing software on the server for attitude estimation. On one hand, the method greatly reduces the attitude estimation efficiency, and in addition, because the method needs to be matched with a PC (personal computer) or a server for use, the front-end processing cost is increased, and the hardware utilization rate is reduced indirectly. Furthermore, the types of cameras that may currently be involved in camera pose estimation include: the video format, code rate and resolution ratio output by each camera are greatly different, and local processing or centralized processing faces the problem of low efficiency caused by more adaptive targets.

Therefore, there is a need for an attitude estimation method, a server and a network device, which can ensure frame-level synchronization during attitude calculation, and have low time delay and increased attitude estimation efficiency.

Disclosure of Invention

The embodiment of the invention provides an attitude estimation method, a server and network equipment, which are used for solving the problems of frame-level asynchronism when attitude calculation of each frame is carried out in real time and low efficiency caused by more adaptive targets in local processing or centralized processing.

In order to solve the above technical problem, an embodiment of the present invention provides an attitude estimation method, which is applied to a service processing server, and includes:

acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station;

performing video decoding according to the video information to obtain decoded video content;

calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment;

sending the pose vector to the network device.

Preferably, the video decoding according to the video information includes:

when the network equipment comprises single-camera equipment, decoding video information of the single camera;

when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.

Preferably, when the network device includes a single-camera device, the calculating a pose vector of the network device according to the video content and device information of the network device includes:

acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content;

acquiring image characteristics in the single-frame image;

estimating an attitude estimation vector of the network equipment according to the image characteristics;

acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information;

wherein the image features comprise an accelerated segmentation test Feature (FAST) and a Scale Invariant Feature (SIFT).

Preferably, when the network device includes at least two camera devices, the calculating a pose vector of the network device according to the video content and device information of the network device includes:

acquiring video content obtained after decoding the video information of each camera;

acquiring a plurality of video frames at the same moment after the video contents are synchronized;

Preferably, after the calculating the pose vector of the network device according to the video content and the device information of the network device, the method further includes:

calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras;

processing the video contents according to the relative attitude vector to obtain processed video contents;

sending the processed video content to a cloud server;

wherein the relative attitude relationship is:

wherein R is a translation vector in the relative attitude vector;

t is a rotation vector in the relative attitude vector;

R_Aa translation vector of one of the cameras;

T_Ais the rotation vector of one camera;

R_Bis the translation vector of another camera;

T_Bis the rotation vector of the other camera.

The embodiment of the invention also provides an attitude estimation method, which is applied to network equipment and comprises the following steps:

acquiring at least one piece of video information shot by the network equipment;

sending the video information to a service processing server through a mobile edge computing MEC server of a base station;

The embodiment of the invention also provides a server, which is a service processing server and comprises a processor and a transceiver, wherein the processor is used for:

acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station;

performing video decoding according to the video information to obtain decoded video content;

calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment;

sending the pose vector to the network device.

Preferably, the processor is specifically configured to:

when the network equipment comprises single-camera equipment, decoding video information of the single camera;

when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.

Preferably, when the network device includes a single-camera device, the processor is specifically configured to:

acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content;

acquiring image characteristics in the single-frame image;

estimating an attitude estimation vector of the network equipment according to the image characteristics;

acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information;

wherein the image features comprise an accelerated segmentation test Feature (FAST) and a Scale Invariant Feature (SIFT).

Preferably, when the network device includes at least two camera devices, the processor is specifically configured to:

acquiring video content obtained after decoding the video information of each camera;

acquiring a plurality of video frames at the same moment after the video contents are synchronized;

Preferably, the processor is further configured to:

calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras;

processing the video contents according to the relative attitude vector to obtain processed video contents;

sending the processed video content to a cloud server;

wherein the relative attitude relationship is:

wherein R is a translation vector in the relative attitude vector;

t is a rotation vector in the relative attitude vector;

R_Aa translation vector of one of the cameras;

T_Ais the rotation vector of one camera;

R_Bis the translation vector of another camera;

T_Bis the rotation vector of the other camera.

An embodiment of the present invention further provides a network device, including a processor and a transceiver, where the transceiver is configured to:

acquiring at least one piece of video information shot by the network equipment;

sending the video information to a service processing server through a mobile edge computing MEC server of a base station;

The embodiment of the present invention further provides a server, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the attitude estimation method described above is implemented.

The embodiment of the present invention further provides a network device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the posture estimation method described above is implemented.

Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the posture estimation method described above.

Compared with the prior art, the attitude estimation method, the server and the network equipment provided by the embodiment of the invention at least have the following beneficial effects:

the method comprises the steps of calculating at least one piece of video information sent by an MEC server through a mobile edge of a base station by obtaining network equipment, carrying out video decoding according to the video information to obtain decoded video content, calculating an attitude vector of the network equipment according to the video content and equipment information of the network equipment, and sending the attitude vector to the network equipment, so that low time delay, high bandwidth and high processing capacity can be guaranteed.

Drawings

FIG. 1 is a flow chart of a method for estimating an attitude according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for estimating pose provided by an embodiment of the present invention;

FIG. 3 is a camera imaging aspect diagram provided by an embodiment of the present invention;

fig. 4 is a schematic diagram of an implementation structure of a server according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an implementation structure of a network device according to an embodiment of the present invention;

fig. 6 is a schematic diagram of another implementation structure of the server according to the embodiment of the present invention;

fig. 7 is a schematic diagram of another implementation structure of a network device according to an embodiment of the present invention;

FIG. 8 is a flowchart illustrating an embodiment of a method for estimating an attitude of an object;

fig. 9 is another specific flowchart of the attitude estimation method according to the embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments. In the following description, specific details such as specific configurations and components are provided only to help the full understanding of the embodiments of the present invention. Thus, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In addition, the terms "system" and "network" are often used interchangeably herein.

In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.

As shown in fig. 1, an embodiment of the present invention provides an attitude estimation method, which is applied to a service processing server, and specifically includes the following steps:

step S11, acquiring at least one video information sent by the network device through the mobile edge computing MEC server of the base station.

As shown in fig. 8, the network device collects the video information, and sends the video information to a 5G base station through an uplink channel on a 5G network, the 5G base station sends the video information to a 5G UPF (5G User Plane Function ) connected to a physical cable of the 5G base station, the 5G UPF sends the video information to an MEC (Mobile edge computing) server connected to the physical cable of the 5G UPF, and the MEC server sends the video information to the service processing server. The air interface delay between the network device and the 5G base station (which may be about 1ms) plus the physical connection delay from the 5G base station to the service processing server (which is generally an optical fiber) may be about 5ms to 10ms or less. And, the MEC server has low time delay, high bandwidth and strong processing capability.

And step S12, performing video decoding according to the video information to obtain decoded video content.

After receiving at least one piece of video information, the service processing server performs video decoding on the video information to obtain video content after compression and recovery, namely decoded video content.

Step S13, calculating a pose vector of the network device according to the video content and the device information of the network device.

Wherein the device information includes IMU (Inertial Measurement Unit) information and physical parameter information. The physical parameter information may include a focal length, distortion parameters, a gyroscope, or an accelerometer value.

Step S14, sending the attitude vector to the network device.

The step S12 specifically includes:

when the network equipment comprises single-camera equipment, decoding video information of the single camera; when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.

After a receiving end management module of the service processing server receives video information and the device information sent by network devices, the network devices are judged to be single-camera devices or multi-camera devices.

In the foregoing embodiment of the present invention, when the network device includes a single-camera device, the step S13 specifically includes:

acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content; acquiring image characteristics in the single-frame image; estimating an attitude estimation vector of the network equipment according to the image characteristics; and acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information.

In the above embodiments of the present invention, the image features include FAST (FAST segmentation Test feature) and SIFT (Scale Invariant feature).

When the FAST and SIFT image features are extracted, the image features of two frames (which may be adjacent frames) may be transmitted to a pose estimation module in the service processing server. The attitude estimation module calculates a translation parameter r of a rough camera position movement matrix by using the image change condition of the same image characteristic of two frames₁～r₉And a rotation parameter t₁～t₃I.e. the pose estimation vector. Inputting the attitude estimation vector and the IMU information in the device information into an EKF (Extended Kalman Filter) in the service processing server, wherein the EKF performs Extended Kalman prediction and updating on the attitude estimation vector, an accelerometer vector and a gyroscope vector to obtain the attitude estimation vector of a single camera, namely the attitude estimation vector of the network device. Wherein the IMU information includes an accelerometer vector Acc (Accx, Accy, Accz) and a gyroscope vector Gyro (Gyrox, Gyroy, Gyroz).

In the above embodiment of the present invention, when the network device includes at least two camera devices, the step S13 specifically includes:

acquiring video content obtained after decoding the video information of each camera; acquiring a plurality of video frames at the same moment after the video contents are synchronized; acquiring three-dimensional coordinates of the same object shot by at least two cameras and two-dimensional coordinates of the object in an image according to the video frame at the same moment; the at least two cameras are cameras with partial video frame overlapping in video content shot at the same moment; and calculating a gesture vector corresponding to each camera according to the three-dimensional coordinates, the two-dimensional coordinates and physical parameter information in the equipment information.

After the decoded video content is obtained, synchronizing the video content to ensure that video frames acquired among cameras at the same moment can be obtained, and reading physical parameter information (including the transverse focal length f of the cameras) of the cameras at the same moment_xLongitudinal focal length f of camera_yParameter u of camera caused image distortion₀And v₀). A stereo calibration algorithm may then be employed to calculate the relative pose vector for each camera.

The following describes the above stereo calibration algorithm by an embodiment:

physical parameter information f of known camera A_Ax、f_Ay、u_A0And v_A0Physical parameter information f of the known camera B_Bx、f_By、u_B0And v_B0And the video contents shot by the two cameras at the same time are partially overlapped. The two cameras are used for shooting the same calibration plate, and the calibration plate can use 7-9 checkerboards and can also use other checkerboards meeting the requirements. As shown in FIG. 3, P is a point or an object on the calibration plate, P_AAnd P_BP may be any point on the calibration board, respectively, which is the point or the imaged point of the object in the focal plane of the camera A, B. Three-dimensional coordinates X, Y, Z of a plurality of corner points P can be obtained from the calibration board, and two-dimensional coordinates x of the image of the point P in the camera A can be obtained from the image of the point P in the camera A and the camera B respectively_AAnd y_AAnd two-dimensional seating of the image of point P in Camera BMark x_BAnd y_B(ii) a The translation parameter R of the camera A relative to the origin in the coordinate system can be calculated according to an imaging formula_A＝r_A1～r_A9And a rotation parameter T_A＝t_A1～t_A3I.e. the pose vector of camera a, and the translation parameter R of camera B in the coordinate system with respect to the origin_B＝r_B1～r_B9And a rotation parameter T_B＝t_B1～t_B3I.e. the pose vector of camera B. According to the attitude vector of the camera, the displacement and rotation of the camera relative to the three-dimensional space of the original position at any moment can be known.

The imaging formula is as follows:

wherein x and y are two-dimensional coordinates of a plane of an object in a photograph or video captured by a camera;

x, Y and Z are the three-dimensional coordinates of the corresponding object in real physical space;

f_xis the lateral focal length of the camera;

f_yis the longitudinal focal length of the camera;

u₀and v₀Parameters that cause image distortion for the camera;

r₁～r₉is a translation vector;

t₁～t₃is a rotation vector.

When the network device includes at least two camera devices, the step S13 is followed by:

calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras; processing the video contents according to the relative attitude vector to obtain processed video contents; and sending the processed video content to a cloud server.

In the above embodiments of the present invention, the relative attitude relationship is:

wherein R is a translation vector in the relative attitude vector;

t is a rotation vector in the relative attitude vector;

R_Athe translation vector of one camera, namely the translation vector of the camera A;

T_Athe rotation vector of one camera is the rotation vector of the camera A;

R_Bis the translation vector of the other camera, i.e. the translation vector of camera B;

T_Bis the rotation vector of the other camera, i.e., the rotation vector of camera B.

As shown in fig. 9, if there are multiple cameras, the relative attitude vectors between the cameras with partial video frame overlaps in the video content shot at the same time can be sequentially calculated according to the overlapping relationship between the cameras; processing a plurality of video contents (such as panorama stitching, 3D image generation, multi-view image generation and the like) according to the relative attitude vector to obtain the processed video contents; the processed video Content is sent to 5GUPF through the MEC server, and then forwarded to the cloud server by the 5G UPF, where the cloud server may include a cloud service, a CDN (Content Delivery Network), or a user player.

As shown in fig. 2, an embodiment of the present invention further provides an attitude estimation method, which is applied to a network device, and specifically includes the following steps:

step S21, acquiring at least one piece of video information shot by the network equipment;

step S22, sending the video information to a service processing server through a mobile edge computing MEC server of a base station;

step S23, receiving the attitude vector sent by the service processing server; and the attitude vector is obtained by the service processing server through calculation according to the video content decoded by the video information and the equipment information of the network equipment.

Wherein the network device may include: cell-phone, camera, robot, car etc. and all have camera hardware and video acquisition function. After receiving the attitude vector sent by the service processing server, the attitude vector can be provided for the network equipment in real time, and the method is suitable for scenes such as mobile phones AR, AR glasses, robot path planning, automatic driving and the like.

As shown in fig. 4, an embodiment of the present invention further provides a server, where the server is a service processing server, and includes a processor 401 and a transceiver 402, where the processor 401 is configured to:

acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station;

performing video decoding according to the video information to obtain decoded video content;

calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment;

sending the pose vector to the network device.

In an embodiment of the present invention, the processor 401 is specifically configured to:

when the network equipment comprises single-camera equipment, decoding video information of the single camera;

when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.

In an embodiment of the present invention, when the network device includes a single-camera device, the processor 401 is specifically configured to:

acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content;

acquiring image characteristics in the single-frame image;

estimating an attitude estimation vector of the network equipment according to the image characteristics;

acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information;

wherein the image features comprise an accelerated segmentation test Feature (FAST) and a Scale Invariant Feature (SIFT).

In an embodiment of the present invention, when the network device includes at least two camera devices, the processor 401 is specifically configured to:

acquiring video content obtained after decoding the video information of each camera;

acquiring a plurality of video frames at the same moment after the video contents are synchronized;

In an embodiment of the present invention, the processor 401 is further configured to:

calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras;

processing the video contents according to the relative attitude vector to obtain processed video contents;

sending the processed video content to a cloud server;

wherein the relative attitude relationship is:

wherein R is a translation vector in the relative attitude vector;

t is a rotation vector in the relative attitude vector;

R_Aa translation vector of one of the cameras;

T_Ais the rotation vector of one camera;

R_Bis the translation vector of another camera;

T_Bis the rotation vector of the other camera.

As shown in fig. 5, an embodiment of the present invention further provides a network device, which includes a processor 501 and a transceiver 502, where the transceiver 502 is configured to:

acquiring at least one piece of video information shot by the network equipment;

sending the video information to a service processing server through a mobile edge computing MEC server of a base station;

As shown in fig. 6, an embodiment of the present invention further provides another server, which includes a transceiver 601, a memory 602, a processor 600, and a computer program stored on the memory 602 and executable on the processor 600; the processor 600 calls and executes programs and data stored in the memory 602.

The transceiver 601 receives and transmits data under the control of the processor 600, and particularly, the processor 600 for reading the program in the memory 602 may perform the following processes:

the processor 600 is configured to:

acquiring at least one piece of video information sent by a network device through a Mobile Edge Computing (MEC) server of a base station;

performing video decoding according to the video information to obtain decoded video content;

calculating the attitude vector of the network equipment according to the video content and the equipment information of the network equipment;

sending the pose vector to the network device.

In an embodiment of the present invention, the processor 600 is specifically configured to:

when the network equipment comprises single-camera equipment, decoding video information of the single camera;

when the network equipment comprises at least two camera equipment, the video information of each camera is decoded respectively.

In an embodiment of the present invention, when the network device includes a single-camera device, the processor 600 is specifically configured to:

acquiring video content obtained after the video information is decoded and a plurality of single-frame images of the video content;

acquiring image characteristics in the single-frame image;

estimating an attitude estimation vector of the network equipment according to the image characteristics;

acquiring the attitude vector of the network equipment according to the attitude estimation vector and Inertial Measurement Unit (IMU) information in the equipment information;

wherein the image features comprise an accelerated segmentation test Feature (FAST) and a Scale Invariant Feature (SIFT).

In an embodiment of the present invention, when the network device includes at least two camera devices, the processor 600 is specifically configured to:

acquiring video content obtained after decoding the video information of each camera;

acquiring a plurality of video frames at the same moment after the video contents are synchronized;

In an embodiment of the present invention, the processor 600 is further configured to:

calculating a relative attitude vector between at least two cameras according to the relative attitude relationship between the at least two cameras;

processing the video contents according to the relative attitude vector to obtain processed video contents;

sending the processed video content to a cloud server;

wherein the relative attitude relationship is:

wherein R is a translation vector in the relative attitude vector;

t is a rotation vector in the relative attitude vector;

R_Aa translation vector of one of the cameras;

T_Ais the rotation vector of one camera;

R_Bis the translation vector of another camera;

T_Bis the rotation vector of the other camera.

Where in fig. 6, the bus architecture may include any number of interconnected buses and bridges, with various circuits being linked together, particularly one or more processors represented by processor 600 and memory represented by memory 602. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 601 may be a number of elements, including a transmitter and a receiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 600 is responsible for managing the bus architecture and general processing, and the memory 602 may store data used by the processor 600 in performing operations.

Those skilled in the art will understand that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program includes instructions for executing part or all of the steps of the above methods; and the program may be stored in a readable storage medium, which may be any form of storage medium.

As shown in fig. 7, an embodiment of the present invention further provides another network device, including: a processor 701; and a memory 703 connected to the processor 701 through a bus interface 702, where the memory 703 is used to store programs and data used by the processor 701 in executing operations, and the processor 701 calls and executes the programs and data stored in the memory 703.

The transceiver 704 is connected to the bus interface 702, and is configured to receive and transmit data under the control of the processor 701, and specifically, the processor 701 is configured to read a program in the memory 703, and may perform the following processes:

the transceiver 704 is configured to:

acquiring at least one piece of video information shot by the network equipment;

sending the video information to a service processing server through a mobile edge computing MEC server of a base station;

It should be noted that in fig. 7, the bus architecture may include any number of interconnected buses and bridges, with one or more processors represented by processor 701 and various circuits of memory represented by memory 703 being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 704 may be a number of elements including a transmitter and a receiver that provide a means for communicating with various other apparatus over a transmission medium. The user interface 705 may also be an interface capable of interfacing with a desired device for different terminals, including but not limited to a keypad, display, speaker, microphone, joystick, etc. The processor 701 is responsible for managing the bus architecture and general processing, and the memory 703 may store data used by the processor 701 in performing operations.

The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements each process in the above-described attitude estimation method embodiment, and can achieve the same technical effect, and is not described herein again to avoid repetition. The computer-readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory), a magnetic disk or an optical disk.

In the embodiment of the invention, the camera attitude estimation process originally executed at the network equipment end is introduced into the 5G network, and the characteristics of low time delay, high bandwidth and high processing capacity of the MEC server are added, so that the camera attitude estimation and video processing are realized on the service processing server, and in the equipment with a plurality of cameras, a plurality of originally separated video information can be gathered to the service processing server for processing in real time, thereby avoiding the appearance and use of equipment such as a plurality of memory cards, a storage hard disk and the like. And because the camera (array) attitude estimation with the highest calculated amount is carried out by the service processing server, the network equipment only needs to have video shooting and streaming transmission capabilities, and does not need to be matched with a corresponding terminal side processing server and a PC (personal computer), so that the hardware cost of the network equipment is saved, meanwhile, the long-time and high-percentage occupation of a processor is avoided, the use efficiency of the network equipment is improved, and the risk of high power consumption of the network equipment is reduced. Moreover, real-time VR panoramic video stream generation, 3D video stream generation, AR bottom layer positioning capacity, robot path planning capacity and automatic driving capacity can be realized, time delay between a network and network equipment is extremely low, the positions of the robot and the CDN are closer, and the video content processing efficiency and the smoothness can be improved.

Furthermore, it is to be noted that in the device and method of the invention, it is obvious that the individual components or steps can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of performing the series of processes described above may naturally be performed chronologically in the order described, but need not necessarily be performed chronologically, and some steps may be performed in parallel or independently of each other. It will be understood by those skilled in the art that all or any of the steps or elements of the method and apparatus of the present invention may be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or any combination thereof, which can be implemented by those skilled in the art using their basic programming skills after reading the description of the present invention.

Thus, the objects of the invention may also be achieved by running a program or a set of programs on any computing device. The computing device may be a general purpose device as is well known. The object of the invention is thus also achieved solely by providing a program product comprising program code for implementing the method or the apparatus. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future. It is further noted that in the method of the invention, it is obvious that the steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

20页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：基于人体识别的IPTV节目推荐方法及系统

Attitude estimation method, server and network equipment

相关技术

网友询问留言