Apparatus and method for determining motion of an ultrasound probe

文档序号：1342513 发布日期：2020-07-17 浏览：34次中文

阅读说明：本技术 用于确定超声探头的运动的设备和方法 (Apparatus and method for determining motion of an ultrasound probe ) 是由朱利安·施普龙罗伯特·鲍尔拉斐尔·普雷沃斯特沃尔夫冈·魏因于 2018-09-05 设计创作，主要内容包括：一种确定可移动超声探头(10)的三维运动的方法。所述方法在通过超声探头采集体积部分(2)的超声图像期间执行。该方法包括：当所述超声探头沿体积部分(2)移动时,从所述超声探头(10)接收超声图像数据(20)流；将表示多个超声图像帧(22)的超声图像数据(20、40)的至少子集输入到机器学习模块(50)中,其中所述机器学习模块(50)已经被训练以确定所述超声图像帧(22)之间的相对三维运动；并且通过所述机器学习模块(50)确定指示所述超声图像帧之间的相对三维运动的三维运动指标(60)。(A method of determining three-dimensional motion of a movable ultrasound probe (10). The method is performed during acquisition of an ultrasound image of the volume portion (2) by the ultrasound probe. The method comprises the following steps: receiving a stream of ultrasound image data (20) from the ultrasound probe (10) as the ultrasound probe is moved along a volume portion (2); inputting at least a subset of ultrasound image data (20, 40) representing a plurality of ultrasound image frames (22) into a machine learning module (50), wherein the machine learning module (50) has been trained to determine relative three-dimensional motion between the ultrasound image frames (22); and determining, by the machine learning module (50), a three-dimensional motion indicator (60) indicative of relative three-dimensional motion between the ultrasound image frames.)

1. A method of determining a three-dimensional motion of a movable ultrasound probe (10) during acquisition of an ultrasound image of a volume portion (2) by the ultrasound probe, the method comprising:

-receiving a stream of ultrasound image data (20) from the ultrasound probe (10) as the ultrasound probe is moved along a volume portion (2);

-inputting at least a subset of ultrasound image data (20, 40) representing a plurality of ultrasound image frames (22) into a machine learning module (50), wherein

The machine learning module (50) has been trained to determine relative three-dimensional motion between the ultrasound image frames (22); and is

-determining, by the machine learning module (50), a three-dimensional motion indicator (60) indicative of relative three-dimensional motion between the ultrasound image frames.

2. The method of claim 1, further comprising pre-processing ultrasound image data, the pre-processing including at least one of image filtering, image resampling, and image segmentation.

3. The method according to any of the preceding claims, wherein the machine learning module (50) comprises a neural network, preferably a convolutional neural network.

4. The method of any one of the preceding claims,

the step of inputting at least a subset of ultrasound image data (20, 40) comprises inputting local image data corresponding to a pair of ultrasound image frames (22) to the machine learning module (50), and wherein

The three-dimensional motion indicator (60) is indicative of relative three-dimensional motion between the pair of ultrasound image frames (22), and wherein

The inputting and determining steps are repeated for successive pairs or subsets of image frames.

5. The method of any one of the preceding claims,

the step of inputting at least a subset of ultrasound image data (20, 40) comprises inputting a global image data set to the machine learning module (50) spanning substantially the entire set of ultrasound image frames (22), and wherein

The three-dimensional motion indicator (60) indicates relative three-dimensional motion to determine relative three-dimensional motion of each of the ultrasound image frames (22) relative to a first one of the ultrasound image frames.

6. The method of any of the preceding claims, wherein the ultrasound image data (20, 40) includes at least one of a-mode data, B-mode data, continuous harmonic imaging data, doppler data, flat wave imaging data, and raw radio frequency data.

7. The method of any of the preceding claims, further comprising inputting further sensor data into a machine learning module (50), wherein the further sensor data is synchronized with the ultrasound image data (20, 40).

8. The method of the preceding claim, wherein the further sensor data comprises at least one of position data, e.g. obtained by a tracking system, acceleration data representing accelerations corresponding to the at least two ultrasound image frames, gyroscope data, magnetic measurement data and barometer data.

9. The method of any preceding claim, further comprising: detecting a discrepancy between the determined three-dimensional motion indicator (60) and the sensor data.

10. The method of any preceding claim, further comprising: determining a probe position and orientation of the ultrasound probe (10) for each image frame (22) from the three-dimensional motion index (60).

11. The method according to the preceding claim, further comprising tracking the position of the movable ultrasound probe (10) by a further tracking system to generate tracking position information, detecting if the tracking system fails, and if the tracking system fails, replacing tracking position information with probe position and orientation determined from the three-dimensional motion index (60).

12. The method of any preceding claim, further comprising: reconstructing a three-dimensional ultrasound image using the stream of ultrasound image data and the probe position and orientation determined from the three-dimensional motion indicator (60).

13. The method according to any one of the preceding claims, wherein the method comprises: motion of the ultrasound probe is predicted directly from the stream of ultrasound images using the three-dimensional motion indicator (60) without using an additional tracking system.

14. An apparatus for determining a three-dimensional motion of a movable ultrasound probe (10) during acquisition of an ultrasound image of a volume portion by the ultrasound probe, the apparatus comprising:

-a probe input interface for receiving a stream of ultrasound image data (20) from an ultrasound probe (10) as the ultrasound probe is moved along a volume portion; and

-a machine learning module (50) having:

(a) an input section adapted to receive as input at least a subset of ultrasound image data (20, 40) representing a plurality of ultrasound image frames (22),

(b) a training memory portion containing a training memory that has been trained to determine relative three-dimensional motion between ultrasound image frames, wherein

A machine learning module (50) is adapted to determine a three-dimensional motion indicator indicative of relative three-dimensional motion between the ultrasound image frames from an input and using a training memory.

Background

Ultrasound imaging (ultrasound) is one of the main medical modalities for diagnostic and interventional applications due to its unique characteristics-affordability, usability, safety and real-time functionality. However, it has not been possible for a long time to acquire 3D images in a simple and reliable manner, and this limitation has narrowed the clinical application range of ultrasound. The solution is to acquire a series of 2D images by scanning the target region and later merging it into a single volume.

One such embodiment is described, for example, in WO 2015/191871 a 1. This embodiment requires a positioning system that provides probe position information. External sensor based solutions (usually using optical or electromagnetic tracking) are well suited to estimate the motion of the ultrasound probe and have therefore been widely used. However, these solutions come at a cost of practicality and price.

Therefore, studies have been made to estimate the motion of the ultrasound probe, i.e. the relative position and orientation of the ultrasound probe from one image to the next, without the need for additional hardware, by estimating the relative position of the two images using pure image processing algorithms. It has been found that algorithms such as "optical flow" allow for a fairly reliable estimation of in-plane motion. However, estimating out-of-plane motion (elevation displacement) remains a challenge.

One method for estimating out-of-plane motion, such as described in US6012458, has utilized the speckle noise pattern visible in the ultrasound image, and is therefore referred to as "speckle decorrelation". "speckle decorrelation" is based on the assumption that: the elevation distance may be estimated by selecting and isolating speckles from the ultrasound image, and by comparing the speckles of successive images: the higher the correlation between speckles, the lower the elevation distance. However, one challenge remains the definition of speckle and its correspondence in the image. For these reasons, existing "speckle decorrelation" methods only work well in quite special cases and may not be successful in all realistic cases.

Disclosure of Invention

The present invention aims to overcome at least some of the above problems. This object is solved by a method according to claim 1 and by an apparatus according to claim 14. Further advantages, features, aspects and details of the invention are apparent from the dependent claims, the description and the drawings.

Accordingly, a method according to one aspect of the present invention is directed to bypassing previous methods such as speckle decorrelation models based on preselected portions or features of an ultrasound image. Instead, according to this aspect, the method provides an end-to-end solution based on a full machine learning approach, using image data representing the entire ultrasound image frame as input, without the need to select any image portions or features.

Furthermore, aspects of the present invention do not require any assumptions about image content, such as the presence of speckle. Therefore, the method has a wide application range.

Drawings

The invention will be better understood by reference to the following description of embodiments of the invention taken in conjunction with the accompanying drawings, in which:

figure 1a schematically shows an ultrasound probe for use in a method according to an embodiment of the invention;

FIG. 1b schematically illustrates a composite three-dimensional ultrasound image obtained by the probe of FIG. 1 a;

FIG. 2 schematically shows details of a method for acquiring the three-dimensional image shown in FIG. 1 a;

FIG. 3a schematically illustrates image data representing a plurality of ultrasound image frames used as input in the method illustrated in FIG. 2;

FIG. 3b schematically illustrates a composite three-dimensional ultrasound image obtained by the method illustrated in FIG. 2;

figure 4 schematically shows an apparatus for determining three-dimensional motion of an ultrasound probe according to an embodiment of the invention;

FIGS. 5 and 6 schematically illustrate neural network architectures for a machine learning module, in accordance with various embodiments of the present invention;

FIG. 7 illustrates prediction of elevation translation according to a comparative example and according to an embodiment of the present invention, respectively; and

figures 8a-8c show a 3D visualization of a tracked ultrasound scan according to a comparative example and according to an embodiment of the present invention, respectively.

Detailed Description

Fig. 1a shows an ultrasound probe 10 moving along a volume portion 2. Here, the volume portion 2 is a body portion of the patient. The movement of the probe is indicated by arrow 12, which represents the movement from a starting position (probe 10 shown on the left side of fig. 1 a) to a final movement position (probe 10 shown on the right side of fig. 1 a). During the movement, the probe 10 collects ultrasound image data representing successive ultrasound image frames. Each ultrasound image frame provides an ultrasound image (i.e., graphically representable information of ultrasound reflection properties) in a particular imaging region or image plane 22, i.e., in a two-dimensional or three-dimensional subspace of the volumetric portion 2. The imaging region 22 has a predetermined shape and position with respect to the ultrasound probe 10, and the imaging region moves together with the ultrasound probe 10. By moving the ultrasound probe 10, the image region 22 is moved over the volume portion 2 such that the ultrasound image frames provide ultrasound images of various portions of the volume portion 2.

Here, an ultrasound image frame is defined as a two-dimensional or three-dimensional ultrasound image taken at a given time using an ultrasound probe. The image frames represent the entire image having a predetermined size acquired by the ultrasound probe. Subsequent image frames typically have the same resolution. In contrast, a dynamically selected subset of ultrasound image frames selected according to image content and possibly having a variable size is not an image frame. Typically, a time stamp is associated with an ultrasound image frame. The probe 10 collects ultrasound image data as a data stream representing successive ultrasound image frames.

Fig. 1b shows the output, composite three-dimensional ultrasound image of the proposed invention. The composite three-dimensional ultrasound image is a three-dimensional image representing ultrasound reflection characteristics in the scanned volumetric portion, which are derived from the acquired ultrasound image frames and the motion (position and orientation) of the ultrasound probe 10 determined for each acquired ultrasound image frame 22. If further processed using a compounding algorithm, such as 3D reconstruction as will be described below, the compounded three-dimensional ultrasound image may be displayed, for example, as an image frame set or as a complete 3D image positioned in space.

Fig. 2 depicts in more detail the challenging technical problem that the present invention aims to solve. During acquisition, the ultrasound probe (10) moves and thus the image content of the image frames 22 changes. It is an object of the invention to recover the motion of the probe 12 between the two times t1 and t2 using only information from the image data I1 and I2 acquired at such times. The estimated motion may be represented as a matrix M12 that models the relative transformation between the coordinate system of one frame C1 and the coordinate system of another frame C2. This process can then be repeated for the entire series of images.

Typically, the motion has six degrees of freedom (three translations and three rotations), and the matrix M12 can be parameterized by 6 parameters.

Fig. 3a represents the input to the machine learning model 50, i.e., ultrasound data 20, which includes an ultrasound image frame data time series representing ultrasound image frames 22 and corresponding time information (e.g., time stamps or time indices). In addition, the ultrasound data 20 may also include metadata, e.g., indicating ultrasound settings and/or presets, such as gain, frequency, and/or dynamic range of the ultrasound image frames 22. The metadata may be provided partially or wholly as a time series. Additionally, the input to the machine learning model 50 may optionally include sensor data 24, e.g., a time series of sensor data and corresponding time information as described in more detail with respect to fig. 4.

Fig. 3b corresponds to fig. 1b, and the description of fig. 1b also applies to fig. 3 b.

Fig. 4 shows the general workflow of the proposed invention. Wherein optional steps are indicated by dashed lines. The primary input to the system is image data 20 generated by the ultrasound system 11 from the probe 10. These images may be pre-processed using various algorithms 30 such as image resampling, image filtering, or other advanced analysis. The pre-processed data 40 from the plurality of frames may then be input into a machine learning module 50 which is trained on the previously learned data 52 to produce an estimate 60 of probe motion between different input image frames. This process is repeated for all frames acquired, and the output of the machine learning model is then post-processed 70 to generate the final trajectory 80 of the probe.

As is known in the art, training from the previously learned data 52 is performed prior to its utilization and includes adjusting the values of the model parameters so that their output values are as close as possible to the desired values.

Optionally, when the external sensor 14 is mounted on an ultrasound probe, its data 24 may also be pre-processed 34 and used as further input 44 to the machine learning module 50. For this purpose, the data 24 is synchronized 20 with the image data, for example by using time stamps.

Fig. 5 shows an example of a machine learning module 50 used in an embodiment of the invention. The machine learning module 50 includes a convolutional neural network. The two-channel image (representing two consecutive ultrasound frames) is the input to the neural network and passes through a series of convolutional layers (with 5x5 or 3x3 pixel kernels and 64 output channels), an active layer (here a rectifying linear unit), and a 2x2 pixel max pooling layer. At the end of the network, two fully connected layers aggregate information from the entire elemental map into a final output with six numbers representing 3 translation and 3 rotation parameters. These six numbers parameterize the matrix M12.

Given a set of training data (each training data sample may comprise (i) a pair of consecutive ultrasound frames, and (ii) a very accurate, parameterized as an estimate of six numbers, for example, of probe motion between these two frames obtained from a tracking system), the goal of the training process may be to minimize the sum of all training data samples of the difference vector squared norm between the 6-dimensional output of the network and the 6 parameters of the actual measured probe motion, this minimization problem may be solved by a random gradient descent or one of its variants with a momentum of 90%, a batch size of 500 and no weight decay, for example, AdaGrad [ John Duchi, Elad Hazan, and Yoram Siger et al, "Adaptive gradient for line and storage optimization", a mean value of L R, 8912, p.12, p.2011-2151, and a random deviation of the initial values of the network parameters of 2150.2150, may be chosen according to the standard deviation of the random deviation of the network parameters.

Alternatively, the estimate of in-plane translation may be pre-computed as optical flow between the two images using known techniques (see the article by Gunnar Farneback, cited further below). The pre-computed output of the optical flow is a 2D vector field, which can be encoded as 2 additional optical flow channels. These 2 further optical flow channels are used as further input channels of the neural network (in addition to the 2 image channels described above).

Similar to fig. 5, fig. 6 shows an example of a neural network architecture that takes into account not only image data, but also some external IMU sensor information. Both architectures are basically similar, but the 9-dimensional measurements of the sensors are connected to the aggregated feature vector at the end of the network before generating the final output.

Next, test results of an exemplary embodiment according to an aspect of the present invention are discussed compared to prior art embodiments. To obtain these test results, the setup described below was used.

Data set acquisition and baseline method: all scans used in the exemplary embodiment were acquired by a Cicada-64 research-type ultrasound machine from Cephasonics (Santa Clara, Calif.). In which a linear 128 element probe is used. The probe was tuned at 9MHz to generate an ultrasound image. The depth of all images was set to 5cm (focus at 2cm) and 256 scan lines were acquired for each image.

When using B-mode images, no filtering or inverse scan conversion is performed and the samples are resampled at an isotropic resolution of 0.3 mm. The probe is equipped with an optical target that can be accurately tracked by the tracking system Stryker navigation system III.

Using this tracking system, and after spatial and temporal image-to-sensor calibration, the inventors were able to obtain a true data (ground truth) transform with an absolute positioning accuracy of about 0.2 mm. It is also ensured that the time alignment is completely jitter free or drift free due to the digital interface and proper clock synchronization of the us system. Therefore, the real data of each frame has sufficient accuracy.

The experiment was based on three data sets:

20 US scan sets were acquired on a blue phantom ultrasound biopsy phantom (total 7168 frames). The image contains primarily speckles, but also various hyperechoic or hypoechoic blobs;

88 in vivo follow-up scans were performed on the forearms of 12 volunteers (total 41869 frames). Two different operators collected at least three scans on both forearms of each participant;

another 12 intra-individual follow-up scans were obtained on the calves of a part of volunteers (6647 frames total). The last group is used to evaluate how the network generalizes to other anatomical structures.

All scans are acquired in a fixed direction (from proximal to distal). Applying the algorithm to the reverse scan will produce a mirrored result. However, the method according to the invention is not limited to any particular scanning direction.

The algorithm according to the invention was compared with two comparison methods:

linear motion, which is the intended motion of the operator in the scanning direction. This means that in all acquisitions all parameters are set to their mean values: rotation and in-plane translation are almost zero, while elevation translation t_zThe constant value is about 2 cm/s;

speckle decorrelation methods, according to the state of the art, in such comparison methods, as described by Afsham, N., Rasoulian, A., Najafi, M., Abolmaesumi, P., Rohling, R., non local means filter-based on speckle tracking, IEEE transactions on ultrasound, ferroelectronics, and frequency control 62(8 (2015) 1501) 1515, every second image is filtered to make the speckle pattern more eye-free, then each image is divided into 15x15 color blocks and the corresponding cross-correlation is calculated, then a model based on the standard index is calculated to derive the corresponding z-displacement from the correlation values, finally a robust fit of 6 transformation parameters to the displacement field is calculated using RANSAC, these method steps are described in front, R.W., GeW., Geh, Tree, Trend 3514, 3. sub-green, 3. sub, 3. green.

These comparison methods were compared with two embodiments of the inventive examples: a first implementation, referred to as "standard CNN", uses the convolutional neural network approach described above with reference to fig. 5, with two input fields (two images between which relative motion is to be determined). The second embodiment is referred to as "CNN with optical flow", and is different from the "standard CNN" in that it further uses a pre-calculated optical flow, and thus, as described above with reference to fig. 5, a total of four input fields are used.

For each of these methods and data sets, a three-dimensional motion index (three translations t) is calculated_x、t_y、t_zAnd three rotation angles theta_x、θ_y、θ_z). Further, an error metric is calculated by comparing these parameters to data from the tracking system described above. The average parameter error is calculated and averaged for each frame relative to the first frame of the scan. Furthermore, a final drift is calculated, which is defined as the distance between the center of the last image with the estimated trajectory and the real data.

The results are summarized in tables 1-3 below:

when comparing the above methods, it can be seen that the linear motion method gives the worst result of the four methods, mainly due to the out-of-plane translation t_zThereby, the effect is achieved. This component is expected to be most variable because it is difficult to maintain a constant velocity. By exploiting the correlation between frames, the speckle decorrelation method significantly reduces all estimation errors; however, at t_zThe out-of-plane error and hence the overall drift on (c) is still high.

On the other hand, the standard CNN method (no light channel) can produce results that are already better than the comparative one. However, one may notice that t_xAnd t_yThe error is somewhat high, especially in the forearm scan. Can be reduced by additional training dataThis error is reduced, thereby allowing the system to learn the entire transformation more accurately through a larger data set. This problem is also greatly reduced by adding optical flow as an input channel (CNN using the optical flow method). In practice, for CNN using optical flow methods, e.g. t_xAnd t_yThe estimation is more accurate; t is t_zThe estimate of (a) improves even further.

Thus, it was observed on real clinical images that the final drift was only 1.45cm over sequences longer than 20cm, twice as much as the comparative example. The rank of the method (from low to high precision: linearity; speckle decorrelation; standard CNN; CNN using optical flow) has been confirmed by paired Wilcoxon tests with signed ranks, all yielding less than 10^-6P-value of (2).

Next, the influence of noise filtering is discussed. To test the importance of speckle noise, we compared the methods applied to the images before and after applying the speckle filter built into the cephalosonics ultrasound system. As we see in the last line of table 2 above, learning and testing the unfiltered image can yield better tracking estimates. This indicates that the speckle pattern is important for neural networks, especially for the estimation of out-of-plane translation. On the other hand, the CNN method on the filtered image has provided better results than the comparative method. Thus, it can be concluded that speckle is indeed very useful, but is not absolutely necessary for estimating out-of-plane motion.

Generalizing to other anatomical structures: another interesting problem is how well the machine learning method can be generalized to other applications: is it really learning about the motion from general statistical information, or is it overly appropriate for certain anatomical structures present in the image?

The results are reported in table 3 above. Here, the training data is based on the forearm dataset, but the reported result is the calf dataset. These results show that the accuracy of all methods is significantly reduced compared to Table 2. For the comparison method, this is due to incorrect calibrations (since they have already been calibrated in the forearm dataset). For the method according to the invention, the reduction is evenTo be more severe (since the dip has already been learned on the forearm dataset). In more detail, the in-plane displacement is still recovered with reasonable accuracy, but the out-of-plane translation t_zThe error of (2) has increased greatly.

However, the method according to the invention still generalizes to new types of images better than other methods. This preliminary experiment shows that accuracy depends largely on the target anatomy, but makes a hope for the functioning of machine learning methods.

For comparison, in the last row of table 3, we also report the accuracy obtained with CNN trained on this particular dataset, which is only slightly worse than on the forearm (due to the smaller dataset).

Next, FIG. 7 is discussed. The same methods discussed above for tables 1-3 have been used herein. To test out-of-plane estimates in challenging environments, the predicted results of these methods are shown for a single scan with intentionally strongly varying velocities: the average velocity of the first 100 and second 150 frames was recorded as 0.3 mm/frame, and the velocity between the two was almost doubled. FIG. 7 shows different predictions of elevation translation.

It is expected that the linear motion method assumes a constant velocity and therefore produces a large amount of reconstruction artifacts. The speckle decorrelation method does detect velocity changes, but severely underestimates large motion. Only the method according to an embodiment of the invention can accurately follow the probe speed.

Figures 8a-8c show qualitative comparisons of reconstructed trajectories over a sample scan. In particular, figures 8a-8c show a 3D visualization of tracked ultrasound scans, respectively. The true data positions of the ultrasound frames have been displayed and their trajectories are emphasized by black outlines. In contrast, the contours of the trajectories obtained with other methods are also displayed with other colors: red represents the linear motion method, blue represents an implementation of our speckle decorrelation method, and green represents our deep learning-based method.

Fig. 8a represents the median case in terms of performance (especially final drift) of our method, fig. 8b corresponds to the best case of the test forearm data set, and fig. 8c corresponds to the worst case. They highlight the level of different approaches in terms of tracking estimation accuracy.

Further examples of test results according to exemplary embodiments of aspects of the present invention may be found in the following publications: "3D fresh ultrasound with external tracking using deep learning", as described in media Imaga Analysis (August 2018), Volume 48, Pages 187 & 202, retrievable at http:// doi. org/10.1016/j. media.2018.06.003, the entire contents of which are incorporated herein by reference.

Description of other aspects:

various more general aspects of the invention are defined in more detail below. Each aspect so defined may be combined with any other embodiment or any other aspect unless clearly indicated to the contrary. The reference numerals referred to in the drawings are for illustration purposes only and are not intended to limit the various aspects to the embodiments shown in the drawings.

According to one aspect, three-dimensional motion of the ultrasound probe 10 is determined. According to one aspect, the three-dimensional motion has six degrees of freedom and includes displacement (three degrees of freedom) and rotation (three degrees of freedom). The displacement comprises in-plane displacement and elevation displacement; rotation includes in-plane rotation and out-of-plane rotation. Here, the terms in-plane and out-of-plane refer to the image plane defined by the image frames 22 acquired by the ultrasound probe 10. The three-dimensional motion indicator may be any parameterization of these degrees of freedom, or at least any parameterization of a subset of these degrees of freedom. According to one aspect, the ultrasound probe is a hand-held probe and has a full six degrees of freedom. According to another aspect, the ultrasound probe is constrained by limiting the degrees of freedom to less than six.

The method comprises the following steps: receiving a stream of ultrasound image data from the ultrasound probe 10; and inputting at least a subset of ultrasound image data representing a plurality of ultrasound image frames into a machine learning module. The (subset of the) ultrasound image data may be preprocessed, filtered or altered in any other way. The term "at least a subset" requires that information contained in ultrasound image data from the ultrasound probe is at least partially input into the machine learning module.

According to one aspect, even the entire image data or a subset thereof is used as an input subset. In the case of the subset, the subset is acquired regardless of the image content of the ultrasound image frames, and thus no image analysis is required.

Next, aspects related to the preprocessing of ultrasound image data are described. According to one aspect, the method includes preprocessing the ultrasound image data prior to inputting at least a subset of the ultrasound image data to the machine learning module. For example, the pre-processing may include pre-computing motion indicating data. An example of motion indicating data is in-plane displacement data representing in-plane displacement between at least two ultrasound images. The method may then comprise inputting the motion indicative data (e.g. in-plane displacement data) as a further input to the machine learning module. For example, the motion indicative data may be a two-dimensional dataset, such as a vector field, and may be input to the machine learning module as a further image channel.

An advantage of this aspect is that by inputting data to the machine learning module that explicitly represents some easily computed aspect of the motion, the machine learning module may be enabled to provide information about the remaining aspects more reliably and/or with less training data.

According to one aspect, the pre-calculation is performed by an "optical flow" method such as that described in Gunnar Farnenback, Two-frame motion estimation based on polymonomialexpansion, L acquisition Notes in Computer Science,2003, (2749), 363-.

According to another aspect, ultrasound image data may be pre-processed using at least one of:

-resampling: ultrasound image data may be resampled to a given size or to a given resolution per pixel. This is done to make the system robust to certain settings of the ultrasound system, such as depth or number of scan lines used.

-image filtering: including any local filter (e.g., low-pass or high-pass filter), adaptive filter (e.g., speckle de-noising, enhancement, or masking), or global image transformation (e.g., histogram equalization).

-segmentation: another pre-processing would include segmenting the image, i.e. classifying all pixels into one of a plurality of classes, and using such a probability map as a further input. For example, in medical applications, one example is to segment skin, fat, muscle and bone pixels.

Any pre-computed function: for example, as previously described, additional channels with optical flow vector fields as model inputs

According to another aspect, if additional sensor data is input, the sensor data may be pre-processed using at least one of the above.

According to an alternative aspect, no pre-processing of the ultrasound image data is performed prior to inputting at least the subset of the ultrasound image data to the machine learning module.

Next, aspects related to the machine learning module are described. According to one aspect, the machine learning module includes a neural network. In particular, the machine learning module may include a convolutional neural network.

According to another aspect, a convolutional neural network has a convolutional layer that outputs a plurality of feature maps, each feature map being a result of convolution with a particular kernel of the layer input. Throughout this application, the indefinite article "a" or "an" is used in the sense of "at least one" and specifically includes the possibility of a plurality. The convolutional neural network may have a plurality of convolutional layers, for example two, three or four convolutional layers, connected in series with each other, and optionally have pooling layers between at least some of the convolutional layers.

According to another aspect, the convolutional neural network further comprises an activation layer (e.g., a sigmoid or rectified unit layer) and/or a global feature vector or a final predicted fully connected layer of the output network. The convolutional neural network may, for example, comprise a plurality (e.g., two) of fully connected layers that receive input from one or more convolutional and/or pooling layers and provide motion data (e.g., six numbers representing 3 translational and 3 rotational parameters) as output.

According to another aspect, the neural network is a recurrent neural network with dynamic temporal behavior (i.e., the network prediction for a given ultrasound image data depends on previous frames that have been input in the network). one popular architecture choice is, for example, the long-short term memory (L STM) network.

Although the machine learning module according to the present invention is mainly illustrated by a neural network, it is not limited to a neural network. Instead, other types of machine learning modules may be used. For example, according to another aspect, the machine learning module can also include, for example, a random forest algorithm.

Next, aspects related to more details of the input data from the ultrasound probe are described.

According to one aspect, the method includes inputting local image data corresponding to (successive) image frame pairs (or subsets) to a machine learning module to determine relative three-dimensional motion between the ultrasound image frame pairs (subsets), and repeating this process for successive image frame pairs or subsets.

According to an alternative aspect, the method includes inputting a global image data set spanning substantially the entire image frame set to a machine learning module to determine relative three-dimensional motion between a first and a last of the ultrasound image frames. Thus, for example, the entire stream of ultrasound image data may be input into the machine learning module.

According to another aspect, the method may include skipping frames such as every second frame. Thus, the need for computing power may be reduced while still providing timely information.

According to another aspect, the method may include inputting a global image dataset spanning substantially the entire image frame set to a machine learning module. The machine learning module may then determine relative three-dimensional motion between some ultrasound image frames (e.g., the first and last of the ultrasound image frames).

According to another aspect, the image data is two-dimensional or three-dimensional, i.e. it describes a two-dimensional image frame or a three-dimensional image frame. For example, 3D image frames may be generated by using a probe capable of imaging a small 3D ultrasound volume, for example by a matrix array ultrasound transducer or a rocking ultrasound system.

According to another aspect, the image data may include data obtained by at least one ultrasound imaging mode, such as a-mode, B-mode, continuous harmonic imaging, color doppler mode, ordinary wave imaging, and the like. According to another aspect, the image data may include raw radio frequency data. According to another aspect, image data is extracted from the ultrasound system at various points in the processing pipeline, for example, prior to the speckle noise filtering step.

According to another aspect, the image data may include doppler data containing velocity information. The doppler data may be obtained by an additional doppler function ultrasound sensor.

According to another aspect, the image data may include metadata indicative of ultrasound settings, e.g., presets such as gain, frequency, and/or dynamic range.

Next, aspects related to using other (non-ultrasonic) sensor data are described.

According to one aspect, additional sensors (e.g., fixed to the ultrasound probe) may be provided, and the method may include inputting sensor data from the additional sensors to the machine learning module. The above description of image data may also optionally be applied to sensor data of the machine learning module.

For example, the further sensor may comprise an acceleration sensor, the method comprising detecting the acceleration of the ultrasound probe by the acceleration sensor attached to the ultrasound probe; and inputting the acceleration corresponding to the at least two ultrasonic image frames into the machine learning module. The acceleration data may be pre-processed, for example, to detect abrupt motion that the machine learning module may be less able to process, and to generate an abrupt motion signal if abrupt motion is detected.

Instead of or in addition to data from acceleration sensors, any other sensor data may be used, in particular sensor data obtained from IMU sensors, such as acceleration, gyroscopes, magnetic fields, barometric data, in particular acceleration and/or gyroscopes.

According to another aspect, the additional sensor may comprise a rotation sensor for detecting rotation of the ultrasound probe.

According to another aspect, the method may include tracking the position of the ultrasound probe (by a tracking system such as an optical tracking system, e.g., a fixed inside-out tracker and tracking a set of markers attached to the probe, or an outside-in tracker connected to the probe and tracking the fixed set of markers). The probe motion index may then be compared and/or combined with the tracking data to identify and/or compensate for errors. Another mode of operation is to detect if the tracking system is malfunctioning (e.g., if the tracking markers are occluded) and if the tracking system is determined to be malfunctioning, use the determined probe motion index as a backup by replacing the tracking position information from the tracking system with the probe position and orientation determined from the three-dimensional motion index (60). Thus, the method according to this aspect may be used to make existing tracking systems more robust or accurate.

According to another aspect, the additional sensor comprises an optical device (e.g., a camera or a laser-based motion detection system).

According to another aspect, the method includes generating a reliability indicator of the probe motion indicator as a result of the comparison between the tracking data and the probe motion indicator. For example, the method may include detecting an inconsistency between the determined three-dimensional motion and the sensor data, and in the event that an inconsistency is detected, generating an indication that the output is unreliable.

According to another alternative aspect, no external tracker is provided.

Next, aspects related to the ultrasound probe are described. According to one aspect, an ultrasound probe includes an ultrasound transducer array for transmitting an ultrasound beam and detecting ultrasound echoes reflected from a target volume of a volumetric portion at a plurality of sample volumes in a scan plane. According to another aspect, ultrasound image data is derived from ultrasound echoes reflected by the body part from each of a plurality of scan planes.

Next, aspects related to training data and acquisition protocols are described.

According to one aspect, the machine learning module has been trained using a training image data stream obtained by a predetermined acquisition direction, and the method comprises receiving an ultrasound image data stream from the ultrasound probe when the ultrasound probe is moved along the body part according to the predetermined acquisition direction. Optionally, the sensor data is synchronized.

According to another aspect, the training data is generated by using a separate tracking system that outputs a tracking position and/or motion of the probe for each image frame, and inputs an index of the tracking position and/or motion of the probe as the real data as well as the training image data. Thus, according to one aspect, the training data includes (1) ultrasound image data, (2) tracking data as real data, and (3) optionally, sensor data.

The machine learning module is typically trained by solving an optimization problem for the model function using training data, i.e., inputting known "true" outputs (e.g., known motion data from an accurate tracking system) to the model function, i.e., inputting a known "true" output to the model function, the optimization problem consists in finding a set of f model parameters that minimize a cost function, defined as a measure of error between the output of the model function and the true data, one example of such a measure of error is the square L2 norm, i.e., the mean square between the 3 translational parameters and the 3 rotational parameters predicted by the model function of the machine learning module, and the mean square calculated from the tracking data.

Next, aspects related to further processing of the probe motion index are described. According to one aspect, the method includes determining a probe position and orientation of the ultrasound probe from probe motion indices (from relative three-dimensional displacements and rotations between ultrasound image frames). The position and orientation of the probe may be obtained by discrete integration of a plurality of probe motion indicators.

According to another aspect, the method includes filtering the determined probe position and orientation. For example, the method may include further refining and normalizing the probe motion indicator or the determined position and orientation of the probe, for example by comparing and/or averaging a plurality of estimates obtained by the machine learning module.

According to another aspect, the method may include reconstructing a three-dimensional Ultrasound image using the determined probe position and orientation and the Ultrasound image data stream, for example, by any known 3D Ultrasound volume Compounding and/or Reconstruction algorithm, see [ Nicholes Rohling, Robert. (1999).: 3D Freehand Ultrasound: Reconstruction and spatial Compounding ].

Next, some other aspects are described. According to one aspect, the volume portion is a body portion of the patient. For example, the body portion may include a limb portion, such as a forearm portion and/or a leg portion of the patient, e.g., for peripheral vein mapping for clinical use in bypass surgery or AV-fistula mapping.

Alternatively, the volume portion may also be a part of the article to be non-destructively inspected.

According to another aspect, the method comprises predicting the motion of the ultrasound probe directly from the ultrasound image stream without any input of an external tracking system, and optionally based on image data only, i.e. without inputting other sensor data than image data.

According to another aspect, the method is performed during acquisition of an ultrasound image of the volume portion by the ultrasound probe (i.e. in the background). This includes evaluating previously acquired and stored image data. Preferably, the method (in particular the determining step) is performed in an at least partially overlapping manner while acquiring ultrasound data.

According to another aspect, an apparatus is provided for determining a three-dimensional motion of a movable ultrasound probe 10 during acquisition of an ultrasound image of a volume portion by the ultrasound probe. The apparatus comprises a probe input interface for receiving an ultrasound image 20 data stream from the ultrasound probe 10 while moving the ultrasound probe along the volume portion; and a machine learning module 50. The machine learning module 50 has an input section adapted to receive as input at least a subset of the ultrasound image data 20, 40 representing a plurality of ultrasound image frames 22, and has a training memory section containing a training memory that has been trained to determine relative three-dimensional motion between the ultrasound image frames. These parts may be provided by software or hardware or by a combination of software and hardware. The machine learning module 50 is adapted to determine a three-dimensional motion index indicative of relative three-dimensional motion between ultrasound image frames from the input and using the training memory.

According to another aspect, the apparatus described herein, in particular the machine learning module 50, is adapted to perform a method according to any one of the embodiments and aspects described herein. Thus, the apparatus may have apparatus components (modules) for performing each of the method steps described herein. The method steps may be performed by hardware components, by a computer programmed by appropriate software, by any combination of the two or in any other way. Thus, in particular, the apparatus comprises a probe input interface for receiving an ultrasound image 20 data stream from the ultrasound probe 10 as the ultrasound probe is moved along the volume portion. The apparatus further includes a machine learning module 50 having a memory portion adapted to receive as input at least a subset of ultrasound image data 20, 40 representing a plurality of ultrasound image frames 22, and having a training memory portion containing a training memory that has been trained to determine relative three-dimensional motion between the ultrasound image frames. Thus, the machine learning module 50 is adapted to determine a three-dimensional motion index indicative of relative three-dimensional motion between ultrasound image frames from the input and using the training memory.

Reference numerals

2 volume part/body part

10 ultrasonic probe

11 ultrasound system

12 movement of the ultrasound probe

14 sensor

20 ultrasound image data

22 image region (image plane) of image frame

24 sensor data

30 (image data) preprocessing module

34 (sensor data) preprocessing module

40 preprocessing sensor image data

Preprocessing 44 sensor data

50 machine learning module

52 training data

60 index of motion

70 post-processing module

80 post-processing trace data

82 determined spatial arrangement of image frames

I₁,I₂,…I_NImage frame

C₁,C₂,…C_NSpatial arrangement of a determined image frame coordinate system

M₁₂Coordinate transformation function of image frame coordinate system

19页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于活检装置的试样容器和同轴引入器插管

Apparatus and method for determining motion of an ultrasound probe

相关技术

网友询问留言