Ultrasound imaging with deep learning based beamforming and associated devices, systems, and methods

文档序号：54489 发布日期：2021-09-28 浏览：39次中文

阅读说明：本技术 具有基于深度学习的波束形成的超声成像和相关联的设备、系统及方法 (Ultrasound imaging with deep learning based beamforming and associated devices, systems, and methods ) 是由 F·G·G·M·维尼翁 M·U·加尼 J·S·沈 F·C·梅拉尔黄圣文 J-L·F-M· 于 2020-02-10 设计创作，主要内容包括：提供了超声图像设备、系统和方法。一种超声成像系统,包括被配置为将超声能量发送到解剖结构中并且接收与所述解剖结构相关联的超声回波的声学元件的阵列以及处理器电路,所述处理器电路与声学元件的阵列通信并且被配置为：从所述阵列接收对应于接收到的超声回波的超声信道数据；通过将第一缩放函数应用到所述超声信道数据将所述超声信道数据规范化；通过将预测网络应用到规范化超声信道数据来生成波束形成数据；通过将第二缩放函数应用到所述波束形成数据将所述波束形成数据去规范化；根据所述波束形成数据来生成所述解剖结构的图像；并且将所述解剖结构的图像输出到与所述处理器电路通信的显示器。(Ultrasound image devices, systems, and methods are provided. An ultrasound imaging system comprising an array of acoustic elements configured to transmit ultrasound energy into an anatomical structure and to receive ultrasound echoes associated with the anatomical structure, and a processor circuit in communication with the array of acoustic elements and configured to: receiving ultrasound channel data from the array corresponding to the received ultrasound echoes; normalizing the ultrasonic channel data by applying a first scaling function to the ultrasonic channel data; generating beamforming data by applying a predictive network to the normalized ultrasound channel data; de-normalizing the beamformed data by applying a second scaling function to the beamformed data; generating an image of the anatomical structure from the beamforming data; and output an image of the anatomical structure to a display in communication with the processor circuit.)

1. An ultrasound imaging system comprising:

an array of acoustic elements configured to transmit ultrasound energy into an anatomical structure and receive ultrasound echoes associated with the anatomical structure; and

a processor circuit in communication with the array of acoustic elements and configured to:

receiving ultrasound channel data from the array corresponding to the received ultrasound echoes;

normalizing the ultrasonic channel data by applying a first scaling function to the ultrasonic channel data based on a signal level of the ultrasonic channel data;

generating beamforming data by applying a predictive network to the normalized ultrasound channel data;

de-normalizing the beamforming data by applying a second scaling function to the beamforming data based on the signal level of the ultrasound channel data;

generating an image of the anatomical structure from the beamforming data; and is

Outputting the image of the anatomical structure to a display in communication with the processor circuit.

2. The system of claim 1, wherein the processor circuit is further configured to:

applying a time delay to the normalized ultrasound channel data based on imaging depth.

3. The system of claim 1, wherein the first and second sensors are disposed in a common housing,

wherein the ultrasound channel data comprises a plurality of samples for a plurality of channels,

wherein the beamforming data comprises a plurality of output values,

wherein the processor circuit is further configured to select a subset of the plurality of samples based on an imaging depth,

wherein the processor circuit normalizes the ultrasonic channel data comprises scaling a first signal level of a first sample of the subset of the plurality of samples based on a second signal level of the subset of the plurality of samples to produce the subset of normalized ultrasonic channel data, and

wherein the processor circuit to generate the beamforming data comprises to apply the prediction network to the subset of the normalized ultrasound channel data to produce a first output value of the plurality of output values in the beamforming data.

4. The system of claim 3, wherein the first sample and the first output value correspond to a same pixel location in the image.

5. The system of claim 4, wherein the processor circuit to normalize the ultrasonic channel data comprises:

scaling the first signal level of the first sample based on a Root Mean Square (RMS) value of the subset of the plurality of samples.

6. The system of claim 1, wherein the array of acoustic elements comprises a first aperture size, and wherein the beamforming data is associated with a second aperture size that is larger than the first aperture size.

7. The system of claim 6, wherein the predictive network is trained by:

providing test ultrasound channel data generated based on the first aperture size and first target beamforming data generated based on the second aperture size; and is

Training the predictive network to generate the first target beamforming data from the test ultrasound channel data.

8. The system of claim 7, wherein the predictive network is trained by:

providing second target beamforming data generated based on the first aperture size; and is

Training the predictive network to generate the second target beamforming data from the test ultrasonic channel data prior to training the predictive network to generate the first target beamforming data.

9. The system of claim 1, wherein the ultrasound channel data is generated from a first number of ultrasound transmission trigger events, and wherein the beamforming data is associated with a second number of ultrasound transmission trigger events that is greater than the first number of ultrasound transmission trigger events.

10. The system of claim 9, wherein the predictive network is trained by:

providing test ultrasound channel data generated based on the first number of ultrasound transmit trigger events and first target beamforming data generated based on the second number of ultrasound transmit trigger events; and is

Training the predictive network to generate the first target beamforming data from the test ultrasound channel data.

11. The system of claim 10, wherein the predictive network is trained by:

providing second target beamforming data generated based on the first number of ultrasound transmission trigger events; and is

12. The system of claim 1, wherein the ultrasound channel data is associated with a first signal-to-noise ratio (SNR), and wherein the beamforming data is associated with a second SNR greater than the first SNR.

13. The system of claim 1, wherein the array of acoustic elements comprises a one-dimensional array of acoustic elements.

14. The system of claim 1, wherein the array of acoustic elements comprises a two-dimensional array of acoustic elements.

15. A method of ultrasound imaging comprising:

receiving, at a processor circuit in communication with the array of acoustic elements, ultrasound channel data corresponding to ultrasound echoes associated with the anatomical structure;

normalizing the ultrasonic channel data by applying a first scaling function to the ultrasonic channel data based on a signal level of the ultrasonic channel data;

generating beamforming data by applying a predictive network to the normalized ultrasound channel data;

de-normalizing the beamforming data by applying a second scaling function to the beamforming data based on the signal level of the ultrasound channel data;

generating an image of the anatomical structure from the beamforming data; and is

Outputting the image of the anatomical structure to a display in communication with the processor circuit.

16. The method of claim 15, further comprising:

applying a time delay to the normalized ultrasound channel data based on imaging depth.

17. The method of claim 15, wherein the first and second light sources are selected from the group consisting of,

wherein the ultrasound channel data comprises a plurality of samples for a plurality of channels,

wherein the beamforming data comprises a plurality of output values,

wherein the method comprises selecting a subset of the plurality of samples based on imaging depth,

wherein normalizing the ultrasound channel data comprises scaling a first signal level of a first sample of the subset of the plurality of samples to generate the normalized ultrasound channel data based on a second signal level of the subset of the plurality of samples, the first sample corresponding to a pixel location in the image, and

generating the beamforming data by: applying the prediction network to the subset of the normalized ultrasound channel data to generate a first output value of the plurality of output values in the beamforming data, the first output value corresponding to the pixel location.

18. The method of claim 15, wherein the array of acoustic elements comprises a first aperture size, and wherein the beamforming data is associated with a second aperture size that is larger than the first aperture size.

19. The method of claim 15, wherein the ultrasound channel data is generated from a first number of ultrasound transmission trigger events, and wherein the beamforming data is associated with a second number of ultrasound transmission trigger events that is greater than the first number of ultrasound transmission trigger events.

20. The method of claim 15, wherein the ultrasound channel data is associated with a first signal-to-noise ratio (SNR), and wherein the beamforming data is associated with a second SNR greater than the first SNR.

Technical Field

The present disclosure relates generally to ultrasound imaging and, in particular, to reconstructing an ultrasound image from ultrasound echo channel responses using a predictive model for beamforming.

Background

Ultrasound imaging systems are widely used for medical imaging. Conventional medical ultrasound systems may include an ultrasound transducer probe coupled to a processing system and one or more display devices. An ultrasound transducer probe may include an array of acoustic elements that transmit sound waves into a subject (e.g., a patient's body) and record the sound waves reflected from the subject. The transmission of sound waves and/or the reception of reflected sound waves or echo responses may be performed by the same set of ultrasound transducer elements or by different sets of acoustic elements. The processing system reconstructs or creates an image of the object from the echo responses received by the acoustic elements. For conventional ultrasound imaging, the processing system may perform beamforming by delaying and summing the received echo response signals to achieve receive focusing along the imaging depth. The processing system may reconstruct an image from the beamformed signals by applying signal processing and/or image processing techniques.

There is often a trade-off between resolution, contrast, penetration depth, signal-to-noise ratio (SNR) and/or acquisition speed and/or frame rate in conventional ultrasound imaging. For example, image quality or resolution in conventional ultrasound imaging is limited by diffraction. One way to reduce the effects of diffraction is to employ transducers with larger aperture sizes. In another example, an ultrasound imaging system may utilize unfocused ultrasound beams or divergent waves to illuminate a larger portion of a region of interest (ROI) with a single transmission in order to reduce image acquisition time. However, images obtained from a limited number of diverging waves may have lower image quality than images obtained from focused imaging. Thus, the quality of ultrasound imaging in conventional ultrasound imaging systems can be limited by the capabilities of the system and/or the acquisition process (e.g., transducer aperture size).

Disclosure of Invention

While existing ultrasound imaging has proven useful for clinical guidance and diagnosis, there remains a need for improved systems and techniques for providing high quality ultrasound images. Embodiments of the present disclosure provide a deep learning framework that maps ultrasound echo channel signals to beamformed signals rather than performing conventional delay-and-add (DAS) -based beamforming. For example, an imaging probe including a transducer array may be used for ultrasound imaging. The transducer array may include an array of acoustic elements that transmit ultrasound pulses into an object (e.g., a patient's anatomy) and receive ultrasound channel signals corresponding to ultrasound waves reflected from the object. A prediction network (e.g., Convolutional Neural Network (CNN)) may be trained to map the per-channel ultrasound echo channel signals to the beamformed signals on a pixel-by-pixel basis. In an example, the per-channel ultrasonic echo channel signals are time aligned and normalized before applying the prediction network. Thus, the predictive network is trained to learn beamforming rather than amplitude mapping and/or time delay mapping. For example, a transducer array of a particular aperture size and/or acquisition with a particular number of transmit excitations may provide a particular image quality using DAS-based beamforming. In an embodiment, the predictive network may be trained to provide beamformed signals with a higher image quality or resolution than the actual transducer aperture size in use may provide. In an embodiment, the predictive network may be trained to provide beamformed signals with a higher image quality or resolution than the actual number of transmit excitations used in the acquisition may provide. The predictive network may be trained using a combination of simulated data, data acquired from phantoms in an experimental testing setting, and/or data acquired from patients in a clinical setting. The disclosed embodiments are suitable for use in two-dimensional (2D) imaging, three-dimensional (3D) volumetric imaging, focused imaging, and/or unfocused imaging.

In one embodiment, an ultrasound imaging system includes: an array of acoustic elements configured to transmit ultrasound energy into an anatomical structure and receive ultrasound echoes associated with the anatomical structure; and processor circuitry in communication with the array of acoustic elements and configured to: receiving ultrasound channel data from the array corresponding to the received ultrasound echoes; normalizing the ultrasonic channel data by applying a first scaling function to the ultrasonic channel data based on a signal level of the ultrasonic channel data; generating beamforming data by applying a predictive network to the normalized ultrasound channel data; de-normalizing the beamforming data by applying a second scaling function to the beamforming data based on the signal level of the ultrasound channel data; generating an image of the anatomical structure from the beamforming data; and output the image of the anatomical structure to a display in communication with the processor circuit.

In some embodiments, wherein the processor circuit is further configured to apply a time delay to the normalized ultrasound channel data based on an imaging depth. In some embodiments, wherein the ultrasound channel data comprises a plurality of samples for a plurality of channels, wherein the beamforming data comprises a plurality of output values, wherein the processor circuit is further configured to select a subset of the plurality of samples based on an imaging depth, wherein the processor circuit normalizing the ultrasound channel data comprises scaling a first signal level of a first sample of the subset of the plurality of samples based on a second signal level of the subset of the plurality of samples to produce the subset of normalized ultrasound channel data, and wherein the processor circuit generating the beamforming data comprises applying the prediction network to the subset of the normalized ultrasound channel data to produce a first output value of the plurality of output values in the beamforming data. In some embodiments, wherein the first sample and the first output value correspond to a same pixel location in the image. In some embodiments, wherein the processor circuit normalizes the ultrasonic channel data comprises scaling the first signal level of the first sample based on a Root Mean Square (RMS) value of the subset of the plurality of samples. In some embodiments, wherein the array of acoustic elements comprises a first aperture size, and wherein the beamforming data is associated with a second aperture size that is larger than the first aperture size. In some embodiments, the predictive network is trained by: providing test ultrasound channel data generated based on the first aperture size and first target beamforming data generated based on the second aperture size; and training the predictive network to generate the first target beamforming data from the test ultrasound channel data. In some embodiments, wherein the predictive network is trained by: providing second target beamforming data generated based on the first aperture size; and training the predictive network to generate the second target beamforming data from the test ultrasound channel data prior to training the predictive network to generate the first target beamforming data. In some embodiments, wherein the ultrasound channel data is generated from a first number of ultrasound transmission trigger events, and wherein the beamforming data is associated with a second number of ultrasound transmission trigger events that is greater than the first number of ultrasound transmission trigger events. In some embodiments, wherein the predictive network is trained by: providing test ultrasound channel data generated based on the first number of ultrasound transmit trigger events and first target beamforming data generated based on the second number of ultrasound transmit trigger events; and training the predictive network to generate the first target beamforming data from the test ultrasound channel data. In some embodiments, wherein the predictive network is trained by: providing second target beamforming data generated based on the first number of ultrasound transmission trigger events; and training the predictive network to generate the second target beamforming data from the test ultrasound channel data prior to training the predictive network to generate the first target beamforming data. In some embodiments, the ultrasound channel data is associated with a first signal-to-noise ratio (SNR), and wherein the beamforming data is associated with a second SNR greater than the first SNR. In some embodiments, wherein the array of acoustic elements comprises a one-dimensional array of acoustic elements. In some embodiments, wherein the array of acoustic elements comprises a two-dimensional array of acoustic elements.

In one embodiment, a method of ultrasound imaging, comprises: receiving, at a processor circuit in communication with the array of acoustic elements, ultrasound channel data corresponding to ultrasound echoes associated with the anatomical structure; normalizing the ultrasonic channel data by applying a first scaling function to the ultrasonic channel data based on a signal level of the ultrasonic channel data; generating beamforming data by applying a predictive network to the normalized ultrasound channel data; de-normalizing the beamforming data by applying a second scaling function to the beamforming data based on the signal level of the ultrasound channel data; generating an image of the anatomical structure from the beamforming data; and output the image of the anatomical structure to a display in communication with the processor circuit.

In some embodiments, the method further comprises applying a time delay to the normalized ultrasound channel data based on imaging depth. In some embodiments, wherein the ultrasound channel data comprises a plurality of samples for a plurality of channels, wherein the beamforming data comprises a plurality of output values, wherein the method comprises selecting a subset of the plurality of samples based on an imaging depth, wherein normalizing the ultrasound channel data comprises scaling a first signal level of a first sample of the subset of the plurality of samples to produce the normalized ultrasound channel data based on a second signal level of the subset of the plurality of samples, the first sample corresponding to a pixel location in the image, and generating the beamforming data by: applying the prediction network to the subset of the normalized ultrasound channel data to generate a first output value of the plurality of output values in the beamforming data, the first output value corresponding to the pixel location. In some embodiments, wherein the array of acoustic elements comprises a first aperture size, and wherein the beamforming data is associated with a second aperture size that is larger than the first aperture size. In some embodiments, wherein the ultrasound channel data is generated from a first number of ultrasound transmission trigger events, and wherein the beamforming data is associated with a second number of ultrasound transmission trigger events that is greater than the first number of ultrasound transmission trigger events. In some embodiments, the ultrasound channel data is associated with a first signal-to-noise ratio (SNR), and wherein the beamforming data is associated with a second SNR greater than the first SNR.

Additional aspects, features and advantages of the present disclosure will become apparent from the detailed description that follows.

Drawings

Illustrative embodiments of the present disclosure will be described with reference to the accompanying drawings, in which:

fig. 1 is a schematic diagram of an ultrasound imaging system according to aspects of the present disclosure.

Figure 2 is a schematic diagram of an ultrasound imaging system implementing delay-and-add (DAS) -based beamforming in accordance with an embodiment of the present disclosure.

Fig. 3 is a schematic diagram illustrating an ultrasound transmission scheme for ultrasound imaging, in accordance with aspects of the present disclosure.

Fig. 4 is a schematic diagram illustrating an ultrasound transmission scheme for ultrasound imaging, in accordance with aspects of the present disclosure.

Fig. 5 is a schematic diagram of an ultrasound imaging system implementing deep learning based beamforming in accordance with an embodiment of the present disclosure.

Fig. 6 is a schematic diagram illustrating a normalization scheme for deep learning based beamforming, according to aspects of the present disclosure.

Fig. 7 is a schematic diagram illustrating a configuration of a deep learning network, in accordance with aspects of the present disclosure.

Fig. 8 is a schematic diagram illustrating a deep learning network training scheme, in accordance with aspects of the present disclosure.

Fig. 9 illustrates pre-scan converted images generated from DAS-based beamforming and deep learning-based beamforming in accordance with aspects of the present disclosure.

Fig. 10 is a schematic diagram illustrating a deep learning network training scheme, in accordance with aspects of the present disclosure.

Fig. 11 is a schematic diagram illustrating a deep learning network training scheme, in accordance with aspects of the present disclosure.

Fig. 12 illustrates images generated from DAS-based beamforming and deep learning-based beamforming in accordance with aspects of the present disclosure.

Fig. 13 is a schematic diagram of a processor circuit according to an embodiment of the present disclosure.

Fig. 14 is a flow diagram of a method of ultrasound imaging based on depth learning, according to aspects of the present disclosure.

Detailed Description

For the purposes of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Any alterations and further modifications in the described devices, systems, and methods, and any further applications of the principles of the disclosure as described herein are contemplated as would normally occur to one skilled in the art to which the disclosure relates. In particular, it is fully contemplated that the features, components, and/or steps described with respect to one embodiment may be combined with the features, components, and/or steps described with respect to other embodiments of the present disclosure. However, for the sake of brevity, many iterations of these combinations will not be described separately.

Fig. 1 is a schematic diagram of an ultrasound imaging system 100 in accordance with aspects of the present disclosure. The system 100 is used to scan a region or volume of a patient's body. The system 100 includes an ultrasound imaging probe 110 in communication with a host 130 through a communication interface or link 120. The probe 110 includes a transducer 112, an Analog Front End (AFE)113, a beamformer 114, processor circuitry 116, and a communication interface 118. Host 130 includes display 132, processor circuit 134, communication interface 136, and memory 138.

The probe 110 may take any suitable form for imaging various body parts of a patient while positioned inside or outside the patient's body. In an embodiment, the probe 110 is an external ultrasound imaging device including a housing configured for handheld operation by a user. The transducer 112 may be configured to obtain ultrasound data when a user grips the housing of the probe 110 such that the transducer 112 is positioned adjacent to and/or in contact with the skin of the patient. The probe 110 is configured to obtain ultrasound data of anatomical structures within the patient's body while the probe 110 is positioned outside the patient's body. In some other embodiments, the probe 110 may take the form of a catheter, intravascular ultrasound (IVUS) catheter, intracardiac echocardiography (ICE) catheter, transesophageal echocardiography (TEE) probe, transthoracic echocardiography (TTE) probe, intracavity probe, handheld ultrasound scanner, or tile-based ultrasound device.

The transducer 112 transmits ultrasound signals towards the anatomical object 105 and receives echo signals reflected back from the object 105 to the transducer 112. The object 105 may include any anatomical structure of a patient suitable for ultrasound imaging examination (e.g., lung, blood vessel, tissue, heart, kidney, and/or liver). The ultrasound transducer 112 may include any suitable number of acoustic elements, including one or more acoustic elements and/or a plurality of acoustic elements. In some examples, the transducer 112 includes a single acoustic element. In some examples, the transducer 112 may include an array of acoustic elements having any number of acoustic elements in any suitable configuration. For example, the transducer 112 may be included between 1 acoustic element and 1000 acoustic elements, including values such as 2 acoustic elements, 4 acoustic elements, 36 acoustic elements, 64 acoustic elements, 128 acoustic elements, 500 acoustic elements, 812 acoustic elements, and/or other values both greater and lesser. In some examples, transducer 112 may include an array of acoustic elements having any number of acoustic elements in any suitable configuration, such as a linear array, a planar array, a curved array, a circumferential array, a circular array, a phased array, a matrix array, a one-dimensional (1D) array, a 1. x-dimensional array (e.g., a 1.5D array), or a two-dimensional (2D) array. The array of acoustic elements (e.g., one or more rows, one or more columns, and/or one or more orientations) can be controlled and activated in unison or independently. The transducer 112 may be configured to obtain 1D, 2D, and/or three-dimensional (3D) images of the patient's anatomy. The acoustic elements may also be referred to as transducer elements or imaging elements. In some embodiments, the transducer 112 may include a Piezoelectric Micromachined Ultrasonic Transducer (PMUT), a Capacitive Micromachined Ultrasonic Transducer (CMUT), a single crystal, lead zirconate titanate (PZT), a PZT composite, other suitable transducer types, and/or combinations thereof.

AFE113 is coupled to transducer 112. AFE113 may include components that control the transmission of ultrasonic waves at transducer 112 and/or the reception of echo responses at transducer 112. For example, in the transmit path, AFE113 may include a digital-to-analog converter (DAC), filter, gain control, and/or High Voltage (HV) transmitter that drives or triggers ultrasonic pulse transmission at the acoustic or transducer elements of transducer 112. In the receive path, the AFE113 may include gain control, filters, amplifiers, and analog-to-digital conversion (ADC) that receive the echo response from the transducer elements of the transducer 112. The AFE113 may also include a plurality of transmit/receive (T/R) switches that control switching between transmission and reception at the transducer elements and prevent high voltage pulses from damaging the transducer elements of the transducer 112.

In an embodiment, the transducer 112 includes M multiple transducer elements (e.g., acoustic elements 202 of fig. 2). In some embodiments, M may be about 2, 16, 64, 128, 192, or greater than 192. In the receive path, each transducer element may convert ultrasonic energy received from the reflected ultrasonic pulses into electrical signals, thereby forming a single receive channel. In other words, the transducer 112 may generate M analog ultrasound echo channel signals 160. AFE113 may be coupled to transducer 112 via M signal lines. An ADC (e.g., ADC 204 of fig. 2) in AFE113 may generate M digital ultrasound echo channel signals 162, each corresponding to an analog ultrasound echo channel signal 160 received at one of the transducer elements in transducer 112. Digital ultrasound echo channel signal 162 may also be referred to as an ultrasound echo data stream or ultrasound echo channel data.

The beamformer 114 is coupled to the AFE 113. The beamformer 114 may include delay elements and summing elements configured to control transmit and/or receive beamforming at the transducer 112. The beamformer 114 may apply appropriate time delays to at least a subset of the digital ultrasound echo channel signals 162 and combine the time delayed digital ultrasound echo channel signals to form a beamformed signal 164 (e.g., a focused beam). For example, the beamformer 114 may generate L beamformed signals 164, where L is a positive integer less than M. In some embodiments, the beamformer 114 may include multiple stages of beamforming. For example, the beamformer 114 may perform partial beamforming to combine subsets of the digital ultrasound echo channel signals 162 to form partial beamformed signals, and then beamform the partial beamformed signals to produce full beamformed signals. Although the beamformer 114 is described in the context of digital beamforming, in some embodiments, the AFE113 may include electronics and/or dedicated hardware for analog partial beamforming.

The processor circuit 116 is coupled to the beamformer 114. The processor circuit 116 may include a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a controller, a Field Programmable Gate Array (FPGA) device, another hardware device, a firmware device, or any combination thereof configured to perform the operations described herein. The processor circuit 134 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. The processor circuit 116 is configured to process the beamformed signals 164. For example, the processor circuit 116 may perform a series of coherent and/or incoherent signal processing, such as compounding, envelope detection, logarithmic compression, and/or non-linear image filtering, on the beamformed signals 164 to produce image signals 166.

The communication interface 118 is coupled to the processor circuit 116. Communication interface 118 may include one or more transmitters, one or more receivers, one or more transceivers, and/or circuitry for sending and/or receiving communication signals. The communication interface 118 may include hardware components and/or software components that implement a particular communication protocol suitable for transmitting signals to the host 130 over the communication link 120. The communication interface 118 may be referred to as a communication device or a communication interface module.

The communication link 120 may be any suitable communication link. For example, the communication link 120 may be a wired link, such as a Universal Serial Bus (USB) link or an ethernet link. Alternatively, the communication link 120 may be a wireless link, such as an Ultra Wideband (UWB) link, an Institute of Electrical and Electronics Engineers (IEEE)802.11WiFi link, or a bluetooth link.

At the host 130, the communication interface 136 may receive image signals 166, transducer element signals (e.g., analog ultrasound echo channel signals 160), or partial beamforming signals. Communication interface 136 may be substantially similar to communication interface 118. Host 130 may be any suitable computing and display device, such as a workstation, Personal Computer (PC), laptop, tablet, or mobile phone.

The processor circuit 134 is coupled to a communication interface 136. The processor circuit 134 may be implemented as a combination of software components and hardware components. The processor circuit 134 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a controller, an FPGA device, another hardware device, a firmware device, or any combination thereof configured to perform the operations described herein. The processor circuit 134 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

The processor circuit 134 may be configured to generate or reconstruct an image 168 of the object 105 from image signals 166 received from the probe 110, beamformed images 168 from transducer signals (e.g., analog ultrasound echo channel signals 160), or partial beamformed signals 164. Processor circuit 134 may also apply image processing techniques to image signal 166. In some embodiments, the processor circuit 134 may perform scan conversion to form a 2D or 2D volumetric image from the image signal 166. In some embodiments, processor circuit 134 may perform real-time processing on image signals 166 to provide streaming video of ultrasound images 168 of object 105. The image 168 may include morphological information, functional information, and/or quantitative measurements of the subject 105, depending on the acquisition modality used at the probe 110. The morphological information may include anatomical structure information (e.g., B-mode information) of the object 105. Examples of functional information may include tissue strain, elasticity, doppler flow, tissue doppler flow, and/or blood flow information associated with the object 105. Examples of quantitative measurements may include blood flow velocity, blood flow, lumen diameter, lumen area, stenosis length, plaque burden, and/or tissue elasticity. In some embodiments, the processor circuit 134 may perform image analysis on the image signals 166 to determine a clinical condition associated with the subject 105.

A display 132 is coupled to the processor circuit 134. Display 132 may be a monitor or any suitable display. The display 132 is configured to display ultrasound images, image video, and/or information associated with the object 105 under examination.

Although the system 100 is illustrated with beamforming and signal processing functions performed at the probe 110 by the beamformer 114 and the processor circuit 116, respectively, in some embodiments at least some of the beamforming and/or signal processing functions may be performed at the host 130. In other words, the probe 110 may transmit the digital ultrasound echo channel signals 162 or the beamforming signals 164 to the host 130 for processing. In some other embodiments, the probe 110 may send the analog ultrasound echo channel signals 160 to the host 130 for processing, e.g., with some gain control, filtering, and/or partial analog beamforming. In such embodiments, the host 130 may also include an ADC and a beamformer. Additionally, the communication interface 118 at the probe 110 can be an industry standard physical connector and/or a proprietary physical connector, and the communication link 120 can include any industry standard cable, coaxial cable, and/or proprietary cable. In general, system 100 may represent any type of ultrasound imaging system in which ultrasound imaging functionality may be divided in any suitable manner across a probe (e.g., including transducer 112), a host, and/or any intermediate processing subsystem between the probe and the host.

According to embodiments of the present disclosure, the system 100 uses a predictive model (e.g., a deep learning model) for beamforming instead of the delay-and-add (DAS) -based beamformer 114 described above. The system 100 may be used in various stages of ultrasound imaging. In an embodiment, the system 100 may be used to collect ultrasound images to form a training data set 140 for training a machine learning network 142 for ultrasound beamforming. For example, the host 130 may include a memory 138, which may be any suitable storage device, such as a cache memory (e.g., of the processor circuit 134), a Random Access Memory (RAM), a magnetoresistive RAM (mram), a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a flash memory, a solid-state memory device, a hard drive, a solid-state drive, other forms of volatile and non-volatile memory, or a combination of different types of memory. The memory 138 may be configured to store a training image dataset 140 and a machine learning network 142. For example, the training image dataset 140 may store digital ultrasound echo channel signals 162 associated with beamformed signals or analog beamformed signals generated using the system 100. In an embodiment, the system 100 may utilize a trained machine learning network 142 for beamforming instead of the DAS beamformer 114 in a clinical setting (e.g., during an ultrasound examination). Mechanisms for training a deep learning model for ultrasound beamforming and applying the trained deep learning model for ultrasound beamforming are described in more detail herein.

Fig. 2 is a schematic diagram illustrating an ultrasound imaging system 200 implementing DAS-based beamforming in accordance with an embodiment of the present disclosure. The system 200 corresponds to a portion of the system 100 and provides a more detailed view of components along a receive signal path of the system 100 (e.g., within the probe 110 and/or the host 130). As shown in fig. 2, the transducer 112 includes a plurality of acoustic elements 202. Each acoustic element 202 forms a receive channel in which the analog ultrasound echo channel signal 160 may be received when the acoustic element 202 is activated for reception after a transmit trigger. For example, the transducer 112 may include M number of acoustic elements 202. Thus, the reception channels may be referred to as channel (1) to channel (M). In an embodiment, the AFE113 may include multiple ADCs 204. Each ADC 204 may be coupled to an acoustic element 202. Although not shown, the AFE113 may additionally include other components, such as filters and amplifiers, coupled to each acoustic element 202. Each ADC 204 may sample the corresponding analog ultrasound echo channel signal 160 to form a digital ultrasound echo channel signal 162. Each digital ultrasound echo channel signal 162 includes a series of samples along the imaging depth of field. In some other embodiments, AFE113 may include a fewer number of ADCs 204 than the number of receive channels. In such embodiments, each ADC 204 may be coupled to a subset of the receive channels and configured to sample the analog ultrasound echo channel signals 160 from the subset of the receive channels, e.g., in a multiplexed manner.

The beamformer 114 is coupled to the ADC 204. The beamformer 114 includes a plurality of delay elements 210 coupled to a summing element 220. Each delay element 210 is configured to apply a time delay to a corresponding digital ultrasound echo channel signal 162 to produce a delayed ultrasound echo channel signal 212. Delay element 210 may be dynamically configured to apply an appropriate time delay to digital ultrasound echo channel signal 162. For example, one or more of the acoustic elements 202 may be triggered to transmit ultrasound energy into an anatomical structure (e.g., anatomical object 105), and a set of acoustic elements 202 may be activated to receive ultrasound echoes reflected from the anatomical structure as a result of the ultrasound signal transmission. Due to the different propagation paths, the received echoes may arrive at the acoustic element 202 at different times. Accordingly, delay element 210 delays ultrasound echo channel signal 162 such that ultrasound echo channel signal 162 is aligned in time. The summation element 220 is configured to combine the delayed ultrasound echo channel signals 212 to produce beamforming data 230. The beamformed data 230 corresponds to beamformed signals 164.

Generally, the goal of beamforming is to reverse the acoustic wave propagation effect so that ultrasound or acoustic energy can be focused at various locations along the major axis of the ultrasound echo signal path. For example, the delay element 210 may be dynamically configured to provide receive focusing at each echo location along the principal axis of the ultrasound echo signal path. In other words, the delay element 210 may be configured with different delays to provide focusing at different echo locations.

The beamforming data 230 may also be processed by the processor circuit 116 and/or the processor circuit 134, including for example, frequency compounding, envelope detection, log compression, and/or non-linear image filtering as described above with respect to fig. 1, to produce the image 168.

Some performance metrics, such as image quality or resolution and/or data acquisition rate or frame rate, may be important for ultrasound imaging. For example, image quality, resolution, or contrast may affect the clinician's ability to distinguish anatomical details within the acquired ultrasound images. The data acquisition rate or frame rate may affect the amount of time required for acquiring an ultrasound image or video, and thus the real-time imaging capabilities and ultrasound examination time.

The ultrasound imaging quality or resolution can be limited by diffraction, which is determined by the aperture size of the transducer. In other words, the image quality or resolution of the systems 100 and/or 200 may be limited by the aperture size 206 (see fig. 2) of the transducer 112 used for inspection. The aperture size 206 refers to the physical size or dimensions of the transducer 112. The aperture size 206 may correspond to the number of acoustic elements 202 in the transducer 112. One way to improve image quality or image resolution is to employ transducers with larger aperture sizes. Typically, the image resolution varies in proportion to the aperture size of the transducer. For example, a transducer having about 160 acoustic elements 202 may provide about twice the imaging resolution as compared to a transducer having about 80 acoustic elements 202.

The data acquisition rate may have a concern for 3D imaging or volume imaging, where a large amount of imaging data is acquired to produce a 3D image. Conventional ultrasound imaging acquisition schemes utilize focused transmit beams (shown in figure 3). The focused transmit beam may collimate the confined area. Thus, multiple transmit beams are typically used to sweep or illuminate the entire region of interest. As such, the use of focused transmit beams may present a time limitation for real-time volumetric imaging and/or applications, where high frame rates are important, for example, in cardiac imaging.

Fig. 3 is a schematic diagram illustrating an ultrasound transmission scheme 300 for ultrasound imaging, in accordance with aspects of the present disclosure. Scenario 300 may be employed by systems 100 and/or 200. Scheme 300 configures transducer 112 to emit focused ultrasound beam 320 for ultrasound imaging. As shown, a set of acoustic elements 202 is activated to emit a focused ultrasound beam 320. Focused ultrasound beam 320 has an hourglass shape with a focal point 322 at an imaging depth of 324. As can be observed, multiple focused ultrasound beams 320 are required in order to sweep through a region of interest (ROI)330, and thus may take a certain amount of time.

To improve the frame rate or reduce the image acquisition time, faster imaging methods may use unfocused ultrasound beams (shown in fig. 4). The unfocused beam may illuminate a larger portion of the ROI 330 and, thus, may reduce the number of transmissions required to illuminate or sweep the entire ROI 330.

Fig. 4 is a schematic diagram illustrating an ultrasound transmission scheme 400 for ultrasound imaging, in accordance with aspects of the present disclosure. Scheme 400 may be employed by systems 100 and/or 200. Scheme 400 configures transducer 112 to emit an unfocused ultrasound beam 420 for ultrasound imaging. As shown, a set of acoustic elements 202 is activated to produce an unfocused ultrasound beam 420. The unfocused ultrasound beam 420 comprises a plane wave or a diverging wave, with a focal point 422 located behind the transducer 112. The unfocused ultrasound beam 420 may illuminate a majority of the ROI 330 than the focused ultrasound beam 320, and thus a smaller number of transmissions are required to scan the entire ROI 330 when using the unfocused ultrasound beam 420 as compared to using the focused ultrasound beam 320.

Although the diverging wave may illuminate a larger portion of the ROI 330, image quality may be degraded due to lack of transmit focusing. One way to compensate for the loss of image quality due to unfocused imaging is to repeat transmissions or increase the number of divergent wave transmissions and coherently combine the received beams from multiple transmissions. Therefore, there is a trade-off between frame rate or acquisition time and image quality.

The use of unfocused ultrasound beams 420 may have the additional effect of having 3D imaging. 3D imaging uses a 2D transducer array, which may include a large number of acoustic elements (e.g., acoustic elements 202), e.g., on the order of thousands of acoustic elements. However, ultrasound imaging systems may typically have a limited number of system channels or receive channels (e.g., approximately 128) for conveying received ultrasound echoes received from the transducers to processor circuitry (e.g., processor circuitry 116 and/or host 130). One approach to overcome the limited number of system channels is to use a microbeamformer in which partial beamforming is performed prior to transmitting the received ultrasound echo signals to the system channels. Although a microbeamformer may provide good receive focus performance by using a focused transmit beam (e.g., beam 320), receive focus performance may be suboptimal when the transmit beam is steered away from the main axis of the transmit beam (e.g., unfocused beam 420). Additionally, in some instances, microbeamforming arrays may result in undersampled arrays, where inter-element spacing (e.g., spacing between acoustic elements 202) may exceed a grating lobe limit of λ/2, where λ represents the wavelength of the transmit beam. As a result, grating lobes may appear in the reconstructed image. The grating lobe may not overlap the focused transmit beam and therefore may not be a problem when a focused transmit beam is used. However, grating lobes can create artifacts with wider acoustic transmission (e.g., when unfocused beam 420 is used).

Accordingly, the present disclosure provides techniques that overcome the image quality and data acquisition rate problems described above. The present disclosure uses deep learning techniques for beamforming rather than conventional DAS-based beamforming. In one embodiment, the deep learning network is trained to map per-channel ultrasound echo data (e.g., ultrasound echo channel signals 162) generated from a particular aperture size to beamformed signals with a higher resolution than the aperture size can provide. In other words, the deep learning based beamforming data includes a resolution corresponding to an image generated from a larger transducer aperture size (e.g., approximately twice the aperture size of the transducer used to collect per-channel ultrasound echo data). In one embodiment, the deep learning network is trained to map ultrasound echo data per channel generated from unfocused transmit beams (e.g., unfocused ultrasound beams 420) having a number of transmit trigger events to beamforming data having a higher image quality (e.g., higher SNR, better contrast, and/or better noise contrast) than the number of transmit trigger events can provide. In other words, the deep learning based beamforming data includes image quality corresponding to images generated from a greater number of transmit trigger events. Accordingly, the present disclosure may improve image quality and/or reduce data acquisition time.

Fig. 5 is a schematic diagram of an ultrasound imaging system 500 implementing deep learning based beamforming in accordance with an embodiment of the present disclosure. The system 500 is substantially similar to the system 100, but utilizes a deep learning based beamformer 560 instead of the DAS based beamformer 114 for beamforming. The system 500 includes a signal conditioning component 510 and a deep learning based beamformer 560. The signal conditioning component 510 and the deep learning based beamformer 560 may be implemented by a combination of hardware and software. The deep learning based beamformer 560 includes a temporal alignment component 520, a normalization component 530, a deep learning network 540, and a de-normalization component 550.

Similar to systems 100 and 200, system 500 may include an array of transducers (e.g., transducer 112). The transducer array may include M number of acoustic elements (e.g., acoustic elements 202), which may be configured to transmit ultrasound energy into an anatomical structure (e.g., anatomical object 105) and receive ultrasound echoes reflected back from the anatomical structure to the transducer array. The ultrasound echoes may be received in the form of M number of channels, each channel carrying an ultrasound echo channel signal 502 (e.g., digital ultrasound echo channel signal 162). The ultrasound echo channel signal 502 may be a raw Radio Frequency (RF) channel signal. The ultrasound echo channel signals 502 may be referred to as per-channel ultrasound RF echo data.

The signal conditioning component 510 may include one or more filters configured to receive the ultrasound echo channel signals 502 and condition the received ultrasound echo channel signals 502 prior to beamforming. In an example, the signal conditioning component 510 can apply a band pass filter to the ultrasound echo channel signal 502 to remove electronic noise. The band pass filters may span all quadrature band pass filters (QBPs) used by the system 500 for subsequent frequency compounding during image reconstruction. As an example, the transducer array may generate an ultrasound beam at a center frequency of about 2.4MHz, and the ultrasound echo channel signal 502 is sampled (e.g., by an ADC, such as 204) at about 32 MHz. The ultrasound echo channel signal 502 may be decimated at approximately 8MHz to reduce subsequent computation speed requirements. Thus, the band pass filter may be centered at about 2.4MHz and may have a bandwidth between about 0MHz and about 4 MHz. In general, decimation may be performed after time alignment, as there are a greater number of samples available to make the estimation of the delayed samples more accurate.

The time alignment unit 520 is coupled to the signal conditioning unit 510. The time alignment component 520 is configured to time align the adjusted ultrasound echo channel signals 512. Time alignment component 520 may include delay elements similar to delay elements 210 and perform substantially similar time delay operations to delay elements 210 described above with respect to fig. 2.

The normalization component 530 is coupled to the time alignment component 520. The normalization component 530 is configured to normalize the signal level of the time-aligned per-channel ultrasonic echo signals 522 by scaling the signal level or amplitude of the time-aligned per-channel ultrasonic echo signals 522 by the local energy of the signals 522. The normalization component 530 performs signal level normalization from a subset of the time-aligned samples of the per-channel ultrasound echo signal 522, as described in more detail herein.

Deep learning network 540 is coupled to normalization component 530. The deep learning network 540 maps the normalized time-aligned per-channel ultrasound echo signals 532 to normalized beamforming data 542. In an example, deep learning network 540 may be a CNN network. The configuration or architecture of the deep learning network 540 and/or the training of the deep learning network 540 is described in more detail herein.

Applying the deep learning network 540 to the normalized time-aligned per-channel ultrasound echo channel signals 532 may reduce the complexity of the deep learning network 540 and improve the beamforming or beam summation prediction performance of the deep learning network. For example, performing time alignment or time delay before the deep learning network 540 may allow the deep learning network 540 to be trained to learn beamforming without having to learn time alignment. The time alignment or time delay operation has a relatively low computational complexity and can therefore be performed outside the deep learning network 540 without high computational cost. Normalization prior to the deep learning network 540 may avoid having samples with large amplitudes or signal levels dominate samples with lower amplitudes or signal levels. Thus, the deep learning network 540 may be trained to learn summation operations in beamforming rather than amplitude mapping. In this way, normalization may prevent numerical imbalances in the loss function of the deep learning network 540. The loss function is a measure of how well the deep learning network 540 performs and is used as an error measure during training, as described in more detail herein.

The de-normalization component 550 is coupled to the deep learning network 540. The de-normalization component 550 is configured to de-normalize the beamformed data 542 based on the normalization performed at the normalization component 530. In other words, the de-normalization component 550 reverses the operation of the normalization component 530, as described in more detail herein. The de-normalization component 550 generates de-normalized beamforming data 552. The beamforming data 552 may also be processed by the processor circuit 116 and/or the processor circuit 134, including for example, frequency compounding, envelope detection, log compression, and/or non-linear image filtering as described above with respect to fig. 1, to produce an image.

In accordance with embodiments of the present disclosure, deep learning network 540 is trained such that beamforming data 552 has a higher image quality or resolution than DAS-based beamforming data 230. As such, an image generated from the beamforming data 552 may have a higher image quality or resolution than an image generated from the DAS-based beamforming data 230.

Fig. 6 is a schematic diagram illustrating a normalization scheme 600 for deep learning based beamforming, according to aspects of the present disclosure. The scheme 600 is implemented by the normalization component 530 of FIG. 5. The scheme 600 applies normalization to the M channels of the time-aligned ultrasound echo channel signal 522. Each ultrasound echo channel signal 522 in the receive channel includes a plurality of time samples along the imaging depth (e.g., in the y-dimension). The time samples are shown as symbols "X" in fig. 6.

The scheme 600 divides the samples in the ultrasound echo channel signal 522 into a plurality of subsets 610 based on imaging depth. For ease of discussion and illustration, three subsets 610a, 610b, and 610c are shown, each corresponding to an imaging depth range. However, the number of subsets 610 may vary depending on the embodiment. In some examples, the imaging depth range for each subset may correspond to approximately four times the wavelength of the corresponding ultrasound transmit beam (e.g., 4 x λ).

The normalization component 530 normalizes each subset 610 by scaling the signal level or amplitude of the samples in the corresponding subset 610 based on the signal energy of the corresponding subset 610. The normalization component 530 generates a subset of samples in the normalized ultrasonic echo channel signal 532 from each subset 610. For example, subset 610a is normalized to produce subset 620a of samples in normalized ultrasonic echo channel signal 532, subset 610b is normalized to produce subset 620b of samples in normalized ultrasonic echo channel signal 532, and subset 610c is normalized to produce subset 620c of samples in normalized ultrasonic echo channel signal 532. After normalization, the normalized ultrasound echo channel signal 532 may include a signal level between approximately 1 and-1.

The deep learning network 540 is applied to the normalized ultrasound echo channel signal 532 to produce a beamformed signal 542. As an example, deep learning network 540 outputs beamformed output samples or pixels 632a for subset 610a, beamformed output samples or pixels 632b for subset 610b, and beamformed output samples or pixels 632c for subset 610 c. Pixel 632a corresponds to the center time sample 612a of subset 610 a. Pixel 632b corresponds to the center time sample 612b of subset 610 b. Pixel 632c corresponds to the center time sample 612c of subset 610 c. In an example, the subset 610a includes approximately 13 samples of each channel along the imaging depth. Sample 612a may correspond to the 7 th sample in channel (i). The temporal samples 612a and the beamformed output pixels 632a may correspond to the same pixel locations in the final image. Similarly, the temporal samples 612b and the beamformed output pixels 632b may correspond to the same pixel locations in the final image. The temporal samples 612c and the beamformed output pixels 632c may correspond to the same pixel locations in the final image.

In an embodiment, the normalization component 530 performs scaling by dividing the subset of samples by the Root Mean Square (RMS) of the signal level of the samples corresponding to the beamformed output samples or pixels. For example, normalization component 530 scales sample 612a by dividing sample 612a by the RMS of all samples in subset 610a, scales sample 612b by dividing sample 612b by the RMS of all samples in subset 610b, and scales sample 612c by dividing sample 612c by the RMS of all samples in subset 610 c. Thus, each sample 612a, 612b, or 612c is sampled with respect to signal energy in its neighborhood. Thus, normalized echo channel signal 532 may include almost samples having signal energies between about 0 and about 1.

Referring to fig. 5, for de-normalization 550, the normalized factor or RMS value for each subset 610 may be stored, and de-normalization component 550 may apply the same factor or RMS value to each corresponding beamformed pixel value 632a, 632b, and 632 c. In other words, de-normalization component 550 multiplies output 632a by the RMS value of the signal level of subset 610a, multiplies output 632b by the RMS value of the signal level of subset 610b, and multiplies output 632c by the RMS value of the signal level of subset 610 c.

Although the subsets 610 are illustrated in fig. 6 as non-overlapping, the scheme 600 may be applied to overlapping samples along the imaging depth in a sliding window fashion. As an example, the subset 610a may include K rows (e.g., row 1 to row K) of samples along the imaging depth. The second subset 610 may be formed by including samples from row 2 to row K +1 along the imaging depth. The third subset 610 may be formed by including samples from row 3 to row K +2 along the imaging depth, and so on. For each subset 610, a normalization value (e.g., RMS) is calculated from all samples in the corresponding subset, and the sample at the center of the subset (e.g., sample 612a) is divided by the normalization value. De-normalization may be performed using a similar sliding window mechanism. Thus, after applying the sliding window to normalization and de-normalization, all samples for the final beamformed data 552 are computed.

In an embodiment, the deep learning network 540 is trained to map ultrasound echo data per channel acquired from a transducer having a particular aperture size (e.g., aperture size 206) or including a particular number of acoustic elements (e.g., acoustic elements 202) to beamforming data corresponding to beamforming data obtained from a larger transducer aperture size (e.g., about twice) or greater number of acoustic elements. In other words, the beamforming data 552 predicted by the deep learning network 540 has a higher image quality (e.g., higher resolution and/or reduced clutter or artifacts) than the transducer in use may provide.

Although scheme 600 is described in the context of a 2D dataset that includes multiple channels along the x-axis and imaging depths along the y-axis, similar mechanisms may be applied to a 3D dataset that includes multiple transmit triggers or excitations along the z-axis, for example, when depth learning network 540 is trained to map per-channel ultrasound echo data acquired from a particular number of transmit triggers to beamforming data corresponding to a greater number of transmit triggers. For example, the 3D data set is divided into 3D data subsets based on imaging depth, the normalization component 530 can scale the center sample in each 3D data subset by dividing the centered sample by the RMS of all samples in the corresponding 3D subset, and the depth learning network 540 maps each 3D data subset to beamformed output samples or pixels.

It should be noted that in some other embodiments, normalization may be performed by scaling the entire set of ultrasound echo channel data (e.g., ultrasound echo channel signal 522) based on the signal energy of the set of ultrasound echo channel data rather than applying normalization based on each subset of imaging depths as in scheme 600.

Fig. 7 is a schematic diagram illustrating a configuration 700 of a deep learning network 540, in accordance with aspects of the present disclosure. Deep learning network 540 may include one or more CNNs 710. CNN710 may operate on per-channel ultrasonic channel data 702. CNN710 maps per-channel ultrasound channel data 702 to beamformed data 704. In an example, the ultrasound channel data 702 may correspond to the normalized time-aligned ultrasound echo channel signals 532 and the beamforming data 704 may correspond to the beamforming data 552 in the system 500. The CNN710 provides a per-channel pixel-based mapping of 2D data and/or 3D data to beamformed data.

CNN710 includes a set of N convolutional layers 720 followed by a set of K fully-connected layers 730, where N and K may be any positive integer. Convolutional layer 720 is shown as 720₍₁₎To 720_(N). Fully connected layer 730 is shown as 730₍₁₎To 730_(K). In an example, convolutional layer 720₍₁₎To 720_(N)And full connection layer 730730₍₁₎To 730_(K-1)A rectifying nonlinear (ReLU) activation function may be utilized. Final output layer 730_(K)A linear activation function may be utilized. Each convolutional layer 720 may include a set of filters 722 configured to extract features from the ultrasonic channel data 702. The values N and K and the size of the filter 722 in each convolutional layer 720 may vary depending on the embodiment. It should be noted that CNN710 does not include pooling layers that are typically used to reduce the size of convolutional layers. The exclusion of pooling layers allows all convolutions to contribute to the output of CNN 710. Alternatively, the CNN may include only convolutional layer 720 or only fully connected layer 730.

In an example, the ultrasound channel data 702 may include a 2D data set spanning an x-dimension corresponding to a receive channel (e.g., channel (1) through channel (M) of fig. 2 and 5) and a y-dimension corresponding to an imaging depth. CNN710 may include about five convolutional layers 720 (e.g., N-5) and about two fully-connected layers 730 (e.g., K-2). Convolutional layer 720 may include 2D convolutional kernels (e.g., filters 722) that span in the x and y dimensions. The 2D convolution kernel size may vary depending on the embodiment. In some examples, the same 2D convolution kernel size is used for all convolution layers 720. In some examples, different 2D convolution kernel sizes may be used for convolution layer 720. In some examples, the 2D convolution kernel size may depend on the ultrasound transmission configuration used to collect ultrasound channel data 702. The first winding layer 720₍₁₎The layers may include about sixty-four filters 722 or 2D convolution kernels, a second convolution layer 720₍₂₎The layers may include about thirty-two filters 722, a third convolutional layer 720₍₃₎The layers may include about sixteen filters 722, a fourth convolutional layer 720₍₄₎The layers may include about eight filters 722, and a fifth convolutional layer 720₍₅₎The layer may include about four filters 722. First full connection layer 730₍₁₎May have a size of about 32 and finally a fully connected layer 730₍₂₎May have a size of about 1. Final full interconnect layer 730₍₂₎The output at corresponds to a single beamformed output sample or pixel (e.g., beamformed output 632a, 632b, or 632 c).

In another example, the ultrasound channel data 702 may include a 3D data set spanning an x-dimension corresponding to a receive channel (e.g., channel (1) through channel (M) of fig. 2 and 5), a y-dimension corresponding to an imaging depth, and a z-dimension corresponding to a transmit trigger or transmit event. CNN710 may include about six convolutional layers 720 (e.g., N ═ 6)And about four fully connected layers 730 (e.g., K-4). Convolutional layer 720 may include 3D convolution kernels that span in the x, y, and z dimensions. The 3D convolution kernel size may vary depending on the embodiment. In some examples, the same 3D convolution kernel size is used for all convolution layers 720. In some examples, different 3D convolution kernel sizes may be used for convolution layer 720. In some examples, the 3D convolution kernel size may depend on the ultrasound transmission configuration used to collect ultrasound channel data 702. The first winding layer 720₍₁₎The layers may include about sixty-four filters 722 or 3D convolution kernels, a second convolution layer 720₍₂₎The layers may include about thirty-two filters 722, a third convolutional layer 720₍₃₎The layers may include about sixteen filters 722, a fourth convolutional layer 720₍₄₎The layers may include about eight filters 722, a fifth convolutional layer 720₍₅₎The layers may include about four filters 722 and a sixth convolutional layer 720₍₆₎A layer may include about two filters 722. First full connection layer 730₍₁₎May have a size of about 32, a second fully connected layer 730₍₂₎May have a size of about 16, a third fully connected layer 730₍₃₎May have a size of about 8 and finally a fully connected layer 730₍₄₎May have a size of about 1. Final full interconnect layer 730₍₄₎The output at (a) corresponds to a single beamformed output sample or pixel (e.g., beamformed output 632a, 632b, or 632 c).

In some examples, CNN710 may include a last convolutional layer 720_(N)To convert the convolved part of CNN710 into a 1D feature vector for the subsequent fully connected layer 730. In some examples, convolutional layer 720 may include zero padding such that the input and output sizes of convolutional or filter 722 are the same.

In some examples, CNN 170 may include a first convolutional layer 720 for normalization (e.g., including similar normalization operations as normalization component 530)₍₁₎The previous additional layers and the additional layers after the last fully connected layer 730(K) for de-normalization (e.g., including a de-normalization operation similar to the de-normalization component 550). Thus, CNN710 may be in an undefined specificationThe normalization is applied with temporal alignment of the ultrasound echo signal per channel (e.g., signal 522) and without explicit de-normalization of the output of CNN 710. In some examples, the CNN710 may be trained to perform beamforming including a normalized front layer and a de-normalized back layer for a particular ultrasound center frequency, as the division of ultrasound echo samples in normalization may depend on the ultrasound center frequency.

Fig. 8 is a schematic diagram illustrating a deep learning network training scheme 800, in accordance with aspects of the present disclosure. Scheme 800 may be implemented by a computer system, such as host 130. The scheme 800 may be used to train a deep learning network 540 for ultrasound beamforming. Scheme 800 trains deep learning network 540 to predict or simulate beamforming data obtained from transducers having larger aperture sizes than the transducer in use.

The scheme 800 trains the deep learning network 540 in two stages 810 and 820. In a first stage 810, the scheme 800 trains a deep learning network 540 using input-output pairs, where the inputs include ultrasound channel data 802 and the outputs include target beamforming data 812. The ultrasonic channel data 802 may be a normalized time-aligned ultrasonic echo channel signal similar to the normalized time-aligned ultrasonic echo channel signal 532. The ultrasound channel data 802 may be acquired from a transducer array (e.g., transducer 112) including an aperture size M (e.g., aperture size 206) or M number of acoustic elements (e.g., acoustic element 202). The ultrasound channel data 802 may correspond to an ultrasound echo response received from a particular subject (e.g., object 105). The ultrasound channel data 802 may be a 2D data set having an x-dimension corresponding to the receive channel and a y-dimension corresponding to the imaging depth. The target data 812 may correspond to beamformed data generated from the ultrasound channel data 802 using a DAS-based beamformer (e.g., beamformer 114). The target data 812 is also normalized so that training does not have to learn the amplitude mapping. During training, the deep learning network 540 may apply to the ultrasound channel data 802 using forward propagation to produce the output 804 (e.g., beamforming data 542). The coefficients of the filter 722 in the convolutional layer 720 and the weights in the fully-connected layer 730 may use back-propagation adjustments to minimize the error between the prediction or mapping output 804 and the target output 812. In some embodiments, the error function or loss function may be a Mean Square Error (MSE) function or any other suitable error metric function. In other words, the scheme 800 trains the deep learning network 540 to approximate the beamforming provided by the beamformer 114. The training or adjustment to the coefficients of filter 722 may be repeated for multiple input-output pairs. The first stage 810 serves as initialization of filter coefficients and/or weights in the deep learning network 540.

In subsequent stage 820, the scheme 800 starts with the filter coefficients and/or weights obtained from the first stage 810 for the deep learning network 540 and continues training. Scheme 800 trains deep learning network 540 using input-output pairs, where the inputs include ultrasound channel data 802 and the outputs include target beamforming data 822. The target data 822 may correspond to beamformed data of the same subject generated from a transducer having an aperture size larger than the aperture size M, e.g., a k × M aperture size or a k × M number of acoustic elements, where k is greater than 1. Similarly, the target data 822 is normalized data. In an example, the target data 812 may be generated for an aperture size including approximately 80 acoustic elements (e.g., acoustic elements 202), and the target data 822 may be generated for an aperture size including approximately 160 acoustic elements (e.g., based on Tukey apodization). Similar to the first stage 820, the deep learning network 540 is trained by applying ultrasound channel data 802 using forward propagation to produce output 806 (e.g., beamforming data 542). The coefficients of filter 722 in convolutional layer 720 and the weights in fully-connected layer 730 may be adjusted using back-propagation to minimize the error between output 806 and target output 822. The training or adjustment to the coefficients of filter 722 may be repeated for multiple input-output pairs. Although scheme 800 utilizes two training phases, in some embodiments, scheme 800 may perform the second phase of training 820 without performing the first phase of training 810.

As can be observed, the scheme 800 trains the deep learning network 540 to map the per-channel ultrasound echo signals to beamforming data corresponding to a transducer aperture size larger than the aperture size of the transducer used to collect the ultrasound echo channel signals. Thus, the deep learning network 540 may provide higher image quality (e.g., improved resolution and/or enhanced contrast) in the final reconstructed image than conventional DAS-based beamformers (e.g., beamformer 114).

Fig. 9 illustrates an ultrasound image of a pre-scan conversion generated from DAS-based beamforming and deep learning-based beamforming in accordance with aspects of the present disclosure. Ultrasound images 910 and 920 are generated from the same set of per-channel ultrasound echo signals (e.g., digital ultrasound channel echo signals 162 and 502 and ultrasound channel data 702 and 802) acquired from an intra-body scan of the patient's heart in a four-chamber view of the apex. Ultrasound image 910 is generated by beamforming the acquired per-channel ultrasound echo signals using a conventional DAS-based beamformer (e.g., beamformer 114), while ultrasound image 920 is generated by applying a deep learning network (e.g., deep learning network 540 trained using scheme 800) to map the per-channel ultrasound echo signals to beamforming data (e.g., beamforming data 542 and 704). As can be observed, ultrasound image 920 provides improved contrast and resolution without significant loss of cardiac structure (endocardium) compared to ultrasound image 910. Thus, deep learning based beamforming may provide higher image quality or resolution than conventional DAS beamforming.

Fig. 10 is a schematic diagram illustrating a deep learning network training scheme 1000 in accordance with aspects of the present disclosure. Scheme 1000 may be implemented by a computer system, such as host 130. The scheme 1000 may be used to train a deep learning network 540 or CNN710 for ultrasound beamforming. Scheme 1000 is substantially similar to scheme 800. However, scheme 1000 uses different types of input and/or target data. The scheme 1000 trains the deep learning network 540 to predict or model beamforming data obtained from a greater number of transmit excitations or events than the actual number of transmit excitations used in the acquisition.

In a first stage 1010, the scheme 1000 trains the deep learning network 540 using input-output pairs, where the inputs include ultrasound channel data 1002 and the outputs include target beamforming data 1012. The ultrasound channel data 1002 may be a normalized time-aligned ultrasound echo channel signal similar to the normalized time-aligned ultrasound echo channel signal 532. Ultrasound channel data 1002 may be acquired from T number of transmit events. For example, the transmission of the ultrasonic beam is repeated T times, and T sets of ultrasonic echo signals per channel are received. The ultrasound channel data 1002 may correspond to an ultrasound echo response received from a particular subject (e.g., object 105).

In some examples, the ultrasound beam is a focused beam (e.g., focused ultrasound transmit beam 320). In some other examples, the ultrasound beam is an unfocused beam or a divergent beam (e.g., unfocused ultrasound transmit beam 420).

In some examples, the ultrasound channel data 1002 may be a 3D data set having an x-dimension corresponding to a receive channel, a y-dimension corresponding to an imaging depth, and a z-dimension corresponding to a transmit event.

The target data 1012 may correspond to beamformed data generated from the ultrasound channel data 1002 using a DAS-based beamformer (e.g., beamformer 114). The target data 1012 is also normalized so that training does not have to learn amplitude mappings. During training, the deep learning network 540 may apply to the ultrasound channel data 1002 using forward propagation to produce the output 1004 (e.g., beamforming data 542). The coefficients of filter 722 in convolutional layer 720 and the weights in fully-connected layer 730 may use back-propagation adjustments to minimize the error between output 1004 and target output 1012. In some embodiments, the error function may be an MSE function or any other suitable error metric function. In other words, the scheme 1000 trains the deep learning network 540 to approximate the beamforming provided by the beamformer 114. The training or adjustment to the coefficients of filter 722 may be repeated for multiple input-output pairs. The first stage 1010 serves as initialization of filter coefficients and/or weights in the deep learning network 540.

In subsequent stage 1020, scheme 1000 begins with filter coefficients and/or weights obtained from first stage 1010 for deep learning network 540, and continues training. Scheme 1000 trains deep learning network 540 using input-output pairs, where the inputs include ultrasound channel data 1002 and the outputs include target beamforming data 1022. The target data 1022 may correspond to beamforming data of the same subject generated from ultrasound echo channel signals collected from a greater number of transmit events (e.g., an m x T number of transmit events or triggers). Similarly, the target data 1022 is normalized data. In an example, target data 1022 may be generated from 5 transmit events (e.g., with 5 repeated ultrasound transmissions), and target data 1022 may be generated from 51 transmit events. Similar to the first stage 1020, the deep learning network 540 is trained by applying ultrasound channel data 1002 using forward propagation to produce an output 1006 (e.g., beamforming data 542). The coefficients of filter 722 in convolutional layer 720 and the weights in fully-connected layer 730 may be adjusted using back-propagation to minimize the error between output 1006 and target output 1022. The training or adjustment to the coefficients of filter 722 may be repeated for multiple input-output pairs. Although scheme 1000 utilizes two training phases, in some embodiments, scheme 1000 may perform a second phase 1020 of training without performing a first phase 1010 of training.

As can be observed, the scheme 1000 trains the deep learning network 540 to map the ultrasound echo signals per channel to beamforming data corresponding to a greater number of transmit events. Thus, the deep learning network 540 may provide higher image quality than conventional DAS-based beamformers (e.g., beamformer 114). Additionally, when using divergent beams for unfocused imaging, the scheme 1000 may train the deep learning network 540 to compensate for artifacts caused by using divergent beams and improve final ultrasound image quality without a significant increase in acquisition time.

Fig. 11 is a schematic diagram illustrating a deep learning network training scheme 1100, in accordance with aspects of the present disclosure. Scheme 1100 may be implemented by a computer system, such as host 130. The scheme 1100 may be used to train a deep learning network 540 or CNN710 for ultrasound beamforming.

The scheme 1100 trains the deep learning network 540 in two stages 1110 and 1120. In a first stage 1110, the scheme 1100 trains a deep learning network 540 using input-output pairs, where the inputs include ultrasound channel data 1102 and the outputs include target beamforming data 1112. The ultrasound channel data 1102 may be a normalized time-aligned ultrasound echo channel signal similar to the normalized time-aligned ultrasound echo channel signal 532. The ultrasound channel data 1102 may be acquired from a patient during a clinical setting or from a phantom in a test setting. The ultrasound channel data 1102 may be a 2D data set having an x-dimension corresponding to the receive channel and a y-dimension corresponding to the imaging depth. The target data 1112 may correspond to beamformed data of the same subject generated from the ultrasound channel data 1102 using a DAS-based beamformer (e.g., beamformer 114). The target data 1112 may have a first SNR (e.g., S decibels (dB)). The target data 1112 is also normalized so that training does not have to learn the amplitude mapping. During training, the deep learning network 540 may apply to the ultrasound channel data 1102 using forward propagation to produce output 1104 (e.g., beamforming data 542). The coefficients of filter 722 in convolutional layer 720 and the weights in fully-connected layer 730 may be adjusted using back-propagation to minimize the error between output 1104 and target output 1112. In some embodiments, the error function may be an MSE function or any other suitable error metric function. In other words, the scheme 1100 trains the deep learning network 540 to approximate the beamforming provided by the beamformer 114. The training or adjustment to the coefficients of filter 722 may be repeated for multiple input-output pairs. The first stage 1110 serves as initialization of filter coefficients and/or weights in the deep learning network 540.

In subsequent stage 1120, scheme 1100 begins with filter coefficients and/or weights obtained from first stage 1110 for deep learning network 540, and continues training. Scheme 1100 trains deep learning network 540 using input-output pairs, where the inputs include ultrasound channel data 1102 and the outputs include target beamforming data 1122. The target data 1122 may correspond to beamformed data for the same subject, but with a second SNR (e.g., n × S dB, where n is greater than 1) that is higher than the first SNR. The higher SNR may be due to more advanced signal and/or imaging processing techniques, larger transducer aperture sizes, and/or the use of a larger number of transmit excitations. Similarly, the target data 1122 is normalized data. Similar to the first stage 1120, the deep learning network 540 is trained by applying ultrasound channel data 1102 using forward propagation to produce output 1106 (e.g., beamforming data 542). The coefficients of filter 722 in convolutional layer 720 and the weights in fully-connected layer 730 may be adjusted using back-propagation to minimize the error between output 1106 and target output 1122. The training or adjustment to the coefficients of filter 722 may be repeated for multiple input-output pairs. Although scheme 1100 utilizes two training phases, in some embodiments, scheme 1100 may perform a second phase 1120 of training without performing a first phase 1110 of training.

As can be observed, the scheme 1100 trains the deep learning network 540 to map the per-channel ultrasound echo signals to beamforming data corresponding to a higher SNR than beamforming data from a conventional DAS-based beamformer (e.g., the beamformer 114).

Fig. 12 illustrates ultrasound images generated from DAS-based beamforming and deep learning-based beamforming in accordance with aspects of the present disclosure. Ultrasound images 1210, 1220 and 1230 are acquired from an in vivo scan of a patient's heart. Initially, a first set of per-channel ultrasound echo signals (e.g., digital ultrasound channel echo signals 162 and 502 and ultrasound channel data 702 and 802) are collected after 5 transmit triggers of an unfocused ultrasound beam or a divergent beam (e.g., unfocused ultrasound beam 420). Subsequently, a second set of per-channel ultrasound echo signals is collected after 51 transmit triggers of an unfocused ultrasound beam or a divergent beam. A second set of per-channel ultrasound echo signals from the 51 transmit triggers are beamformed using a DAS-based beamformer (e.g., beamformer 114) to generate an ultrasound image 1210. A first set of per-channel ultrasound echo signals from the 5 transmit triggers are beamformed using a DAS beamformer to generate an image 1220. An image 1230 is generated by applying the deep learning network 540 to map the first set of per-channel ultrasound echo signals with 5 transmit triggers to beamforming data (e.g., beamforming data 542 and 704) from 51 transmit triggers.

Comparing image 1210 to image 1220, image 1210 from 51 transmit triggers provides higher image quality (e.g., better contrast to noise ratio) than image 1220 from 5 transmit triggers as expected. Comparing images 1210, 1220, and 1230, deep learning based beamforming image 1230 from 5 transmit triggers provides an image quality or resolution that is comparable to DAS based beamforming image 1210 from 51 transmit triggers. The amount of clutter or artifacts in the image 1230 generated from the deep learning based beamforming is significantly less than the image 1220 generated from the DAS based beamforming with the same number of transmit triggers. Thus, deep learning based beamforming may provide higher image quality or resolution than conventional DAS-based beamforming.

In general, the scenarios 800, 1000, and 1100 may train the deep-learning network 540 using any suitable combination of off-line generated simulation data, data acquired from a patient in a clinical setting, and data acquired from a phantom in a test setting. Given target beamforming data with high SNR, e.g., generated according to a larger aperture size, an increased number of transmissions, and/or coherent composite echo signals received from multiple transmissions, schemes 800, 1000, and 1100 may train deep learning network 540 to output beamforming data with higher SNR. Further, using actual data acquired from the ultrasound system (e.g., systems 100 and 200) as input-output data pairs instead of simulated data, the deep learning network 540 may be trained to suppress clutter from noise sources, such as acoustic noise, thermal noise, electronic noise, aberrations, and/or reverberation, that are introduced due to poor acoustic conditions and that cannot be addressed along the signal path of the ultrasound system (e.g., systems 100, 200, and/or 500).

In some embodiments, the deep learning network 540 may be trained to learn to map microbeamforming data to beamforming data rather than per-channel ultrasound echo data. As an example, a system (e.g., systems 100 and 200) may have 80 receive channels. The system may include a microbeamformer for microbeamforming. For example, the system may group four adjacent acoustic elements (e.g., acoustic element 202) together and apply a beamformer to the group of acoustic elements to focus and steer the delays to the corresponding receive channels such that the microbeamforming point is along the major axis of the transmit beam. Thus, after microbeamforming, 80 receive channels are reduced to 20 channels. The deep learning network 540 may train and apply to map the 20 microbeamforming channel signals to beamforming data (e.g., beamforming data 542) using a substantially similar mechanism as described above.

Although the error function or loss function in the schemes 800, 1000, and 1100 described above is an error or cost function between the annotation data pixel values (e.g., in the target data 812, 822, 1012, 1022, 1112, and 1122) and the deep learning network 540 predicted pixel values (e.g., in the outputs 804, 806, 1004, 1006, 1104, and 1006), the deep learning network 540 may be trained to predict other signal values at earlier stages in the signal path of the ultrasound system 100 and/or 200 (e.g., prior to beamforming).

In an example, deep learning network 540 may be trained to learn to map a transmission composite from a limited number of transmissions to an increased number of transmissions. Thus, the penalty function for the deep learning network 540 may be the difference between the annotation data transmission composite channel data and the network predicted composite channel data corresponding to the greater number of transmissions. For example, the input to the deep learning network 540 may be a 3D ultrasound echo channel data set as described above, where the x dimension may correspond to a receive channel, the y dimension may correspond to an imaging depth, and the z dimension corresponds to a transmit event (e.g., T). The deep learning network 540 may be trained to output a composite echo channel data set corresponding to m x T transmissions, where m is greater than 1. Alternatively, the 3D ultrasound echo channel data set may be converted to a 2D data set by summing the per-channel ultrasound echo signals from the T transmit events (e.g., folded in the transmit or z-dimension), and the deep learning network 540 may be trained to provide the same composite echo channel data set corresponding to the m x T transmissions.

In general, deep learning network 540 may output composite channel data or beamformed data in any suitable dimension or representation and may modify the loss function accordingly. In the example of deep learning based transmit compounding, the deep learning network 540 may be trained to fold 1D compound channel data in the transmit or z dimension and sampled at the depth or y dimension. In the example of deep learning based beamforming, the deep learning network 540 may be trained to provide 1D DAS vectors folded in the channel or x-dimension and sampled in the depth or y-dimension, or scalar values of corresponding pixel points folded in the channel or x-dimension and transmit or z-dimension and sampled in the depth or y-dimension.

Although the input data in schemes 800, 1000 and 1100 described above is a 3D matrix for each pixel, a 3D matrix of alignment data for each beam may be used as input. A full convolution architecture may operate on a larger data set using a substantially similar mechanism as described above.

Although the input data in the schemes 800, 1000, and 1100 described above is per-channel ultrasound echo data, beamforming data may be used as input. For example, input beamforming data may be generated from a limited number of transmissions and may include grating lobe artifacts. The deep learning network 540 may be trained to provide beamforming data corresponding to a greater number of transmissions and having greater image quality and resolution.

In general, aspects of the present disclosure describe the use of a machine learning network to replace one or more conventional ultrasound image processing steps, such as beamforming, that are required to generate a conventional ultrasound image. The machine learning network is applied to raw channel data obtained by the ultrasound transducer rather than one or more of the conventional image processing steps being performed on the raw channel data (e.g., multiple transmitted beamforming and/or compounding). The machine learning network is trained using a plurality of target beamforming data. Applying a machine learning network to the original channel data results in modified data. The processor generates an ultrasound image using the modified data, which includes features of the target image (e.g., anatomy, speckle, etc.). Although the disclosed embodiments are described in the context of mapping ultrasound echo channel data RF data to beamforming data using deep learning, in some embodiments similar deep learning techniques may be applied to map ultrasound echo channel data in Intermediate Frequency (IF) or baseband (BB) to beamforming data.

Fig. 13 is a schematic diagram of a processor circuit 1300 according to an embodiment of the present disclosure. The processor circuit 1300 may be implemented in the probe 110 and/or the host 130 of fig. 1. As shown, the processor circuit 1300 may include a processor 1360, a memory 1364, and a communication module 1368. These elements may be in direct or indirect communication with each other, e.g., via one or more buses.

The processor 1360 may include a CPU, DSP, ASIC, controller, FPGA, another hardware device, firmware device, or any combination thereof, configured to perform the operations described herein. The processor 1360 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In an example, the processor 1360 may correspond to the processor circuit 116 of fig. 1. In an example, the processor 1360 may correspond to the processor circuit 134 of fig. 1.

The memory 1364 may include cache memory (e.g., of the processor 1360), Random Access Memory (RAM), magnetoresistive RAM (mram), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory, solid state memory devices, hard drives, other forms of volatile and non-volatile memory, or combinations of different types of memory. In an embodiment, memory 1364 includes non-transitory computer-readable media. Memory 1364 may store instructions 1366. The instructions 1366 may include instructions that, when executed by the processor 1360, cause the processor 1360 to perform the operations described herein, e.g., aspects of fig. 2-8, 10-11, and 14, and with reference to the host 130 and/or the probe 110 (fig. 1). Instruction 1366 may also be referred to as code. The terms "instructions" and "code" should be construed broadly to include any type of computer-readable statement(s). For example, the terms "instructions" and "code" may refer to one or more programs, routines, subroutines, functions, procedures, and the like. The "instructions" and "code" may comprise a single computer-readable description or a plurality of computer-readable descriptions. In an example, the memory 1364 may correspond to the memory 138 of fig. 1.

The communication module 1368 may include any electronic and/or logic circuitry that facilitates the direct or indirect communication of data between the processor circuit 1300, the probe 110, and/or the display 132. In this aspect, the communication module 1368 may be an input/output (I/O) device. In some examples, the communication module 1368 facilitates direct or indirect communication between various elements of the processor circuit 1300 and/or the host 130 (fig. 1). In some instances, communication module 1368 may correspond to communication interface 118 (fig. 1). In some instances, the communication module 1368 may correspond to the communication interface 136 (fig. 1).

Fig. 14 is a flow diagram of a method 1400 of depth learning based ultrasound imaging in accordance with aspects of the present disclosure. The steps of the method 1400 may be performed by the systems 100, 200, and/or 500, for example, by a processor such as the processor circuit 116, 134 or the processor 1360, a processor circuit such as the processor circuit 1300, and/or other suitable components such as the probe 110 and/or the host 130. As illustrated, the method 1400 includes a number of enumerated steps, but embodiments of the method 1400 may include additional steps before, after, and between the enumerated steps. In some embodiments, one or more of the enumerated steps may be omitted or performed in a different order.

At step 1410, the method 1400 includes receiving, at a processor circuit in communication with the array of acoustic elements, ultrasound channel data corresponding to ultrasound echoes associated with the anatomical structure. The processor circuit may be similar to processor circuits 116 and 138 and processor circuit 1300. The acoustic elements may be similar to the acoustic elements 202. The ultrasonic channel data may be similar to the digital ultrasonic echo channel signals 162 and 502, the ultrasonic channel data 702, 802, 1002, and 1102.

At step 1420, the method 1400 includes normalizing the ultrasound channel data by applying a first scaling function to the ultrasound channel data based on the signal level of the ultrasound channel data, e.g., utilizing the normalization component 530 and/or scheme.

At step 1430, the method 1400 includes generating beamforming data by applying a predictive network (e.g., the deep learning network 540) to the normalized ultrasound channel data (e.g., the ultrasound echo channel signal 532).

At step 1440, the method 1400 includes de-normalizing the beamformed data by applying a second scaling function to the beamformed data based on the signal level of the ultrasound channel data, e.g., with the de-normalization component 550.

In an example, the first scaling function may include scaling the signal level of the ultrasound channel data by a first factor corresponding to a signal energy or RMS value of the ultrasound channel data. The second scaling function may include scaling the signal level of the beamformed data by an inverse of the first factor (e.g., an inverse of the signal energy or RMS value).

At step 1450, the method 1400 includes generating an image of the anatomical structure from the beamforming data.

At step 1460, method 1400 includes outputting an image of the anatomical structure to a display (e.g., display 132) in communication with the processor circuit.

In an embodiment, a time delay is applied to the normalized ultrasound channel data based on imaging depth, for example, using the time alignment component 520 to facilitate receive focusing.

In an embodiment, the ultrasound channel data includes a plurality of samples for a plurality of channels (e.g., receive channels 1-M of fig. 5 and 6). The beamformed data includes a plurality of output values (e.g., beamformed output samples or pixels 632). Normalization may include selecting a subset of the plurality of samples (e.g., subset 610a, 610b, or 610c) based on the imaging depth and scaling a first signal level of a first sample of the subset of the plurality of samples (e.g., sample 612a, 612b, or 612c) based on a second signal level (e.g., RMS) of the plurality of samples to produce a subset of normalized ultrasound channel data (e.g., subset 620a, 620b, or 620 c). The first sample corresponds to a pixel location in the image. Generating the beamforming data includes applying a prediction network to a subset of the normalized ultrasound channel data to produce a first output value of the plurality of output values in the beamforming data, wherein the first output value corresponds to a same pixel location in the image as the first sample.

In an embodiment, the array of acoustic elements includes a first aperture size (e.g., aperture size 206), and the beamforming data is associated with a second aperture size that is larger than the first aperture size. For example, the predictive network is trained using the scheme 800.

In an embodiment, the ultrasound channel data is generated from a first number of ultrasound transmission trigger events and the beamforming data is associated with a second number of ultrasound transmission trigger events that is greater than the first number of ultrasound transmission trigger events. For example, the predictive network is trained using the scheme 1000.

In an embodiment, the ultrasound channel data is associated with a first SNR and the beamforming data is associated with a second SNR greater than the first SNR. For example, the predictive network is trained using scheme 1100.

Aspects of the present disclosure may provide several benefits. For example, beamforming raw RF channel data (e.g., ultrasound echo channel signals 162 and 502) acquired from a probe (e.g., probe 110) using a deep learning network (e.g., deep learning network 540) may provide better ultrasound image quality (e.g., improved resolution, enhanced contrast, and/or reduced sidelobes, clutter, and/or artifacts) and/or reduce image acquisition time or increase imaging frame rate as compared to conventional DAS-based beamformers. Using normalized time-aligned ultrasound echo channel signals as an input to the deep learning network allows the deep learning network to be trained for beamforming or beam summing without having to learn amplitude maps and/or time delay maps, and thus reduces the complexity of the network. In addition, using a deep learning network may provide computational cost advantages over conventional DAS-based beamformers (e.g., beamformer 114) because the operations in the inference stage of the deep learning network are primarily convolution (e.g., multiply-add) and matrix multiplication.

Those skilled in the art will recognize that the apparatus, systems, and methods described above may be modified in various ways. Thus, those of ordinary skill in the art will appreciate that the embodiments encompassed by the present disclosure are not limited to the specific exemplary embodiments described above. In this regard, while illustrative embodiments have been shown and described, a wide variety of modifications, changes, and substitutions are contemplated in the foregoing disclosure. It will be appreciated that such variations may be made to the foregoing without departing from the scope of the present disclosure. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the disclosure.

32页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于基于回声的环境检测的联网声学传感器单元

Ultrasound imaging with deep learning based beamforming and associated devices, systems, and methods

相关技术

网友询问留言