Virtual teletransmission in a fixed environment

文档序号:12635 发布日期:2021-09-17 浏览:38次 中文

阅读说明:本技术 固定环境中的虚拟远距传送 (Virtual teletransmission in a fixed environment ) 是由 B·福鲁坦保尔 S·塔加迪尔施瓦帕 P·H·阮 于 2019-12-20 设计创作,主要内容包括:本文公开的技术包括一种用于从第二设备接收通信信号的第一设备,第一设备包括一个或多个处理器,其被配置为:在通信信号中接收作为被嵌入在虚拟图像中的一个或多个视觉对象的虚拟远距传送的一部分的表示虚拟图像的分组。一个或多个处理器可以被配置为:对表示虚拟图像的分组进行解码;以及在固定环境内的物理位置处输出虚拟图像。第一设备还可以包括存储器,其被配置为:存储作为被嵌入在虚拟图像中的一个或多个视觉对象的虚拟远距传送的一部分的表示虚拟图像的分组。(The technology disclosed herein includes a first device for receiving a communication signal from a second device, the first device comprising one or more processors configured to: packets representing the virtual image are received in the communication signal as part of a virtual teletransmission of one or more visual objects embedded in the virtual image. The one or more processors may be configured to: decoding a packet representing a virtual image; and outputting the virtual image at a physical location within the fixed loop. The first device may further include a memory configured to: packets representing the virtual image are stored as part of a virtual teletransmission of one or more visual objects embedded in the virtual image.)

1. A first device for receiving a communication signal from a second device, the first device comprising:

one or more processors configured to:

receiving in the communication signal packets representing a virtual image as part of a virtual teletransmission of one or more visual objects embedded in the virtual image;

decoding the packet representing the virtual image; and

outputting the virtual image at a physical location within a fixed ring environment; and

a memory configured to: storing the packets representing the virtual image as part of the virtual teletransmission of one or more visual objects embedded in the virtual image.

2. The first device of claim 1, wherein the virtual image at the physical location within a fixed ring environment is presented on a surface of a screen of a display device.

3. The first device of claim 2, further comprising the display device, wherein the display device is integrated into one of: a headset device, a windshield in the first device, a tablet device in the first device, a window in the first device, a digital rearview mirror in the first device, a table in the first device, and a mobile device in the first device.

4. The first device of claim 1, further comprising: two or more speakers configured to present audio signals.

5. The first device of claim 4, wherein the audio signal is a three-dimensional audio signal spatially located at an image plane of the display device.

6. The first device of claim 4, wherein the audio signal is a three-dimensional audio signal spatially located outside an image plane of the display device.

7. The first device of claim 4, wherein the audio signal is a three-dimensional audio signal comprising one or more audio objects spatially located where the virtual image appears to be physically located within the fixed ring.

8. The first device of claim 4, wherein the audio signal is a three-dimensional audio signal perceived as emanating from a direction in which the virtual image is transmitted remotely.

9. The first apparatus of claim 4, wherein the audio signal comprises a sound pattern during, before, or after the distant transmission of the virtual image.

10. The first device of claim 4, wherein the audio signal comprises a sound pattern, and wherein the sound pattern is a first sound pattern during the distant transmission of the virtual image, and the sound pattern is a second sound pattern before the distant transmission of the virtual image, and the sound pattern is a third sound pattern after the distant transmission of the virtual image.

11. The first device of claim 1, wherein the virtual image is used to generate a mask image of one or more objects associated with the second device, wherein the mask image is based on combining the virtual image with an in-cabin image around the physical location within which the virtual image is placed within the fixed ring environment.

12. The first device of claim 1, further comprising a heads-up display, wherein the heads-up display comprises an optical combiner and a plurality of optical components configured to display the virtual image.

13. The first device of claim 12, wherein the heads-up display is integrated in a windshield of a vehicle and the virtual image is displayed on the windshield.

14. The first device of claim 12, wherein the heads-up display is physically separated from a windshield and the virtual image is displayed in free space in a plane behind the windshield of the vehicle.

15. The first device of claim 1, further comprising a projector, wherein the projector is configured to project the virtual image.

16. The first device of claim 15, wherein the projector is raised or lowered from within a cabin of the vehicle.

17. The first device of claim 1, wherein the virtual image comprises two-dimensional avatar data or three-dimensional avatar data.

18. The first device of claim 1, further comprising a display device configured to present the one or more visual objects embedded in the virtual image as part of a virtual teletransmission.

19. A method for receiving a communication signal at a first device from a second device, the method comprising:

receiving in the communication signal packets representing a virtual image as part of a virtual teletransmission of one or more visual objects embedded in the virtual image;

storing the packets representing the virtual image as part of the virtual teletransmission of one or more visual objects embedded in the virtual image;

decoding the packet representing the virtual image; and

outputting the virtual image at a physical location within a fixed ring.

20. The method of claim 19, wherein the virtual image at the physical location within a fixed ring is presented on a surface of a screen of a display device of the first devices.

21. The method of claim 19, further comprising: an audio signal is presented.

22. The method of claim 21, wherein the audio signal is a three-dimensional audio signal and the rendering of the three-dimensional audio signal is performed with at least two or more speakers included in the first device, the three-dimensional audio signal being spatially located at an image plane of a display device in the first device.

23. The method of claim 21, wherein the audio signal is a three-dimensional audio signal, the rendering of the three-dimensional audio signal is performed with at least two or more speakers included in the first device, and the three-dimensional audio signal is spatially located outside of an image plane of a display device in the first device.

24. The method of claim 21, wherein the audio signal is a three-dimensional audio signal, the rendering of the three-dimensional audio signal is performed with at least two or more speakers included in the first device, and the three-dimensional audio signal includes one or more audio objects spatially located at positions within the fixed loop within which the virtual image appears to be physically located in the first device.

25. The method of claim 21, wherein the audio signal is a three-dimensional audio signal, the rendering of the three-dimensional audio signal is performed with at least two or more speakers included in the first device, and the three-dimensional audio signal is perceived as emanating from a direction in which the virtual image is transmitted remotely.

26. The method of claim 21, wherein the audio signal comprises a sound pattern during the distant transmission of the virtual image, before the distant transmission of the virtual image, or after the distant transmission of the virtual image.

27. The method of claim 21, wherein the audio signal comprises a sound pattern, wherein the sound pattern is a first sound pattern during the teleportation of the virtual image and the sound pattern is a second sound pattern before the teleportation of the virtual image and the sound pattern is a third sound pattern after the teleportation of the virtual image.

28. The method of claim 19, wherein the virtual image is used to generate a mask image of one or more objects associated with the second device, wherein the mask image is based on combining the virtual image with an in-cabin image around the physical location within which the virtual image is placed within the fixed ring.

29. An apparatus for receiving a communication signal at a first device from a second device, the apparatus comprising:

means for receiving in the communication signal packets representing a virtual image as part of a virtual teletransmission of one or more visual objects embedded in the virtual image;

means for storing the packets representing the virtual image as part of the virtual teletransmission of one or more visual objects embedded in the virtual image;

means for decoding the packet representing the virtual image; and

means for outputting the virtual image at a physical location within a fixed ring environment.

30. A non-transitory computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors of a first device to:

receiving in the communication signal packets representing a virtual image as part of a virtual teletransmission of one or more visual objects embedded in the virtual image;

storing the packets representing the virtual image as part of the virtual teletransmission of one or more visual objects embedded in the virtual image;

decoding the packet representing the virtual image; and

outputting the virtual image at a physical location within a fixed ring.

Technical Field

The present application relates to virtual teletransmission in a fixed environment.

Background

Wireless communication systems are widely deployed to provide various types of communication content such as voice, video, packet data, messaging, broadcast, and so on. These systems are capable of supporting communication with multiple users by sharing the available system resources (e.g., time, frequency, and power). Examples of such multiple-access systems include Code Division Multiple Access (CDMA) systems, Time Division Multiple Access (TDMA) systems, Frequency Division Multiple Access (FDMA) systems, and Orthogonal Frequency Division Multiple Access (OFDMA) systems (e.g., Long Term Evolution (LTE) systems or New Radio (NR) systems).

A wireless multiple-access communication system may include multiple base stations or access network nodes, each supporting communication for multiple communication devices (which may otherwise be referred to as User Equipment (UE)) simultaneously. Additionally, the wireless communication system may include a support network for vehicle-based communication. For example, vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications are wireless technologies that enable the exchange of data between a vehicle and its surroundings. V2V and V2I are collectively referred to as vehicle-to-everything (V2X). V2X uses a wireless communication link for fast moving objects (e.g., vehicles). The V2X communication cell V2X (C-V2X) has recently emerged to distinguish it from WLAN-based V2X.

The 5G automobile Association (5GAA) promoted C-V2X. C-V2X was originally defined in LTE release 14 and was designed to operate in several modes: (a) device-to-device (V2V); (b) a device-to-cell tower (V2I); and (c) a device-to-network (V2N). In 3GPP release 15, C-V2X includes support for both V2V and legacy cellular network based communications, and this functionality is extended to support the 5G air interface standard. The PC5 interface in C-V2X allows direct communication (via a "sidelink channel") between the vehicle and other devices without the use of a base station.

Vehicle-based communication networks may provide always-on telematics, where UEs, such as vehicular UEs (V-UEs), transmit directly to the network (V2N), pedestrian UEs (V2P), infrastructure equipment (V2I), and other V-UEs (e.g., via the network). Vehicle-based communication networks may support a safe, always-connected driving experience by providing intelligent connections, where traffic signals/timing, real-time traffic and routes, pedestrian/cyclist safety alerts, collision avoidance information, etc. are exchanged.

However, such a network supporting vehicle-based communications may also be associated with various requirements (e.g., communication requirements, security and privacy requirements, etc.). Other example requirements may include, but are not limited to, reduced latency requirements, higher reliability requirements, and the like. For example, vehicle-based communication may include transmitting sensor data that may support an autonomous automobile. Sensor data may also be used between vehicles to improve the safety of an autonomous vehicle.

V2X and C-V2X allow for a variety of applications, including those described in this disclosure.

Disclosure of Invention

In general, this disclosure describes techniques related to virtual teletransmission in a fixed environment.

In one example, the present disclosure describes a first device for receiving a communication signal from a second device, the first device comprising one or more processors configured to: receiving in the communication signal packets representing a virtual image as part of a virtual teletransmission of one or more visual objects embedded in the virtual image. The one or more processors may be configured to: decoding the packet representing the virtual image; and outputting the virtual image at a physical location within a fixed ring environment. The first device may further include a memory configured to: storing the packets representing the virtual image as part of the virtual teletransmission of one or more visual objects embedded in the virtual image.

In one example, the present disclosure describes a method for receiving a communication signal at a first device from a second device, the method comprising: receiving in the communication signal packets representing a virtual image as part of a virtual teletransmission of one or more visual objects embedded in the virtual image; and storing the packets representing the virtual image as part of the virtual teletransmission of one or more visual objects embedded in the virtual image. The method further comprises the following steps: decoding the packet representing the virtual image; and outputting the virtual image at a physical location within a fixed ring environment.

In one example, the present disclosure describes an apparatus for receiving a communication signal at a first device from a second device, the apparatus comprising: means for receiving in the communication signal packets representing a virtual image as part of a virtual teletransmission of one or more visual objects embedded in the virtual image; and means for storing the packets representing the virtual image as part of the virtual teletransmission of one or more visual objects embedded in the virtual image. The device further comprises: means for decoding the packet representing the virtual image; and means for outputting the virtual image at a physical location within a fixed ring environment.

In one example, the present disclosure describes a non-transitory computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors of a first device to: receiving in the communication signal packets representing a virtual image as part of a virtual teletransmission of one or more visual objects embedded in the virtual image; and storing the packets representing the virtual image as part of the virtual teletransmission of one or more visual objects embedded in the virtual image. The instructions, when executed, may cause one or more processors to: decoding the packet representing the virtual image; and outputting the virtual image at a physical location within a fixed ring environment.

The details of one or more examples of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of various aspects of the technology will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1a shows a conceptual diagram of a first device communicating with another device (e.g., a second device) based on detection of the selection of the other device.

Fig. 1b shows a conceptual diagram of a first device that may communicate with another device (e.g., a second device) based on detection of the selection (aided by a tracker) of the other device.

Fig. 1c shows a conceptual diagram of a head-up display (HUD) architecture.

Fig. 1d shows a conceptual diagram of projecting a person onto a passenger seat in a vehicle via a wireless connection according to the techniques described in this disclosure.

Fig. 1e illustrates a conceptual diagram of a digital display projecting a virtual rear seat occupant in a vehicle via a wireless connection according to the techniques described in this disclosure.

Fig. 1f shows a conceptual diagram of a display that overlays a passenger in the background of the display operating in accordance with the techniques described in this disclosure.

FIG. 1g illustrates a conceptual diagram of a display system in an autonomous vehicle operating in accordance with the techniques described in this disclosure.

FIG. 1h illustrates a conceptual diagram of a display system in an autonomous vehicle operating in accordance with the techniques described in this disclosure.

Fig. 2 shows a flow diagram of a process by which a first device receives packets representing virtual images as part of a virtual teletransmission of one or more visual objects embedded in a virtual image communication based on the techniques described in this disclosure.

Fig. 3a illustrates a conceptual diagram of a first vehicle with different components on or in the first vehicle operating according to the techniques described in this disclosure.

Fig. 3b illustrates a conceptual diagram of a virtual group talk experience across multiple vehicles operating in accordance with the techniques described in this disclosure.

Fig. 3c illustrates a conceptual diagram of a virtual group experience across different physical entities operating in accordance with the techniques described in this disclosure.

Fig. 4a illustrates a block diagram of a first device with different components on or in the first device operating in accordance with the techniques described in this disclosure.

Fig. 4b illustrates a block diagram of a first device with different components on or in the first device operating in accordance with the techniques described in this disclosure.

Fig. 4c illustrates a flow diagram of operations performed by a first device with different components operating in accordance with the techniques described in this disclosure on or in the first device.

FIG. 5 illustrates a conceptual diagram of a transformation of world coordinates to pixel coordinates according to the techniques described in this disclosure.

Fig. 6a shows a conceptual diagram of one embodiment of the estimation of distance and angle of a remote vehicle/passenger (e.g., a second vehicle).

FIG. 6b shows a conceptual diagram of the estimation of the distance and angle of the remote device in the x-y plane.

FIG. 6c shows a conceptual diagram of the estimation of distance and angle of the remote device in the y-z plane.

Fig. 7a illustrates an embodiment of an audio spatializer according to the technique described in this disclosure.

Fig. 7b shows an embodiment of an audio spatializer comprising a decoder for use according to the technique described in this disclosure.

Fig. 8 shows an embodiment in which the position of the person in the first vehicle and the selected (remote) vehicle may be in the same coordinate system.

Detailed Description

Some wireless communication systems may be used to communicate data associated with high reliability and low latency. One non-limiting example of such data includes C-V2X and V2X communications. For example, autonomous driving an automobile may rely on wireless communication. The autonomous vehicle may include sensors, such as light detection and ranging (LIDAR), radio detection and ranging (RADAR), cameras, etc., as line-of-sight sensors. However, C-V2X and V2X communications may include line-of-sight and non-line-of-sight wireless communications. Currently, C-V2X and V2X communications are examples of using non-line-of-sight wireless communications to handle communications between vehicles that are near a public intersection but not within line-of-sight of each other. The C-V2X and V2X communications may be used to share sensor information between vehicles. This and other communication scenarios give rise to certain considerations. For example, for a particular location or geographic area, there may be several vehicles that sense the same information, such as an obstacle or pedestrian. This causes the following problems: which vehicles should broadcast such information (e.g., sensor data), how to share such information (e.g., which channel configuration provides reduced latency and improved reliability), and so forth.

The C-V2X communication system may have logical channels and transport channels. The logical and transport channels may be used as part of uplink and downlink data transmissions between a first device (e.g., a headset or a vehicle) and a base station or another intermediate node in the network. One of ordinary skill in the art will recognize that logical channels may include different types of control channels, e.g., xBCCH, xCCH, xDCCH. The xBCCH type channel may be used when the first device is downloading broadcast system control information from another entity, such as a server or a base station. The xCCCH control channel may be used to transmit control information between a first device (e.g., a vehicle, mobile device, or headset) and a network (e.g., a node in a network base station). The xCCCH control channel may be used when the first device (e.g., vehicle, mobile device, or headset) does not have a radio resource control connection with the network. The xDCCH control channel includes control information between the first device and the network. The xDCCH control channel is used by a first device having a radio resource control connection with the network. The xDCCH is also bi-directional, i.e. control information may be sent and received by the first device and the network.

In general, some information bits conveyed in the different types of control channels mentioned above may provide an indication of the location of the data channel (or resource). Since data may span several subcarriers (depending on the amount of data transmitted), and the size of the control channel is currently fixed, this may introduce transients or gaps in time/frequency between the control channel and the corresponding data channel. This results in unused frequency/time resources of the control channel. It is possible to use unused frequency/time resources for other purposes of transferring media between vehicles or between devices. It is also possible to create new channels in the V2X or C-V2X systems, in particular for exchanging media between vehicles or between devices, such as virtual teletransmission of visual and audio objects.

Virtual teleport is used to describe a real-time transmission of a representation of a real-world object (e.g., a person) received at a first device for another device. The representation may be a visual representation (such as a video image of a real-world object located in or near another device captured in real-time by one or more cameras) or avatar data obtained from scanning the real-world object by a three-dimensional scanner in or near another device. The representation may also be audio data. The audio data may also be captured in real-time by one or more microphones in or near another device. The audio data may be processed and a location of an audio source associated with the real-world object may be determined. The audio source may be human speech, in which case the audio source is determined to be a single audio object. If there are multiple people, there may be multiple audio sources, and thus multiple audio objects. Additionally, one or more microphones located in or near another device may capture other audio sources, such as music, road noise, loud voices outside the vehicle or fixed environment. In such a case, the audio data may include the positions of a plurality of audio objects.

As mentioned above, vehicles are using many advances from other areas to improve their safety, infotainment systems, and overall user experience.

For example, object detection algorithms incorporating sensors such as RADAR, LIDAR or computer vision may be used in a vehicle to perform object detection while traveling. These objects may include road lanes, stop signs, other vehicles, or pedestrians. Some of the V2X and C-V2X use cases contemplate a collaborative V2X system to alert a vehicle or a driver of the vehicle when a possible collision may exist between the vehicle and another object (e.g., a car, a bicycle, or a person). Due to the relatively nascent nature of the V2X and C-V2X systems, many refinements have not been envisaged.

One area for refinement is communication between people when in different vehicles or between people in different stationary environments. For example, the cabin of a vehicle is a fixed environment, i.e., the structure such as the position of the seat, dashboard, etc. is mostly static. Another example of a fixed environment is at home, in an office or classroom where there may be chairs, sofas or other furniture. Although it is possible for one person in a fixed environment to communicate with another person in a different fixed environment, the communication is accomplished by making a telephone call. The originator of the phone knows which phone number to dial to communicate with another person and then dials the phone number.

The present disclosure contemplates refinement of the manner in which a device (e.g., a vehicle) allows for the reception of a communication signal from a second device (e.g., another vehicle or a headset device). The first device may include one or more processors configured to receive, in a communication signal, packets representing a virtual image as part of a virtual teletransmission of one or more visual objects embedded in the virtual image. The one or more processors may be configured to decode a packet representing a virtual image. Additionally, the one or more processors may be configured to output a virtual image at a physical location within the fixed ring.

The first device may comprise two or more speakers configured to render a three-dimensional audio signal, wherein the three-dimensional audio signal comprises one or more audio objects spatially located at a position where a virtual image of the teleported object appears to be physically located within a fixed ring environment. For example, a virtual image of a teleported object may appear to be projected at a physical location within the vehicle. Alternatively, the virtual image of the object that is teleported may appear on a retinal projector or on a display surface of a vehicle (e.g., table, windshield, display device, mirror) or on a display surface of a headset (e.g., HMD, XR, AR, VR device).

In addition, the three-dimensional audio signal may be perceived as being emitted from a direction in which the virtual image is transmitted distantly. Further, the audio signal may include a sound pattern during the distant transmission of the virtual image, before the distant transmission of the virtual image, or after the distant transmission of the virtual image. The sound pattern may comprise a tone or may be a pre-recorded sound. For example, today's cellular telephones have ring tones associated with contacts. Although the sound pattern may be a ring tone or some other sound pattern, no ring tone occurs during, before, or after the remote transmission of the virtual image.

In addition, ring tones or other sound patterns are not currently emitted from the direction in which the virtual image is transmitted remotely. For example, prior to the distant transmission of the virtual image, there may be a first sound pattern that may serve as an indication that distant transmission is imminent. The first sound pattern may be a ringtone or some other sound pattern; however, the first sound pattern may also be a sound that can be perceived as being emitted from a direction in which the virtual image is to be transmitted far away. In another example, the sound pattern need not be three-dimensional, nor does it need to be perceived as if it emanated from the direction in which the virtual image is about to be teleported, is teleported, or has just been teleported. The sound pattern may only indicate that there is an imminent teletransmission, that a teletransmission is occurring, or that a teletransmission has just occurred.

As one example, consider that there is a virtual image (which is teleported into the vehicle during teleportation), and the virtual image may visually appear to be "coming in" from the right. It is also possible that the sound pattern may also be emitted from the right side.

In addition, there may be a separate sound pattern different from the sound pattern heard before the distant transmission of the virtual image. For example, these separate sound patterns may include a second sound pattern that occurs while teleportation of the virtual image is occurring. In addition, there may be a third sound mode that occurs after the teletransmission of the virtual image occurs.

Additional techniques and background are described herein with reference to the figures.

Fig. 1a shows a conceptual diagram of a first device that may communicate with another device (e.g., a second device). The conceptual diagram also includes detecting, within the first device, a selection of another device. For example, the first device may be the first vehicle 303a capable of communicating with the second device via a V2X or C-V2X communication system. First vehicle 303a may include different components or persons 111, as shown in circle 103 above. The person 111 may be driving or, if the first vehicle 303a is autonomous, the person 111 may not be driving. Person 111 may see other vehicles traveling on the road through mirror 127 or window 132 of first vehicle 303a and wish to hear the type of music being played on a radio in the other vehicle. In some configurations of the first vehicle 303a, the camera 124 of the first vehicle 303a may assist the person 111 in seeing other vehicles, which may be challenging to see through the mirror 127 or the window 132.

The person 111 may select at least one target object that is outside the vehicle, or that is outside the headset if the person 111 is wearing the headset. The target object may be the vehicle itself, i.e. the second vehicle may be the target object. Alternatively, the target object may be another person. This selection may be the result of an image detection algorithm that may be encoded in instructions executed by a processor in the first vehicle. The image detection algorithm may be assisted by an external camera mounted on the first vehicle. The image detection algorithm may detect different types of vehicles, or may detect only faces.

Additionally or alternatively, person 111 may speak a descriptor to identify the target vehicle. For example, if the second vehicle is a black honda attic, the person may say "honda attic", "black honda attic in front of me", "attic to the left of me", etc., and a voice recognition algorithm may be encoded in instructions that are executed on a processor in the first vehicle to detect and/or identify phrases or keywords (e.g., the make and model of the car). Thus, the first device may comprise selecting at least one target object based on detecting a command signal according to keyword detection.

The processor executing the instructions for the image detection algorithm may not necessarily be the same processor executing the instructions for the speech recognition algorithm. If the processors are not identical, the processors may operate independently or in a coordinated manner, e.g., to assist in image or speech recognition by another processor. The one or more processors (which may include the same processor or different processors used in image detection or speech recognition) may be configured to detect selection of the at least one target object by the first device. That is, one or more processors may be used to detect which target object (e.g., a face or another vehicle or a headset) is selected. The selection may initiate communication with a second device (another vehicle or headset). In some cases, a communication channel between the first device and the second device may have been established. In some cases, the image detection algorithm may also incorporate aspects of image recognition, such as detecting vehicles and detecting "honda attic". For simplicity, in the present disclosure, unless explicitly stated otherwise, the image detection algorithm may include an image recognition aspect.

As mentioned above, when two people wish to communicate and speak with each other, one calls the other by dialing a telephone number. Alternatively, two devices may be wirelessly connected to each other, and each device may register an Internet Protocol (IP) address of the other device if both devices are connected to a communication network. In fig. 1a, communication between a first device and a second device may also be established by respective IP addresses of each of these devices in a V2X, C-V2X communication network or a network with the capability of directly connecting the two devices without using a base station, for example. However, unlike instant messaging, chat, or email, communication between a first device and a second device is initiated based on a selection of a target object associated with the second device or directly based on a selection of the second device itself.

For example, the person 111 in vehicle 303a may see a second vehicle 303b or a different second vehicle 303c, and may wish to initiate communication with a person in one of these vehicles based on: image detection, image recognition or voice recognition of the vehicle.

After selecting the target object, the one or more processors in the first device may be configured to initiate communication, including based on the IP address. In the case where person 111 is the driver of the first vehicle, it is not safe to initiate messaging, email or chat by hand through the conversation window. However, audio user interfaces for speaking without using the hands are becoming increasingly popular, and in the system shown in fig. 1a it is possible to initiate communication between two devices and speak to another person based on a V2X or C-V2X communication system. The vehicles may communicate using V2V communications or using the sidelink channel of C-V2X. One advantage of the C-V2X system is that the vehicles can send communication signals between the vehicles regardless of whether the vehicles are connected to a cellular network.

It is also possible that when the vehicles are wirelessly connected to the cellular network, the vehicles communicate using V2V or C-V2X communications or a sidelink channel.

It is possible to include other data in the sidelink channel. For example, audio packets received as part of a virtual teletransmission may be received via a sidelink channel. In the case where the person 111 is not driving (either because the vehicle itself is driving or because the person 111 is a passenger), it is also possible to send instant messages between the devices in the sidelink channel. The instant message may be part of a media exchange (which may include audio packets) between the first device and the second device.

Also shown in the top circle 103 is a display device 119. The display device 119 may represent an image or icon of the vehicle. The pattern 133 may light up or may blink when a communication is initiated or during a communication between the first vehicle 303a and a second vehicle (e.g., 303b or 303 c).

Further, after selecting the target object, audio packets may be received from the second device due to a communication channel between the at least one target object external to the first device and the second device. For example, the following circle 163 includes a processor 167, and the processor 167 may be configured to: decoding an audio packet received from a second device to generate an audio signal; and outputting an audio signal based on the selection of the at least one target object external to the first device. That is, what voice or music is being played in the second vehicle (or the headphone apparatus) can be heard through the playback of the speaker 169.

As explained later in this disclosure, other selection modes are possible, including gesture detection of person 111 and eye gaze detection of person 111.

Fig. 1b illustrates a conceptual diagram of a first device that may communicate with another device (e.g., a second device). The conceptual diagram also includes detecting a selection of another device within the first device assisted by the tracker.

FIG. 1b has a similar description as that associated with FIG. 1a, with the addition of other elements. For example, the top circle 104 does not show the device 119 because the device 119 is shown in the lower circle 129. The top circle 104 shows the vehicle outside of the window 132, mirror 127 and interior camera 124 (which may function as described with respect to fig. 1 a).

The lower circle 129 shows the display device 119. In addition to icons or images representing only the vehicle 133, the display device may also represent images of actual vehicles that may potentially be selected by the person 111 in the first vehicle 303 a. For example, images of the vehicle captured by one or more external cameras (e.g., 310b in fig. 3, 402 in fig. 4) are represented on the display device 119. The images of the vehicles may have bounding boxes 137a-137d that enclose each of the images of the vehicles. The bounding box may facilitate selection of a target object, e.g., one of the vehicles represented on the display device. In addition, instead of the pattern 133 between the icon and the image of the vehicle, there may be a separate pattern 149 from the perspective of the person 111 selecting the second vehicle. Thus, the bounding box 137d may show the selected second vehicle 303b, and the direction of the individual pattern 149 may light up or may also blink to indicate that communication with the second vehicle 303b has been initiated or is occurring.

Further, the processor may include a tracker 151 and a feature extractor (not shown) that may perform feature extraction on the image on the display device 119. The extracted features, alone or in some configurations in combination with RADAR/LIDAR sensors, may assist in estimating the relative position of the selected vehicle (e.g., 303 b). In other configurations, the tracker 151 may assist or operate only on input from the GPS location of the selected vehicle, which may also be transmitted to the first vehicle 303a over a communication channel in a V2X or C-V2X system.

For example, the second vehicle 303b or another second vehicle 303c may not be visible with the camera. In such a scenario, vehicles 303b and 303c may each have a GPS receiver that detects the location of each vehicle. The location of each vehicle may be received by the first device (e.g., vehicle 303a) via assisted GPS or, if permitted by the V2X or C-V2X systems, directly through the V2X or C-V2X systems. The receipt of the vehicle's location may be represented by GPS coordinates determined by one or more GPS satellites 160, either alone or in conjunction with a base station (as used, for example, in assisted GPS). The first device may calculate its own position relative to the other vehicles (vehicles 303b and 303c) based on knowing the first device's (its own) GPS coordinates via its own GPS receiver. Additionally or alternatively, the first device may calculate its own position based on the use of a RADAR sensor, a LIDAR sensor, or a camera coupled to the first device. It should be understood that the calculation may also be referred to as an estimate. Thus, the first device may estimate its own location based on a RADAR sensor, a LIDAR sensor, a camera, or receiving GPS coordinates coupled to the first device. In addition, each vehicle or device may know its own location by using assisted GPS, i.e., having a base station or other intermediate structure receive GPS coordinates and relay them to each vehicle or device.

Further, the display device 119 may represent an image of the second device in the relative position of the first device. That is, the externally facing camera 310b or 402 in cooperation with the display device 119 may represent the second device in the relative position of the first device. Thus, the display device 119 may be configured to represent the relative position of the second device. Additionally, the relative position of the second device may be represented on the display device 119 as an image of the second device.

In addition, the audio engine 155, which may be integrated into one or more processors, may process the decoded audio packets based on the relative location of the devices. The audio engine 155 may be part of an audio spatializer, which may be integrated as part of the processor, and the audio engine 155 may output the audio signal as a three-dimensional spatialized audio signal based on the relative position of the second device as represented on the display device 119.

As discussed above, the relative position may also be based on a GPS receiver, which may be coupled to the tracker 151 and may be integrated with one or more processors, and the first device may perform assisted GPS to determine the relative position of the second device. The audio engine 155 (which may be part of an audio spatializer, which may be integrated as part of the processor) may output the audio signal as a three-dimensional spatialized audio signal based on the relative position determined by the assisted GPS of the second device 161.

Further, in some configurations, the outward facing cameras 310b and 402 may capture devices or vehicles in front of or behind the first vehicle 303 a. In such a scenario, it may be desirable to hear sounds emanating from vehicles or devices behind the first vehicle 303a (or behind the person wearing the headset if it is a headset) that have a different spatial resolution than the vehicles or devices in front of the first vehicle 303 a. Thus, when the second device is located in a first position relative to the first device (e.g., in front of the first device), the three-dimensional spatialized audio signal is output at a different spatial resolution than when the second device is located in a second position relative to the second device (e.g., behind the first device).

Additionally, while the relative position of at least one target object (e.g., a second device or a second vehicle) external to the first device is being tracked, the one or more processors may be configured to: an updated estimate of the relative position of at least one target object external to the first device is received. Based on the updated estimate, a three-dimensional spatialized audio signal may be output. Thus, the first device may render the three-dimensional spatialized audio signal through the loudspeaker 157. A person in the first vehicle 303a or a person wearing headphones may hear sound received by the second device (e.g., vehicle 303c right in front of the first device) as if the audio came from the right front. If the first device is vehicle 303a, then the front right is with respect to a potential driver of vehicle 303a looking outward from window 132 as if he or she were driving vehicle 303 a. If the first device is a headset, the right front is the direct front view with respect to the person wearing the headset.

In some scenarios, it is possible for audio engine 155 to receive multiple audio streams, i.e., audio/voice packets from multiple devices or vehicles. That is, there may be multiple target objects that are selected. The plurality of target objects external to the first device may be vehicles, headphones, or a combination of headphones and vehicles. In such scenarios where multiple target objects are present, the speakers 157 may be configured to render three-dimensional spatialized audio signals based on the relative position of each of multiple vehicles (e.g., 303b and 303c) or devices (e.g., headphones). It is also possible that the audio streams are mixed into one auditory channel and heard together as if there were a multiparty conversation between at least one person in the secondary vehicles (e.g., 303b and 303 c).

In some configurations, audio/voice packets may be received from each of a plurality of vehicles in a separate communication channel. That is, the first vehicle 303a may receive audio/voice packets from the secondary vehicle 303b in one communication channel and may also receive audio/voice packets from a different secondary vehicle 303c in a different communication channel 303 c. The audio packets (for simplicity) may represent speech spoken by at least one person in each of the secondary vehicles.

In such a scenario, a passenger or a headset in the first vehicle 303a may select two target objects through the techniques presented throughout the remainder of this disclosure. For example, the person 111 in the first vehicle 303a may click in an area on the display device 119 enclosed by the bounding boxes 137a-137d to select at least two vehicles (e.g., 303b and 303c) to have multiparty communication with. Alternatively, person 111 may use speech recognition to select at least two vehicles (e.g., 303b and 303c) to have multiple parties in communication with.

In some configurations, the one or more processors may be configured to authenticate the person or each of the vehicles in the secondary vehicle to facilitate a trusted multiparty conversation between at least one of the persons in the secondary vehicles (e.g., 303b and 303c) and the person 111 in the first vehicle 303 a. Authentication may be based on voice recognition if a person would like to store the other party's voice sample on their vehicle. Other authentication methods may involve facial or image recognition of people or vehicles in a multi-party conversation.

Fig. 1c shows a conceptual diagram of a head-up display (HUD) architecture in a vehicle. The vehicle, which may be an example of a first device, may include a display device (e.g., digital display 176) for a person 111 in the vehicle to view objects remotely transmitted into the stationary environment. The display device (e.g., digital display 176) may be a heads-up display as shown in FIG. 1 c. In a fixed environment, the display device may be within a variable distance of several centimeters of the physical location where the virtual image appears to be remotely transmitted. For example, if a virtual image of a teleported object appears to be teleported onto a passenger seat of a vehicle, and there is a projector located near the passenger seat, the variable distance between the projection of the virtual image and the projector may be within 60 centimeters. For example, the projector may be located on the roof of the vehicle, and the projection may be located near the passenger seat or rear seat. The projection may also be on the windshield 182 of the vehicle. The projection surface may be within 60 centimeters of the projector's projection with the virtual image 184. For example, the vehicle may have a windshield 182 (at different distances between each windshield and the projector). Thus, for a windshield 182 in front of the driver, the distance may be within 60 centimeters. However, if there is a windshield (e.g., a rear windshield (not drawn) or a rear side windshield) closer to the rear seats, the distance between the projector and the windshield may be a large distance, e.g., within 90 centimeters. In such a case, where the cabin of the vehicle is larger than the car (e.g., a minivan or bus), the projected image may be within 120cm of the projector. In this example, the HUD is located at a physical location within the cabin of the fixed environment (i.e., the vehicle). However, the virtual image 184 may be projected outside of the cabin of the vehicle. Although technically speaking, the projection may be outside the cabin of the vehicle, the projection of the virtual image 184 is still part of the fixed environment, as the HUD, mirrors, and windshield 182 are all part of the fixed environment. The virtual image 184 may move with the vehicle. The HUD may include an optical combiner and different optical components configured to display a virtual image. Further, the HUD may be integrated into the windshield 182 of the vehicle, and the virtual image 184 may be displayed on the windshield (in an alternative configuration). In one example, the HUD may be physically separated from the windshield 182, and the virtual image 184 is displayed in free space in a plane behind the windshield 182 of the vehicle, as mentioned above. The virtual image 184 may be two-dimensional avatar data or three-dimensional avatar data. Further, the vehicle may include two or more speakers configured to present audio signals associated with the virtual image.

As shown in fig. 1c, the HUD may include an optical combiner and a display system having different optical components, such as a fold mirror 178 and an aspherical mirror 180. Additionally, the windshield 182 may be a combiner.

As previously described, the virtual image may be coupled to a two-dimensional audio signal or a three-dimensional audio signal. The two-dimensional audio signal or the three-dimensional audio signal may comprise one or more audio objects that appear to be spatially located at positions where the virtual image appears to be physically located within a fixed ring.

In addition, the three-dimensional audio signal may be perceived as being emitted from a direction in which the virtual image is transmitted distantly. For example, as further explained in fig. 4a, the driver of the first vehicle or the first device may select a target object, i.e. a real world object located at a distance and angle away from the first vehicle or the first device. When virtual teleport is imminent, the audio signal may include a sound pattern during teleport of the virtual image (i.e., prior to teleport of the virtual image) that is manifested as distance and angle from the selected target object. The sound pattern prior to teletransmission by the teletransmitted subject may comprise tones or may be a pre-recorded sound.

Fig. 1d shows a conceptual diagram of a person projected onto a passenger seat in a vehicle via a wireless connection according to the techniques described in this disclosure. In various embodiments, the vehicle may include a projector. As shown in fig. 1d, the projector may be configured to project a virtual image. The projected virtual image may be projected onto a semi-transparent projection screen or display 186. In another embodiment, projector illumination may be utilized. In fig. 1d, the virtual image is an image of a person who appears to be a passenger. The virtual passenger may be a different passenger or driver in another vehicle or other fixed environment (e.g., school, office, or home). Projection screen or display 186 may be raised and lowered from within the vehicle cabin. The virtual image (i.e., the virtual passenger) may include two-dimensional avatar data or three-dimensional avatar data. When the virtual passenger speaks, it sounds as if the passenger is in the passenger seat where the virtual image is located. That is, the virtual passenger may be coupled to a two-dimensional audio signal or a three-dimensional audio signal. The two-dimensional audio signal or the three-dimensional audio signal may comprise one or more audio objects (e.g. human sounds) spatially located at positions where the virtual image appears to be physically located within the fixed ring. In some systems, the projector or projector screen or display 186 may block sound waves or, for some other technical limitation, may not cause sound to emanate from a location where the virtual image appears to be physically located within the fixed ring environment. Thus, to overcome technical limitations, in different embodiments, one or more audio objects may be spatially located at a position that is different from the position at which the virtual image appears to be physically located within the fixed ring environment.

Further, the three-dimensional audio signal may be perceived as being emitted from a direction in which the virtual image is transmitted distantly. Further, the audio signal may include a sound pattern during the distant transmission of the virtual image, before the distant transmission of the virtual image, or after the distant transmission of the virtual image. The sound pattern prior to teletransmission by the teletransmitted subject may comprise tones or may be a pre-recorded sound. Fig. 1e illustrates a conceptual diagram of a digital display projecting a virtual rear seat occupant in a vehicle via a wireless connection according to the techniques described in this disclosure. In a vehicle, there may be a digital rear view mirror (e.g., digital display 189 acting as a mirror). The rear view mirror may be configured to display a virtual image 187 (e.g., of a virtual rear seat occupant). Additionally, the vehicle may include two or more speakers configured to present a three-dimensional audio signal spatially located at an image plane of the digital display 189 (e.g., the rear view mirror 189). The image plane of the digital rearview mirror can include a reflection of the virtual image 187.

When virtual teleport is imminent, the audio signal may include a sound pattern during teleport of the virtual image (i.e., prior to teleport of the virtual image) that is manifested as distance and angle from the selected target object. The sound pattern prior to teletransmission by the teletransmitted subject may comprise tones or may be a pre-recorded sound.

Furthermore, after the virtual teletransmission of the virtual image, i.e. once the virtual image of the passenger is located in a fixed environment (e.g. the rear seat of the vehicle), for example, the voice of the virtual passenger may appear to emanate from the rear seat. The virtual rear seat passengers may be perceived to sound as if they were on the rear seats of the vehicle even though the real world target object was in another location and the sound pattern appeared to be coming from that direction before the virtual teleport. Additionally, in various embodiments, even if a virtual rear seat passenger visually appears on the rear seat via the digital display 189, their voice may appear to sound as if it were emanating from the location where the digital display 189 is located.

The virtual images may be different passengers or drivers in another vehicle or other fixed environment (e.g., school, office, or home). The virtual image (i.e., the virtual passenger) may include two-dimensional avatar data or three-dimensional avatar data. When the virtual passenger speaks, it sounds as if the passenger is in the passenger seat where the virtual image is located. That is, the virtual passenger may be coupled to a two-dimensional audio signal or a three-dimensional audio signal. The two-dimensional audio signal or the three-dimensional audio signal may comprise one or more audio objects (e.g. human speech) spatially located at positions where the virtual image is perceived to be physically located within a fixed ring of objects.

A different passenger or driver in another vehicle or fixed environment may have a camera 188 (e.g., similar to the camera shown in fig. 1e, but in another vehicle, not the camera 188 shown in the first vehicle) to capture video images in real-time. Similarly, another vehicle or other fixed environment may have a three-dimensional scanner (not shown) in or near another device to capture avatar data in real-time that may be received by the first device. The receipt of real-time avatar data or video images is further discussed with respect to fig. 4b and 4 c. Additionally, there may be one or more microphones in or near another device that capture real-time audio of the virtual passenger. The reception of real-time audio is further discussed with respect to other figures, including at least fig. 4a, 4b and 4 c.

Fig. 1f shows a conceptual diagram of a display that overlays a passenger 192 in the background of the display, operating in accordance with the techniques described in this disclosure. It is also possible to see the projection of the virtual image on the digital displays 191a, 191b when the driver or another passenger wears the headphone device 195. For example, the headphone device 195 may be Head Mounted Display (HMD), Augmented Reality (AR), mixed reality (XR), Virtual Reality (VR) glasses. The projected passenger 192 may be projected onto a digital display 191a integrated in the headphone device 195. In alternative embodiments, the digital display 191b may be part of a surface within the vehicle (e.g., a table, a window), or may be part of another mobile device (e.g., a tablet device, a smartphone, or a standalone display device). The passenger projected on display 192 may be projected onto digital display 191 b. For example, using augmented reality technology, the projection of the passenger 192 on the display 191b may be assisted by one or more cameras 193 coupled to a headphone device 195. A virtual image of a teletransmitted subject may appear to be physically located within a fixed ring. For example, a virtual image of a teleported object may appear to be projected at a physical location within the vehicle (such as a table) or other surface in the vehicle (such as display 191 a). Alternatively, the remotely transmitted virtual image of the object may appear on a display surface of the headphone device 195 (e.g., HMD, XR, AR, VR device). In such a case, with the display surface located on the headphone device 195, the virtual image of the object that is teleported may be within less than 2 centimeters of the location where the projection device is located.

The virtual images may be different passengers or drivers in another vehicle or other fixed environment (e.g., school, office, or home). The virtual image (i.e., the virtual passenger) may include two-dimensional avatar data or three-dimensional avatar data. When the virtual passenger speaks, it sounds like the virtual passenger 192 is projected on the digital display 191a (i.e., on-screen orientation) of the headphone device 195, or on the digital display 191b as viewed through a camera 193 coupled to the headphone device 195. That is, the virtual passenger 192 may be coupled to a two-dimensional audio signal or a three-dimensional audio signal. The two-dimensional audio signal or the three-dimensional audio signal may comprise one or more audio objects (e.g. human voices) spatially located at a position where the virtual image appears to be oriented with respect to the position of the digital display 191a on the headphone device 195 or the screen of the digital display 191b coupled to the headphone device 195. Speakers generating two-dimensional or three-dimensional audio signals may be mounted and integrated into the headphone apparatus 195.

In addition, the three-dimensional audio signal may be perceived as being emitted from a direction in which the virtual image is transmitted distantly. Further, the audio signal may include a sound pattern during the distant transmission of the virtual image, before the distant transmission of the virtual image, or after the distant transmission of the virtual image.

For example, when virtual teleport is imminent, the audio signal may include a sound pattern during teleport of the virtual image (i.e., prior to teleport of the virtual image) that is manifested as distance and angle from the selected target object. The sound pattern prior to teletransmission by the teletransmitted subject may comprise tones or may be a pre-recorded sound.

FIG. 1g illustrates a conceptual diagram of a display system in an autonomous vehicle operating in accordance with the techniques described in this disclosure.

The cabin of the autonomous vehicle 50 may include a display device and a user interface unit 56. The display device may represent any type of passive reflective screen on which an image may be projected, or an active reflective, emissive or transmissive display capable of projecting an image, such as a Light Emitting Diode (LED) display, an organic LED (oled) display, a Liquid Crystal Display (LCD), or any other type of active display.

As shown, the display device may be integrated into a window 52 of a vehicle. Although shown as including a single display device (e.g., a single window), the autonomous vehicle may include multiple displays that may be positioned throughout the cabin of the autonomous vehicle 50.

In some examples, a passive version of the display device or some type of active version of the display device (e.g., an OLED display) may be integrated into a seat, a table, a headliner, a floor, a window (or into an interior wall in a vehicle with no or few windows), or other aspects of a cabin of an autonomous vehicle.

To determine the location of the projected virtual passenger 40 within the cabin of the autonomous vehicle 50, there may be a preconfigured cabin context that defines the geometry of the cabin of the autonomous vehicle 50 and specifies the location of the display device to be projected. For example, as shown, the display device may be integrated into a vehicle window 52. However, the display device may be integrated into the seats 54A-54D, the user interface unit 56, the dashboard 58, the console 60, the cabin floor 62, or as part of the overhead projector 64. There may be a camera 66 coupled to the overhead projector 64 that may assist in identifying the location of a person within the cabin of the autonomous vehicle to assist one or more display surfaces may include one or more virtual passengers 40 to be projected thereon.

The speakers may be located inside a cabin of the vehicle, the speakers configured to present a three-dimensional audio signal such that an occupant in the cabin of the autonomous vehicle 50 may perceive sound as if emanating at a location where the virtual visual object appears to be physically located (e.g., projected on the display device). For example, the speakers may be configured to render a three-dimensional audio signal to include one or more audio objects that are spatially located at a position where a virtual image (i.e., a virtual human) is perceived to be physically located within a fixed ring environment. In this embodiment, the stationary environment is a cabin of the autonomous vehicle 50. The avatar may be represented as two-dimensional avatar data or three-dimensional avatar data.

Further, the audio signal may include a sound pattern during the distant transmission of the virtual image, before the distant transmission of the virtual image, or after the distant transmission of the virtual image.

For example, when virtual teleport is imminent, the audio signal may include a sound pattern during teleport of the virtual image (i.e., prior to teleport of the virtual image) that is manifested as distance and angle from the selected target object. The sound pattern prior to teletransmission by the teletransmitted subject may comprise tones or may be a pre-recorded sound.

FIG. 1h illustrates a conceptual diagram of a display system in an autonomous vehicle operating in accordance with the techniques described in this disclosure.

The display device may also include a projector 64 or other image projection device capable of projecting or otherwise rendering the image 40 on a passive display. In some embodiments, projector 64 may create a 3D hologram or other 3D view within the cabin of autonomous vehicle 50. Additionally, although not explicitly shown in fig. 1h, the display device may also represent a display in wired or wireless communication with one or more processors within the cabin of the autonomous vehicle 50. For example, a mobile device or other device may be present within an autonomous vehicle. The display of the mobile device or other device may, for example, represent a computing device, such as a laptop computer, a heads-up display, a head-mounted display, an augmented reality computing device or display (such as "smart glasses"), a virtual reality computing device or display, a mobile phone (including so-called "smart phones"), a tablet computer, a gaming system, or another type of computing device capable of acting as an extension to or in place of a display integrated into an autonomous vehicle.

User interface unit 56 may represent any type of physical or virtual interface with which a user may interface to control various functions of an autonomous vehicle. The user interface element 56 may include physical buttons, knobs, sliders, or other physical control means. The user interface unit 56 may also include a virtual interface where an occupant of the autonomous vehicle interacts with virtual buttons, knobs, sliders, or other virtual interface elements via a touch screen (as one example) or via a contactless interface. The occupant may interface with the user interface unit 56 to control one or more of: climate functions within the cabin of autonomous vehicle 50, audio playback of speakers within the cabin of autonomous vehicle 50, video playback on a display device within the cabin of autonomous vehicle 50, transmission through a user interface unit 56 in the cabin of autonomous vehicle 50 (such as cellular telephone calls, video conference calls, and/or web conference calls), or any other operation that the autonomous vehicle is capable of performing in some embodiments.

The speakers may be located inside a cabin of the vehicle, the speakers configured to present a three-dimensional audio signal such that an occupant in the cabin of the autonomous vehicle 50 may perceive sound as if emanating at a location where the virtual visual object appears to be physically located (e.g., projected on the display device). For example, the speakers may be configured to render a three-dimensional audio signal to include one or more audio objects that are spatially located at a position where a virtual image (i.e., a virtual human) is perceived to be physically located within a fixed ring environment. In this embodiment, the stationary environment is a cabin of the autonomous vehicle 50. The avatar may be represented as two-dimensional avatar data or three-dimensional avatar data.

Further, the audio signal may include a sound pattern during the distant transmission of the virtual image, before the distant transmission of the virtual image, or after the distant transmission of the virtual image.

For example, when virtual teleport is imminent, the audio signal may include a sound pattern during teleport of the virtual image (i.e., prior to teleport of the virtual image) that is manifested as distance and angle from the selected target object. The sound pattern prior to teletransmission by the teletransmitted subject may comprise tones or may be a pre-recorded sound.

One of ordinary skill in the art will recognize from the various examples discussed above that the virtual image that is teleported to the first device may be presented on the surface of the display screen of the display device. Further, one of ordinary skill in the art will recognize from the various examples discussed above that the display device may be integrated into an earpiece device (e.g., HMD, XR, VR, AR, etc.). Additionally, the display device may be integrated into a windshield or a window in the first device. Further, the display device may be integrated into a desk in the first device, a tablet computer, or another mobile device in the first device. Also as discussed, the display device may be integrated into the rear view mirror in the first device. An earpiece device (HMD, XR, VR, AR device) or other device (tablet, digital rearview mirror) may present a virtual image that is remotely transmitted to the first device on the surface of the display screen of one of these display devices. In addition, the surface (windshield, window, table) may also have a projection or presentation of a virtual image that is teleported to the first device.

Additionally, the first device may include two or more speakers configured to present a three-dimensional audio signal spatially located at an image plane of the display device. For example, there may be a virtual image that appears to have sound emanating from a person's mouth, and the virtual image is in the same plane as the sound. In various embodiments, the headset device may present virtual images on a display device integrated into the headset device. The virtual image has an image plane on the surface of the display device. However, the virtual image may appear as if it is some distance from the display device. In such a case, the two or more speakers may be configured to present a three-dimensional audio signal that is spatially located outside the image plane of the display device, i.e., the sound appears to be within a certain distance, e.g., 2-120 cm, apart (according to various examples of locations where the virtual image may appear to be located in a fixed environment).

Fig. 2 shows a flow diagram of a process by which a first device receives packets representing virtual images as part of a virtual teletransmission of one or more visual objects embedded in a virtual image communication based on the techniques described in this disclosure. A first device for receiving communication signals from a second device may include one or more processors configured to: packets representing the virtual image are received in the communication signal 210 as part of a virtual teletransmission of one or more visual objects embedded in the virtual image. Packets representing the virtual image are stored 215 as part of a virtual teletransmission of one or more visual objects embedded in the virtual image. The one or more processors may decode 220 a packet representing a virtual image and output 230 the virtual image at a physical location within a fixed ring environment.

FIG. 3a illustrates a conceptual diagram of a first vehicle with different components on or in the first vehicle operating according to the techniques described in this disclosure. As shown in fig. 3a, person 111 may be moving in vehicle 303 a. The selection of the target object outside of the vehicle 303a may be directly within the driver's field of view, which may be captured by an eye gaze tracker (i.e., the person 111 is looking at the target object) or a gesture detector (the person 111 gestures, e.g., points at the target object) coupled to the camera 310a within the vehicle 303 a.

The first device may comprise a selection of at least one target object based on detecting a command signal according to eye gaze detection. The at least one target object may comprise a real world object, for example, a passenger in a vehicle. In addition, the at least one target object may be a person wearing another headset device (e.g., a second headset device), or the at least one target object may be the second headset device. A person wearing the second headphone device or in the selected vehicle may be virtually teleported to be seen in the first headphone device or the first vehicle.

The camera 310b installed on the vehicle 303a may also assist in selecting the target object itself (e.g., vehicle 303b) or another device associated with the target object if the target object is a person outside of the vehicle 303a, or if there is some other recognizable image associated with the vehicle 303 b.

Through a Wireless Local Area Network (WLAN), which may be part of a cellular network, such as C-V2X, or a coexistence of a cellular network and a Wi-Fi network, or a Wi-Fi network only, or a V2X network, one or more antennas 356, optionally coupled with the depth sensor 340, may assist in determining the relative position of the target object with respect to the vehicle 303 a.

It should be noted that, depending on the available bandwidth, the camera 310a mounted within the vehicle 303a, or the camera 310b mounted on the vehicle 303a, or both cameras 310a, 310b may form a Personal Area Network (PAN) that is part of the vehicle 303a, via one or more antennas 356. Through the PAN, it is possible for the camera 310a in the vehicle 303a or the camera 310b on the vehicle 303a to have an indirect wireless connection with the device associated with the target object or the target object itself. Although the exterior camera 310b is shown near the front of the vehicle 303a, it is possible for the vehicle 303a to have one or more exterior cameras 310b mounted near or behind the rear of the vehicle 303a to see which devices or vehicles are behind the vehicle 303 a. For example, the second device may be the vehicle 303 c.

The external camera 310b may assist in the selection, or as explained previously and below, the GPS may also assist in locating a location where a second device, such as the second vehicle 303c, is located.

The relative position of the second device may be represented on the display device 319. The relative position of the second device may be based on receiving the position through one or more antennas 356. In another embodiment, a depth sensor 340 may be used to assist or determine the location of the second device. It is also possible to determine the relative position of the second device using other position detection techniques (e.g., GPS) or assisted GPS that detect the position of the second device.

The representation of the relative location of the second device may appear as a composite image, icon, or other representation associated with the second device such that the person in the vehicle 303a may make a selection of the second device by looking toward the eyes of the representation on the display device 319 or by a gesture (pointing or touching) toward the representation on the display device 319.

The selection may also be by voice recognition and use of one or more microphones 360 located within the vehicle 303 a. When the second device is in communication with the vehicle 3030a, the audio signals may be received by the (first) vehicle 303a through a transceiver coupled to one or more antennas 356 mounted in or on the vehicle 303 a.

It is possible for the driver or passenger of the first device to select the rear vehicle 303a or the front vehicle 303b and to establish a communication between the first device and any of these vehicles to initiate a virtual teletransmission of one or more visual objects embedded in the teletransmitted virtual image.

Those of ordinary skill in the art will also appreciate that as autonomous vehicles continue to advance, the driver of vehicle 303a may not actually manually command (i.e., "drive") vehicle 303 a. Rather, the vehicle 303a may be autonomous for some portion of the time.

Fig. 3b illustrates a conceptual diagram of a virtual group talk experience across multiple vehicles operating in accordance with the techniques described in this disclosure.

For example, at location 4(354), the current vehicle may include a driver sitting in front of a digital display wearing a headset device operating in accordance with the techniques described in this disclosure. The projection or example of one or more passengers from the vehicle at location 1(351) may be displayed on the digital display of the headset device at location 4(354), or alternatively projected on or in front of the windshield in the vehicle at location 4(354) using the HUD technique described above (without the headset device). Additionally, other virtual passengers from other locations (e.g., locations 2 and 3, where different vehicles are shown) are at locations 2 and 3. These different vehicles have the same capabilities as follows: the real passenger is captured by using cameras within the vehicles at locations 2 and 3, or by sending a personalized avatar and teleporting it to the current vehicle at location 4 (354).

The virtual images of the virtually teletransmitted passenger 360 may each include two-dimensional avatar data or three-dimensional avatar data. When a virtual passenger of virtual passengers 360 speaks, it sounds as if the passenger is in the passenger seat or on a display device within the vehicle (where the virtual image appears to be physically located on the display device). That is, the virtual passenger may be coupled to a two-dimensional audio signal or a three-dimensional audio signal. The two-dimensional audio signal or the three-dimensional audio signal may comprise one or more audio objects (e.g. human speech) spatially located at positions where the virtual image is perceived to be physically located within a fixed ring of objects. The speakers generating the two-dimensional or three-dimensional audio signals may be located in the current vehicle at location 4(354), or may be mounted and integrated into a headphone device located within the vehicle of the current vehicle at location 4.

In addition, the three-dimensional audio signal may be perceived as being emitted from a direction in which the virtual image is transmitted distantly. Further, the audio signal may include a sound pattern during the distant transmission of the virtual image, before the distant transmission of the virtual image, or after the distant transmission of the virtual image. The sound pattern may include tones or may be a pre-recorded sound.

For example, when virtual teleport is imminent, the audio signal may include a sound pattern during teleport of the virtual image (i.e., prior to teleport of the virtual image) that is manifested as distance and angle from the selected target object. The sound pattern prior to teletransmission by the teletransmitted subject may comprise tones or may be a pre-recorded sound.

Fig. 3c illustrates a conceptual diagram of a virtual group experience across different physical entities operating in accordance with the techniques described in this disclosure. Similar to fig. 3b, fig. 3c depicts a fixed environment at a different building.

For example, at location 8(378), current buildings may include a person sitting in front of a digital display wearing a headphone device operating in accordance with the techniques described in this disclosure. The projection of one or more persons from the building at location 5(375) may be displayed on a digital display of a headphone device, or alternatively projected onto a projection screen, mirror, or some digital display device (without the need for a headphone device). In addition, other avatars from other locations (e.g., locations 6 and 7, where different buildings are shown) are at locations 6 and 7. These different buildings have the same capabilities as follows: capturing real world objects (e.g., people) by using cameras within the building, or by sending personalized avatars scanned by three-dimensional scanners in the building and teleporting the avatars or videos of one or more people to the current building at location 8 (378).

The virtual images of the virtually teletransmitted person may each include two-dimensional avatar data or three-dimensional avatar data. When the virtual passenger speaks, it sounds as if the passenger is in the passenger seat where the virtual image is located. That is, the virtual passenger may be coupled to a two-dimensional audio signal or a three-dimensional audio signal. The two-dimensional audio signal or the three-dimensional audio signal may comprise one or more audio objects (e.g. human speech) spatially located at positions where the virtual image appears to be physically located within the fixed ring. The loudspeakers generating the two-dimensional or three-dimensional audio signal may be located in the current building at the location 5 or may be mounted and integrated into the headphone device.

In addition, the three-dimensional audio signal may be perceived as being emitted from a direction in which the virtual image is transmitted distantly. Further, the audio signal may include a sound pattern during the distant transmission of the virtual image, before the distant transmission of the virtual image, or after the distant transmission of the virtual image. For example, when virtual teleport is imminent, the audio signal may include a sound pattern during teleport of the virtual image (i.e., prior to teleport of the virtual image) that is manifested as a distance 420a and an angle 420b from the selected target object (see fig. 4b, 4 c). The sound pattern prior to teletransmission by the teletransmitted subject may comprise tones or may be a pre-recorded sound.

Fig. 4a shows a block diagram 400a of a first device with different components on or in the first device operating in accordance with the techniques described in this disclosure. One or more of the different components may be integrated in one or more processors of the first device.

As shown in fig. 4a, the vehicle may include a user interface unit 56. Previously, the user interface unit 56 was described in association with the cabin of an autonomous vehicle. However, the user interface unit 56 may also be in a non-autonomous vehicle. The user interface unit 56 may include one or more of the following: a voice command detector 408, a gaze tracker 404, or a gesture detector 406. The user interface unit 56 may assist in selecting a target object external to the first device.

Based on a driver or other passenger selection target object in the first device (i.e., the first vehicle), selecting a target object external to the device may assist in transporting the virtual passenger. The target object may be a person wearing headphones, or if the person is in a second vehicle, the target object may be the second vehicle. The person may be remotely transmitted as a "virtual passenger" into the first vehicle. In fig. 4a, a driver or other person in a first vehicle may command a nearby second device with which to initiate communication. The communication may be for listening to a virtual passenger based on the direction and angle of the second device, or may be remotely transmitted into the first vehicle. The components in the user interface 56 may be integrated into one or more of the processors, or may be separately integrated into one or more of the processors in other configurations (as shown in FIG. 4 a). That is, one or more of these components (voice command detector 408, gaze tracker 404, or gesture detector 406) may all be integrated into one processor, or may each be located in a separate processor, or a subset of these components may be integrated into a different processor.

For example, the selection of the target object outside the first device may be based on an eye gaze tracker 404 that detects and tracks where the wearer of the headset is looking or where the person 111 in the first vehicle is looking. When the target object is within the person's field of view, the eye gaze tracker 404 may detect and track the eye gaze and assist in selecting the target object via the target object selector 414. Similarly, a gesture detector 406 coupled to one or more interior facing cameras 403 within vehicle 303a or mounted on headphones (not shown) may detect a gesture, e.g., pointing in the direction of a target object. In addition, voice command detector 408 may assist in selecting a target object based on person 111 speaking a phrase (e.g., "black honda attic in front of me"), as described above. The output of the voice command detector 408 may be used by the target object selector 414 to select a desired second device, such as the vehicle 303b or 303 c.

As previously mentioned, it is possible for the vehicle 303a to have one or more externally facing cameras 402 mounted near or behind the rear of the vehicle 303a in order to see which devices or vehicles are behind the vehicle 303 a. For example, the second device may be the vehicle 303 c.

A target object (e.g., a second device) may be represented relative to a first device based on features of an image, the image, or both the image and the features of the image, where the image is captured by one or more cameras coupled to the first device.

The exterior 402 may assist in selecting a location where the second vehicle 303c is located, for example, behind the vehicle 303a (in other figures).

It is also possible that the relative location of the second device may be represented on the display device 410 based on one or more transmitter antennas 425 and possibly based on the depth sensor 340 (not shown in fig. 4a) or other location detection technology (e.g., GPS) that detects the location of the second device. The representation of the relative location of the second device may appear as a composite image, icon, or other representation associated with the second device such that a person in vehicle 303a may make a selection of the second device by looking toward the eyes of the representation on display device 410 or by a gesture (pointing or touching) toward the representation on display device 410.

If the selection of the remote device (i.e., the second device) is touch-based, a display device including a representation of at least one target object for the external device (i.e., the first device) may be configured to select the at least one target object external to the device based on a capacitive sensor or an ultrasonic sensor on the display device changing state.

The first device may include one or more transmitter antennas 425 coupled to one or more processors. The one or more processors of the first device may be configured to: the communication data is sent to a second device associated with at least one target object external to the first device based on initiating, by the one or more processors, a communication channel between the first device and the second device. That is, after selecting the second device, the one or more processors may initiate protocol or other forms of communication between the first device and the second device in a communication channel between the first device and the second device using C-V2X and/or V2X communications.

The selection may also be by voice recognition and use of one or more microphones (not shown in fig. 4a) located in the vehicle 303 a. When the second device is in communication with the vehicle 3030a, the audio signals may be received by the (first) vehicle 303a through one or more receiver antennas 430 mounted in or on the vehicle 303a that are coupled to a transceiver (e.g., a modem capable of V2X or C-V2X communication). That is, the one or more receive antennas 430 coupled to the one or more processors may be configured to receive the audio packets based on a result of an initiation of a communication channel between at least one target object (e.g., a second device) external to the first device and the first device.

Additionally, the first device may include one or more externally facing cameras 402. The externally facing camera 402 may be mounted on the vehicle 303a, or may assist in selecting the target object itself (e.g., vehicle 303b) or another device associated with the target object (if the target object is a person outside of the vehicle 303a or there is another recognizable image associated with the vehicle 303 b). The one or more externally facing cameras may be coupled to one or more processors including a feature extractor (not shown) that may perform feature extraction on the image on the display device 410. The extracted features, alone or in some configurations in combination with external sensors 422 (e.g., RADAR/LIDAR sensors), may assist in estimating the relative position of the second device (e.g., the selected vehicle 303 b).

The extracted features or the output of the external sensor 422 may be input to the relative position/orientation determiner 420 of the selected target object. The relative position/orientation determiner of the selected target object 420 may be integrated into one or more processors and may be part of the tracker, or may be separately integrated into one or more processors in other configurations (as shown in fig. 4 a). In fig. 4a, the tracker 151 is not shown.

The distance 420a and angle 420b may be provided by the relative position/orientation determiner 420 of the selected target object. The distance 420a and the angle 420b may be used by the audio spatializer 420 for outputting a three-dimensional audio signal based on the relative position of the second device. There may be at least two speakers 440 coupled to the one or more processors that are configured to present the three-dimensional spatialized audio signal based on the relative position of the second device, or if there are multiple second devices (e.g., multiple vehicles), the three-dimensional spatialized audio signal may be presented as described above.

After target object selector 414 performs selection of at least one target object external to the first device, command interpreter 416, which is integrated into one or more of the processors in the first device, conducts a communication channel between the first device and a second device associated with the at least one target object external to the first device. In response to selection of at least one target object external to the first device, an audio packet may be received from the second device.

The audio packet 432a from the second device may be decoded by the codec 438 to generate an audio signal. The audio signal may be output based on a selection of at least one target object external to the first device. In some scenarios, the audio packets may represent a stream from a cloud associated with the remote device (i.e., secondary device) 436 a. The codec 438 may decompress the audio packets and the audio spatializer may operate on the uncompressed audio packets 432b or 436 b. In other scenarios, the audio may be spatialized based on the passenger position of the person making the selection of the secondary vehicle.

The codec may be integrated into one or more of the processors along with another component (e.g., audio spatializer 424) shown in fig. 4a, or may be separately integrated into a separate processor in other configurations.

Thus, the driver in the first device may select which vehicle is intended for delivery of the virtual passenger (e.g., the second device) therefrom. Although it is possible to communicate with another passenger in the second device (e.g., vehicle) without transmitting a virtual representation of another passenger in the second vehicle, a driver of the first device may also initiate transmission of a virtual passenger, as described in more detail in fig. 4b and 4 c.

The transmission of audio packets to be used by the audio codec may include one or more of the following: MPEG-2/AAC stereo, MPEG-4BSAC stereo, Real Audio, SBC Bluetooth, WMA, and WMA 10 Pro. Since the C-V2X and V2V systems may use either a data traffic channel or a voice channel, audio packets (which may carry voice signals) may decompress the audio signals using one or more of the following codecs: AMR narrowband speech codec (5.15kbp), AMR wideband speech codec (8.85Kbps), G.729AB speech codec (8Kbps), GSM-EFR speech codec (12.2Kbps), GSM-FR speech codec (13Kbps), GSM-HR speech codec (5.6 Kbps), EVRC-NB, EVRC-WB, Enhanced Voice Service (EVS). Speech codecs are sometimes referred to as vocoders. Vocoder packets are inserted into larger packets before being transmitted over the air. Voice is sent in a voice channel, but VOIP (voice over IP) can also be used to send voice in a data channel. Codec 438 may represent a voice codec, an audio codec, or a combination of functions for decoding voice packets or audio packets. Generally, for ease of explanation, the term audio packet also includes the definition of a packet.

The audio packets may be transmitted in conjunction with the virtual passenger and may be received in conjunction with metadata from the second device to the first device.

It is also possible that in one configuration, the spatialization effect may be disabled after the second vehicle is a certain distance from the first vehicle.

The one or more processors included in the first device may be configured to disable the spatialization effect after the second vehicle exceeds a configurable distance from the first device. The particular distance may be configurable based on, for example, a distance of one-eighth of a mile. The configurable distance may be entered as a distance measurement or a time measurement. The particular distance may be configurable based on time (e.g., depending on the speed of the first and second vehicles). For example, instead of indicating that one-eighth mile is a distance that a spatial effect should last, the distance between the two may be measured in time. A vehicle traveling at 50 miles per hour (mph) one eighth of a mile corresponds to 9 seconds, i.e., 125 miles per 50 miles per hour 0025 hours 0025 minutes 60 minutes 15 minutes 9 seconds. Thus, in this example, after 9 seconds, the spatial effect may fade or suddenly stop.

In one embodiment, the spatialization effect may also be disabled for teletransmission if the second vehicle is a certain distance from the first vehicle. That is, the driver in the first vehicle may not be able to select the second vehicle and also may not initiate aspects of the teleport. For example, as mentioned above, the audio signal may comprise a sound pattern during, before or after the distant transmission of the virtual image. In one embodiment, if the second vehicle is too far away, the speaker rendering sound mode may be disabled during, before, or after the distant transmission of the virtual passenger.

Fig. 4b illustrates a block diagram 400b of a first device with different components on or in the first device that operate according to the techniques described in this disclosure. One or more of the different components may be integrated in one or more processors of the first device.

The block diagram 400b includes a transmitter 418 and a receive antenna 430. Through the receive antenna 430, the one or more processors may be configured to receive metadata 435 from a second device wirelessly connected to the first device via a sidelink channel. The one or more processors may use the metadata to directly identify the location of the received virtual object, i.e., the coordinates (visual or audio) of the virtual object may be embedded in the metadata. Additionally or alternatively, if the driver or another passenger in the vehicle is selecting a virtual object by using the target object selector 414, as described in fig. 4a, the metadata may assist in the derivation of the relative position/orientation of the selected object by the determiner 420.

The transmitter 418 may output a video stream that is compressed (i.e., in the form of a bitstream) or uncompressed. The video codec is not shown, however, there may be a video codec in the display device 410, the transmitter 418, the teletransmission virtual object synthesizer 415. Alternatively, there may be a video codec configured to decompress the video stream (if it is compressed). The video codec may be coupled to the transmitter 418, or to the remote transmission virtual object synthesizer 415, or to the display device 415. The video stream may include a representation of the virtual passenger.

The video codec may be integrated into one or more of the processors along with another component shown in fig. 4b (e.g., telematic virtual object synthesizer 415), or in other configurations, may be separately integrated into a separate processor. One or more of the transmitter 418, the teletransmitted virtual object synthesizer 415, the determiner 420, and the video codec may each be integrated into a processor of the one or more processors, or in some configurations, any combination of these components may be integrated into one processor of the one or more processors.

The output of the transmitter 418 may also include virtual passenger avatar data. As previously discussed, the virtual images of the virtually teleported person may each include two-dimensional avatar data or three-dimensional avatar data. Avatar data or video stream may be input into telematic virtual object synthesizer 415. The output of the telematic virtual object synthesizer 415 may be an enhanced image represented on the display device 410. The transmitter 418 may be integrated into one or more processors. The remote transport virtual object synthesizer 415 may be coupled to the transmitter 418 and may also be integrated with one or more processors. In some embodiments, the telematic virtual object synthesizer 415 may be integrated with the display device 410.

As previously discussed, when the virtual passenger speaks, it sounds as if the passenger is in the passenger seat in which the virtual image is located. That is, the virtual passenger may be coupled to a two-dimensional audio signal or a three-dimensional audio signal. The two-dimensional audio signal or the three-dimensional audio signal may comprise one or more audio objects (e.g. human speech) spatially located at positions where the virtual image appears to be physically located within the fixed ring. The speaker 440 generating the two-dimensional or three-dimensional audio signal may be located in a first device (e.g., a vehicle) or may be mounted and integrated into a headphone device.

In addition, the three-dimensional audio signal may be perceived as being emitted from a direction in which the virtual image is transmitted distantly. Further, the audio signal may include a sound pattern during the distant transmission of the virtual image, before the distant transmission of the virtual image, or after the distant transmission of the virtual image. For example, when virtual teleport is imminent, the audio signal may include a sound pattern during teleport of the virtual image (i.e., prior to teleport of the virtual image) that is manifested as a distance 420a and an angle 420b from the selected target object. The sound pattern prior to teletransmission by the teletransmitted subject may comprise tones or may be a pre-recorded sound.

As previously discussed with respect to fig. 4a, the first device may include a display device configured to represent the relative position of the second device.

After decoding the audio packets 432a or 436a from the codec 438, the audio spatializer 424 can optionally generate a three-dimensional audio signal. In the same or alternative embodiments, the audio packets 432a associated with the virtual visual object embedded in the avatar data or video stream may be decoded from the codec 438. The codec 438 may implement an audio codec or a speech codec as described with respect to fig. 4 a. The one or more processors may be configured to output three-dimensional spatialized audio content based on where the relative position of the second device is represented on the display device 410. The output three-dimensional spatialized audio content can be rendered by two or more speakers 440 coupled to the first device.

In some configurations, regardless of whether the location of the second device is represented on the display device 410, the output of the audio signal associated with the audio object may be a three-dimensional spatialized audio signal based on the relative location of the second device. In other embodiments, the audio object may be a three-dimensional spatialized audio signal based on coordinates included in the metadata output from the transmitter 418. The coordinates may include six degrees of freedom for the audio object. For example, there may be three degrees of freedom rotation of the virtual audio object in yaw, pitch and roll with respect to a fixed coordinate system. Similarly, the virtual object may consider three degrees of freedom relative to the location where the virtual visual object is projected.

Fig. 4c shows a block diagram 400c of a first device with different components on or in the first device operating in accordance with the techniques described in this disclosure. One or more of the different components may be integrated in one or more processors of the first device.

As shown, the video data may be generated by a camera coupled to the second device. For example, for ease of illustration, the second device may be referred to as device B. The visual environment behind the person using device B may also be captured by a camera coupled to the second device. Alternatively, the person using device B may be represented by a two-dimensional (2D) or three-dimensional (3D) graphical representation. The 2D graphical representation may be an avatar. The avatar may be an animated cartoon character. The 3D graphical representation may be a 3D mathematical model representing the surface of a person using device B. The 3D model may be texture mapped to obtain additional surface color and detail. The texture map allows pixels of the two-dimensional image to be wrapped along the surface of the 3D model. The one or more receive antennas 430 may be configured to receive messages from device B.

There may be a handshake or transfer protocol between the first device and device B in which messages are sent from the first device and device B using one or more transmit antennas 425. Similarly, there may be messages between the first device and device B that are received using one or more receive antennas 430. The handshake or transfer protocol may include one or more messages indicating that one or more virtual objects or images including one or more virtual objects are to be remotely transferred to the first device. For ease of explanation, the first device is referred to as device a. Communication interpreter 418a may receive a message from device B. Based on the message content, communication interpreter 418a may pass the packet to data extractor 418 b. The packet may include a field including one or more bits indicating that the packet includes a virtual passenger video stream and/or virtual passenger avatar data. Data extractor 418b may parse the packet and output the virtual passenger video stream and/or the virtual passenger avatar data. In some embodiments, data extractor 418b is integrated into another block that includes communication interpreter 418 a.

In one embodiment, there may be a video stream or avatar data selector 464. The selector 464 may output a video stream or avatar data. Additionally or alternatively, the selector 464 may output both the video stream and/or the avatar data. There may be a configuration in which the first device may not have the selector 464 and output avatar data, a video stream, or both avatar and video streams.

The avatar data may include motion attributes of one or more virtual passengers. The motion attributes may be rendered as an animation by the avatar renderer 468, where the 3D model of the human body part may move. The avatar data may include a 3D model, a 2D texture map, and animation information. The avatar data may be rendered to generate an image or a sequence of color animated images of the virtual passenger. It will be appreciated by those of ordinary skill in the art that a virtual passenger may be any object, such as a virtual animal, or a virtual instrument, or a cartoon. The sequence of color animated images may be referred to as a vehicle B passenger color image.

In addition to the color images, the avatar data may be rendered by using a mask (matte) generator 43a to generate one or more vehicle B passenger mask images. The mask image may be used to combine two or more images into a single final image. The mask image may be used to describe a region of interest in the color image. For example, the mask image of the vehicle B is a passenger color image, and the region of interest is an object (e.g., a person) received from the vehicle B. The region of interest in the vehicle B passenger color image may have corresponding pixels that are colored white in the vehicle B passenger mask image. Other pixels in the vehicle B passenger mask image may all be colored black. The border region of the region of interest and the rest of the vehicle B passenger mask image may have gray pixel values for a smooth transition between white and black pixels. The vehicle B passenger mask image may also be described as having transparency or alpha pixels. When so described, black indicates a transparency value, and white means an opaque alpha value.

Additionally, one or more receive antennas in vehicle a may be configured to receive a video stream from a second device (e.g., vehicle B). The second device may include its own in-vehicle camera or may be coupled to a device having a camera thereon. In one embodiment, the received video stream is output from the video stream or avatar selector 464, and the selector 464 passes the video stream to the video stream color separator 470. In an alternative embodiment, the received video stream is passed directly to video stream color separator 470. A video stream color separator generates color images. The color image is input to a mask generator 472 a. In an alternative embodiment, there may be a separate mask generator coupled to video stream color separator 470 instead of being shared between video stream color separator 470 and avatar renderer 468.

The video stream color image based output image 475 of mask generator 432a may be input into compositor 482. Similarly, the avatar renderer 468 based output image 476 of the mask generator 432a may also be input into the compositor 482. Mask generator 432a may apply image segmentation in the region of interest to identify the person in vehicle B and generate vehicle B passenger mask image 474.

In one embodiment, compositor 482 may combine output image 474 of the image of mask generator 472a with any of the video stream color images and generate a composite image. Additionally or alternatively, the compositor 482 may combine the output image 469 of the mask generator 472a with the avatar-rendered color image and generate a composite image. The composite image may be an enhanced video image. The composite image may be based on a combination of both avatar rendered color images and video stream color images. Alternatively, the composite image may be based on a color image rendered by the avatar without the video stream image, or based on a video stream color image without the avatar rendered color image.

Additionally, vehicle a may use an inwardly facing camera to capture the in-cabin color video 428. Image segmentation techniques may be used to describe regions of interest in vehicle a. A monochrome image from the in-cabin color video from the interior-facing camera 403 may be passed to the mask generator 473b and the vehicle passenger mask image 484 generated.

Synthesizer 482 may receive vehicle B occupant color images (474, 469) and corresponding vehicle B mask images (475, 476) and perform image synthesis using vehicle a occupant color image 488 and vehicle a mask image 484. The resulting enhanced video image is a composite of virtual objects (e.g., virtual passengers) placed in the appropriate environment of vehicle a. For example, a virtual passenger in vehicle B may have a background with seats that are blue, but the seats and/or doors around the virtual passenger projected in vehicle a are brown. The background of the virtual passenger (including the blue seat) may not appear in the composite image.

The compositor 482 may use an over operator composition where mathematically one image appears on top of another. The composite image color may be determined by the following relationship:

Coutput=Cforeground+Cbackground(1-Aforeground)

wherein: coutputA pixel color representing an output composite image;

Cforegroundpixel colors representing foreground images;

Cbackgrounda pixel color representing a background image;

Aforegroundalpha (transparency) representing the foreground image (i.e., the mask image).

The vehicle B passenger mask image (whether it is mask image 475 based on video stream 474 or mask image 476 based on avatar rendered color image 469) is based on the presence or absence of a location where the avatar is present.

Instead of the techniques described above with respect to mask generators 472a, 472b, mask generation may be accomplished using depth image assisted image segmentation. The depth image is an image as follows: where each pixel describes the distance from the camera to that point in space. Objects close to the camera have small values or shallow depths. Objects far from the camera have a large depth. Virtual objects (e.g., people) in an image may have similar and close depth image pixel values as a camera, while non-human pixels may have very different values. Two in-vehicle depth cameras (e.g., inwardly facing cameras) may be used to create the depth image. To generate depth from two cameras, a triangulation algorithm may be used to determine depth image pixel values using disparity. Disparity is the difference in image position when the same 3D point is projected under perspective onto two different cameras. For example, the 3D point location may be calculated using the following relationship:

x (xl) or B + xr (z/f)

y-yl z/f or yr z/f

z=f*B/(xl-xr)=f*B/d:

Wherein:

f is the focal length;

b is a baseline;

d is the distance between the two cameras;

the corresponding image point (xl, yl) of the left image and the image point (xr, yr) of the right image.

In conjunction with other figures and descriptions herein, the blocks in fig. 4c may enable one or more persons in vehicle a to communicate with one or more persons in vehicle B. There may be scenarios where: wherein the driver in vehicle a wishes to speak with the driver or passenger in vehicle B and remotely transmit a virtual object (i.e., the driver or passenger in vehicle B) to the front seat of vehicle a.

In one embodiment, vehicle a may not have a front seat passenger. As described in connection with fig. 1 c-1 h, the display device 410 may be incorporated and used in a vehicle in different configurations.

As one example, the driver of vehicle a may simply look to his or her right side and speak with the presented view from another person of vehicle B. In some embodiments, a person from vehicle B may have his head, shoulders, arms, and left side of his body visible to the driver (or other passenger) in vehicle a.

It is also possible to use the display device 410 as a digital rear view mirror in vehicle a (see fig. 1e), which allows the driver of vehicle a to speak to the person in vehicle B. However, one or more persons in vehicle B may be projected into the rear seats by reflection from the digital rear view mirrors in vehicle a. The person in the vehicle B may be seated in any position in the vehicle B, whether a front driver, a front passenger, or a rear passenger position. The display device 410 may act as a digital rear view mirror. The display device 410 may be coupled to a camera mounted above the display device directed toward the rear of the vehicle. Real-time video captured by the camera may be displayed on the display device 410. Like a physical mirror, a driver can see a portion of his or her own face when he/she is looking in the rearview mirror display (i.e., display device 410). Thus, a virtual passenger remotely transmitted into the vehicle a may be considered to be behind the driver in the vehicle a.

To maintain proper sequencing of perspective and occlusion, the video stream or avatar data output of the transmitter 418 may be displayed on the display device 410 by appropriately masking the passenger represented in the video stream or avatar data using the silhouette of the driver of the vehicle a. The result is a vehicle a passenger mask image 484. Therefore, the virtual passengers can appropriately synthesize the foreground and the background so that they appear behind the driver of the vehicle a. The output of the compositor 482 may generate an enhanced video stream that displays the virtual passenger from vehicle B along with any rear-view passengers from vehicle a.

In another embodiment, a mixed reality, augmented reality, or virtual reality wearable device (e.g., glasses or HMD) may be used to place a passenger or driver from vehicle B on any seat in vehicle a. For example, the orientation of the 3D coordinate system of the wearable device may be initialized such that the 3D avatar appears to be to the right of the driver in vehicle a and the rear passenger from vehicle B is located behind the driver of vehicle a.

There may be metadata describing the spatial location and orientation of each of the virtual passengers being sent from vehicle B to vehicle a. Each packet of metadata may include a timestamp (e.g., an integer) and X, Y, Z locations as floating point numbers in three-dimensional space. Additionally, there may be rotation or orientation information, which may be expressed as euler rotations X, Y, Z as three floating point numbers.

Alternatively, the rotation or orientation may be expressed as a quaternion rotation of four floating point numbers describing the angle and axis. There may be two integers that describe each person's position in the car. The row number, wherein the front row indicates the front row of the car with the driver. Seat number, which represents the position of the passenger from left to right along a given row. Finally, a boolean value, which indicates whether a given passenger is real or virtual. A real passenger is a passenger physically sitting in a given car, while a virtual passenger is a passenger reproduced as sitting in vehicle a, even if the person is physically sitting in vehicle B. The metadata may be used as input to a relative position/orientation determiner 420 (see fig. 4a) for the selected target object. Alternatively, in one embodiment, the metadata may be input into the audio spatializer 424 and played through the speaker 440.

Fig. 5 illustrates a conceptual diagram 500 of world coordinates transformed into pixel coordinates according to the techniques described in this disclosure. An external camera (e.g., 310b in fig. 3a, 402 in fig. 4a) mounted on a first vehicle may capture an image (e.g., a video frame) and in three-dimensional (3D) world coordinates [ x, y, z)]502 to represent an object. World coordinates can be controlled byTransformation into 3D Camera coordinates [ xc, yc, zc]504. The 3D camera coordinates 504 may be projected to the 2D x-y plane (normal vector of direction perpendicular to the face of the camera (310b, 402)) and in pixel coordinates (x, 402)p,yp)506 represent objects of an image. One of ordinary skill in the art will recognize that this transformation from world coordinates to pixel coordinates is based on the use of an input rotation matrix R]Translation vector [ t ]]And camera coordinates [ x ]c,yc,zc]To transform world coordinates [ x y z ]]. For example, the camera coordinates may be represented as [ xc, yc, zc ]]=[x y z]*[R]+ t, where the matrix [ R ] is rotated]Is a 3x3 matrix and the translation vector is a 1x3 vector.

The bounding box of the region of interest (ROI) may be in pixel coordinates (x) on the display device 510p,yp) And (4) showing. There may be a visual indication (e.g., a color change or icon or composite pointer enhanced within the bounding box 512) to alert the passenger in the vehicle that a target object (e.g., a second vehicle) has been selected to initiate communication therewith.

Fig. 6a shows a conceptual diagram of one embodiment of the estimation of distance and angle of a remote vehicle/passenger (e.g., a second vehicle). The distance may be derived from bounding box 622d in the video frame. The distance estimator 630 may receive the sensor parameters 632a, intrinsic and extrinsic parameters 632d of the exo-camera (310b, 402), and the size 632b of the bounding box 622 d. In some embodiments, there may be a vehicle information database that includes sizes 632c of different vehicles and may also contain certain image characteristics that may assist in identifying the vehicles.

The distance and angle parameters may be estimated at the video frame rate and interpolated to match the audio frame rate. From the database of vehicles, the actual size, i.e., width and height, of the remote vehicle can be obtained. Pixel coordinate (x) of one corner of the bounding boxp,yp) May correspond to a line in the 3D world coordinates with a given azimuth and elevation angle.

For example, using the lower left and lower right corners of the bounding box, and having the width w of the vehicle, a distance 640c (d) and an azimuth angle (θ)640a can be estimated, as shown in fig. 6 b.

FIG. 6b shows a conceptual diagram of an estimation of distance 640c and angle 640a of a remote device in the x-y plane.

Point a in fig. 6b may be represented by world coordinates (a, b, c). Point B in fig. 6B may also be represented by world coordinates (x, y, z). The azimuth angle (θ)640a may be expressed as (θ)12)/2. For small angles, distance dxy*(sinθ1-sinθ2) Approximately w, which is the width of the remote device in fig. 6 b. World coordinates (x, y, z) and (a, b, c) may be expressed in terms of width in the x-y plane, for example using the following formulas:

x=a

|y-b|=w

z=c

the pixel coordinates depicted in FIG. 5 may be expressed as xpX, a and yp=y=w+/-b。

Similarly, using the lower and upper left corners of the bounding box, and knowing the height h of the second vehicle 303b, the elevation angle of the second vehicle 303b can be calculated640b and a distance d of the second vehicleyzAs shown in fig. 6 c.

FIG. 6c shows a conceptual diagram of an estimation of distance 640c and angle 640b of a remote device in the y-z plane.

Point a in fig. 6c may be represented by world coordinates (a, b, c). Point B in fig. 6c may also be represented by world coordinates (x, y, z). Elevation angle640b may be expressed asFor small angles, distancesApproximately h, which is the height of the remote device 670 in fig. 6 c. World coordinates (x, y, z) and (a, b, c) may be in terms of height in the y-z planeDegree, for example using the following formula:

x=a

y=b

|z-c|=h

the pixel coordinates depicted in FIG. 5 may be expressed as xpX, a and yp=y=b。

Elevation angle 640b and azimuth angle 640a may be further adjusted for sound from the left half, right half, or middle of remote device 670, depending on the location of the sound source. For example, if the remote device 670 is a remote vehicle (e.g., a second vehicle), the location of the sound source may depend on whether the driver is speaking or the passenger is speaking. For example, the driver-side (left) azimuth 640a for the remote vehicle may be represented as (3 x θ)1+ θ 2)/4. This provides the azimuth angle 640a in the left half of the vehicle shown in fig. 8.

The video frame rate typically does not match the audio frame rate. To compensate for frame rate misalignment in different domains (audio and video), the parameters distance 640c, elevation angle, may be paired for each audio frameAnd azimuth angle 640a (θ) as a linear interpolation from values corresponding to the previous two video frames. Alternatively, the value from the most recent video frame (sample and hold) may be used. Furthermore, these values can be smoothed by taking the median (outlier culling) or average of the past few video frames at the cost of reduced responsiveness.

The distance 640c (d) shown in FIG. 6a may be dxyOr dyzOr d isxyAnd dyzSome combination (e.g., average). In some embodiments, it may be desirable to ignore the difference in height between the first vehicle and the remote device 670, for example, if the remote device 670 is at the same height of the first vehicle. Another example may be that a listener in a first vehicle configures settings for receiving spatial audio by projecting a z-component of a sound field emanating from a remote device 670 onto an x-y plane. In other examples, the remote device 670 may be a drone (e.g.,flying around while playing music) or there may be devices in the high-rise building that are streaming music. In such an example, it may be desirable to have angle estimator 630 output elevation angle 640b, or have other optional blocks operate on it as well. That is, smoothing 640 the parameters for video-to-audio frame rate conversion also operates on elevation angle 640b, and produces a smoother version of elevation angle 640 b. Since the vehicle and/or remote device will likely be moving around, the relative change in sound frequency may be accounted for by the doppler estimator 650. Accordingly, it may be desirable to have the listener in the first vehicle hear the sound of the remote device 670 (e.g., the second vehicle) additionally with the doppler effect. As the remote device 670 gets closer or farther from the first vehicle, the doppler estimator 650 may increase or decrease the variation in the frequency (i.e., pitch) heard by a listener in the first vehicle. As the remote device 670 gets closer to the first vehicle, the sound (if propagated through air) reaches the listener at a higher frequency because the remote device near the first vehicle compresses the pressure sound waves. In the case of an audio signal (or audio content) that is compressed and received as part of a radio frequency signal, there is no doppler shift that is perceptible to the human ear. Therefore, the doppler estimator 650 must compensate and use the distance and angle to create the doppler effect. Similarly, when the remote device 670 is moving away from the first vehicle, the pressure sound waves of the audio signal (or audio content), if propagated through air, will be expanded and produce a lower pitch sound. The doppler estimator 650 will compensate for the sound that will be a lower frequency effect because the audio signal (or audio content) is compressed in the bitstream and also transmitted by the remote device and received by the first vehicle (using radio frequency waves according to a modulation scheme that is part of the air interface for the C-V2X or V2X communication link). Alternatively, if the remote device 670 is not a vehicle, different types of communication links and air interfaces may be used.

Fig. 7a illustrates an embodiment of an audio spatializer 724a according to the techniques in this disclosure. In fig. 7a, the reconstructed sound field is rendered into speaker feeds that are provided to speakers 440 or headphones or any other audio delivery mechanism. The reconstructed sound field may include spatial effects that are provided to account for the distance and azimuth/elevation angle of a device (e.g., a remote vehicle or wearable device) relative to the person 111 (or another wearable device) in the vehicle 303 a.

The distance 702a (e.g., from the distance estimator 630, the parametric smoother 650 for video to audio frame rate conversion, or the doppler estimator 660) may be provided to the distance compensator 720. The input to the distance compensator 720 may be an audio signal (or audio content). The audio signal (or audio content) may be the output of the codec 438. The codec 438 may output a pulse code modulated audio signal. The PCM audio signal may be represented in the time domain or the frequency domain. The distance effect may be added as a filtering process, a Finite Impulse Response (FIR), or an Infinite Impulse Response (IIR) with an additional attenuation proportional to the distance (e.g., 1/distance may be the applied attenuation). An optional parameter (gain) may also be applied to adjust the gain up to improve intelligibility. Furthermore, a reverberation filter is an example of a distance simulator filter.

Another distance cue that can be modeled and added to the audio signal (or audio content) is the doppler effect described with respect to the doppler estimator 650 in fig. 6 c. The relative velocity of the remote vehicle is determined by calculating the rate of change of distance per unit time, and the distance and angle are used to provide the doppler effect as described above.

The sound field rotator 710 may use the output of the distance compensator 720 and the input angle 702b (e.g., azimuth 640a, elevation 640b, or a combination based on these angles), and may translate (pan) audio from a remote device (e.g., a second vehicle) to a desired azimuth and elevation. The input angle 720b may be converted by parametric smoothing 650 for video-to-audio frame rate conversion for output at audio frame intervals rather than video frame intervals. Another embodiment that may include a distance-independent sound field rotator 710 is shown in fig. 7 b. Panning (panning) may be achieved, among other means, by using object-based rendering techniques, such as vector-based amplitude panning (VBAP), panned sound-based renderer, or by using high-resolution head-related transfer functions (HRTFs) for headphone-based spatialization and rendering.

Fig. 7b shows an embodiment of an audio spatializer 424, which audio spatializer 424 comprises a decoder for use according to the techniques described in this disclosure. In fig. 7b, decoder 724b may utilize distance 702a information in the decoding process. As depicted in fig. 7a, an additional distance effect may be applied. Decoder 730 may be configured to ignore the highest frequency bins (bins) when decoding distances greater than a certain threshold. The distance filter may wipe out these higher frequencies and may not need to maintain the highest fidelity in these frequency bins. In addition, doppler shift may be applied in the frequency domain during the decoding process to provide a computationally efficient implementation of the doppler effect. Reverberation and other distance filtering effects can also be efficiently achieved in the frequency domain and lend themselves to easy integration with the decoding process. During the decoding process, rendering and/or binauralization may also be applied in the time or frequency domain within the decoder to produce appropriately panned speaker feeds at the output of the decoder.

Decoder 730 may be a speech decoder, an audio decoder, or a combined speech/audio decoder capable of decoding audio packets including compressed speech and music. The input to decoder 730 may be a stream from a cloud server associated with one or more remote devices. That is, there may be multiple streams as inputs 432 b. The cloud server may include streaming of music or other media. The input to the decoder 730 may also be compressed speech and/or music directly from a remote device (e.g., a remote vehicle).

Fig. 8 depicts an embodiment 800 in which the location of the person 111 in the first vehicle and the selected (remote) vehicle 810 may be in the same coordinate system. It may be necessary to readjust the angle and distance relative to the previously described exterior camera relative to the head position 820(X ', Y ', Z ') of the person 111 in the first vehicle. The location (X, Y, Z)802 of the selected remote device (e.g., remote vehicle 303b) and the location (X, Y, Z) of the first vehicle 303a may be calculated as follows from the distance and the azimuth/elevation angle. X ═ d × cos (azimuth), Y ═ d × sin (azimuth), and Z ═ d × sin (elevation). The head position 820 from the inwardly facing camera 188 (of the first vehicle) may be determined and converted to the same coordinate system as the first vehicle to obtain 820X ', Y ', and Z '. Given X, Y, Z802 and X ', Y ', Z ' 820, trigonometric relationships may be used to determine updated distances and angles relative to person 111. d ═ sqrt [ (X-X ') ^2+ (Y-Y ') ^2+ (Z-Z ') ^2], and azimuth ═ asin [ (Y-Y ')/d ], and elevation ═ asin [ (Z-Z ')/d ]. These updated d and angles can be used for finer spatialization and distance resolution and better accuracy.

The techniques described herein may be implemented in hardware, software, firmware, or any combination thereof. The techniques may be implemented in any of a variety of devices, such as a general purpose computer, a wireless communication device handset, or an integrated circuit device having multiple uses, including applications in wireless communication device handsets and other devices. Any features described as modules or components may be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques may be realized at least in part by a computer-readable data storage medium comprising program code comprising instructions that, when executed, perform one or more of the methods described above. The computer readable data storage medium may form part of a computer program product, which may include packaging materials. The computer-readable medium may include memory or a data storage medium, such as Random Access Memory (RAM), such as Synchronous Dynamic Random Access Memory (SDRAM), read-only memory (ROM), non-volatile random access memory (NVRAM), electrically erasable programmable read-only memory (EEPROM), flash memory, a magnetic or optical data storage medium, and so forth. Additionally or alternatively, the techniques may be realized at least in part by a computer-readable communication medium that carries or communicates program code in the form of instructions or data structures and that can be accessed, read, and/or executed by a device having computing capabilities.

One of ordinary skill in the art will recognize that one or more components in a device may be implemented in circuitry of a processor, or partially or wholly as part of an Application Specific Integrated Circuit (ASIC) in one or more processors.

The program code or instructions may be executed by a processor, which may include one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Such a processor may be configured to perform any of the techniques described in this disclosure. A general purpose processor may be a microprocessor; but, in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. Thus, the term "processor," as used herein may refer to any one of the foregoing structure, any combination of the foregoing structure, or any other structure or means suitable for implementing the techniques described herein. In addition, in some aspects, the functions described herein may be provided within dedicated software modules or hardware modules configured for encoding and decoding or incorporated in a combined video encoder-decoder (CODEC).

The coding techniques discussed herein may be embodiments in an example video encoding and decoding system. The system includes a source device that provides encoded video data to be decoded by a destination device at a later time. In particular, a source device provides video data to a destination device via a computer-readable medium. The source and destination devices may comprise any of a wide variety of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets (such as so-called "smart" phones), so-called "smart" pads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, and so forth. In some cases, the source device and the destination device may be equipped for wireless communication.

The destination device may receive encoded video data to be decoded via the computer-readable medium. The computer-readable medium may include any type of medium or device capable of moving encoded video data from a source device to a destination device. In one example, the computer readable medium may include a communication medium to enable a source device to transmit encoded video data directly to a destination device in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to a destination device. The communication medium may include any wireless or wired communication medium such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network (such as the internet). The communication medium may include a router, switch, base station, or any other means that may be used to facilitate communication from a source device to a destination device.

In some examples, the encoded data may be output from the output interface to a storage device. Similarly, the encoded data may be accessed from the storage device through the input interface. The storage device may include any of a variety of distributed or locally accessed data storage media such as a hard drive, blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In further examples, the storage device may correspond to a file server or another intermediate storage device that may store the encoded video generated by the source device. The destination device may access the stored video data from the storage device via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to a destination device. Example file servers include web servers (e.g., for a website), FTP servers, Network Attached Storage (NAS) devices, or local disk drives. The destination device may access the encoded video data over any standard data connection, including an internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof.

The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require implementation by different hardware units. Rather, as noted above, the various units may be combined in a codec hardware unit, or provided by a collection of interoperating hardware units (including one or more processors as noted above) in conjunction with appropriate software and/or firmware.

Specific implementations of the present disclosure are described below with reference to the accompanying drawings. In this description, common features are designated by common reference numerals throughout the drawings. As used herein, various terms are used only for the purpose of describing particular implementations and are not intended to be limiting. For example, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the terms "comprises," "comprising," "includes" and "including" are used interchangeably. In addition, it will be understood that the term "wherein" may be used interchangeably with "wherein". As used herein, "exemplary" may indicate examples, implementations, and/or aspects, and should not be construed as limiting or indicating a preferred manner or preferred implementation. As used herein, an ordinal term (e.g., "first," "second," "third," etc.) used to modify an element such as a structure, component, operation, etc., does not by itself indicate any priority or order of the element relative to another element, but merely distinguishes the element from another element having the same name (if an ordinal term is not used). As used herein, the term "set" refers to a grouping of one or more elements, and the term "plurality" refers to a plurality of elements.

As used herein, "coupled" may include "communicatively coupled," "electrically coupled," or "physically coupled," and may also (or alternatively) include any combination thereof. Two devices (or components) can be directly coupled (e.g., communicatively coupled, electrically coupled, or physically coupled) or indirectly coupled via one or more other devices, components, wires, buses, networks (e.g., wired networks, wireless networks, or a combination thereof), and so forth. As an illustrative, non-limiting example, two devices (or components) that are electrically coupled may be included in the same device or different devices and may be connected via electronics, one or more connectors, or inductive coupling. In some implementations, two devices (or components) that are communicatively coupled (such as in electronic communication) may send and receive electrical signals (digital or analog signals) directly or indirectly (such as via one or more wires, buses, networks, or the like). As used herein, "directly coupled" may include two devices that are coupled (e.g., communicatively coupled, electrically coupled, or physically coupled) without intervening components.

As used herein, "integrated" may include "manufacturing or selling a device. The device may be integrated if a user purchases an enclosure that bundles or includes the device as part of the enclosure. In some descriptions, two devices may be coupled, but not necessarily integrated (e.g., different peripherals may not be integrated into a command device, but may still be "coupled"). Another example may be any of the transceivers or antennas described herein, which may be "coupled" to a processor, but not necessarily part of an enclosure that includes a video device. Other examples may be inferred from the context disclosed herein (including this paragraph) when the term "integrated" is used.

As used herein, a "wireless" connection between devices may be based on various wireless technologies, such as may be based on different cellular communication systems (such as V2X and C-V2X). C-V2X allows direct communication (via a "sidelink channel") between the vehicle and other devices without the use of a base station. In such a case, the device may "connect wirelessly via a sidelink channel.

A Long Term Evolution (LTE) system, a Code Division Multiple Access (CDMA) system, a global system for mobile communications (GSM) system, a Wireless Local Area Network (WLAN) system, or some other wireless system. A CDMA system may implement wideband CDMA (wcdma), CDMA 1X, evolution-data optimized (EVDO), time division synchronous CDMA (TD-SCDMA), or some other version of CDMA. Additionally, the two devices may be wirelessly connected based on Bluetooth, wireless fidelity (Wi-Fi), or a variation of Wi-Fi (e.g., Wi-Fi direct). The "wireless connection" may also be based on other wireless technologies such as ultrasound, infrared, pulsed radio frequency electromagnetic energy, structured light, or direction of arrival techniques used in signal processing (e.g., audio signal processing or radio frequency processing) when the two devices are in line of sight.

As used herein, a "and/or" B "means" a and B, "or" a or B, "or both" a and B "and" a or B "are applicable or acceptable.

As used herein, a unit may comprise, for example, dedicated hard-wired circuitry, software and/or firmware in combination with programmable circuitry, or a combination thereof.

The term "computing device" is used generically herein to refer to any or all of the following: servers, personal computers, laptop computers, tablet computers, mobile devices, cellular telephones, smartbooks, ultrabooks, palm-top computers, Personal Data Assistants (PDAs), wireless email receivers, multimedia internet capable cellular telephones, Global Positioning System (GPS) receivers, wireless game controllers, and similar electronic devices that include programmable processors and circuits for wirelessly transmitting and/or receiving information.

Various examples have been described. These and other examples are within the scope of the following claims.

53页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于订阅和通知的方法和设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类