Gesture space dimension reduction for gesture space deformation of virtual characters

文档序号:602786 发布日期:2021-05-04 浏览:6次 中文

阅读说明:本技术 虚拟角色的姿势空间变形的姿势空间维度减小 (Gesture space dimension reduction for gesture space deformation of virtual characters ) 是由 S·M·科默 G·韦迪希 于 2019-07-25 设计创作,主要内容包括:用于减小姿势空间维度的系统和方法。多个示例姿势可以定义输入姿势空间。每个示例姿势可以包括虚拟角色的一组关节旋转。关节旋转可以利用无奇异点数学表示来表达。然后可以将多个示例姿势聚类成一个或多个聚类。可以针对每个聚类确定代表性姿势。然后可以提供与输入姿势空间相比维度减小的输出姿势空间。(Systems and methods for reducing gesture space dimensions. A plurality of example gestures may define an input gesture space. Each example gesture may include a set of joint rotations of the virtual character. Joint rotation may be expressed using a singular point-free mathematical representation. The plurality of example gestures may then be clustered into one or more clusters. A representative gesture may be determined for each cluster. An output gesture space of reduced dimensions compared to the input gesture space may then be provided.)

1. A system, comprising:

a non-transitory computer memory to store a plurality of example gestures of a skeletal system of a virtual character; and

a hardware computer processor in communication with the non-transitory computer memory, the hardware computer processor configured to reduce a dimension of an input gesture space by performing a method comprising:

clustering the plurality of example gestures into one or more clusters, the plurality of example gestures defining an input gesture space, each example gesture of the plurality of example gestures comprising a set of joint rotations, the joint rotations having a singular point-free mathematical representation;

determining a representative gesture for each cluster; and

providing an output gesture space of reduced dimensions compared to the input gesture space.

2. The system of claim 1, wherein the singularity-free mathematical representation of the joint rotation comprises a quaternion representation.

3. The system of claim 2, further comprising:

receiving the plurality of example gestures for which the joint rotation has a representation of Euler angles; and

converting the euler angle representation to the quaternion representation.

4. The system of claim 1, wherein the example gestures are clustered into the one or more clusters based on a metric for determining similarity between each example gesture and each cluster.

5. The system of claim 1, wherein clustering the example gestures comprises: each of the example gestures is mapped to a point in a multi-dimensional space.

6. The system of claim 5, wherein clustering the example gestures further comprises:

determining a centroid of each cluster;

determining a distance between the point of each example gesture and a centroid of each cluster; and

each example gesture is assigned to the closest cluster.

7. The system of claim 6, further comprising: the centroid of each cluster is determined iteratively, and each example gesture is assigned to the closest cluster.

8. The system of claim 6, wherein the representative gesture for each cluster comprises one of the example gestures assigned to the cluster or an example gesture associated with a centroid of the cluster.

9. The system of claim 1, further comprising: the number of clusters is determined.

10. The system of claim 9, wherein determining a number of clusters comprises:

clustering, for each of a plurality of different candidate cluster quantities, the plurality of example gestures into the one or more clusters;

calculating an error metric associated with each candidate cluster quantity; and

selecting one of the candidate cluster numbers based on the error metric associated with the candidate cluster number.

11. The system of claim 10, wherein the error metric comprises, for all of the example gestures in the input gesture space, a sum of squared distances between a point corresponding to each example gesture and a centroid of its assigned cluster.

12. The system of claim 10, wherein selecting one of the candidate cluster numbers comprises: it is determined whether its error metric meets a selected criterion.

13. The system of claim 12, wherein the criterion is that a rate of change of the measure of error exceeds a specified threshold.

14. The system of claim 1, further comprising: training a gesture space deformer using the output gesture space.

15. The system of claim 14, further comprising: computing a mesh deformation of the virtual character using the pose space deformer.

16. The system of claim 14, further comprising: controlling a plurality of hybrid shapes in an output deformation matrix using the gesture space deformer.

17. The system of claim 16, wherein the output deformation matrix is generated by reducing a dimension of an input deformation matrix using principal component analysis.

18. The system of claim 17, wherein reducing the dimension of the input deformation matrix comprises:

determining a principal component of the input deformation matrix;

omitting one or more of the principal components to leave one or more remaining principal components;

generating the output deformation matrix using one or more blending shapes associated with the one or more remaining principal components.

19. The system of claim 1, wherein the output gesture space is at least 30% smaller than the input gesture space.

20. The system of claim 1, wherein the system comprises a virtual reality display system, an augmented reality display system, or a mixed reality display system.

21. A method, comprising:

obtaining a plurality of example gestures of a skeletal system of a virtual character, the plurality of example gestures defining an input gesture space, each of the plurality of example gestures comprising a set of joint rotations, the joint rotations having a singular point-free mathematical representation;

clustering the plurality of example gestures into one or more clusters;

determining a representative gesture for each cluster; and

providing an output gesture space of reduced dimensions compared to the input gesture space.

Technical Field

The present disclosure relates to virtual reality and augmented reality, including mixed reality, imaging, and visualization systems, and more particularly to fitting (decorating) systems and methods for animating virtual characters, such as avatars.

Background

Modern computing and display technologies have facilitated the development of systems for so-called "virtual reality", "augmented reality", and "mixed reality" experiences, in which digitally reproduced images are presented to a user in a manner in which they appear to be, or may be perceived to be, real. Virtual Reality (VR) scenes typically involve the presentation of computer-generated virtual image information, without being transparent to other real-world visual inputs. Augmented Reality (AR) scenes typically involve the presentation of virtual image information as an enhancement to the visualization of the real world around the user. Mixed Reality (MR) is an augmented reality in which physical and virtual objects can coexist and interact in real time. The systems and methods disclosed herein address various challenges associated with VR, AR, and MR techniques.

SUMMARY

The avatar may be a virtual representation of a real or fictitious person (or biological or anthropomorphic object) in an AR/VR/MR environment. For example, during a presentation session where two AR/VR/MR users interact with each other, a viewer may perceive an avatar of the other user in the viewer's environment, creating a tangible sense of the other user's presence in the viewer's environment. Avatars can also provide users with a way to interact with each other and do things together in a shared virtual environment. For example, a student attending an online class may perceive and interact with avatars of other students or teachers in a virtual classroom. As another example, a user playing a game in an AR/VR/MR environment may see and interact with avatars of other players in the game.

Embodiments of the disclosed systems and methods may provide improved avatars and more realistic interactions between a user of a wearable system and avatars in the user's environment. Although examples in this disclosure describe animating human avatars, similar techniques may be applied to animals, fictional creatures, objects, and the like.

Examples of 3D display of wearable systems

Wearable systems, also referred to herein as Augmented Reality (AR) systems, may be configured to present 2D or 3D virtual images to a user. The image may be a still image, a frame of video, a combination of videos, or the like. At least a portion of the wearable system may be implemented on a wearable device that may be directed to presenting VR, AR, or MR environments, alone or in combination, for interaction. The wearable device may be used interchangeably as an AR device (ARD). Furthermore, for purposes of this disclosure, the term "AR" may be used interchangeably with the term "MR".

FIG. 1 depicts an illustration of a mixed reality scene with certain virtual reality objects and certain physical objects viewed by a person. In fig. 1, an MR scene 100 is depicted, where a user of MR technology sees a real-world park-like setting 110 featuring people, trees, buildings in the background, and a concrete platform 120. In addition to these items, the user of MR technology may also perceive that he "sees" a robotic statue 130 standing on the real world platform 120, as well as an anthropomorphic flying cartoon-like avatar 140 that appears to be a bumblebee, although these elements do not exist in the real world.

In order for a 3D display to produce a realistic sense of depth, and more specifically, a simulated sense of surface depth, it may be desirable for each point in the display field of view to generate an adjustment response corresponding to its virtual depth. If the accommodative response to a displayed point does not conform to the virtual depth of the point (as determined by the binocular depth cues for convergence and stereopsis), the human eye may encounter an accommodative conflict, resulting in imaging instability, harmful eye fatigue, headaches, and an almost complete lack of surface depth in the absence of accommodative information.

The VR, AR, and MR experiences may be provided by a display system having a display in which images corresponding to a plurality of depth planes are provided to a viewer. The image may be different for each depth plane (e.g., providing a slightly different rendering of a scene or object) and may be focused separately by the eyes of the viewer, providing depth cues to the user based on eye accommodation required to focus different image features of the scene located on different depth planes, or based on observing different image features out of focus on different depth planes. As discussed elsewhere herein, such depth cues provide reliable depth perception.

FIG. 2 shows an example of a wearable system 200 that may be configured to provide an AR/VR/MR scenario. Wearable system 200 may also be referred to as AR system 200. Wearable system 200 includes a display 220 and various mechanical and electronic modules and systems that support the functionality of display 220. The display 220 may be coupled to a frame 230, which frame 230 may be worn by a user, wearer, or viewer 210. The display 220 may be placed in front of the eyes of the user 210. The display 220 may present AR/VR/MR content to the user. The display 220 may include a Head Mounted Display (HMD) that is worn on the head of the user.

In some embodiments, a speaker 240 is coupled to the frame 230 and is positioned adjacent to the ear canal of the user (in some embodiments, another speaker, not shown, is positioned adjacent to another ear canal of the user to provide stereo/moldable sound control). The display 220 may include an audio sensor (e.g., microphone) 232 for detecting an audio stream from the environment and capturing ambient sounds. In some embodiments, one or more other audio sensors, not shown, are positioned to provide stereo reception. Stereo reception can be used to determine the location of sound sources. Wearable system 200 may perform sound or speech recognition on the audio stream.

The wearable system 200 may include an outward-facing imaging system 464 (shown in fig. 4) that views the world in the user's surroundings. Wearable system 200 may also include an inward facing imaging system 462 (shown in fig. 4) that may track the user's eye movements. The inward facing imaging system may track the motion of one eye or the motion of both eyes. An inward facing imaging system 462 may be attached to frame 230 and may be in electrical communication with processing module 260 or 270, which processing module 260 or 270 may process image information acquired by the inward facing imaging system to determine eye movements or eye gestures of user 210, such as a pupil diameter or orientation of an eye. The inward-facing imaging system 462 may include one or more cameras. For example, at least one camera may be used to image each eye. The images acquired by the camera may be used to determine the pupil size or eye pose of each eye separately, allowing image information to be presented to each eye to be dynamically adjusted for that eye.

As an example, wearable system 200 may use outward facing imaging system 464 or inward facing imaging system 462 to acquire images of the user's gestures. The image may be a still image, a frame of video, or video.

The display 220 may be operatively coupled 250 to a local data processing module 260, such as by a wired lead or wireless connection, which local data processing module 260 may be mounted in various configurations, such as fixedly attached to the frame 230, fixedly attached to a helmet or hat worn by the user, embedded in headphones, or otherwise removably attached to the user 210 (e.g., in a backpack, belt-coupled configuration).

The local processing and data module 260 may include a hardware processor as well as digital memory, such as non-volatile memory (e.g., flash memory), both of which may be used to facilitate processing, caching, and storage of data. The data may include the following: a) capture from sensors (which may be, for example, operatively coupled to the frame 230 or otherwise attached to the user 210), such as an image capture device (e.g., a camera in an inward facing imaging system or an outward facing imaging system), an audio sensor (e.g., a microphone), an Inertial Measurement Unit (IMU), an accelerometer, a compass, a Global Positioning System (GPS) unit, a radio, or a gyroscope; or b) retrieved or processed using remote processing module 270 or remote data store 280, perhaps communicated to display 220 after such processing or retrieval. Local processing and data module 260 may be operatively coupled to remote processing module 270 or remote data store 280 by communication links 262 or 264 (such as via wired or wireless communication links) such that these remote modules may serve as resources for local processing and data module 260. Further, remote processing module 280 and remote data store 280 may be operatively coupled to each other.

In some embodiments, remote processing module 270 may include one or more processors configured to analyze and process data or image information. In some embodiments, remote data store 280 may include a digital data storage facility, which may be used over the internet or other networked configurations configured in "cloud" resources. In some embodiments, all data is stored and all computations are performed in local processing and data modules, allowing fully autonomous use from remote modules.

Wearable systemExample Assembly of

Fig. 3 schematically illustrates example components of a wearable system. Fig. 3 shows a wearable system 200, which may include a display 220 and a frame 230. Enlarged view 202 schematically shows various components of wearable system 200. In some implementations, one or more of the components shown in FIG. 3 may be part of the display 220. The various components, alone or in combination, may collect various data (such as, for example, audio or visual data) associated with the user of wearable system 200 or the user's environment. It should be understood that other embodiments may have more or fewer components depending on the application for which the wearable system is used. Nevertheless, fig. 3 provides a basic idea of some of the various components and types of data that may be collected, analyzed, and stored by the wearable system.

Fig. 3 illustrates an example wearable system 200, which example wearable system 200 may include a display 220. The display 220 may include a display lens 226 that may be mounted to a user's head or a housing or frame 230 corresponding to the frame 230. The display lens 226 may include one or more transparent mirrors positioned by the housing 230 in front of the user's eyes 302, 304 and may be configured to launch (launch) the projected light 338 into the eyes 302, 304 and facilitate beam shaping, while also allowing transmission of at least some light from the local environment. The wavefront of the projection beam 338 may be curved or focused to coincide with a desired focal length of the projection light. As shown, two wide-field machine vision cameras 316 (also referred to as world cameras) may be coupled to the housing 230 to image the environment surrounding the user. These cameras 316 may be dual capture visible/invisible (e.g., infrared) light cameras. The camera 316 may be part of the outward facing imaging system 464 shown in fig. 4. Images acquired by the world camera 316 may be processed by the gesture processor 336. For example, the gesture processor 336 may implement one or more object recognizers 708 (e.g., shown in FIG. 7) to recognize gestures of the user or another person in the user environment, or to recognize physical objects in the user environment.

With continued reference to fig. 3, a pair of scanning laser shaped wavefront (e.g., for depth) light projector modules with display mirrors and optics are shown that are configured to project light 338 into the eyes 302, 304. The depicted view also shows two miniature infrared cameras 324 paired with infrared light (such as light emitting diodes "LEDs") configured to be able to track the user's eyes 302, 304 to support rendering and user input. The camera 324 may be part of the inward facing imaging system 462 shown in fig. 4. Wearable system 200 may further have sensor assembly 339, which may include X, Y and Z-axis accelerometer functionality as well as a magnetic compass and X, Y and Z-axis gyroscope functionality, preferably providing data at a relatively high frequency, such as 200 Hz. The sensor assembly 339 may be part of the IMU described with reference to fig. 2A. The depicted system 200 may also include a head pose processor 336, such as an ASIC (application specific integrated circuit), FPGA (field programmable gate array), or ARM processor (advanced reduced instruction set machine), which may be configured to compute real-time or near real-time user head poses from the wide field of view image information output from the capture device 316. The head pose processor 336 may be a hardware processor and may be implemented as part of the local processing and data module 260 shown in fig. 2A.

The wearable system may also include one or more depth sensors 234. The depth sensor 234 may be configured to measure a distance between an object in the environment to the wearable device. The depth sensor 234 may include a laser scanner (e.g., lidar), an ultrasonic depth sensor, or a depth sensing camera. In some implementations, where the camera 316 has depth sensing capabilities, the camera 316 may also be considered a depth sensor 234.

Also shown is a processor 332 configured to perform digital or analog processing to derive gestures from gyroscope, compass or accelerometer data from a sensor component 339. Processor 332 may be part of local processing and data module 260 shown in fig. 2. Wearable system 200 as shown in fig. 3 may also include a positioning system such as, for example, GPS 337 (global positioning system) to assist with posture and positioning analysis. Additionally, the GPS may further provide remote-based (e.g., cloud-based) information about the user's environment. This information may be used to identify objects or information in the user's environment.

The wearable system may combine the data acquired by the GPS 337 with a remote computing system (such as, for example, the remote processing module 270, another user's ARD, etc.) that may provide more information about the user's environment. As one example, the wearable system may determine the user's location based on GPS data and retrieve a world map (e.g., by communicating with remote processing module 270) that includes virtual objects associated with the user's location. As another example, the wearable system 200 may monitor the environment using the world camera 316 (which may be part of the outward facing imaging system 464 shown in fig. 4). Based on the images acquired by the world camera 316, the wearable system 200 may detect objects in the environment (e.g., by using one or more object recognizers 708 shown in fig. 7). The wearable system may further use the data acquired by GPS 337 to interpret the features.

The wearable system 200 may also include a rendering engine 334, which rendering engine 334 may be configured to provide rendering information local to the user to facilitate operation of the scanner and imaging into the user's eyes for the user to view the world. Rendering engine 334 may be implemented by a hardware processor, such as, for example, a central processing unit or a graphics processing unit. In some embodiments, the rendering engine is part of the local processing and data module 260. Rendering engine 334 may be communicatively coupled (e.g., via a wired or wireless link) to other components of wearable system 200. For example, rendering engine 334 may be coupled to eye camera 324 via communication link 274 and to projection subsystem 318 (which may project light into the user's eyes 302, 304 via scanned laser devices in a manner similar to a retinal scan display) via communication link 272. Rendering engine 334 may also communicate with other processing units (such as, for example, sensor pose processor 332 and image pose processor 336) via links 276 and 294, respectively.

A camera 324 (e.g., a miniature infrared camera) may be used to track eye gestures to support rendering and user input. Some example eye gestures may include where the user is looking or the depth at which he or she is focusing (which may be estimated by the vergence of the eyes). GPS 337, gyroscopes, compasses, and accelerometers 339 may be used to provide coarse or fast pose estimation. One or more of the cameras 316 may acquire images and gestures that may be used, along with data from associated cloud computing resources, to map the local environment and share user views with others.

The example components depicted in FIG. 3 are for illustration purposes only. A number of sensors and other functional modules are shown together for ease of illustration and description. Some embodiments may include only one or a subset of these sensors or modules. Further, the positions of these components are not limited to the positions shown in fig. 3. Some components may be mounted to or housed within other components, such as a belt-mounted component, a hand-held component, or a helmet component. As one example, the image pose processor 336, the sensor pose processor 332, and the rendering engine 334 may be placed in a belt pack and configured to communicate with other components of the wearable system via wireless communication (such as ultra-wideband, Wi-Fi, bluetooth, etc.) or via wired communication. The depicted housing 230 is preferably user wearable and wearable. However, some components of the wearable system 200 may be worn to other parts of the user's body. For example, the speaker 240 may be inserted into the ear of the user to provide sound to the user.

With respect to the projection of light 338 into the user's eyes 302, 304, in some embodiments, the camera 324 may be used to measure the geometrically trended position of the center of the user's eyes, which generally coincides with the focal position or "depth of focus" of the eyes. The three-dimensional surface of all points of the eye's tendency may be referred to as the "binocular vision (horopter)". The focal length may take a certain number of depths or may vary infinitely. Light projected from the vergence distance appears to be focused on the subject eye 302, 304, while light before or after the vergence distance is blurred. Examples of wearable devices and other display systems of the present disclosure are also described in U.S. patent publication No.2016/0270656, which is incorporated herein by reference in its entirety.

The human visual system is complex and it is challenging to provide a sense of realism on depth. Due to the combination of vergence and accommodation, a viewer of the object may perceive the object as three-dimensional. Vergence movement of the two eyes relative to each other (e.g., rotational movement for converging the lines of sight of the eyes to fixate the pupils on the subject toward or away from each other) is closely related to the focusing (or "accommodation") of the eye's lens. Under normal circumstances, the focus of the eye lens is changed or the eye is accommodated to automatically match the vergence change to the same distance by changing the focus from one object to another object at another distance in a relationship known as "accommodation-vergence reflex". Also, under normal circumstances, a change in vergence will trigger a change in the match of the adjustment. A display system that provides a better match between accommodation and vergence may result in a more realistic and comfortable simulation of a three-dimensional image.

In addition, the human eye can correctly resolve spatially coherent light having a beam diameter of less than about 0.7 millimeters, regardless of where the eye is focused. Thus, to create the illusion of proper depth of focus, the camera 324 may be used to track the vergence of the eyes, and the rendering engine 334 and projection subsystem 318 may be used to render all objects on or near the focused monocular (horopter) and all other objects at varying degrees of defocus (e.g., using purposely created blur). Preferably, the system 220 renders to the user at a frame rate of about 60 frames per second or higher. As described above, preferably, the camera 324 may be used for eye tracking, and the software may be configured to pick not only the vergence geometry but also the focus position cues for use as user input. Preferably, such display systems are configured with a brightness and contrast suitable for daytime or nighttime use.

In some embodiments, the display system preferably has a delay for visual object alignment of less than about 20 milliseconds, an angular alignment of less than about 0.1 degrees, and a resolution of about 1 arc minute, which is believed to be about the limits of the human eye, without being limited by theory. Display system 220 may be integrated with a positioning system, which may involve a GPS element, optical tracking, compass, accelerometer, or other data source to aid in position and orientation determination; the positioning information may be used to facilitate accurate rendering of the user while viewing the relevant world (e.g., such information would help the glasses learn their location with respect to the real world).

In some embodiments, wearable system 200 is configured to display one or more virtual images based on the accommodation of the user's eyes. In some embodiments, unlike existing 3D display methods that force the user to focus on where the image is projected, the wearable system is configured to automatically change the focus of the projected virtual content to allow for more comfortable viewing of the image or images presented to the user. For example, if the user's eye has a current focus of 1m, the image may be projected to coincide with the user's focus. If the user moves the focus to 3m, the image is projected to coincide with the new focus. Thus, the wearable system 200 of some embodiments allows the user's eyes to function in a more natural manner rather than forcing the user to a predetermined focus.

Such a wearable system 200 may eliminate or reduce the occurrence of eye strain, headaches, and other physiological symptoms typically observed with virtual reality devices. To accomplish this, various embodiments of wearable system 200 are configured to project virtual images at varying focal lengths through one or more Variable Focusing Elements (VFEs). In one or more embodiments, 3D perception may be achieved by a multi-planar focusing system that projects images at a fixed focal plane away from the user. Other embodiments employ a variable plane focus, wherein the focal plane moves back and forth in the z-direction to coincide with the user's current focus state.

In both multi-plane and variable-plane focusing systems, wearable system 200 may employ eye tracking to determine vergence of the user's eyes, determine the user's current focus, and project a virtual image at the determined focus. In other embodiments, the wearable system 200 includes a light modulator that variably projects a variable focus light beam in a raster pattern across the retina via a fiber optic scanner or other light generating source. Thus, as further described in U.S. patent No.2016/0270656 (the entire contents of which are incorporated herein by reference), the ability of the display of wearable system 200 to project images at varying focal lengths not only mitigates the user's focusing of viewing 3D objects, but can also be used to compensate for the user's ocular abnormalities. In some other embodiments, the spatial light modulator may project an image to a user through various optical components. For example, as described further below, the spatial light modulator may project an image onto one or more waveguides and then transmit the image to a user.

Waveguide stack assembly

Fig. 4 shows an example of a waveguide stack for outputting image information to a user. The wearable system 400 includes a stack of waveguides, or stacked waveguide assembly 480, that can be used to provide three-dimensional perception to the eye/brain using a plurality of waveguides 432b, 434b, 436b, 438b, 4400 b. In some embodiments, wearable system 400 may correspond to wearable system 200 of fig. 2, with fig. 4 schematically showing some portions of wearable system 200 in greater detail. For example, in some embodiments, the waveguide assembly 480 may be integrated into the display 220 of fig. 2.

With continued reference to fig. 4, the waveguide assembly 480 may also include a plurality of features 458, 456, 454, 452 between the waveguides. In some embodiments, the features 458, 456, 454, 452 may be lenses. In other embodiments, the features 458, 456, 454, 452 may not be lenses. Rather, they may simply be spacers (e.g., capping layers or structures for forming air gaps).

The waveguides 432b, 434b, 436b, 438b, 440b or the plurality of lenses 458, 456, 454, 452 can be configured to transmit image information to the eye at various levels of wavefront curvature or ray divergence. Each waveguide stage may be associated with a particular depth plane and may be configured to output image information corresponding to that depth plane. Image injection devices 420, 422, 424, 426, 428 may be used to inject image information into waveguides 440b, 438b, 436b, 434b, 432b, each of which may be configured to distribute incident light across each respective waveguide for output toward eye 410. The light exits the output surfaces of the image injection devices 420, 422, 424, 426, 428 and is injected into the corresponding input edges of the waveguides 440b, 438b, 436b, 434b, 432 b. In some embodiments, a single beam (e.g., a collimated beam) may be injected into each waveguide to output the entire field of clonal collimated beams, which sample exit pupil beam is directed toward the eye 410 at a particular angle (and amount of divergence) corresponding to the depth plane associated with the particular waveguide.

In some embodiments, image injection devices 420, 422, 424, 426, 428 are separate displays that each generate image information to inject into a corresponding waveguide 440b, 438b, 436b, 434b, 432b, respectively. In some other embodiments, the image injection devices 420, 422, 424, 426, 428 are outputs of a single multiplexed display that may communicate image information to each of the image injection devices 420, 422, 424, 426, 428, e.g., via one or more light pipes (such as optical cables).

The controller 460 controls the operation of the stacked waveguide assembly 480 and the image injection devices 420, 422, 424, 426, 428. The controller 460 includes programming (e.g., instructions in a non-transitory computer readable medium) that adjusts timing and provides image information to the waveguides 440b, 438b, 436b, 434b, 432 b. In some embodiments, controller 460 may be a single integrated device or a distributed system connected by a wired or wireless communication channel. In some embodiments, the controller 460 may be part of the processing module 260 or 270 (shown in fig. 2).

The waveguides 440b, 438b, 436b, 434b, 432b may be configured to propagate light within each respective waveguide by Total Internal Reflection (TIR). The waveguides 440b, 438b, 436b, 434b, 432b may each be planar or have another shape (e.g., curved), with major top and bottom surfaces and edges extending between those major top and bottom surfaces. In the illustrated configuration, the waveguides 440b, 438b, 436b, 434b, 432b can each include light extraction optics 440a, 438a, 436a, 434a, 432a configured to extract light out of the waveguides by redirecting light propagating within each respective waveguide to output image information to the eye 410. The extracted light may also be referred to as outcoupled light and the light extraction optical element may also be referred to as an outcoupling optical element. The extracted light beam is output by the waveguide at a location where the light propagating in the waveguide strikes the light redirecting element. The light extraction optical elements (440a, 438a, 436a, 434a, 432a) may be, for example, reflective or diffractive optical features. Although shown disposed on the bottom major surface of the waveguides 440b, 438b, 436b, 434b, 432b for ease of description and clarity, in some embodiments the light extraction optical elements 440a, 438a, 436a, 434a, 432a may be disposed on the top major surface or the bottom major surface, or may be disposed directly in the volume of the waveguides 440b, 438b, 436b, 434b, 432 b. In some embodiments, the light extraction optical elements 440a, 438a, 436a, 434a, 432a may be formed as a layer of material attached to a transparent substrate to form the waveguides 440b, 438b, 436b, 434b, 432 b. In some other embodiments, the waveguides 440b, 438b, 436b, 434b, 432b may be a single piece of material, and the light extraction optical elements 440a, 438a, 436a, 434a, 432a may be formed on a surface or in the interior of the piece of material.

With continued reference to fig. 4, as discussed herein, each waveguide 440b, 438b, 436b, 434b, 432b is configured to output light to form an image corresponding to a particular depth plane. For example, the waveguide 432b closest to the eye may be configured to deliver collimated light injected into such waveguide 432b to the eye 410. The collimated light may represent an optically infinite focal plane. The next waveguide up 434b may be configured to emit collimated light that passes through the first lens 452 (e.g., the negative lens) before it reaches the eye 410. The first lens 452 may be configured to produce a slight convex wavefront curvature such that the eye/brain interprets light from the next waveguide up 434b as light from optical infinity into a first focal plane that is closer inward to the eye 410. Similarly, the third upward waveguide 436b passes its output light through both the first lens 452 and the second lens 454 before reaching the eye 410. The combined optical power of the first lens 452 and the second lens 454 can be configured to produce another incremental wavefront curvature such that the eye/brain interprets light from the third waveguide 436b as coming from a second focal plane that is closer to the person from optical infinity inward than light from the next waveguide up 434 b.

The other waveguide layers (e.g., waveguides 438b, 440b) and lenses (e.g., lenses 456, 458) are similarly configured, with the highest waveguide 440b in the stack sending its output through all of the lenses between it and the eye for representing the aggregate focal power relative to the closest focal plane of the person. To compensate for the stack of lenses 458, 456, 454, 452 when viewing/interpreting light from the world 470 on the other side of the stacked waveguide assembly 480, a compensating lens layer 430 may be provided at the top of the stack to compensate for the total power of the underlying lens stack 458, 456, 454, 452. Such a configuration provides as many sensing focal planes as there are waveguide/lens pairs available. Both the light extraction optics of the waveguide and the focusing aspects of the lens may be static (e.g., not dynamic or electrically excited). In some alternative embodiments, one or both may dynamically use the electrically active features.

With continued reference to fig. 4, the light extraction optical elements 440a, 438a, 436a, 434a, 432a may be configured to both redirect light out of their respective waveguides and output it with an appropriate amount of divergence or collimation for the particular depth plane associated with the waveguide. As a result, waveguides having different associated depth planes may have different configurations of light extraction optics that output light having different amounts of divergence depending on the associated depth plane. In some embodiments, as discussed herein, the light extraction optical elements 440a, 438a, 436a, 434a, 432a may be volumes or surface features that may be configured to output light at a particular angle. For example, the light extraction optical elements 440a, 438a, 436a, 434a, 432a may be volume holograms, surface holograms, and/or diffraction gratings. Light extraction optical elements such as diffraction gratings are described in U.S. patent publication No. 2015/0178939, published on 25/6/2015, which is incorporated herein by reference in its entirety.

In some embodiments, the light extraction optical elements 440a, 438a, 436a, 434a, 432a are diffractive features that form a diffraction pattern, or "diffractive optical element" (also referred to herein as a "DOE"). Preferably, the DOE has a relatively low diffraction efficiency such that only a portion of the light of the beam is deflected away towards the eye 410 at each intersection of the DOE, while the rest of the light continues to move through the waveguide via total internal reflection. The light carrying the image information can thus be split into a plurality of related exit beams that exit the waveguide at a plurality of locations, and the result is a fairly uniform pattern of exit emissions toward the eye 304 for that particular collimated beam bouncing around within the waveguide.

In some embodiments, one or more DOEs may be switched between an "on" state in which they actively diffract and an "off" state in which they do not significantly diffract. For example, a switchable DOE may comprise a polymer dispersed liquid crystal layer, wherein the droplets comprise a diffraction pattern in the host medium, and the refractive index of the droplets may be switched to substantially match the refractive index of the host material (in which case the pattern DOEs not significantly diffract incident light), or the droplets may be switched to a refractive index that DOEs not match the refractive index of the host medium (in which case the pattern actively diffracts incident light).

In some embodiments, the number and distribution of depth planes or depths of field may be dynamically changed based on the pupil size or orientation of the viewer's eyes. The depth of field may be inversely proportional to the size of the pupil of the viewer. As a result, as the size of the pupil of the viewer's eye decreases, the depth of field increases, such that a plane that cannot be discerned because the plane is positioned beyond the depth of focus of the eye may become discernable and appear more and more concentrated as the pupil size decreases and the depth of field correspondingly increases. Likewise, the number of spaced apart depth planes used to present different images to a viewer may be reduced as the pupil size decreases. For example, a viewer may not be able to clearly perceive details of both the first depth plane and the second depth plane at one pupil size without adjusting the accommodation of the eye from one depth plane to another. However, both depth planes may be sufficiently focused on the user at the same time with the other pupil size without changing the adjustment.

In some embodiments, the display system may change the number of waveguides that receive image information based on a determination of pupil size or orientation, or based on receiving an electrical signal indicative of a particular pupil size or orientation. For example, if the user's eye is unable to distinguish between two depth planes associated with two waveguides, the controller 460 (which may be an embodiment of the local processing and data module 260) may be configured or programmed to stop providing image information to one of the waveguides. Advantageously, this may reduce the processing burden on the system, thereby increasing the responsiveness of the system. In embodiments where the DOE of the waveguide is switchable between on and off states, the DOE may be switched to the off state when the waveguide DOEs receive image information.

In some embodiments, it may be desirable to have the outgoing beam satisfy the condition that the diameter is smaller than the eye diameter of the viewer. However, meeting this condition can be a challenge given the variability of the viewer's pupil size. In some embodiments, this condition is satisfied over a wide range of pupil sizes by varying the size of the exit beam in response to a determination of the pupil size of the viewer. For example, as the pupil size decreases, the size of the exiting beam may also decrease. In some embodiments, a variable aperture may be used to vary the size of the outgoing beam.

Wearable system 400 may include an outward facing imaging system 464 (e.g., a digital camera) that images a portion of the world 470. This portion of the world 470 may be referred to as the field of view (FOV) of the world camera, and the imaging system 464 is sometimes referred to as a FOV camera. The FOV of the world camera may or may not be the same as the FOV of the viewer 210, which FOV of the viewer 210 encompasses the portion of the world 470 perceived by the viewer 210 at a given time. For example, in some cases, the FOV of the world camera may be larger than the viewer 210 of the wearable system 400. The entire area available to a viewer FOR viewing or imaging may be referred to as a field of interest (FOR). FOR may include an arc of 4 pi of solid angle around wearable system 400, as the wearer may move his body, head, or eyes to perceive substantially any direction in space. In other contexts, the wearer's motion may be more limited and, therefore, the wearer's FOR may create a smaller solid angle. The images obtained from the outward facing imaging system 464 may be used to track gestures made by the user (e.g., hand or finger gestures), detect objects in the world 470 in front of the user, and so forth.

Wearable system 400 may include an audio sensor 232, such as a microphone, to capture ambient sounds. As described above, in some embodiments, one or more other audio sensors may be placed to provide stereo reception for determining the location of the voice source. As another example, the audio sensor 232 may include a directional microphone that may also provide such useful directional information as to where the audio source is located. The wearable system 400 may use information from both the outward facing imaging system 464 and the audio sensor 230 in locating the speech source, or determine an active speaker at a particular time, or the like. For example, wearable system 400 may use speech recognition alone or in combination with a reflected image of the speaker (e.g., as seen in a mirror) to determine the identity of the speaker. As another example, wearable system 400 may determine a location of a speaker in an environment based on sounds acquired from directional microphones. Wearable system 400 may employ speech recognition algorithms to parse sounds from the speaker's location to determine the content of the speech and use speech recognition techniques to determine the speaker's identity (e.g., name or other demographic information).

Wearable system 400 may also include an inward facing imaging system 466 (e.g., a digital camera) that observes user motion, such as eye motion and facial motion. The inward facing imaging system 466 may be used to capture images of the eye 410 to determine the size and/or orientation of the pupil of the eye 304. The inward facing imaging system 466 may be used to obtain images for determining a direction in which the user is looking (e.g., eye gestures) or biometric recognition of the user (e.g., via iris recognition). In some embodiments, at least one camera may be used for each eye to independently determine the individual pupil size or eye pose for each eye, allowing image information to be presented to each eye to be dynamically adjusted for that eye. In some other embodiments, the pupil diameter or orientation of only a single eye 410 is determined (e.g., using only a single camera per both eyes), and it is assumed to be similar for both eyes of the user. The images obtained by the inward facing imaging system 466 may be analyzed to determine the user's eye pose or mood, which the wearable system 400 may use to determine which audio or visual content should be presented to the user. Wearable system 400 may also use sensors such as IMUs, accelerometers, gyroscopes, etc. to determine head pose (e.g., head position or head orientation).

Wearable system 400 may include a user input device 466 by which a user may input commands to controller 460 to interact with wearable system 400. For example, the user input devices 466 may include a touch pad, touch screen, joystick, multiple degree of freedom (DOF) controller, capacitive sensing device, game controller, keyboard, mouse, directional pad (D-pad), wand, haptic device, totem (e.g., acting as a virtual user input device), and so forth. The multi-DOF controller may sense user input with some or all of the controller's possible translations (e.g., left/right, forward/backward, or up/down) or rotations (e.g., yaw, pitch, or roll). A multi-DOF controller that supports translational motion may be referred to as a 3DOF, while a multi-DOF controller that supports translation and rotation may be referred to as a 6 DOF. In some cases, the user may press or slide on the touch-sensitive input device using a finger (e.g., a thumb) to provide input to the wearable system 400 (e.g., provide user input to a user interface provided by the wearable system 400). User input device 466 may be held by a user's hand during use of wearable system 400. User input device 466 may be in wired or wireless communication with wearable system 400.

Other Components of wearable System

In many implementations, the wearable system may include other components in addition to or in lieu of the components of the wearable system described above. The wearable system may, for example, include one or more haptic devices or components. The haptic device or haptic assembly may be operable to provide a tactile sensation to a user. For example, a haptic device or component may provide a tactile sensation of pressure or texture when touching virtual content (e.g., virtual objects, virtual tools, other virtual constructs). The tactile sensation may replicate the sensation of a physical object represented by a virtual object, or may replicate the sensation of an imaginary object or character (e.g., a dragon) represented by virtual content. In some implementations, the haptic device or component can be worn by a user (e.g., a user-wearable glove). In some implementations, the haptic device or component may be held by a user.

The wearable system may, for example, include one or more physical objects that may be manipulated by a user to allow input to or interaction with the wearable system. These physical objects may be referred to herein as totems. Some totems may take the form of an inanimate object, such as, for example, a piece of metal or plastic, a wall, a table surface. In some implementations, the totem may not actually have any physical input structures (e.g., keys, triggers, joysticks, trackballs, rocker switches). Instead, the totem may simply provide a physical surface, and the wearable system may render a user interface to make the user appear to be on one or more surfaces of the totem. For example, the wearable system may render images of a computer keyboard and a touch pad to appear to reside on one or more surfaces of the totem. For example, the wearable system may render a virtual computer keyboard and a virtual touchpad to appear on the surface of an aluminum rectangular sheet used as a totem. The rectangular plate itself does not have any physical keys, touch pads or sensors. However, the wearable system may detect user manipulation or interaction or touch with a rectangular pad as a selection or input via a virtual keyboard or virtual touchpad. User input device 466 (shown in fig. 4) may be a totem embodiment, which may include a trackpad, touchpad, trigger, joystick, trackball, rocker or virtual switch, mouse, keyboard, multiple degree of freedom controller, or another physical input device. The user may use totems to interact with the wearable system or other users, alone or in combination with gestures.

Examples of haptic devices and totems that may be used with the wearable devices, HMDs, and display systems of the present disclosure are described in U.S. patent publication No.2015/0016777, which is incorporated herein by reference in its entirety.

Example Process for user interaction with wearable System

FIG. 5 is a process flow diagram of an example of a method 500 for interacting with a virtual user interface. Method 500 may be performed by a wearable system described herein. Embodiments of method 500 may be used by a wearable system to detect a person or document in the FOV of the wearable system.

At block 510, the wearable system may identify a particular UI. The type of UI may be predetermined by the user. The wearable system may recognize that a particular UI needs to be populated based on user input (e.g., gestures, visual data, audio data, sensory data, direct commands, etc.). The UI may be specific to a security scenario in which the wearer of the system is observing a user presenting a document to the wearer (e.g., at a travel checkpoint). At block 520, the wearable system may generate data for the virtual UI. For example, data associated with the boundaries, overall structure, shape, etc. of the UI may be generated. Additionally, the wearable system may determine map coordinates of the user's physical location such that the wearable system may display a UI related to the user's physical location. For example, if the UI is body-centered, the wearable system may determine coordinates of the user's body, head, or eye gestures so that a ring-shaped UI may be displayed around the user, or a planar UI may be displayed on a wall or in front of the user. In the security context described herein, the UI may be displayed as if the UI were around a traveler who is presenting a document to the wearer of the system, so that the wearer can easily view the UI while viewing the traveler and the traveler's document. If the UI is hand-centered, the map coordinates of the user's hand may be determined. These map points may be derived from data received by the FOV camera, sensory input, or any other type of collected data.

At block 530, the wearable system may send data from the cloud to the display, or may send data from a local database to the display component. At block 540, a UI is displayed to the user based on the transmitted data. For example, the light field display may project a virtual UI into one or both eyes of the user. Once the virtual UI has been created, the wearable system may simply wait for a command from the user to generate more virtual content on the virtual UI at block 550. For example, the UI may be a body-centered ring around the user's body or the body of a person (e.g., traveler) in the user's environment. The wearable system may then wait for a command (gesture, head or eye movement, voice command, input from a user input device, etc.) and, if the command is recognized (block 560), virtual content that may be associated with the command is displayed to the user (block 570).

Example of avatar rendering in mixed reality

The wearable system may employ various mapping-related techniques in order to achieve a high depth of field in the rendered light field. When mapping out the virtual world, it is advantageous to know all the features and points in the real world to accurately depict the virtual objects relative to the real world. To do so, FOV images captured from wearable system users may be added to the world model by including new pictures conveying information about various points and features of the real world. For example, the wearable system may collect a set of map points (e.g., 2D points or 3D points) and find new map points to render a more accurate version of the world model. The world model of the first user may be communicated to the second user (e.g., over a network such as a cloud network) so that the second user may experience the world surrounding the first user.

Fig. 6A is a block diagram of another example of a wearable system that may include an avatar processing and rendering system 690 in a mixed reality environment. Wearable system 600 may be part of wearable system 200 shown in fig. 2. In this example, wearable system 600 may include a map 620, which map 620 may include at least a portion of the data in map database 710 (shown in fig. 7). The map may reside in part locally on the wearable system, and may reside in part in a networked storage location accessible through a wired or wireless network (e.g., in a cloud system). The gesture process 610 may be executed on a wearable computing architecture (e.g., the processing module 260 or the controller 460) and utilize data from the map 620 to determine a position and orientation of the wearable computing hardware or the user. While the user is experiencing the system and operating worldwide, gesture data may be calculated from data collected in flight. The data may include images, data from sensors (such as inertial measurement units, which typically include accelerometer and gyroscope assemblies), and surface information related to objects in a real or virtual environment.

The sparse point representation may be the output of a simultaneous localization and mapping (e.g., SLAM or vSLAM, referring to a configuration where the input is only image/visual) process. The system may be configured to find not only the locations of the various components in the world, but also the composition of the world. Gestures may be building blocks that achieve many goals, including populating maps and using data from maps.

In one embodiment, the sparse point location itself may not be entirely sufficient, and further information may be needed to produce a multi-focus AR, VR, or MR experience. A dense representation, which generally refers to depth map information, may be used to at least partially fill the gap. Such information may be computed from a process known as "stereo" 640, in which depth information is determined using techniques such as triangulation or time-of-flight sensing. Image information and active patterns (such as infrared patterns created using an active projector), images acquired from an image camera, or gestures/totems 650 may be used as input to the stereo process 640. A large amount of depth map information may be fused together and some of them may be summarized using a surface representation. For example, a mathematically definable surface may be a valid (e.g., relative to a large point cloud) and understandable input to other processing devices, such as a game engine. Thus, the outputs of the stereo process (e.g., depth map) 640 may be combined in the fusion process 630. Gesture 610 may also be an input to the fusion process 630, and the output of fusion 630 becomes an input to the populate map process 620. Such as in topographical mapping, the sub-surfaces may be connected to each other to form a larger surface, and the map becomes a large mixture of points and surfaces.

To address various aspects of the mixed reality process 660, various inputs may be utilized. For example, in the embodiment shown in FIG. 6A, game parameters may be entered to determine that a user of the system is playing a monster battle game with one or more monsters at various locations, walls or other objects at various locations, or the like, which die or flee under various conditions (such as if the user shot at a monster). A world map may include information about the location of an object or semantic information of an object (e.g., a classification such as whether an object is flat or circular, horizontal or vertical, table or light, etc.), and a world map may be another valuable input for mixed reality. Gestures relative to the world also become inputs and play a key role in almost all interactive systems.

The control or input from the user is another input to wearable system 600. As described herein, user input may include visual input, gestures, totems, audio input, sensory input, and the like. To move around or play a game, for example, the user may need to instruct wearable system 600 about what he or she wants to do. In addition to moving itself in space, there are various forms of user controls that may be utilized. In one embodiment, a totem (e.g., a user input device) or object such as a toy gun may be held by a user and tracked by the system. The system will preferably be configured to know that the user is holding the item and to know what type of interaction the user is doing with the item (e.g., if the tote or object is a gun, the system may be configured to know the location and orientation, and whether the user has clicked a trigger or other sensory button or element that may be equipped with a sensor (such as an IMU) that may assist in determining what is happening even when such activity is not within the field of view of any camera).

Gesture tracking or recognition may also provide input information. Wearable system 600 may be configured to track and interpret gestures for button presses, for gestures to the left or right, stops, grips, and the like. For example, in one configuration, a user may want to flip through an email or calendar in a non-gaming environment, or to "fist bump" another person or player. Wearable system 600 may be configured to utilize a minimum amount of gestures, which may or may not be dynamic. For example, the gesture may be a simple static gesture, such as opening the hand to indicate a stop, with the thumb on top to indicate ok, and the thumb on bottom to indicate not ok; or the hand is flipped right or left or up/down to represent a directional command.

Eye tracking is another input (e.g., tracking where a user is looking to control display technology to render at a particular depth or range). In one embodiment, the vergence of the eye may be determined using triangulation, and then accommodation determined using a vergence/accommodation model developed for that particular person. Eye tracking may be performed by an eye camera to determine eye gaze (e.g., direction or orientation of one or both eyes). Other techniques may be used for eye tracking, such as, for example, measuring electrical potentials through electrodes (e.g., ocular potentiometers) placed near the eye.

Speech tracking may be another input and may be used alone or in combination with other inputs (e.g., totem tracking, eye tracking, gesture tracking, etc.). The voice tracking may include voice recognition, alone or in combination. The system 600 may include an audio sensor (e.g., a microphone) that receives an audio stream from the environment. The system 600 may combine voice recognition techniques to determine who is speaking (e.g., whether the speech is from the ARD wearer or another person or voice (e.g., recorded speech sent by a speaker in the environment)) and voice recognition techniques to determine what is being spoken local data and processing module 260 or remote processing module 270 may process audio data from a microphone (or audio data in another stream such as a video stream that the user is viewing, for example) to determine the speech by applying various voice recognition algorithms such as, for example, hidden markov models, Dynamic Time Warping (DTW) based speech recognition, neural networks, deep learning algorithms such as deep feed forward and recursive neural networks, end-to-end automatic speech recognition, machine learning algorithms (described with reference to figure 7), or other algorithms using acoustic or language modeling, etc.

The local data and processing module 260 or the remote processing module 270 may also apply speech recognition algorithms that may identify the identity of the speaker, such as whether the speaker is the user 210 of the wearable system 600 or another person with whom the user is communicating. Some example speech recognition algorithms may include frequency estimation, hidden markov models, gaussian mixture models, pattern matching algorithms, neural networks, matrix representations, vector quantization, speaker binary, decision trees, and Dynamic Time Warping (DTW) techniques. The speech recognition techniques may also include anti-speaker techniques, such as peer group models and world models. Spectral features may be used to represent speaker characteristics. The local data and processing module or the remote data processing module 270 may perform speech recognition using various machine learning algorithms described with reference to fig. 7.

Implementations of the wearable system may use these user controls or inputs via the UI. UI elements (e.g., controls, pop-up windows, bubbles, data entry fields, etc.) may be used, for example, to close the display of information (e.g., graphical or semantic information for an object).

With respect to camera systems, the example wearable system 600 shown in fig. 6A may include three pairs of cameras: a relatively wide FOV or passive SLAM camera pair disposed on either side of the user's face; oriented as a different pair of cameras in front of the user for processing the stereoscopic imaging process 640 and capturing gestures and totems/objects tracked in front of the user's face. The FOV camera and camera pair for the stereoscopic process 640 may be part of an outward facing imaging system 464 (shown in fig. 4). The wearable system 600 may include an eye-tracking camera (which may be part of the inward-facing imaging system 462 shown in fig. 4) oriented toward the user's eye in order to triangulate eye vectors and other information. Wearable system 600 may also include one or more texture light projectors, such as Infrared (IR) projectors, to inject texture into the scene.

The wearable system 600 may include an avatar processing and rendering system 690. The avatar processing and rendering system 690 may be configured to generate, update, animate and render avatars based on the context information. Some or all of the avatar processing and rendering system 690 may be implemented as part of the local processing and data module 260 or the remote processing modules 262, 264, alone or in combination. In various embodiments, multiple avatar processing and rendering systems 690 (e.g., as implemented on different wearable devices) may be used to render the virtual avatar 670. For example, the wearable device of the first user may be used to determine the intent of the first user, while the wearable device of the second user may determine features of the avatar and render the avatar of the first user based on the intent received from the wearable device of the first user. As will be described with reference to fig. 9A and 9B, the wearable device of the first user and the wearable device of the second user (or other such wearable devices) may communicate, for example, via a network.

Fig. 6B illustrates an example avatar processing and rendering system 690. The example avatar processing and rendering system 690 may include, alone or in combination, a 3D model processing system 680, a context information analysis system 688, an avatar auto-scaler 692, an intent mapping system 694, an anatomical structure adjustment system 698, a stimulus response system 696. The system 690 is intended to illustrate functionality for avatar processing and rendering and is not intended to be limiting. For example, in some implementations, one or more of these systems may be part of another system. For example, portions of the contextual information analysis system 688, alone or in combination, may be part of the avatar auto scaler 692, the intent mapping system 694, the stimulus response system 696, or the anatomical structure adjustment system 698.

The contextual information analysis system 688 may be configured to determine environmental and object information based on one or more device sensors described with reference to fig. 2 and 3. For example, the contextual information analysis system 688 may analyze the environment and the user environment or objects (including physical or virtual objects) rendering the environment of the user avatar using images acquired by the outward facing imaging system 464 of the user or a viewer of the user avatar. The contextual information analysis system 688 may analyze such images, alone or in combination with data obtained from location data or world maps (e.g., maps 620, 710, 910), to determine the location and layout of objects in the environment. Contextual information analysis system 688 may also generally access biological characteristics of a user or human to realistically animate virtual avatar 670. For example, the contextual information analysis system 688 may generate an discomfort curve that may be applied to the avatar such that a portion of the avatar body of the user (e.g., the head) is not in an uncomfortable (or unrealistic) position with respect to other portions of the body of the user (e.g., the avatar's head is not rotated 270 degrees). In some implementations, one or more object recognizers 708 (shown in fig. 7) may be implemented as part of the context information analysis system 688.

The avatar auto sealer 692, the intent mapping system 694 and the stimulus response system 696, and the anatomy adjustment system 698 may be configured to determine features of the avatar based on contextual information. Some example features of the avatar may include size, appearance, position, orientation, motion, pose, expression, and so forth. The avatar auto scaler 692 may be configured to automatically scale the avatar so that the user does not have to look at the avatar in an uncomfortable pose. For example, the avatar auto scaler 692 may increase or decrease the size of the avatar to bring the avatar to the user's eye level so that the user does not need to look down at the avatar or up at the avatar, respectively. The intent mapping system 694 may determine the intent of the user interaction and map the intent to the avatar based on the environment in which the avatar is rendered (rather than the exact user interaction). For example, the first user's intent may be to communicate with the second user in a telepresence session (see, e.g., FIG. 9B). Typically, two people will face each other when communicating. The intent mapping system 694 of the first user's wearable system may determine that there is such a face-to-face intent during the telepresence session, and may cause the first user's wearable system to render the avatar of the second user to face the first user. If the second user is to physically turn around, rather than rendering the avatar of the second user to the position of the turn around (which would result in the back of the avatar of the second user being rendered to the first user), the intent mapping system 694 of the first user may continue to render the face of the second avatar to the first user, which is the inferred intent (e.g., face-to-face intent in this example) of the telepresence session.

The stimulus response system 696 may identify an object of interest in the environment and determine the avatar's response to the object of interest. For example, the stimulus response system 696 may identify a sound source in the avatar's environment and automatically turn the avatar to look at the sound source. The stimulus response system 696 may also determine a threshold termination condition. For example, the stimulus response system 696 may return the avatar to its original pose after the sound source disappears or after a period of time has elapsed.

The anatomical structure adjustment system 698 may be configured to adjust the pose of the user based on the biological features. For example, the anatomy adjustment system 698 may be configured to adjust the relative position between the user's head and the user's torso or between the user's upper and lower body based on the discomfort curve.

The 3D model processing system 680 may be configured to animate the display 220 and render a virtual avatar 670. The 3D model processing system 680 may include a virtual character processing system 682 and a motion processing system 684. Virtual character processing system 682 may be configured to generate and update a 3D model of a user (for creating and animating a virtual avatar). The motion processing system 684 may be configured to animate the avatar, such as, for example, by changing the avatar's pose, by moving the avatar around in the user's environment, or animating the avatar's facial expression, etc. As will be further described herein, the avatar may be animated using assembly techniques. In some embodiments, the avatar is represented in two parts: a surface representation (e.g., a deformable mesh) for rendering the external appearance of the virtual avatar and a hierarchical set of interconnected joints (e.g., a core skeleton) for animating the mesh. In some implementations, the virtual character processing system 682 may be configured to edit or generate the surface representation, while the motion processing system 684 may be used to animate the avatar by moving the avatar, morphing a mesh, and so forth.

Example of mapping user environments

Fig. 7 is a block diagram of an example of an MR environment 700. MR environment 700 may be configured to receive input (e.g., visual input 702 from a user's wearable system, such as fixed input 704 of an indoor camera, sensory input 706 from various sensors, gestures, totems, eye tracking, user input from user input device 466, etc.) from one or more user wearable systems (e.g., wearable system 200 or display system 220) or fixed room systems (e.g., room cameras, etc.). The wearable system may use various sensors (e.g., accelerometers, gyroscopes, temperature sensors, motion sensors, depth sensors, GPS sensors, inward facing imaging systems, outward facing imaging systems, etc.) to determine the location and various other attributes of the user's environment. This information may be further supplemented with information from stationary cameras in the room, which may provide images or various cues from different perspectives. Image data acquired by a camera (such as a room camera and/or a camera of an outward facing imaging system) may be reduced to a set of mapped points.

One or more object identifiers 708 may crawl through the received data (e.g., a collection of points) and identify or map points, tagged images, attaching semantic information to objects by way of a map database 710. Map database 710 may include various points and their corresponding objects collected over time. The various devices and the map database may be connected to each other through a network (e.g., LAN, WAN, etc.) to access the cloud.

Based on this information and the set of points in the map database, the object identifiers 708a-708n may identify objects in the environment. For example, the object identifier may identify a face, a person, a window, a wall, a user input device, a television, a document (e.g., a travel ticket, a driver's license, a passport as described herein in this security example), other objects in the user's environment, and so forth. One or more object identifiers may be specific to objects having certain characteristics. For example, object identifier 708a may be used to identify a face, while another object identifier may be used to identify a document.

Object recognition may be performed using a variety of computer vision techniques. For example, the wearable system may analyze images acquired by the outward facing imaging system 464 (shown in fig. 4) to perform scene reconstruction, event detection, video tracking, object recognition (e.g., of a person or document), object pose estimation, facial recognition (e.g., from an image on a person or document in the environment), learning, indexing, motion estimation, or image analysis (e.g., to identify markers in a document, such as photographs, signatures, identification information, travel information, etc.), and so forth. One or more computer vision algorithms may be used to perform these tasks. Non-limiting examples of computer vision algorithms include: scale Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), oriented FAST and rotated BRIEF (ORB), binary robust invariant extensible key points (BRISK), FAST retinal key points (FREAK), Viola-Jones algorithm, Eigenfaces method, Lucas-Kanade algorithm, Horn-Schunk algorithm, mean-shift algorithm, visual simultaneous localization and mapping (vSLAM) techniques, sequential bayesian estimators (e.g., kalman filters, extended kalman filters, etc.), beam conditioning, adaptive thresholding (and other thresholding techniques), Iterative Closest Point (ICP), semi-global matching (SGM), semi-global block matching (SGBM), feature point histograms, various machine learning algorithms (such as, for example, support vector machines, k-nearest neighbor algorithms, naive bayes, neural networks (including convolutional or deep neural networks) or other supervised/unsupervised models, etc.), and the like.

Object recognition may additionally or alternatively be performed by various machine learning algorithms. After training, the machine learning algorithm may be stored by the HMD. Some examples of Machine learning algorithms may include supervised or unsupervised Machine learning algorithms, including regression algorithms (such as, for example, ordinary least squares regression), instance-based algorithms (such as, for example, learning vector quantization), decision tree algorithms (such as, for example, classification and regression trees), bayesian algorithms (such as, for example, naive bayes), clustering algorithms (such as, for example, k-means clustering), association rule learning algorithms (such as, for example, a priori algorithms), artificial neural network algorithms (such as, for example, Perceptron), Deep learning algorithms (such as, for example, Deep Boltzmann Machine or Deep neural network), dimension reduction algorithms (such as, for example, principal component analysis), ensemble algorithms (e.g., Stacked genetics), and/or other Machine learning algorithms. In some embodiments, separate models may be customized for each data set. For example, the wearable device may generate or store a base model. The base model may be used as a starting point for generating additional models that are specific to the type of data (e.g., a particular user in a remote presentation session), the set of data (e.g., a set of additional images obtained by the user in the remote presentation session), conditional cases, or other variations. In some embodiments, the wearable HMD may be configured to utilize a variety of techniques to generate a model for analyzing the aggregated data. Other techniques may include using predefined thresholds or data values.

Based on this information and the set of points in the map database, the object identifiers 708a-708n may identify objects and supplement the objects with semantic information to give the objects life. For example, if the object identifier identifies a set of points as a door, the system may append some semantic information (e.g., the door has a hinge and has 90 degrees of motion around the hinge). If the object identifier identifies a set of points as a mirror, the system may append the mirror with semantic information of a reflective surface that may reflect an image of an object in the room. Semantic information may include the affordability of an object as described herein. For example, the semantic information may include a normal to the object. The system may assign a vector whose direction indicates the normal of the object. Over time, map databases grow as systems (which may reside locally or be accessible through wireless networks) collect more data from the world. Once the object is identified, the information may be transmitted to one or more wearable systems. For example, MR environment 700 may include information about a scene occurring in the state of california. The environment 700 may be transmitted to one or more users of new york. Based on data received from the FOV camera and other inputs, the object identifier and other software components may map points collected from various images, identify objects, etc., so that the scene may be accurately "passed on" to a second user, which may be in a different part of the world. The environment 700 may also use the topology map for positioning purposes.

FIG. 8 is a process flow diagram of an example of a method 800 of rendering virtual content with respect to an identified object. Method 800 describes how a virtual scene may be presented to a user of a wearable system. The user may be geographically distant from the scene. For example, a user may be in new york, but may want to view a scene that is currently occurring in california, or may want to walk with friends residing in california.

At block 810, the wearable system may receive input from the user and other users regarding the user's environment. This may be achieved by various input devices and knowledge already in possession of the map database. At block 810, the user's FOV camera, sensors, GPS, eye tracking, etc. communicate information to the system. At block 820, the system may determine sparse points based on the information. Sparse points may be used to determine pose data (e.g., head pose, eye pose, body pose, or hand gestures) that may be used to display and understand the orientation and position of various objects in the user's surroundings. At block 830, the object identifiers 708a-708n may crawl through these collected points and identify one or more objects using a map database. This information may then be communicated to the user's separate wearable system at block 840, and the desired virtual scene may thus be displayed to the user at block 850. For example, a desired virtual scene (e.g., a user in California) may be displayed in an appropriate orientation, position, etc. relative to various objects and other surrounding environments of the user in New York.

Example communication between multiple wearable systems

Fig. 9A schematically illustrates an overall system view depicting multiple user devices interacting with each other. The computing environment 900 includes user devices 930a, 930b, 930 c. User devices 930a, 930b, and 930c may communicate with each other over a network 990. The user devices 930a-930c may each include a network interface to communicate with remote computing systems 920 (which may also include the network interface 971) via a network 990. Network 990 may be a LAN, WAN, peer-to-peer network, radio, bluetooth, or any other network. The computing environment 900 may also include one or more remote computing systems 920. The remote computing systems 920 may include server computer systems that are clustered and located in different geographic locations. User devices 930a, 930b, and 930c may communicate with remote computing system 920 via network 990.

The remote computing system 920 may include a remote data repository 980, which remote data repository 980 may maintain information about the physical and/or virtual worlds of particular users. The data store 980 may store information relating to the user, the user's environment (e.g., a world map of the user's environment), or the configuration of the user's avatar. The remote data store may be an embodiment of the remote data store 280 shown in fig. 2. The remote computing system 920 may also include a remote processing module 970. Remote processing module 970 may be an embodiment of remote processing module 270 shown in fig. 2. The remote processing module 970 may include one or more processors that may communicate with user devices (930a, 930b, 930c) and a remote data store 980. The processor may process information obtained from the user device and other sources. In some implementations, at least a portion of the processing or storage may be provided by a local processing and data module 260 (as shown in fig. 2). The remote computing system 920 may enable a given user to share information about the particular user's own physical and/or virtual world with another user.

The user device may be a wearable device (such as an HMD or ARD), a computer, a mobile device, or any other device, alone or in combination. For example, user devices 930b and 930c may be embodiments of wearable system 200 shown in fig. 2 (or wearable system 400 shown in fig. 4) that may be configured to present AR/VR/MR content.

One or more user devices may be used with the user input device 466 shown in fig. 4. The user device may obtain information about the user and the user's environment (e.g., using the outward facing imaging system 464 shown in fig. 4). The user device and/or the remote computing system 1220 may use information obtained from the user device to construct, update, and construct a collection of images, points, and other information. For example, the user device may process the acquired raw information and send the processed information to the remote computing system 1220 for further processing. The user device may also send the raw information to a remote computing system 1220 for processing. The user device may receive processed information from the remote computing system 1220 and provide final processing before projection to the user. The user equipment may also process the obtained information and pass the processed information to other user equipment. The user device may communicate with the remote data store 1280 while processing the acquired information. Multiple user devices and/or multiple server computer systems may participate in the construction and/or processing of the acquired images.

Information about the physical world may evolve over time and may be based on information collected by different user devices. The model of the virtual world may also evolve over time and be based on input from different users. Such information and models may sometimes be referred to herein as world maps or world models. As described with reference to fig. 6 and 7, the information obtained by the user device may be used to construct a world map 910. World map 910 may include at least a portion of map 620 depicted in FIG. 6A. Various object identifiers (e.g., 708a, 708b, 708c … 708n) may be used to identify objects and tag images, as well as to attach semantic information to objects. These object identifiers are also depicted in fig. 7.

A remote data store 980 may be used to store data and facilitate construction of the world map 910. The user device may continuously update information about the user's environment and receive information about the world map 910. The world map 910 may be created by a user or others. As discussed herein, the user devices (e.g., 930a, 930b, 930c) and the remote computing system 920 may construct and/or update the world map 910, alone or in combination. For example, the user device may communicate with the remote processing module 970 and the remote data repository 980. The user device may obtain and/or process information about the user and the user's environment. The remote processing module 970 may communicate with the remote data store 980 and user devices (e.g., 930a, 930b, 930c) to process information about the user and the user's environment. The remote computing system 920 may modify information obtained by the user device (e.g., 930a, 930b, 930c), such as, for example, selectively cropping the user's image, modifying the user's background, adding virtual objects to the user's environment, annotating the user's voice with ancillary information, and so forth. The remote computing system 920 may send the processed information to the same and/or different user devices.

Examples of remote presentation sessions

Fig. 9B depicts an example in which two users of respective wearable systems are engaged in a telepresence session. Two users (named Alice 912 and Bob 914 in this example) are shown in the figure. Two users are wearing their respective wearable devices 902 and 904, which wearable devices 902 and 904 may include the HMD described with reference to fig. 2 (e.g., display device 220 of system 200) for representing a virtual avatar of another user in a telepresence session. Two users may use the wearable device for a telepresence session. Note that the vertical lines separating the two users in fig. 9B are intended to illustrate that Alice 912 and Bob 914 may (but need not) be located in two different locations while communicating via telepresence (e.g., Alice may be in her office in atlanta while Bob is outside boston).

As described with reference to fig. 9A, wearable devices 902 and 904 may communicate with each other or with other user devices and computer systems. For example, Alice's wearable device 902 may communicate with Bob's wearable device 904, e.g., via network 990 (shown in fig. 9A). Wearable devices 902 and 904 may track the user's environment and motion (e.g., via respective outward-facing imaging systems 464, or one or more location sensors) and speech (e.g., via respective audio sensors 232) in the environment. Wearable devices 902 and 904 may also track the user's eye movements or gaze based on data acquired by inward facing imaging system 462. In some cases, the wearable device may also capture or track a user's facial expression or other body movement (e.g., arm or leg movement), where the user is near a reflective surface, and the outward facing imaging system 464 may obtain a reflected image of the user to observe the user's facial expression or other body movement.

The wearable device may use information obtained from the first user and the environment to animate an avatar to be rendered by the wearable device of the second user to create a tangible presence of the first user in the environment of the second user. For example, wearable devices 902 and 904, remote computing system 920 may process images or motions of Alice presented by Bob's wearable device 904, or may process images or motions of Bob presented by Alice's wearable device 902, alone or in combination. As described further herein, the avatar may be rendered based on contextual information such as, for example, the user's intent, the user's environment, or the environment in which the avatar is rendered, or other biological characteristics of a person.

Although the example only involves two users, the techniques described herein should not be limited to two users. Multiple users (e.g., two, three, four, five, six, or more) using a wearable device (or other telepresence device) may participate in a telepresence session. During a telepresence session, a wearable device of a particular user may present avatars of other users to the particular user. Further, although the example in this figure shows the user as standing in the environment, the user is not required to stand. During a telepresence session, any user may stand, sit, kneel, lie, walk, or run, or be in any position or motion. In addition to what is described in this example, the user may also be in a physical environment. The user may be in a separate environment while the telepresence session is in progress, or may be in the same environment. Not all users need to wear their respective HMDs in a telepresence session. For example, while Bob 914 is wearing the wearable device 904, Alice 912 can use other image acquisition and display devices, such as a webcam and a computer screen.

Examples of avatars

Fig. 10 shows an example of an avatar perceived by a user of a wearable system. The example avatar 1000 shown in FIG. 10 may be an avatar of Alice 912 (shown in FIG. 9B) standing behind a physical plant in a room. The avatar may include various features such as, for example, size, appearance (e.g., skin tone, complexion, hairstyle, clothing, facial features such as wrinkles, moles, spots, acne, dimples, etc.), location, orientation, motion, pose, expression, and the like. These features may be based on the user associated with the avatar (e.g., Alice's avatar 1000 may have some or all of the features of the actual person Alice 912). As further described herein, the avatar 1000 may be animated based on contextual information, which may include adjustments to one or more features of the avatar 1000. Although generally described herein as representing the physical appearance of a person (e.g., Alice), this is for purposes of illustration and not limitation. The avatar of Alice may represent the appearance of another real or fictitious person in addition to Alice, an anthropomorphic object, a living being, or any other real or virtual representation. Furthermore, the vegetation in fig. 10 need not be physical, but may be a virtual representation of the vegetation presented to the user by the wearable system. In addition, additional or different virtual content may also be presented to the user than that shown in FIG. 10.

Example of a fitting (rigging) System for a virtual character

An animated virtual character, such as a human avatar, may be represented in computer graphics, in whole or in part, as a polygonal mesh. A polygonal mesh, or simply "mesh," is a collection of points in the modeled three-dimensional space. The mesh may form a polyhedral object whose surfaces define the body or shape of the virtual character (or a portion thereof). Although a mesh may include any number of points (within practical limits that may be imposed by available computing power), a finer mesh with more points is generally able to depict a more realistic virtual character with finer details that may more closely approximate real-life people, animals, objects, etc. FIG. 10 shows an example of a grid 1010 around the eyes of the avatar 1000. Avatar

Each point in the grid may be defined by coordinates in the modeled three-dimensional space. The modeled three-dimensional space may be, for example, a cartesian space addressed by (x, y, z) coordinates. The points in the mesh are the vertices of polygons that make up the polyhedral object. Each polygon represents a surface or face of a polyhedral object and is defined by an ordered set of vertices, and the edges of each polygon are rectilinear edges connecting the ordered set of vertices. In some cases, the polygon vertices in the mesh may be different from the geometric polygons, where they are not necessarily coplanar in the 3D graph. In addition, the vertices of the polygons in the mesh may be collinear, in which case the polygons have zero area (referred to as degenerate polygons).

In some embodiments, the mesh is composed of three-vertex polygons (i.e., triangles or simply "tris") or four-vertex polygons (i.e., quadrilaterals or so-called "quads"). However, higher order polygons may also be used in some meshes. In Direct Content Creation (DCC) applications (e.g., applications such as Maya (available from Autodesk, inc.) or Houdini (available from Side Effects Software inc.), the mesh is typically based on quadrilaterals, which are primarily designed for creating and manipulating 3D computer graphics), while the mesh is typically based on triangles in real-time applications.

To animate a virtual character, the mesh of the virtual character may be deformed by moving some or all of the vertices of the virtual character to new positions in space at various times. Deformation may represent both gross movement (e.g., limb movement) and fine movement (e.g., facial movement). These and other deformations may be based on real-world models (e.g., photogrammetric scans of real humans performing body movements, joints, facial distortions, expressions, etc.), development of artistic guidelines (which may be based on real-world sampling), combinations of the same or other techniques. In the early days of computer graphics, mesh deformation could be done manually by independently setting new positions of vertices, but in view of the size and complexity of modern meshes, it is often desirable to use automated systems and processes to generate the deformation. The control systems, processes, and techniques used to produce these deformations are referred to as setups, or simply "setups". The example avatar processing and rendering system 690 of FIG. 6B includes a 3D model processing system 680 that may implement the rigging.

The assembly for the virtual character may use a skeletal system to assist in mesh morphing. The skeletal system includes a set of joints that correspond to points of the joints of the grid. In the context of fitting, a joint is sometimes referred to as a "bone" regardless of differences between these terms as used in an anatomical sense. The joints in the skeletal system may move or otherwise change relative to each other according to transformations that may be applied to the joints. The transformation may include a translation or rotation of space, among other operations. The joints may be assigned a hierarchical relationship (e.g., parent-child relationship) with respect to each other. These hierarchical relationships may allow one joint to inherit transformations or other features from another joint. For example, a child joint in a skeletal system may inherit the transform assigned to its parent joint so that the child joint moves with the parent joint.

The skeletal system of the virtual character may be defined with joints at appropriate locations and with appropriate local axes of rotation, degrees of freedom, etc., to allow a desired set of mesh deformations to be performed. Once the skeletal system is defined for the virtual character, the amount of influence on the various vertices in the mesh can be assigned for each joint in a process called "skinning. This may be done by assigning a weight value to each vertex of each joint in the skeletal system. When a transformation is applied to any given joint, vertices affected by that joint may be automatically moved or otherwise changed by an amount that may depend on their respective weight values according to the joint transformation.

The assembly may include a plurality of skeletal systems. One type of skeletal system is a core skeleton (also referred to as a low-level skeleton), which may be used to control large-scale movements of a virtual character. For example, in the case of a human avatar, the core skeleton may resemble the anatomical skeleton of a human. Although the core skeleton for assembly purposes may not map precisely to an anatomically correct skeleton, it may have a subset of joints with similar orientation and motion characteristics in similar locations.

As described above, the skeletal system of joints may be hierarchical, e.g., with parent-child relationships between joints. When a transformation (e.g., a change in position and/or orientation) is applied to a particular joint in the skeletal system, the same transformation may be applied to all other lower level joints within the same hierarchy. For example, in the case of an assembly for a human avatar, the core skeleton may include separate joints for the avatar's shoulder, elbow, and wrist. Wherein the shoulder joint may be assigned to the highest level in the hierarchy, whereas the elbow joint may be assigned to a sub-level of the shoulder joint and the wrist joint may be assigned to a sub-level of the elbow joint. Thus, when a particular translation and/or rotation transformation is applied to the shoulder joint, the same transformation may also be applied to the elbow joint and wrist joint so that they are translated and/or rotated in the same manner as the shoulder.

Although the name has meaning, the skeletal system in assembly does not necessarily represent an anatomical skeleton. In assembly, the skeletal system may represent various hierarchies for controlling mesh deformation. For example, hair may be represented as a series of joints in a hierarchical chain; skin movements due to avatar's facial distortion (which may represent expressions such as smiling, frowning, laughing, speaking, blinking, etc.) may be represented by a series of facial joints controlled by facial fittings; muscle deformation may be modeled by joints; and the motion of the clothing may be represented by a joint grid.

The rigging of a virtual character may include multiple skeletal systems, some of which may drive the motion of other skeletons. A low-order skeleton system is a system that drives one or more high-order skeleton systems. In contrast, a high-order skeleton system is a system driven or controlled by a low-order skeleton system. For example, while the motion of the core skeleton of a character may be manually controlled by an animator, the core skeleton may in turn drive or control the motion of a high-level skeleton system. For example, higher order auxiliary joints (possibly without anatomical analogs in the physical skeleton) may be provided to improve mesh deformation caused by the motion of the core skeleton. The transforms applied to these and other joints in the higher-order skeleton system can be algorithmically derived from the transforms applied to the lower-order skeleton. The high-order skeleton may represent, for example, muscle, skin, fat, clothing, hair, or any other skeletal system that does not require direct animation control.

As already discussed, the transformations may be applied to joints in a skeletal system in order to perform mesh morphing. In the context of assembly, the transformation includes a function that accepts one or more given points in 3D space and produces an output of one or more new 3D points. For example, the transformation may accept one or more 3D points defining a joint and may output one or more new 3D points specifying the transformed joint. The joint transformation may include, for example, a translation component, a rotation component, and a scaling component.

A translation is a transformation that moves a set of one or more specified points in the modeled 3D space by a specified amount without changing the orientation or size of the set of points. Rotation is a transformation that rotates a set of one or more specified points in the modeled 3D space by a specified amount about a specified axis (e.g., rotates each point in the mesh by 45 degrees about the z-axis). An affine transform (or 6 degrees of freedom (DOF) transform) is one that includes only translation and rotation. The application of an affine transformation can be thought of as moving a set of one or more spatial points without changing their size, but the orientation can change.

Meanwhile, the scaling transformation is a transformation that modifies one or more specified points in the modeled 3D space by scaling their respective coordinates by a specified value. This will change the size and/or shape of the transformed set of points. The uniform scaling transform may scale each coordinate by the same amount, while the non-uniform scaling transform may scale the (x, y, z) coordinates of a specified point independently. For example, a non-uniform scaling transform may be used to provide a squeezing and stretching effect, such as may result from muscle action. Another type of transformation is a clipping transformation. A shear transform is a transform that modifies a set of one or more specified points in the modeled 3D space by translating the coordinates of the points by different amounts based on the distance of the coordinates from the axis.

When a transformation is applied to a joint to move it, the vertices under the influence of that joint will also be moved. This results in deformation of the mesh. As described above, the process of assigning weights to quantify the effect of each joint on each vertex is referred to as skinning (or sometimes "weight painting" or "skinning weighting"). The weight is typically a value between 0 (representing no influence) and 1 (representing complete influence). Some vertices in the mesh may be affected by only a single joint. In that case, the vertices are assigned a weight value of 1 for that joint, and their positions change based on the transformations assigned to that particular joint but not other joints. Other vertices in the mesh may be affected by multiple joints. In this case, separate weights would be assigned to all those vertices affecting the joint, with the sum of the weights for each vertex being equal to 1. The positions of these vertices are changed based on the transformations assigned to all those affected joints.

Assigning weights to all vertices in the mesh can be very laborious, especially as the number of joints increases. Balancing weights in response to transforms applied to joints to achieve a desired mesh deformation can be very difficult for even a trained artist. In the case of real-time applications, the task can be further complicated by the fact that many real-time systems also impose limits on the number of joints (typically 8 or less) that can be weighted to a particular vertex. This limitation is typically imposed by the efficiency of the Graphics Processing Unit (GPU).

The term skinning also refers to the process of actually deforming the mesh using the assigned weights based on the transformations applied to the joints in the skeletal system. For example, an animator may specify a series of core skeletal joint transformations to generate a desired character motion (e.g., a running motion or a dance motion). When a transformation is applied to one or more joints, a new position is calculated for the vertex under the influence of the transformed joint. Typically, the new position for any given vertex is calculated as a weighted average of all joint transformations affecting that particular vertex. There are many algorithms for calculating this weighted average, but the most common and used algorithm in most real-time applications because of its simplicity, ease of use and control is the linear hybrid skin (LBS). In linear hybrid skinning, each joint transform for which a vertex has a non-zero weight is used to calculate a new position for each vertex. The new vertex coordinates resulting from each of these joint transformations are then averaged in proportion to the respective weight assigned to its vertex for each of the joints. In practice, there are well known limitations to LBS, and much work in making high quality assemblies has been devoted to discovering and overcoming these limitations. A number of auxiliary joint systems have been specifically designed for this purpose.

In addition to the skeleton system, a "hybrid shape" may also be used in the assembly to create mesh deformations. A hybrid shape (sometimes also referred to as a "deformation target" or simply "shape") is a deformation that is applied to a set of vertices in a mesh, where each vertex in the set of vertices is moved by a specified amount in a specified direction based on a weight. For a particular hybrid shape, each vertex in the set may have its own custom motion, and moving the vertices in the set at the same time will generate the desired shape. The custom motion of each vertex in the hybrid shape may be specified by a "delta," which is a vector representing the amount and direction of XYZ motion applied to the vertex. To name a few possibilities, hybrid shapes may be used to create facial deformations that move, for example, the eyes, lips, eyebrows, nose, dimples, and the like.

The hybrid shape can be used to deform the mesh in an art-guided manner. They can provide great control force because the exact shape can be carved or captured by scanning the model. However, the benefit of the hybrid shape is at the cost of having to store the increments for all vertices in the hybrid shape. For animated characters with fine meshes and many mixed shapes, the amount of incremental data can be large.

Each blend shape may be applied to a specified degree by using the blend shape weight. These weights are typically in the range of 0 (no mixing shape applied at all) to 1 (where the mixing shape is fully activated). For example, a mixed shape of moving character eyes may be applied with less weight to move eyes by a small amount, or a greater weight may be applied to produce greater eye movement.

The assembly may apply multiple hybrid shapes in combination with each other to achieve the desired complex deformation. For example, to create a smile, the fitting may apply a mixed shape to pull the corners of the lips, raise the upper lip and lower the lower lip, and move the eyes, eyebrows, nose, and dimple. A desired shape that is combined from two or more mixed shapes is referred to as a combined shape (or simply "combination").

One problem that may result from applying two hybrid shapes in combination is that the hybrid shapes may operate on some of the same vertices. When both hybrid shapes are activated, the result is referred to as a double transition or "pull-out model". The solution to this problem is typically to correct the blend shape. A correct hybrid shape is a special hybrid shape that represents the desired deformation with respect to the currently applied deformation, rather than the desired deformation with respect to neutrality. The correct hybrid shapes (or just "correct") may be applied based on the weights of the hybrid shapes they are correcting. For example, the weight of the correction blend shape may be made proportional to the weight of the base blend shape that triggered the application of the blend correction shape.

Correcting the hybrid shape may also be used to correct skin anomalies or improve the quality of the deformation. For example, a joint may represent the motion of a particular muscle, but as a single transformation it cannot represent all the non-linear behavior of skin, fat and muscle. Applying a correction or a series of corrections upon muscle activation may result in a more pleasant and convincing deformation.

The assembly is built into the layers, with lower simpler layers typically driving higher order layers. This applies to both skeletal systems and mixed shape deformations. For example, as already mentioned, an assembly for an animated virtual character may include a high-level skeletal system controlled by a low-level skeletal system. There are many ways to control high-order skeletons or hybrid shapes based on low-order skeletons, including constraints, logic systems, and gesture-based morphing.

A constraint is typically a system in which a particular object or joint transformation controls one or more components of a transformation applied to another joint or object. There are many different types of constraints. For example, the target constraint changes the rotation of the target transformation to point in a particular direction or a particular object. The parent constraints act as virtual parent-child relationships between the transformed pairs. Location constraints constrain the transformation to a particular point or a particular object. Orientation constraints limit the transformation to a particular rotation of the object.

A logic system is a system of mathematical equations that produces some outputs given a set of inputs. These are assigned rather than learned. For example, a hybrid shape value may be defined as the product of two other hybrid shapes (this is an example of a corrected shape called a composite or composite shape).

Gesture-based deformation may also be used to control higher order skeletal systems or hybrid shapes. The pose of a skeletal system is defined by a set of transformations (e.g., rotations and translations) of all of the joints in the skeletal system. Gestures may also be defined for a subset of joints in the skeletal system. For example, arm gestures may be defined by transformations applied to the joints of the shoulder, elbow, and wrist. A Pose Space Deformer (PSD) is a system for determining a deformation output for a particular pose based on one or more "distances" between the pose and defined poses. These distances may be metrics that characterize how different one pose is from another. The PSD may include a pose interpolation node, for example, that accepts a set of joint rotations (defining poses) as input parameters and, in turn, outputs normalized per-pose weights to drive a deformer, such as a hybrid shape. The gesture interpolation node may be implemented in a number of ways, including using Radial Basis Functions (RBFs). The RBF may perform a functional mathematical approximation of machine learning. The RBF may be trained using a set of inputs and their associated expected outputs. The training data may be, for example, sets of joint transformations (which define particular gestures) and corresponding hybrid shapes that are applied in response to the gestures. Once the function is learned, new inputs (e.g., gestures) can be given and their expected outputs can be efficiently calculated. RBFs are a subtype of artificial neural networks. The RBF can be used to drive an assembled higher-level component based on the state of the lower-level component. For example, the pose of the core skeleton may drive the auxiliary joints and corrections at a higher level.

These control systems may be linked together to perform complex actions. For example, an eye set may contain two "look around" values for horizontal and vertical rotation. These values may be passed through some logic to determine the exact rotation of the eye joint transformation, which may then be used as an input to the RBF that controls the hybrid shape that changes the shape of the eyelid to match the position of the eye. The activation values for these shapes may be used to drive other components of the facial expression using additional logic, and so on.

The goal of the mounting system is generally to provide a mechanism that produces pleasing high fidelity deformations based on a simple human understandable control system. In the case of real-time applications, the goal is generally to provide an assembly system that is simple enough to run in real-time on, for example, the VR/AR/MR system 200, while making as little compromise as possible to the final quality. In some embodiments, the 3D model processing system 680 executes a fitting system to animate the avatar in real-time to interact (with the user of the VR/AR/MR system) in the mixed reality environment 100 and provide appropriate contextual avatar behavior (e.g., intent-based behavior) in the user's environment.

Gesture space dimension reduction for gesture space deformation of virtual characters

As discussed herein, assembly elements such as deformable meshes and one or more underlying skeletal systems can be used to animate virtual characters. For example, the joints of the core skeletal system may be transformed to various positions and/or orientations (e.g., using translation and/or rotation transformations) to place the virtual character in various different poses. FIG. 11 illustrates several arm gestures of a virtual character, in this case, the virtual character is an avatar of a human. Each gesture in fig. 11 is defined by a set of rotational transformations applied to the respective arm joints. Each arm joint in the avatar's core skeleton may each be rotated in three-dimensional space to a selected angle to transition the arm from a first pose (e.g., a base pose) to a second pose. In a similar manner, a set of rotational transformations for all respective joints in the avatar's core skeleton may define other gestures, with each unique set of rotational transformations defining a unique gesture.

The euler angles can be used to specify the rotation of the respective joints for different poses of the skeletal system of the virtual character. Each joint may be associated with a local coordinate system XYZ. A set of euler angles may be used to specify the rotation of the local coordinate system XYZ of a given joint relative to a reference coordinate system XYZ (e.g., a world coordinate system or a coordinate system associated with the joint when the virtual character is in a base pose). The angular orientation of the joint in three-dimensional space may be specified by a set of three euler angles. In some embodiments, the first euler angle α represents an angle between the x-axis and the N-axis when a so-called x convention (convention) is used (or an angle between the y-axis and the N-axis when a y convention is used), wherein the N-axis is a line of nodes (nodes) defined by the intersection of XY and XY planes; the second euler angle β may represent the angle between the Z-axis and the Z-axis; and the third euler angle γ may represent an angle between the N-axis and the X-axis when X is used (or an angle between the N-axis and the Y-axis when Y is used). Euler angles such as these may be used to specify any desired angular orientation of the joint relative to the reference coordinate system.

The vectors of the euler angles (α, β, γ) of the joints may be mapped to a 3 × 3 rotation matrix. The elements of the rotational matrix may correspond to the cosine of the angle between the axes of the local coordinate system XYZ of the joint and the axes of the reference coordinate system XYZ. Thus, the rotation matrix may be referred to as a direction cosine matrix.

Multiple sets of euler angles may be used to describe different poses of the skeletal system of the virtual character. For example, if the skeletal system includes M joints, and if the angular orientation of each joint in a pose is defined by three euler angles (α, β, γ), then it may be determined by the vector x ═ α1,β1,γ1…αM,βM,γM]To specify a gesture.

Various example poses, each specified by a unique vector x, can be used in an assembly technique called pose space morphing to determine the mesh morphing of a virtual character. Pose space morphing is an assembly technique based on the following assumptions: the deformation of the mesh of the virtual character is a function of the character pose. Thus, pose space deformation can be used to compute a deformation of the virtual character mesh based on the input of the character pose.

Machine learning techniques may be used to train the gesture space deformer. The training data may include mesh deformations (e.g., deformations of skin, muscles, clothing, etc.) specified for a plurality of example poses of a basic skeleton system of the virtual character (e.g., the character's core skeleton). For example, the mesh deformation associated with the example pose may be obtained by scanning a gestural physical model of the virtual character. The example gestures collectively comprise a gesture space. Once the gesture space deformer is trained using example gestures, the deformation of other gestures (which may be specified by, for example, an animation sequence) may be calculated by interpolation (e.g., using radial basis functions). In particular, the gesture space deformer may provide inputs corresponding to respective rotational states of joints of a basic skeletal system of the virtual character. Based on these inputs, the gesture space deformer may calculate interpolated mesh deformations for the input gestures based on example gestures in the gesture space.

The gesture space may be composed of N gestures (x)1,x2…xN) Composition, where N is any positive integer, is referred to as the dimension of the gesture space. The dimensions of the gesture space are the number of example gestures used to train the gesture space deformer. High-dimensional pose spaces may be advantageous because they may be used to achieve realistic, high-fidelity mesh deformation. However, high-dimensional gesture spaces may disadvantageously require a large amount of computing resources, such as computer memory and/or storage. By simply including fewer example gestures in the gesture space, the dimensions of the gesture space may be reduced to conserve computing resources. However, this approach may negatively impact the fidelity of the mesh deformation computed by the pose space deformer. It would therefore be advantageous if the dimensions of the gesture space could be reduced while still allowing the gesture space deformer to produce realistic mesh deformations.

Techniques are described for reducing dimensions of an input gesture space in a manner that reduces the impact of lower dimensions on the fidelity of mesh deformations computed via gesture space deformations. In some embodiments, multiple example gestures that are part of an input gesture space may be clustered together, and a single representative gesture for the entire cluster may be used in an output gesture space. This clustering may be accomplished using, for example, k-means clustering techniques. k-means clustering is a technique that can divide example gestures that make up an input gesture space into k clusters. Each of the k clusters may be characterized by a mean. Depending on the selected metric, each example gesture in the input gesture space may be considered to belong to the cluster whose mean is most similar to that particular example gesture. The process of determining a cluster mean and assigning each example gesture to a cluster may be iteratively repeated.

The mean of each of the k clusters may correspond to a pose that may be considered to represent the pose of the entire cluster. Thus, the average gesture for each cluster may be replaced in the output gesture space in place of all example gestures belonging to that particular cluster.

As part of the clustering process, each example gesture in the input gesture space may be mapped to a point in the multidimensional space. As already discussed, each gesture may be represented by a vector of euler angles, x ═ α1,β1,γ1…αM,βM,γM]. The elements in the vector x can be considered as coordinates in a multidimensional space. Thus, vector x of N example gestures in the input gesture space1…xNCan each be used to define a point in space for the respective example gesture. The points may be grouped into clusters using a metric that indicates the similarity of each point to the respective cluster. For example, the metric may be a distance from a point (corresponding to a particular gesture) to a center of a particular cluster in the multidimensional space. The center of the cluster may be calculated as, for example, the mean point or centroid of the cluster.

Other metrics may also be used to determine the similarity of a particular example gesture to a particular cluster. For example, in some embodiments, a weighted distance metric may be used. In such embodiments, some joints in the skeletal system may be weighted to influence the distance metric to a greater degree, while other joints may be weighted to influence the distance metric to a lesser degree.

Since the clustered gestures are similar to each other, as determined by the selected metric, the representative gesture for each cluster can be used in the output gesture space in place of all gestures belonging to that particular cluster. When a set of N example gestures is divided into k clusters, the dimensionality of the input gesture space may be reduced from N to k, where k is a positive integer less than N.

However, complications arise when mapping an example gesture to a point in a multi-dimensional space based on a corresponding vector of Euler angles. For example, a mathematical singularity may occur when the second euler angle β for a given joint is 90 °. Such singular points result in a plurality of possible sets of 3-value Euler angles (e.g., an infinite number of sets of 3-value Euler angles) representing the same angular rotation of the joint. For example, a set of euler angles (45 °,90 °,45 °) represents the same rotation as a set of euler angles (90 °,90 °,90 °).

Therefore, when the euler angle is used as a coordinate in a space, a posture including joint rotation affected by such a mathematical singular point can be mapped to any one of a plurality of different points in the space. Since such gestures may be mapped to any of a number of different points in space, they may be grouped into different clusters according to a particular set of euler angles (including a number of equivalent possibilities) that happen to be specified. This means that it is possible to cluster different gestures together, or conversely, similar gestures may be grouped into separate clusters. Thus, singularities that may occur when using euler angles to represent angular rotations can disrupt the clustering process and in turn can negatively impact the fidelity of the mesh deformation computed by the pose space deformer.

To avoid clustering complexity caused by mathematical singularities that may occur when using euler angles to represent joint rotation, a singularity-free representation of the angular rotation of the joint may be used instead in the various example poses that make up the input pose space. Quaternions are examples of such non-singular point representations. Thus, in some embodiments, quaternions are used to specify joint rotations for example gestures in the input gesture space for clustering purposes.

A quaternion may be represented by the vector q ═ x, y, z, w ], where the vector v ═ x, y, z defines the axis of rotation of the joint, and θ ═ acos (w) defines the angle of rotation about that axis. The mathematical singularity types that affect the euler angle representation do not affect the quaternion representation. Thus, the various example gestures that make up the input gesture space can be clustered more accurately when represented as quaternions than when the example gestures are represented using euler angles.

Fig. 12A illustrates a method 1200 for reducing a dimension of an input gesture space of a gesture space deformer using k-means clustering. The method 1200 may be performed by a hardware computer processor executing instructions stored in a non-transitory storage medium. FIG. 12B illustrates an example set of clusters formed using the method of FIG. 12A.

According to the method 1200, quaternions are used to represent angular rotations of joints in a skeletal system for a virtual character. Thus, any example gesture whose joint rotation is initially represented using Euler angles may be translated to alternatively use a quaternion representation, as shown in block 1205. After this conversion, each joint rotation may be of the form q ═ x, y, z, w]And each pose may consist of M joint rotations, and may be composed of a vector x ═ x1,y1,z1,w1,…xM,yM,zM,wM]Where M is the number of joints in the skeletal system, and can be any positive integer.

Once the euler angles of joint rotation are converted to quaternions, at block 1210, each example gesture in the input gesture space may be mapped to a point in a multi-dimensional space. Each example gesture may be represented by the form x ═ x1,y1,z1,w1,…xM,yM,zM,wM]Is represented by a vector of (a). The elements of the vector for each gesture may be used as coordinates to map the gesture to a point in a multidimensional space for clustering. The input gesture space may include N example gestures, and vector x1…xNCan be used to define a point in space for the respective example gesture. FIG. 12B illustrates an example set of gestures mapped to points in space, where each point 1201 represents an example gesture.

At block 1215, the hardware computer processor may determine an initial mean point or centroid in the multidimensional space of k clusters, where k is any positive integer. (the value of k may be determined using the technique described below with reference to FIG. 13.) in some embodiments, the initial positions of the k centroids are selected at different random locations in the multidimensional space. However, other techniques may be used to determine the initial position of the k centroids. For example, each of the k clusters may have an initial centroid position based on one of the example gestures in the input gesture space. In this case, one example gesture (whether random or in a predetermined manner) may be selected and assigned to each of the k clusters. The coordinates of the initial centroid of each cluster may be determined by a set of quaternions for the example gesture selected for that cluster.

At block 1220, each example gesture that collectively constitutes the input gesture space may be assigned to one of k clusters. In some embodiments, the assignment may be based on a geometric distance between the cluster centroid and the point corresponding to the example gesture. For example, the hardware computer processor may calculate a geometric distance between each cluster centroid and each point corresponding to the example gesture. The hardware computer processor may then assign each example gesture to the cluster whose centroid is closest.

Once each example gesture in the gesture space has been assigned to one of the k clusters, the hardware computer processor may recalculate the centroid of each cluster at block 1225. Can be determined by finding the quaternion q currently assigned to the clusteri=[xi,yi,zi,wi]To compute an updated centroid for each of the k different clusters. According to the quaternion algorithm, the mean m of the n quaternions q1, q2, …, qn can be calculated by the following expression:whereinv=[xi,yi,zi]And θ ═ acos (w)i)。

The process of assigning the example gesture to the cluster (block 1220) and the process of updating the cluster centroid (block 1225) may be repeated until the number of iterations exceeds a threshold, as shown in block 1230, or until a convergence condition is satisfied, as shown in block 1235. For example, if the assignment of each example pose to k clusters does not change from one iteration to the next, then the convergence condition may be considered to be satisfied.

Fig. 12B shows an example result of this clustering process in the case where k is 3. Each example gesture is represented by a point 1201. Each example gesture has been assigned to cluster a, cluster B, or cluster C. The boundaries between clusters are depicted by lines. The centroid of each cluster is represented by stars 1202a through 1202C, where the centroid of cluster a is 1202a, the centroid of cluster B is 1202B, and the centroid of cluster C is 1202C.

Once each example gesture in the input gesture space is ultimately assigned to one of the k clusters, the hardware computer processor may specify one example gesture to represent each cluster in the output gesture space. In some embodiments, the representative gesture for a particular cluster may be one of the example gestures assigned to that cluster. For example, an example gesture whose corresponding point in the multidimensional space is closest to the centroid of the cluster may be designated as a representative gesture for the entire cluster. In other embodiments, the coordinates of the centroid of the cluster may be specified as the joint rotation for the representative gesture.

In this way, the dimensionality of the input gesture space may be reduced from N example gestures to k example gestures, where k is the number of clusters. Moreover, since the example gestures of a cluster are similar to each other, the reduction in dimensionality of the gesture space may have less of an impact on the fidelity of the mesh deformation computed by the gesture space deformer than reducing the dimensionality of the gesture space in a less complex manner (e.g., by randomly selecting a subset of the example gestures from the input gesture space). The reduced dimensionality of the output pose space may enable high fidelity and real-time computation of mesh deformation in response to virtual character pose changes.

However, the effectiveness of the method 1200 shown in FIG. 12A may be sensitive to the number of clusters into which the example gestures are grouped from the input gesture space. If example gestures that collectively make up the input gesture space are clustered into a relatively large number of clusters, the fidelity of the gesture space deformation will be reduced by a relatively small amount. However, the dimensions of the input gesture space will likewise be reduced by only a relatively small amount. Conversely, if the example gestures are clustered into a relatively small number of clusters, the dimensionality of the input gesture space will be reduced by a relatively large amount, but the fidelity of the gesture space deformation may also be reduced by a relatively large amount. Since the reduction of the input gesture space dimensions depends on the number of clusters (i.e. the value of k), it may be advantageous to select the number of clusters that strike an effective balance between the reduction of the dimensions of the input gesture space and the fidelity of the resulting gesture space deformation.

Various factors can complicate the selection of the number of clusters. One of these factors is the distribution of the example gestures in the input gesture space. For example, if the various example gestures in the input gesture space are widely distributed such that the various example gestures are relatively different from one another, a larger number of clusters may be required to partition the different example gestures while still maintaining an acceptable degree of fidelity in the computational deformation of the virtual character mesh. However, if the distribution of the various example gestures in the input gesture space is so narrow that the various example gestures are relatively similar, a smaller number of clusters may be required to partition the different example gestures while still maintaining the desired degree of fidelity in mesh deformation.

When the number of clusters is appropriately selected, the dimensionality of the input gesture space can be reduced while maintaining the degree of fidelity required in the mesh deformation computed by the gesture space deformer. Indeed, in some embodiments, the fidelity of the mesh deformation calculated with the reduced-dimension output gesture space may be comparable to that which may be achieved using the higher-dimension input gesture space.

One technique for selecting the number of clusters is to perform the method 1200 shown in fig. 12A for a plurality of different candidate values k to calculate the error metric associated with each candidate value k, and then select the value of k based on the error metric associated with the candidate value. In some embodiments, for all example gestures in the input gesture space, the error metric is the sum of the squared distances between the point corresponding to each example gesture and the centroid of its assigned cluster. This error metric may be referred to as the sum of squared errors. However, other error metrics may be used.

FIG. 13 is a curve illustrating an example technique for selecting a number of clusters for reducing input gesture space dimensions. Fig. 13 plots an example plot 1300 of the sum of squared errors as a function of k value. To generate the data shown in the example plot 1300, the hardware computer processor may perform the method 1200 shown in FIG. 12A for each of a plurality of candidate values k. In the example shown, the candidate values are k 1 to k 12. Other ranges of candidate values may be used in other embodiments depending on the particular application. The hardware computer processor may then sum the squared error of each candidate value.

When the number of clusters k is relatively small, the sum of squared errors is relatively large. As the value of k increases, the sum of the squared errors decreases and begins to approach zero. It is often the case that the sum of squared errors initially decreases rapidly as the number of clusters increases. This is shown by the steep slope of the curve 1300 for lower candidate values k. The descent speed will typically slow at some point. The decrease in slope of the curve 1300 is evident for higher candidate values k. Once the candidate value k is equal to the number of example gestures in the input gesture space, the sum of the squared errors will eventually become zero, since in this case each input gesture sample constitutes its own cluster. In contrast, the sum of the maximum squared errors is typically when k is 1, which means that all example gestures are placed in a single cluster.

In some embodiments, the k value used to generate the output pose space may be selected based on its error metric satisfying a selected criterion. For example, the criterion may be that the rate of change of the error metric exceeds a specified threshold. Referring to fig. 13, the rate of change of the error metric may be determined by analyzing the slope of curve 1300. The slope of the curve 1300 between pairs of adjacent candidate values k may be calculated and the candidate value k at which the slope exceeds a specified threshold is selected as the value of k. Referring to fig. 13, the slope of curve 1300 may be calculated between k-1 and k-2, between k-2 and k-3, between k-3 and k-4, between k-4 and k-5, and so on. The value of k (in this case k 4, as shown by line 1305) at which the slope of the curve exceeds a specified threshold may be selected as the value of the number of clusters. In some embodiments, the magnitude of the threshold rate of change may be from about 0 to about 1 (i.e., from about 0 ° to 45 °), although other thresholds may also be used.

Table 1 shows an example of several reduced-dimension gesture spaces generated by the example method shown in fig. 12A.

TABLE 1

As shown in table 1, an example of an input gesture space for computing gesture-based deformations of the deltoid and pectoral muscles of a human avatar includes 1944 gestures. Using the techniques described herein, this may be reduced to 1189 gestures in the output gesture space. Similarly, the example input gesture space for computing quadriceps femoris deformation initially includes 1296 gestures, but is now reduced to 757 gestures. Comparable results were also obtained for other various muscles and body parts of the avatar virtual character, such as the trapezius muscle, latissimus dorsi, biceps brachii and calf muscles, and the hip. However, the input posture space for the scapula, sternocleidomastoid and scalene muscles, and tendon muscles is excluded. The dimensionality of these gesture spaces is not reduced because the corresponding input gesture space includes only relatively few example gestures, thus reducing the efficiency of clustering. In general, for the gesture space shown in table 1, the number of input gestures is reduced from 6891 to 4193. In some embodiments, the dimensions of the input gesture space may be reduced by 30% or more, or 40% or more, or 50% or more, using the techniques described herein. Furthermore, the output gesture space may still represent 70% or more, or 80% or more, or 90% or more of the information in the input gesture space.

Compression of mixed shapes using principal component analysis

In some embodiments, the gesture space deformer may control weights of one or more hybrid shapes used to deform the mesh of the virtual character. The hybrid shape may be represented as an N × M warping matrix a, which includes delta values for warping the grid. Each of the M columns of the deformation matrix may correspond to a particular vertex in the mesh, while each of the N rows may correspond to a set of vertices that make up the hybrid shape. For avatars with high resolution meshes and many hybrid shapes, the deformation matrix may become large and consume a large amount of computing resources. Therefore, it would be advantageous to reduce the size of the deformation matrix while still providing high fidelity grid deformation.

Techniques are described herein for reducing the dimensions of an input deformation matrix (i.e., reducing the number of hybrid shapes in the deformation matrix) using Principal Component Analysis (PCA). PCA can be used to achieve this goal without losing more than a specified amount of variation or information contained in the input deformation matrix.

The columns of increment values for each vertex in the mesh in the input deformation matrix may be used to define a point in an N-dimensional space, where N is equal to the number of hybrid shapes in the input deformation matrix. PCA can be used to compute a set of basis vectors for the input deformation matrix, which means that each point in the N-dimensional space can be represented as a linear combination of basis vectors. These basis vectors are the principal components of the input deformation matrix and have the following properties: the first principal component has the potential for maximum variation (variance) while each subsequent principal component has the potential for the next maximum variation, but obeys the constraint of being orthogonal to all previous principal components. For example, if all points specified by the input deformation matrix are plotted in the N-dimensional space, the first principal component is a vector that points to the direction of maximum variation between the points. The second principal component is orthogonal to the first principal component and is directed to the direction of maximum variation among the points specified by the input deformation matrix, subject to the constraint of being orthogonal to the first principal component. The third principal component is orthogonal to both the first principal component and the second principal component. The third principal component points in the direction of maximum variation, obeying the orthogonal constraint with the first and second principal components. For each of the N principal components of the input deformation matrix, and so on. Each principal component also has an associated eigenvalue that indicates the amount of change in the input data that occurs in the direction of the associated eigenvector.

FIG. 14 shows an example of a set of plotted points of an input deformation matrix having N hybrid shapes. As just discussed, each point is a point in N-dimensional space. Although the deformation matrix may include many mixed shapes (e.g., N equals tens, hundreds, or thousands of mixed shapes), for ease of illustration, fig. 14 shows points projected on the plane of the first two principal components. As shown in fig. 14, the first principal component PC1 points in the direction of greatest change between the plotted points. The second principal component PC2 points in the direction of the next largest change, subject to the constraint of being orthogonal to the first principal component PC 1. Although not shown, so does the subsequent principal component. Each point in fig. 14 can be represented as a linear combination of principal components. The first principal component may be used to define a new hybrid shape that may account for most of the changes in the input deformation matrix. The second principal component may be used to define another new blending shape that may account for the next largest amount of variation in the input deformation matrix. The rest of the principal components and so on.

For an N-dimensional input deformation matrix, each of the N principal components may be used to define a new mixture shape. These new hybrid shapes may be organized in an output deformation matrix a'. Each successive new hybrid shape occupies less variation in the input deformation matrix than the previous one. This is shown in fig. 15, and fig. 15 is an example graph of the change ratio occupied by each of the 30 principal components of the input deformation matrix composed of 30 mixed shapes. As is apparent from the figure, as the number of principal components increases, the proportion of change of each principal component decreases. The last few principal components account for relatively small variations in the input deformation matrix. Since the principal component N includes the least amount of information from the input deformation matrix, it and its corresponding hybrid shape may be omitted from the output deformation matrix in some embodiments. In a similar manner, the principal components (N-1), (N-2), etc. and their corresponding hybrid shapes may also be omitted from the output warp matrix, depending on the amount of information that is desired to be retained from the input warp matrix. This process of omitting the mixed shape corresponding to the last principal component reduces the dimensionality of the output deformation matrix while retaining at least a specified amount of information from the input deformation matrix.

The principal component of the input deformation matrix can be calculated in a number of different ways. In some embodiments, a covariance matrix or a correlation matrix of the input deformation matrix a may be calculated. An eigen decomposition may then be performed on the covariance matrix or the correlation matrix. The eigen decomposition of the covariance matrix or correlation matrix gives a set of linearly independent eigenvectors. These eigenvectors (which can be normalized to a single length) are the principal components. The principal components may also be calculated using other techniques, such as Singular Value Decomposition (SVD). In addition to the principal components, SVD generates a set of values called singular values. The singular values of each principal component represent the amount of variation in the input deformation matrix that is interpreted by that principal component.

FIG. 16 is a flow diagram depicting an example method for reducing the dimensionality of a deformation matrix using Principal Component Analysis (PCA). The method may be performed by a hardware computer processor executing instructions stored in a non-transitory storage medium. The method begins at block 1605, where an input deformation matrix is initialized to an nxm matrix of hybrid shapes. Then, as shown in block 1610, the N principal components of the input deformation matrix are computed. As already discussed, each principal component is associated with a new hybrid shape. Depending on the percentage of data to be retained from the input deformation matrix, some principal components are omitted, as shown in block 1620. The number of principal components that are omitted may be selected based on the percentage of changes in the input deformation matrix that are interpreted by each principal component. In some embodiments, the number of principal components retained may be based on a user's preference for a percentage of change to be retained, the number of new blend shapes that are computationally performance optimized, and/or the number of new blend shapes that are judged to result in an aesthetically acceptable mesh deformation. Finally, at block 1625, the hybrid shape corresponding to the omitted principal component is also omitted, and an output deformation matrix A' is formed from the new hybrid shapes associated with the remaining principal components.

The mixed shapes in the output deformation matrix a' may be controlled by a pose space deformer. Because the techniques described herein allow a specified amount of information to be retained from the input deformation matrix, the techniques may advantageously reduce the number of hybrid shapes required to achieve acceptable mesh deformation.

Example embodiments

1. A system, comprising: a non-transitory computer memory to store a plurality of example gestures of a skeletal system of a virtual character; and a hardware computer processor in communication with the non-transitory computer memory, the hardware computer processor configured to reduce the dimension of the input gesture space by performing a method comprising: clustering the plurality of example gestures into one or more clusters, the plurality of example gestures defining an input gesture space, each example gesture of the plurality of example gestures comprising a set of joint rotations, the joint rotations having a singular point-free mathematical representation; determining a representative gesture for each cluster; and providing an output gesture space of reduced dimensions compared to the input gesture space.

2. The system of claim 1, wherein the singularity-free mathematical representation of the joint rotation comprises a quaternion representation.

3. The system of claim 2, further comprising: receiving the plurality of example gestures for which the joint rotation has a representation of Euler angles; and converting the euler angle representation to the quaternion representation.

4. The system of claim 1, wherein the example gestures are clustered into the one or more clusters based on a metric for determining similarity between each example gesture and each cluster.

5. The system of claim 1, wherein clustering the example gestures comprises: each of the example gestures is mapped to a point in a multi-dimensional space.

6. The system of claim 5, wherein clustering the example gestures further comprises: determining a centroid of each cluster; determining a distance between the point of each example gesture and a centroid of each cluster; and assigning each example gesture to the nearest cluster.

7. The system of claim 6, further comprising: the centroid of each cluster is determined iteratively, and each example gesture is assigned to the closest cluster.

8. The system of claim 6, wherein the representative gesture for each cluster comprises one of the example gestures assigned to the cluster or an example gesture associated with a centroid of the cluster.

9. The system of claim 1, further comprising: the number of clusters is determined.

10. The system of claim 9, wherein determining a number of clusters comprises: clustering, for each of a plurality of different candidate cluster quantities, the plurality of example gestures into the one or more clusters; calculating an error metric associated with each candidate cluster quantity; and selecting one of the candidate cluster numbers based on the error metric associated with the candidate cluster number.

11. The system of claim 10, wherein the error metric comprises, for all of the example gestures in the input gesture space, a sum of squared distances between a point corresponding to each example gesture and a centroid of its assigned cluster.

12. The system of claim 10, wherein selecting one of the candidate cluster numbers comprises: it is determined whether its error metric meets a selected criterion.

13. The system of claim 12, wherein the criterion is that a rate of change of the measure of error exceeds a specified threshold.

14. The system of claim 1, further comprising: training a gesture space deformer using the output gesture space.

15. The system of claim 14, further comprising: computing a mesh deformation of the virtual character using the pose space deformer.

16. The system of claim 14, further comprising: controlling a plurality of hybrid shapes in an output deformation matrix using the gesture space deformer.

17. The system of claim 16, wherein the output deformation matrix is generated by reducing a dimension of an input deformation matrix using principal component analysis.

18. The system of claim 17, wherein reducing the dimension of the input deformation matrix comprises: determining a principal component of the input deformation matrix; omitting one or more of the principal components to leave one or more remaining principal components; generating the output deformation matrix using one or more blending shapes associated with the one or more remaining principal components.

19. The system of claim 1, wherein the output gesture space is at least 30% smaller than the input gesture space.

20. The system of claim 1, wherein the system comprises a virtual reality display system, an augmented reality display system, or a mixed reality display system.

21. A method, comprising: obtaining a plurality of example gestures of a skeletal system of a virtual character, the plurality of example gestures defining an input gesture space, each of the plurality of example gestures comprising a set of joint rotations, the joint rotations having a singular point-free mathematical representation; clustering the plurality of example gestures into one or more clusters; determining a representative gesture for each cluster; and providing an output gesture space of reduced dimensions compared to the input gesture space.

22. The method of claim 21, wherein the singularity-free mathematical representation of the joint rotation comprises a quaternion representation.

23. The method of claim 22, further comprising: receiving the plurality of example gestures for which the joint rotation has a representation of Euler angles; and converting the euler angle representation to the quaternion representation.

24. The method of claim 21, wherein the example gestures are clustered into the one or more clusters based on a metric for determining similarity between each example gesture and each cluster.

25. The method of claim 21, wherein clustering the example gestures comprises: each of the example gestures is mapped to a point in a multi-dimensional space.

26. The method of claim 25, wherein clustering the example gestures further comprises: determining a centroid of each cluster; determining a distance between the point of each example gesture and a centroid of each cluster; and assigning each example gesture to the nearest cluster.

27. The method of claim 26, further comprising: the centroid of each cluster is determined iteratively, and each example gesture is assigned to the closest cluster.

28. The method of claim 26, wherein the representative gesture for each cluster comprises one of the example gestures assigned to the cluster or an example gesture associated with a centroid of the cluster.

29. The method of claim 21, further comprising: the number of clusters is determined.

30. The method of claim 29, wherein determining a number of clusters comprises: clustering, for each of a plurality of different candidate cluster quantities, the plurality of example gestures into the one or more clusters; calculating an error metric associated with each candidate cluster quantity; and selecting one of the candidate cluster numbers based on the error metric associated with the candidate cluster number.

31. The method of claim 30, wherein the error metric comprises, for all of the example gestures in the input gesture space, a sum of squared distances between a point corresponding to each example gesture and a centroid of its assigned cluster.

32. The method of claim 30, wherein selecting one of the candidate cluster numbers comprises: it is determined whether its error metric meets a selected criterion.

33. The method as defined in claim 32, wherein the criterion is that a rate of change of the error metric exceeds a specified threshold.

34. The method of claim 31, further comprising: training a gesture space deformer using the output gesture space.

35. The method of claim 34, further comprising: computing a mesh deformation of the virtual character using the pose space deformer.

36. The method of claim 34, further comprising: controlling a plurality of hybrid shapes in an output deformation matrix using the gesture space deformer.

37. The method of claim 36, wherein the output deformation matrix is generated by reducing a dimension of an input deformation matrix using principal component analysis.

38. The method of claim 37, wherein reducing the dimension of the input deformation matrix comprises: determining a principal component of the input deformation matrix; omitting one or more of the principal components to leave one or more remaining principal components; generating the output deformation matrix using one or more blending shapes associated with the one or more remaining principal components.

39. The method of claim 21, wherein the output gesture space is at least 30% smaller than the input gesture space.

40. The method of claim 21, wherein the method is performed by a virtual reality display system, an augmented reality display system, or a mixed reality display system.

41. A non-transitory computer readable medium that, when read by a hardware computer processor, causes the hardware computer processor to perform a method, the method comprising: obtaining a plurality of example gestures of a skeletal system of a virtual character, the plurality of example gestures defining an input gesture space, each of the plurality of example gestures comprising a set of joint rotations, the joint rotations having a singular point-free mathematical representation; clustering the plurality of example gestures into one or more clusters; determining a representative gesture for each cluster; and providing an output gesture space of reduced dimensions compared to the input gesture space.

42. The computer-readable medium of claim 41, wherein the singularity-free mathematical representation of the joint rotation comprises a quaternion representation.

43. The computer-readable medium of claim 42, wherein the method further comprises: receiving the plurality of example gestures for which the joint rotation has a representation of Euler angles; and converting the euler angle representation to the quaternion representation.

44. The computer-readable medium of claim 41, wherein the example gestures are clustered into the one or more clusters based on a metric for determining similarity between each example gesture and each cluster.

45. The computer-readable medium of claim 41, wherein clustering the example gestures comprises: each of the example gestures is mapped to a point in a multi-dimensional space.

46. The computer-readable medium of claim 45, wherein clustering the example gestures further comprises: determining a centroid of each cluster; determining a distance between the point of each example gesture and a centroid of each cluster; and assigning each example gesture to the nearest cluster.

47. The computer-readable medium of claim 46, wherein the method further comprises: the centroid of each cluster is determined iteratively, and each example gesture is assigned to the closest cluster.

48. The computer-readable medium of claim 46, wherein the representative gesture for each cluster comprises one of the example gestures assigned to the cluster or an example gesture associated with a centroid of the cluster.

49. The computer-readable medium of claim 41, wherein the method further comprises: the number of clusters is determined.

50. The computer-readable medium of claim 49, wherein determining a number of clusters comprises: clustering, for each of a plurality of different candidate cluster quantities, the plurality of example gestures into the one or more clusters; calculating an error metric associated with each candidate cluster quantity; and selecting one of the candidate cluster numbers based on the error metric associated with the candidate cluster number.

51. The computer-readable medium according to claim 50, wherein the error metric includes, for all of the example gestures in the input gesture space, a sum of squared distances between a point corresponding to each example gesture and a centroid of its assigned cluster.

52. The computer-readable medium of claim 50, wherein selecting one of the candidate cluster numbers comprises: it is determined whether its error metric meets a selected criterion.

53. The computer-readable medium according to claim 52, wherein the criterion is that a rate of change of the measure of error exceeds a specified threshold.

54. The computer-readable medium of claim 41, wherein the method further comprises: training a gesture space deformer using the output gesture space.

55. The computer-readable medium of claim 54, wherein the method further comprises: computing a mesh deformation of the virtual character using the pose space deformer.

56. The computer-readable medium of claim 54, wherein the method further comprises: controlling a plurality of hybrid shapes in an output deformation matrix using the gesture space deformer.

57. The computer-readable medium of claim 56, wherein the output deformation matrix is generated by reducing a dimension of an input deformation matrix using principal component analysis.

58. The computer-readable medium of claim 57, wherein reducing the dimension of the input deformation matrix comprises: determining a principal component of the input deformation matrix; omitting one or more of the principal components to leave one or more remaining principal components; generating the output deformation matrix using one or more blending shapes associated with the one or more remaining principal components.

59. The computer-readable medium of claim 41, wherein the output gesture space is at least 30% smaller than the input gesture space.

60. The computer readable medium of claim 41, wherein the method is performed by a virtual reality display system, an augmented reality display system, or a mixed reality display system.

Other considerations

Each of the processes, methods, and algorithms described herein and/or depicted in the figures may be embodied in, and executed in whole or in part automatically by, code modules executed by one or more physical computing systems, hardware computer processors, dedicated circuitry, and/or electronic hardware configured to execute dedicated and specific computer instructions. For example, a computing system may comprise a general purpose computer (e.g., a server) programmed with specific computer instructions, or special purpose computers, special purpose circuits, or the like. Code modules may be compiled and linked into executable programs, installed in dynamically linked libraries, or written in interpreted programming languages. In some implementations, the specific operations and methods may be performed by circuitry that is specific to a given function.

Furthermore, certain implementations of the functionality of the present disclosure are sufficiently complex mathematically, computationally, or technically that dedicated hardware or one or more physical computing devices (with appropriate dedicated executable instructions) may be necessary to perform the functionality or provide the result in substantially real time, e.g., due to the amount or complexity of the computations involved. For example, an animation or video may include many frames, each having millions of pixels, and requiring specially programmed computer hardware to process the video data to provide a desired image processing task or application in a commercially reasonable amount of time.

The code modules or any type of data may be stored on any type of non-transitory computer readable medium, such as a physical computer storage device including a hard disk drive, solid state memory, Random Access Memory (RAM), Read Only Memory (ROM), optical disks, volatile or non-volatile memory devices, combinations thereof, or the like. The methods and modules (or data) may also be transmitted as generated data signals (e.g., as part of a carrier wave or other analog or digital propagated signal) on a variety of computer-readable transmission media, including wireless-based and wired/cable-based media, and may take a variety of forms (e.g., as part of a single or multiplexed analog signal, or as multiple discrete digital packets or frames). The results of the disclosed processes or process steps may be stored persistently or otherwise in any type of non-transitory tangible computer storage device, or may be transmitted via a computer-readable transmission medium.

Any process, block, state, step, or function in the flowcharts described herein and/or depicted in the figures should be understood as potentially representing a module, segment, or portion of code, which includes one or more executable instructions for implementing the specified function (e.g., logical or arithmetic) or step in the process. Various processes, blocks, states, steps or functions may be combined, rearranged, added to, deleted, modified or otherwise altered with the illustrative examples provided herein. In some embodiments, additional or different computing systems or code modules may perform some or all of the functions described herein. The methods and processes described herein are also not limited to any particular sequence, and the blocks, steps, or states associated therewith may be performed in any other suitable order, e.g., serially, in parallel, or in some other manner. Tasks or events can be added to, or removed from, the disclosed example embodiments. Moreover, the separation of various system components in the implementations described herein is for illustrative purposes, and should not be understood as requiring such separation in all implementations. It should be understood that the described program components, methods and systems can generally be integrated together in a single computer product or packaged into multiple computer products. Many implementation variations are possible.

The processes, methods, and systems may be implemented in a networked (or distributed) computing environment. Network environments include enterprise-wide computer networks, intranets, Local Area Networks (LANs), Wide Area Networks (WANs), Personal Area Networks (PANs), cloud computing networks, crowd-sourced computing networks, the internet, and the world wide web. The network may be a wired or wireless network or any other type of communication network.

The systems and methods of the present disclosure each have several innovative aspects, none of which is solely responsible for or required by the desirable attributes disclosed herein. The various features and processes described above may be used independently of one another or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of the present disclosure. Various modifications to the implementations described in this disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other implementations without departing from the spirit or scope of the disclosure. Thus, the claims are not intended to be limited to the implementations shown herein but are to be accorded the widest scope consistent with the disclosure, principles and novel features disclosed herein.

Certain features that are described in this specification in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination. No single feature or group of features is essential or critical to each and every embodiment.

Unless stated otherwise, or otherwise understood in the context of usage, conditional language such as "can," "may," "might," "may," "for example," and the like, as used herein, is generally intended to convey that certain embodiments include certain features, elements, and/or steps, while other embodiments do not. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding whether or not to include or be performed in any particular embodiment, regardless of whether or not input or prompted by an author. The terms "comprising," "including," "having," and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and the like. Furthermore, the term "or" is used in its inclusive sense (and not in its exclusive sense), so that when used, for example, to connect a list of elements, the term "or" means one, some, or all of the elements in the list. In addition, the articles "a", "an" and "the" as used in this application and the appended claims should be construed to mean "one or more" or "at least one" unless specified otherwise.

As used herein, a phrase referring to "at least one of" a list of items refers to any combination of those items, including a single member. For example, "A, B or at least one of C" is intended to encompass: A. b, C, A and B, A and C, B and C, and A, B and C. Joint language such as the phrase "X, Y and at least one of Z" is otherwise understood by context as commonly used to convey that an item, term, etc. can be at least one of X, Y or Z, unless specifically stated otherwise. Thus, such conjunctive language is not generally intended to imply that certain embodiments require at least one of X, at least one of Y, and at least one of Z to each be present.

Similarly, while operations may be depicted in the drawings in a particular order, it will be appreciated that these operations need not be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Further, the figures may schematically depict one or more example processes in the form of a flow diagram. However, other operations not shown may be included in the example methods and processes that are schematically illustrated. For example, one or more additional operations may be performed before, after, concurrently with, or between any of the illustrated operations. In addition, operations may be rearranged or reordered in other implementations. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products. Additionally, other implementations are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results.

Drawings

The details of one or more implementations of the subject matter described in the specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings, and the claims.

FIG. 1 depicts an illustration of a mixed reality scene with certain virtual reality objects and certain physical objects viewed by a person.

Fig. 2 schematically shows an example of a wearable system.

Fig. 3 schematically illustrates example components of a wearable system.

Fig. 4 schematically shows an example of a waveguide stack of a wearable device for outputting image information to a user.

FIG. 5 is a process flow diagram of an example of a method for interacting with a virtual user interface.

Fig. 6A is a block diagram of another example of a wearable system that may include an avatar processing and rendering system.

FIG. 6B illustrates example components of an avatar processing and rendering system.

Fig. 7 is a block diagram of an example of a wearable system including various inputs into the wearable system.

FIG. 8 is a process flow diagram of an example of a method of rendering virtual content with respect to an identified object.

Fig. 9A schematically illustrates an overall system view depicting multiple wearable systems interacting with each other.

FIG. 9B illustrates an example remote presentation session.

Fig. 10 shows an example of an avatar perceived by a user of a wearable system.

FIG. 11 illustrates several arm gestures of a virtual character, in this case, the virtual character is a human avatar.

Fig. 12A illustrates a method for reducing the dimensionality of the input gesture space of a gesture space deformer using k-means clustering.

FIG. 12B illustrates an example set of clusters formed using the method of FIG. 12A.

FIG. 13 is a graph illustrating an example technique for selecting a number of clusters for reducing a dimension of an input gesture space.

FIG. 14 shows an example set of plotted points of an input deformation matrix having N hybrid shapes.

Fig. 15 is an example graph of the change ratio occupied by each of the 30 principal components of the input deformation matrix composed of 30 mixed shapes.

FIG. 16 is a flow diagram depicting an example method for reducing the dimensionality of a deformation matrix using Principal Component Analysis (PCA).

Throughout the drawings, reference numerals may be repeated to indicate correspondence between reference elements. These drawings are provided to illustrate example embodiments described herein and are not intended to limit the scope of the present disclosure.

Detailed Description

56页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:中间新兴内容

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类