System and method for gesture-based interaction

文档序号：1785360 发布日期：2019-12-06 浏览：21次中文

阅读说明：本技术 用于基于手势的交互的系统和方法 (System and method for gesture-based interaction ) 是由拉尔夫·布伦纳安德鲁·埃米特·塞利格曼简欣怡约翰·菲利普·斯托达德陈柏瑞埃隆·沙于 2017-04-13 设计创作，主要内容包括：各种公开实施例呈现了基于深度的用户交互系统,其促进了自然和沉浸式用户交互。特别地,各种实施例将沉浸式视觉呈现与自然和流畅的手势动作集成在一起。这种集成促进了更快速的用户采用和更精确的用户交互。一些实施例可以利用本文所公开的特定形状因子来适应用户交互。例如,在界面的显示器顶上的壳体中的双深度传感器布置可以促进深度视场,其适应比其他可能更自然的手势识别。在一些实施例中,可以将这些手势组织到用于用户对界面的通用控制以及用户对界面的应用特定控制的框架中。(Various disclosed embodiments present a depth-based user interaction system that facilitates natural and immersive user interaction. In particular, various embodiments integrate immersive visual presentations with natural and fluent gesture actions. This integration facilitates faster user adoption and more accurate user interaction. Some embodiments may utilize the particular form factors disclosed herein to accommodate user interaction. For example, a dual depth sensor arrangement in the housing atop the display of the interface may facilitate a depth field of view that accommodates gesture recognition that is more natural than otherwise possible. In some embodiments, these gestures may be organized into a framework for user-generic control of the interface as well as user-application-specific control of the interface.)

1. a user interface, comprising:

one or more displays;

one or more depth sensor housing frames including one or more depth sensors, the depth sensor housing frames being above the display;

At least one processor; and

At least one memory including instructions configured with the at least one processor to cause the user interface to perform a method comprising:

Acquiring a frame of depth data using at least one of the one or more depth sensors;

associating at least a portion of the depth frame with a classification corresponding to a first portion of a user's body;

Determining a gesture based at least in part on the classification; and

Notifying an application running on the user interface of the determined gesture.

2. the user interface of claim 1, wherein at least one of the one or more frames comprises:

A back plate;

top plate:

a base plate;

a first allowed frequency panel configured to substantially allow passage of radiation associated with a depth sensor, but substantially not allow passage of visual frequency radiation;

A second allowed frequency panel, the first allowed frequency panel configured to substantially allow passage of radiation associated with the depth sensor, but substantially not allow passage of visual frequency radiation;

One or more depth sensor mounts; and

one or more spacers located within a package formed by the backplane, top plate, bottom plate, first allowed frequency panel, and second allowed frequency panel, wherein,

Each of the one or more depth sensor mounts is located between at least two of the one or more spacers, and wherein,

each depth sensor mount of the one or more depth sensor mounts is configured to receive two depth sensors such that a field of view associated with each depth sensor passes through at least one of the two allowed frequency panels.

3. The user interface of claim 2, wherein the one or more displays comprise a 3 x 3 grid of substantially adjacent displays.

4. the user interface of claim 3, wherein the one or more depth sensor housing frames comprise:

A first depth sensor housing frame positioned above an upper left display of the 3 x 3 grid;

a second depth sensor housing frame positioned above the mid-upper display of the 3 x 3 grid;

A third depth sensor housing frame positioned above the upper right display of the 3 x 3 grid.

5. the user interface of claim 1, wherein the method further comprises:

detecting a position of at least a portion of the user relative to the interface; and

Adjusting a vanishing point of images presented on the one or more displays to correspond with the location.

6. The user interface of claim 1, wherein the method further comprises:

detecting a first distance between the user and the one or more displays;

determining that the first distance is greater than a threshold distance from the one or more displays;

Displaying, from a first perspective, a virtual environment comprising one or more rooms based at least in part on a determination that the first distance is greater than the threshold distance;

Detecting a second distance between the user and the one or more displays;

determining that the second distance is less than a threshold distance from the one or more displays; and

Displaying the virtual environment from a second perspective based at least in part on a determination that the second distance is less than the threshold distance, wherein

the first perspective displays the virtual environment from a virtual location that is further from the room than the second perspective.

7. The user interface of claim 6, wherein the method further comprises:

Detecting a flick gesture while presenting the virtual environment from the second perspective; and

Activating an application associated with a room displayed closest to the user on the one or more displays.

8. A computer-implemented method running on a user interface, the method comprising:

Acquiring a frame of depth data using at least one depth sensor of one or more depth sensors located on the user interface;

Associating at least a portion of the depth frame with a classification corresponding to a first portion of a user's body;

Determining a gesture based at least in part on the classification; and

Notifying an application running on the user interface of the determined gesture.

9. the computer-implemented method of claim 8, wherein the user interface comprises:

One or more depth sensor housing frames including the one or more depth sensors, the depth sensor housing frames being over substantially adjacent displays of the 3 x 3 grid.

10. The computer-implemented method of claim 8, wherein at least one of the one or more depth sensor housing frames comprises:

A back plate;

A top plate;

A base plate;

a first allowed frequency panel configured to substantially allow passage of radiation associated with a depth sensor, but substantially not allow passage of visual frequency radiation;

a second allowed frequency panel, the first allowed frequency panel configured to substantially allow passage of radiation associated with the depth sensor, but substantially not allow passage of visual frequency radiation;

One or more depth sensor mounts; and

one or more spacers located within a package formed by the backplane, top plate, bottom plate, first allowed frequency panel, and second allowed frequency panel, wherein,

each of the one or more depth sensor mounts is located between at least two of the one or more spacers, and wherein,

Each of the one or more depth sensor mounts is configured to receive two depth sensors such that a field of view associated with each depth sensor passes through at least one of the two allowed frequency panels.

11. The computer-implemented method of claim 10, wherein the one or more depth sensor housing frames comprise:

A first depth sensor housing frame positioned above an upper left display of the 3 x 3 grid;

A second depth sensor housing frame positioned above the mid-upper display of the 3 x 3 grid;

a third depth sensor housing frame positioned above the upper right display of the 3 x 3 grid.

12. The computer-implemented method of claim 9, wherein the method further comprises:

detecting a position of at least a portion of the user relative to the interface; and

adjusting a vanishing point of images presented on the one or more displays to correspond with the location.

13. The computer-implemented method of claim 12, wherein the method further comprises:

Detecting a first distance between the user and the one or more displays;

Determining that the first distance is greater than a threshold distance from the one or more displays;

Displaying, from a first perspective, a virtual environment comprising one or more rooms based at least in part on a determination that the first distance is greater than the threshold distance;

detecting a second distance between the user and the one or more displays;

Determining that the second distance is less than a threshold distance from the one or more displays; and

displaying the virtual environment from a second perspective based at least in part on a determination that the second distance is less than the threshold distance, wherein

The first perspective displays the virtual environment from a virtual location that is further from the room than the second perspective.

14. the computer-implemented method of claim 13, wherein the method further comprises:

Detecting a flick gesture while presenting the virtual environment from the second perspective; and

activating an application associated with a room displayed closest to the user on the one or more displays.

15. A non-transitory computer-readable medium comprising instructions configured to cause a user interface to perform a method, the method comprising:

Acquiring a frame of depth data using at least one depth sensor of one or more depth sensors located on the user interface;

associating at least a portion of the depth frame with a classification corresponding to a first portion of a user's body;

determining a gesture based at least in part on the classification; and

notifying an application running on the user interface of the determined gesture.

16. The non-transitory computer-readable medium of claim 15, wherein the user interface comprises:

One or more depth sensor housing frames including the one or more depth sensors, the depth sensor housing frames being over substantially adjacent displays of the 3 x 3 grid.

17. the non-transitory computer readable medium of claim 16, wherein at least one of the one or more depth sensor housing frames comprises:

A back plate;

a top plate;

A base plate;

a first allowed frequency panel configured to substantially allow passage of radiation associated with a depth sensor, but substantially not allow passage of visual frequency radiation;

one or more depth sensor mounts; and

One or more spacers located within a package formed by the backplane, top plate, bottom plate, first allowed frequency panel, and second allowed frequency panel, wherein,

each of the one or more depth sensor mounts is located between at least two of the one or more spacers, and wherein,

18. The non-transitory computer readable medium of claim 16, wherein the one or more depth sensor housing frames comprise:

A first depth sensor housing frame positioned above an upper left display of the 3 x 3 grid;

A second depth sensor housing frame positioned above the mid-upper display of the 3 x 3 grid;

A third depth sensor housing frame positioned above the upper right display of the 3 x 3 grid.

19. The non-transitory computer-readable medium of claim 16, wherein the method further comprises:

detecting a position of at least a portion of the user relative to the interface; and

Adjusting a vanishing point of an image presented on the display to correspond to the location.

20. The non-transitory computer-readable medium of claim 16, wherein the method further comprises:

detecting a first distance between the user and the display;

determining that the first distance is greater than a threshold distance from the display;

displaying, from a first perspective, a virtual environment comprising one or more rooms based at least in part on a determination that the first distance is greater than the threshold distance;

detecting a second distance between the user and the display;

Determining that the second distance is less than a threshold distance from the display; and

displaying the virtual environment from a second perspective based at least in part on a determination that the second distance is less than the threshold distance, wherein

the first perspective displays the virtual environment from a virtual location that is further from the room than the second perspective.

Technical Field

Various disclosed embodiments relate to optimization and improvement of depth-based human-machine interaction.

Background

human-computer interaction (HCI) systems are becoming more and more prevalent in our society. With this increasing popularity, the nature of this interaction has evolved. Punch cards have been overridden by the keyboard, the keyboard itself is supplemented by a mouse, the mouse itself is now supplemented by a touch screen display, and so on. Even now, various machine vision methods may facilitate visual, rather than mechanical, user feedback. Machine vision allows computers to interpret images from their environment, for example, to recognize a user's face and gestures. Some machine vision systems rely on grayscale or RGB images of their surroundings to infer user behavior. Some machine vision systems may also use depth-based sensors, or rely solely on depth-based sensors, to identify user behavior (e.g., Microsoft KinectTM, IntelRealSenseTM, Apple PrimeSenseTM, Structure sensor, Velodyne HDL-32E LiDARTM, Orbbec AstraTM, etc.).

For many users, interaction with a depth-based user interface system appears unnatural. This discomfort can be particularly acute when the system fails to provide an immersive visual experience associated with natural and fluid gesture movements. Accordingly, there is a need for an improved depth-based interface that accommodates both user expectations and typical user motion. Such systems may also need to serve as a general-purpose platform from which developers can implement their own custom applications.

Drawings

The various embodiments described herein may be better understood by reference to the following detailed description when considered in connection with the accompanying drawings in which like reference numerals indicate identical or functionally similar elements:

FIG. 1 is a series of usage diagrams illustrating various scenarios in which various disclosed embodiments may be implemented;

FIG. 2 is a perspective illustration diagram illustrating example user interactions with an example display structure that may occur in some embodiments;

FIG. 3 is a series of perspective and side views of example depth data that may be used in some embodiments;

FIG. 4 is a series of views illustrating data isolation via clipping (clipping) that may be applied to the depth data of FIG. 3 in some embodiments;

FIG. 5 is an example component classification that may be applied to the isolated data of FIG. 4 in some embodiments;

FIG. 6 is a flow diagram illustrating some example depth data processing operations that may be performed in some embodiments;

FIG. 7 is a hardware block diagram illustrating an example hardware implementation that may be used to perform depth data processing operations in some embodiments;

FIG. 8 is a schematic diagram of an example wide screen display with a multi-angle depth sensor housing that may be implemented in some embodiments;

FIG. 9 is a schematic diagram of an example projection display with a multi-angle depth sensor housing that can be implemented in some embodiments;

FIG. 10 is a schematic diagram of an example composite display with a multi-angle depth sensor housing that may be implemented in some embodiments;

FIG. 11 is a schematic diagram of a composite display with the multi-angle depth sensor housing of FIG. 10, including a turn around of modular components in the system, which may be implemented in some embodiments;

FIG. 12A is a schematic bottom view of a composite display having the multi-angle depth sensor housing of FIG. 10 that may be implemented in some embodiments; FIG. 12B is a schematic top view of a composite display having the multi-angle depth sensor housing of FIG. 10 that may be implemented in some embodiments; FIG. 12C is a side schematic view of a composite display having the multi-angle depth sensor housing of FIG. 10 that may be implemented in some embodiments;

FIG. 13A is an exploded schematic view of components in a frame of a multi-angle depth sensor housing that may be implemented in some embodiments; FIG. 13B is an assembled schematic of components in a frame of a multi-angle depth sensor housing that may be implemented in some embodiments;

FIG. 14A is a view from two perspectives (perspective) of the spacer assembly of FIG. 13A that can be implemented in some embodiments; FIG. 14B is a view from two perspectives of a mirror-image spacer assembly of FIG. 13A that can be implemented in some embodiments;

FIG. 15 is a view from two perspectives of a cradle assembly of FIG. 13A that can be implemented in some embodiments;

FIG. 16 is an exploded schematic view of components in a frame of a multi-angle depth sensor housing that may be implemented in some embodiments, including bracket spacers for sensor pair reception;

FIG. 17 is a schematic diagram of a possible sensor placement configuration in a multi-angle depth sensor housing of an example user interface that may be implemented in some embodiments;

FIG. 18 is a number of schematic views of an alternative bracket mounting assembly that may be used in some embodiments;

FIG. 19A is a perspective schematic view of a portion of the alternative bracket mounting assembly of FIG. 18 in an exploded unassembled state as may be achieved in some embodiments; FIG. 19B is a perspective schematic view of a portion of the alternative carriage mounting assembly of FIG. 18 that can be implemented in some embodiments with a bracket support coupled with a sensor mount; FIG. 19C is a perspective schematic view of a portion of the alternative carriage mounting assembly of FIG. 18 that can be implemented in some embodiments with a bracket support and a sensor mount coupled with an example depth sensor;

FIG. 20A is a "perspective" view of a housing frame of a multi-angle depth sensor housing that includes a depth sensor attached via a "stand-alone mount" rather than a bracket, which may be implemented in some embodiments; FIG. 20B is a schematic view of a horizontal sensor mount that can be implemented in some embodiments; FIG. 20C is a schematic view of a vertical sensor mount that can be implemented in some embodiments; FIG. 20D is a schematic illustration of a varying depth zone field of view as may occur in some embodiments using the sensor mount of FIG. 20B or 20C;

FIG. 21A is a schematic diagram of an example multi-angle standalone depth sensor mount with transparent depth sensor representations 2110a, 2110b in their relative positions that may be implemented in some embodiments; FIG. 21B is a schematic view of a multi-angle depth sensor mount without a depth sensor that may be implemented in some embodiments;

FIG. 22 is a schematic side view of various dimensions of an example multi-angle independent depth sensor mount that may be implemented in some embodiments;

FIG. 23 is an exploded schematic view of components in a frame of a multi-angle depth sensor housing including only independently mounted sensor pairs, and a schematic top cross-sectional view of an assembled structure, as may be achieved in some embodiments;

FIG. 24 is an exploded schematic view of components in a frame of a multi-angle depth sensor housing including independently mounted and bracket mounted sensor pairs, and a schematic top cross-sectional view of an assembled structure, as may be implemented in some embodiments;

FIG. 25A is a side view in various dimensions of an example interactive system with a multi-angle depth sensor housing that may be implemented in some embodiments; FIG. 25B is a schematic side view for a combined viewing angle of the system of FIG. 25A;

FIG. 26 is a "perspective" view of a frame for a multi-angle depth sensor housing in which both a depth sensor and a visual image sensor are mounted, as may be implemented in some embodiments;

FIG. 27A is a schematic illustration of a user "trigger-point" gesture that may occur in some embodiments; FIG. 27B is a schematic diagram of a correspondence that may be used in some embodiments to identify a user "trigger point" gesture;

FIG. 28A is a schematic diagram of a user "nudge" (push) gesture that may occur in some embodiments; FIG. 28B is a schematic diagram of a correspondence that may be used in some embodiments to identify a user "nudge" gesture;

FIG. 29A is a series of schematic front and side views of a step in a "reveal" (recent) gesture that may occur in some embodiments; FIG. 29B is a schematic diagram of an elevation and plan view that may be used in some embodiments to detect various correspondences of the "open" gesture of FIG. 29A;

FIG. 30A is a series of schematic front, side, and top views of a step in a "swipe" gesture that may occur in some embodiments; FIG. 30B is an elevational and overhead schematic view that may be used in some embodiments to detect various correspondences in the "swipe" gesture of FIG. 30A;

FIG. 31A is a series of schematic front and side views of a step in a "circle" gesture that may occur in some embodiments; FIG. 31B is a composite front view of hand orientations associated with a correspondence that may be used to detect the "circle" gesture of FIG. 31A in some embodiments; FIG. 31C is a composite front view that may be used in some embodiments to detect a correspondence of the "circle" gesture of FIG. 31A;

FIG. 32A is a series of schematic front and side views of a step in a "crouch" gesture that may occur in some embodiments; FIG. 32B is a front and top schematic view that may be used in some embodiments to detect various correspondences of the "squat" gesture of FIG. 32A;

FIG. 33 is a flow diagram illustrating aspects of a gesture detection process that may be implemented in some embodiments;

FIG. 34 is a flow diagram illustrating aspects of a gesture template fulfillment (refill) determination process that may be implemented in some embodiments;

FIG. 35A is an example tree diagram illustrating a correspondence between multiple gestures that may occur in some embodiments; FIG. 35B is an example tree diagram illustrating a correspondence between multiple gestures (including sub-gestures) that may occur in some embodiments;

FIG. 36 is a Venn diagram (Venn Diagram) illustrating various gesture set relationships that may occur in some embodiments;

FIG. 37A is a schematic illustration of a user at a first position in front of an interface system including different interaction zones that may be implemented in some embodiments; FIG. 37B is a schematic illustration of a user at a second location in front of an interface system including different interaction intervals that may be implemented in some embodiments;

FIG. 38A is a schematic view and corresponding user view of a user in a centered position prior to running the interface system for a dynamic vanishing point selection menu as may be implemented in some embodiments; FIG. 38B is a schematic view of a user in a left position in front of an interface system running a dynamic vanishing point selection menu and a corresponding user view as may be implemented in some embodiments; FIG. 38C is a schematic view of a user in a right position in front of an interface system running a dynamic vanishing point selection menu and a corresponding user view as may be implemented in some embodiments;

FIG. 39A is a schematic diagram and resulting display of a user in a first position prior to engaging a contextual focus feature at the center of a user interface as may be implemented in some embodiments; FIG. 39B is a schematic view of a user at a second location after engaging a contextual focus feature at the center of a user interface and resulting display changes as may be implemented in some embodiments;

40A and 40B are schematic diagrams of contextual focus features of the user interface of FIG. 39 and resulting display changes that may be implemented in some embodiments — before and after engagement, but not in the center of the user interface but instead to the sides of the user interface;

FIG. 41 is a schematic diagram of a user interface system running an example "trigger point" based shooting application that may be implemented in some embodiments;

FIG. 42 is a schematic diagram of a user interface system running an example calligraphy training application that may be implemented in some embodiments;

FIG. 43 is a series of schematic diagrams of a user interface system running an example obstacle course application that may be implemented in some embodiments; and

FIG. 44 is a block diagram of an example computer system that may be used in conjunction with some embodiments.

the specific examples depicted in the drawings have been selected to facilitate understanding. Therefore, the disclosed embodiments should not be limited to the specific details in the drawings or the corresponding disclosure. For example, the figures may not be drawn to scale, the dimensions of some of the elements in the figures may have been adjusted to facilitate understanding, and the operations of embodiments associated with the flow diagrams may include more, alternative, or fewer operations than those depicted herein. Thus, some components and/or operations may be separated into different blocks or combined into a single block in a manner different than that depicted. There is no intention to limit the embodiments to the specific examples described or depicted. On the contrary, the embodiments are intended to cover all modifications, equivalents, and alternatives falling within the scope of the disclosed examples.

Detailed Description

example use case overview

The various disclosed embodiments may be used in conjunction with installed or fixed depth camera systems to detect, for example, user gestures. FIG. 1 is a series of usage diagrams illustrating various scenarios 100a-100c in which various disclosed embodiments may be implemented. In scenario 100a, a user 105 is standing in front of a kiosk (kiosk)125, which kiosk 125 may include a graphical display 125 a. Rather than requiring the user to physically touch an item of interest on the display 125a, the system may allow the user to "point" or "gesture" at the item and thereby interact with the kiosk 125.

the depth sensor 115a may be mounted on the kiosk 125 or connected to the kiosk 125 or near the kiosk 125 such that a depth field of capture 120a (also referred to herein as a "field of view") of the depth sensor 115a encompasses (encompass) the gesture 110 made by the user 105. Thus, when a user points at an icon, for example, on the display 125a, by making a gesture within the depth data capture field 120a, the depth sensor 115a may provide a depth value to the processing system, which may infer the selected icon or operation to be performed. The processing system may be configured to perform various operations disclosed herein, and may be specifically configured or designed to interface with the depth sensor (in fact, it may be embedded in the depth sensor). Thus, a processing system may include hardware, firmware, software, or a combination of these components. The processing system may be located within the depth sensor 115a, within the kiosk 125, at a remote location, etc., or may be distributed throughout various locations. The application running on the kiosk 125 may simply receive an indication of the selected icon, and may not be specifically designed to consider selecting via physical touch or depth-based selection determination. Thus, in some embodiments, the depth sensor 115a and the processing system may be separate products or devices from the kiosk 125.

In scenario 100b, user 105 is standing in a home environment, which may include one or more depth sensors 115b, 115c, and 115d, each having their own corresponding depth capture field 120b, 120c, and 120d, respectively. The depth sensor 115b may be located on or near a television or other display 130. Depth sensor 115b may be used to capture gesture input from user 105 and forward depth data to an application running on display 130 or with display 130. For example, a gaming system, computer conferencing system, or the like may operate using display 130 and may be responsive to gesture input by user 105. In contrast, the depth sensor 115c may passively observe the user 105 as part of a separate gesture or behavior detection application. For example, the home automation system may respond to gestures made by the user 105 alone or in conjunction with various voice commands. In some embodiments, depth sensors 115b and 115c may share their depth data with a single application to facilitate viewing user 105 from multiple perspectives. Obstacles and non-user dynamic and static objects, such as sofas 135, may be present in the environment and may or may not be included in the depth capture fields 120b, 120 c.

Note that while the depth sensor may be placed in a location visible to the user 105 (e.g., attached on top of or mounted to the side of a television, kiosk, etc., as depicted, such as a television, kiosk, etc. with sensors 115a-115 c), the depth sensor may be integrated within another object. Such an integrated sensor may be able to collect depth data without being readily visible to the user 105. For example, the depth sensor 115d may be integrated into the television 130 behind the one-way mirror and used to collect data in place of the sensor 115 b. The one-way mirror may allow the depth sensor 115d to collect data without the user 105 being aware that data is being collected. This may allow users to be less conscious in their movements and to behave more naturally during interaction.

While it is possible to position the depth sensors 115a-115d parallel to the wall, or to have a depth field in a direction orthogonal to the normal vector from the floor, this is not always the case. In practice, the depth sensors 115a-115d may be positioned at various angles, some of which place the depth data capture fields 120a-120d at angles oblique to the floor and/or walls. For example, the depth sensor 115c may be positioned near the ceiling and directed to look down the user 105 on the floor.

in some cases, this relationship between the depth sensor and the floor may be extreme and dynamic. For example, in scenario 100c, the depth sensor 115e is located at the rear of the truck 140. The truck may be parked in front of the inclined platform 150 for loading and unloading. The depth sensor 115e may be used to infer user gestures to direct operation of the truck (e.g., move forward, backward) or to perform other operations (e.g., initiate a phone call). As the truck 140 enters the new environment periodically, new obstacles and objects 145a, b may periodically enter the depth capture field 120e of the depth sensor 115 e. In addition, inclined platforms 150 and irregularly elevated terrain may often place depth sensors 115e and corresponding depth capture fields 120e at an oblique angle relative to the "floor" on which user 105 stands. Such variations can complicate assumptions made about depth data in a static and/or controlled environment (e.g., assumptions made about the location of the floor).

various embodiments disclosed contemplate user interaction with a feedback system that includes two or more depth sensors. In some embodiments, the depth sensor device may also include a visual image sensor, such as an RGB sensor. For example, fig. 2 is a perspective illustration diagram illustrating an example user interaction 200 with an example display structure 205 that may occur in some embodiments. The display structure 205 may be placed in a mall, shopping mall, grocery store, front desk, or the like. In some embodiments, the height 220a is at least as large or slightly larger than the user 210, such as 7-10 feet. The length 220b may be several times the width of the user 210, for example, to facilitate interaction as the user 210 walks across the length of the display structure 205.

the example display structure 205 includes a screen 230. The screen 230 may include a single large screen, multiple smaller screens positioned adjacent to one another, a projection, etc. In one example interaction, the user may make a gesture 215 at a portion of the screen and the system may present visual feedback, such as a cursor 230b at a location on the screen corresponding to the projection 225 of the gesture. The display structure 205 may monitor the movement and gestures of the user 210 using one or more of the one or more depth sensors C1, C2. In the example depicted in fig. 2, there are at least three cameras. Ellipses 245 indicate that there may be more than three cameras in some embodiments, and the length 220b of the display structure 205 may be adjusted accordingly. In this example, the sensors are evenly spaced on top of the display structure 205, but in some embodiments they may be unevenly spaced.

Although the terms "camera" and "sensor" may be used interchangeably in this application, it will be appreciated that the depth sensor does not require or facilitate "camera capture" of optical images, such as RGB or grayscale images, although the depth sensor may additionally include this functionality. In some embodiments, the computer system 250 may take various forms, such as a pre-programmed chip, a circuit, a Field Programmable Gate Array (FPGA), a mini-computer, and so forth. It will be appreciated that "computer system," "processing system," and the like may be used interchangeably herein. Similarly, one will readily appreciate that the training system employed to create the system for recognizing gestures may be, but need not be, the same system as the testing system that performs field recognition. Thus, in some embodiments, a "system" may be a different computer than the interfaces of fig. 1 and 2, residing, for example, off-site where in-situ classification occurs.

Example depth data

similar to a normal optical image camera, the depth sensor Error! Reference source not found.15a-Error! Reference source not found.15e, C1, C2.., CN may capture individual "frames" (frames) of depth data over time. Each "frame" may comprise a set of three-dimensional values of depth measured in the field of view (although one would readily identify a variety of representations, such as time-of-flight analysis for depth determination). These three-dimensional values may be represented, for example, as points in three-dimensional space, distances of light rays emitted from the depth sensor at various angles, and so forth. Fig. 3 is a series of perspective views 300a and side views 300b of example depth data 305 that may be used in some embodiments. In this example, the user points his right finger towards the depth sensor while standing in front of the wall. His left table is also captured in the field of view. Thus, the depth values associated with the user 310 include a portion associated with the user's head 310a and a portion associated with the user's extended right arm 310 b. Similarly, the background behind the user is reflected in the depth values 320, including those values 315 associated with the table.

for ease of understanding, the side view 300b also includes a depiction of the field of view 335 of the depth sensor at the time of frame capture. The angle 330 of the depth sensor at the origin is such that the user's upper torso, but not the user's legs, has been captured in the frame. Again, this example is provided only to accommodate the reader's understanding, and the reader will understand that some embodiments may capture the entire field of view without ignoring any part of the user. For example, the embodiment depicted in fig. 1A-1C may capture less than all of the interactive users' information, while the embodiment of fig. 2 may capture the entire interactive user (in some embodiments, all content that is more than 8cm from the ground appears in the depth field of view). Of course, depending on the orientation of the system, the depth camera, the terrain, etc., and vice versa. Thus, it will be appreciated that variations on the disclosed examples are explicitly contemplated (e.g., classification with reference to torso parts is discussed below, but some embodiments will also consider classification of legs, feet, clothing, user pairing, user gestures, etc.).

Similarly, although FIG. 3 depicts the depth data as a "point cloud," one will readily recognize that the data received from the depth sensor may occur in many different forms. For example, a depth sensor, such as depth sensor 115a or 115d, may include a grid-like detector array. These detectors may acquire images of the scene from the perspective of depth capture fields 120a and 120d, respectively. For example, some depth detectors include an "emitter" that generates electromagnetic radiation. The travel time from the emitter to an object in the scene to one of the grid cell detectors may correspond to a depth value associated with that grid cell. The depth determination at each of these detectors may be output as a two-dimensional grid of depth values. As used herein, a "depth frame" may refer to such a two-dimensional grid, but may also refer to other representations of three-dimensional depth data acquired from a depth sensor (e.g., a point cloud, an ultrasound image, etc.).

Example depth data clipping method

Many applications want to infer a user's gesture from the depth data 305. Achieving this from the raw depth data can be very challenging, so some embodiments apply a pre-processing procedure to isolate depth values of interest. For example, fig. 4 is a series of views illustrating data isolation via plane cropping that may be applied to the depth data 305 of fig. 3 in some embodiments. In particular, a perspective view 405a and a side view 410a illustrate the depth data 305 (including a portion associated with the user 310 and a portion associated with the background 320). The perspective view 405b and the side view 410b show the depth data 305 relative to the floor plane 415. Floor plane 415 is not part of depth frame data 305. Instead, the floor plane 415 may be estimated by the processing system or based on contextual assumptions.

The perspective view 405c and the side view 410c introduce a wall plane 420, which may also be assumed or estimated by the processing system. Floor and wall planes may be used as "clipping planes" to exclude depth data from subsequent processing. For example, based on the assumed context of using a depth sensor, the processing system may place the wall plane 420 in the middle of the maximum range of the depth sensor's field of view. Depth data values behind the plane may be excluded from subsequent processing. For example, portion 320a of background depth data may be excluded, but portion 320b may be retained, as shown in perspective view 405c and side view 410 c.

Ideally, the portion 320b of the background will also be excluded from subsequent processing, as it does not contain data relevant to the user. As shown in the perspective view 405d and the side view 410d, some embodiments further exclude depth data by "raising" the floor plane 415 to a position 415a based on context. This may result in portion 320b being excluded from future processing. These crop operations may also remove portions of the user data 310d that will not contain gestures (e.g., the lower torso). As mentioned previously, the reader will understand that this example is provided merely for ease of understanding, and in some embodiments (e.g., such as those systems appearing in fig. 2), the clipping may be omitted entirely, or may occur only very close to the floor, so as to still capture leg and even foot data. Thus, only portion 310c remains for future processing. It will be appreciated that figure 4 simply depicts one possible clipping process for a given context. Different contexts may be addressed in a similar manner, such as those where the gesture includes the user's lower torso. Many such operations may still require an accurate assessment of the floor 415 and wall 420 planes to perform a precise cut.

example depth data Classification method

After isolating depth values that may contain gesture data of interest (which may not occur in some embodiments), the processing system may classify the depth values into various user portions. These portions or "categories" may reflect particular parts of the user's body and may be used to infer gestures. FIG. 5 is an example component classification that may be applied to the isolated data of FIG. 4 in some embodiments. Initially 500a, the extracted data 310c may be unclassified. After classification 500b, each depth value may be associated with a given classification. The granularity of the classification may reflect the characteristics of the gesture of interest. For example, some applications may be interested in the direction the user is looking at, and thus may classify the head into a "head" category 515 and a "nose" category 520. Based on the relative orientation of the "head" category 515 and the "nose" category 520, the system may infer the direction in which the user's head is turning. Since the chest and torso are not generally related to the gesture of interest in this example, only the broad classifications "upper torso" 525 and "lower torso" 535 are used. Similarly, the details of the upper arm are not as relevant as the other parts, so a single category "right arm" 530c and a single category "left arm" 530b may be used.

Conversely, the lower arm and hand may be very relevant for gesture determination, and finer grained classification may be used. For example, a "lower right arm" category 540, a "right wrist" category 545, a "right hand" category 555, a "right thumb" category 550, and a "right finger" category 560 may be used. Although not shown, a complementary category for the left lower arm may also be used. With these granular classifications, the system can infer, for example, the direction the user is pointing by comparing the relative orientations of the classified depth points.

Example depth data processing pipeline

fig. 6 is a flow diagram illustrating some example depth data processing operations 600 that may be performed in some embodiments. At block 605, the processing system may receive a frame of depth sensor data (e.g., a frame of a category such as frame 305). Generally, the data may then pass through "preprocessing" 610, "classification" 615, and "application" 620 stages. During "preprocessing" 610, the processing system may use the frame data or perform "plane detection" at block 625 based on assumptions or depth camera configuration details (again, although preprocessing and plane detection may not be applied in many embodiments). This may include, for example, clipping planes such as floor 415 plane and wall plane 420 discussed with respect to fig. 4. These planes may be used, for example, to isolate depth values of interest at block 630, e.g., as described above with respect to fig. 4.

During classification 615, the system may associate a group of depth values with a category (or categories, in some embodiments) at block 635. For example, the system may use categories as discussed with respect to fig. 5 to determine the classification. At block 640, the system may determine per-class statistics (e.g., the number of depth values associated with each class, the impact on ongoing system training and calibration, etc.). Example categories may include: nose, left index finger, left other fingers, left palm, left wrist, right index finger, right other fingers, right palm, right wrist, and others.

during operation of the application 620, the system may use the category determination to infer user behavior related to a particular application target. For example, the HCI interface may seek to determine the location at which the user is currently pointing their hand. In this example, at block 645, the system classifies the selection/isolation as a depth value associated with "hand" and/or "finger". From these depth values (and possibly depth values associated with the user's arm), the system may estimate the direction in which the user is pointing in that particular frame at block 650 (one will recognize that other gestures besides the pointing example may also be performed). This data may then be published to applications, such as a kiosk operating system, a game console operating system, and the like. At block 655, operations may be performed again for additional frames received. One will recognize that this process can be used to infer gestures across frames by comparing, for example, the displacement of categories between frames (e.g., as a user moves their hand from left to right).

Fig. 7 is a hardware block diagram illustrating an example hardware implementation 705 that may be used to perform depth data processing operations in some embodiments. The frame receiving system 710 may receive depth frames from a depth sensor. The frame receiving system 710 may be firmware, software, or hardware (e.g., FPGA implementation, system on a chip, etc.). The frames may be passed directly, or cached and then passed to the pre-processing module 715. The pre-processing module 715 may also be firmware, software, or hardware (e.g., FPGA implementation, system on a chip, etc.). The pre-processing module may perform the pre-processing operation 610 discussed in fig. 6. The pre-processing results (e.g., the isolated depth values 310c) may then be provided to the classification module 720. Classification module 720 may be firmware, software, or hardware (e.g., FPGA implementation, system-on-a-chip, etc.). Classification module 720 may perform classification operation 615 discussed in FIG. 6. The sorted depth values may then be provided to the publish module 725. The publishing module 725 may be configured to package the classification results into a form suitable for various different applications (e.g., as specified at 620). For example, interface specifications may be provided for a kiosk operating system, a game operating system, and the like, to receive the classified depth values and infer various gestures therefrom.

Example Interactive System form factor (form factor)

Various embodiments may include a housing frame for one or more depth sensors. The housing frame may be specifically designed to predict user input and behavior. In some embodiments, the display system may be integrated with the housing frame to form a modular unit. FIG. 8 is a schematic diagram of an example wide screen display with a multi-angle depth sensor housing that may be implemented in some embodiments. For example, the system may include a large single display 835, with which the user 840 may interact via various spatial, temporal, and spatiotemporal gestures 830 using, for example, their hand 845, arm, or entire body. For example, the user may direct the movement of the cursor 825 by pointing with the fingers of their hand 845. The display 835 may communicate with the computer system 805 via, for example, a direct wire connection 810a, wireless connections 815c and 815a, or any other suitable means for communicating desired display output. Similarly, computer system 805 may communicate with one or more depth sensors contained in housing frames 820a-820c via direct wire connection 810b, wireless connections 815b and 815a, or any other suitable means for communicating desired display output. Although shown separately in this example, in some embodiments, the computer system 805 may be integrated with the housing frames 820a-820c or the display 835 or contained off-site.

each of the housing frames 820a-820c may contain one or more depth sensors, as described elsewhere herein. Computer system 805 may have a transformation that may be used to associate the depth data acquired at each sensor with a global coordinate system relative to display 835. These transformations may be implemented using a calibration procedure, or may be preset, for example, to factory defaults. Although shown here as separate frames, in some embodiments, frames 820a-820c may be a single frame. The frames 820a-820c may be secured to the display 835, a nearby wall, a separate mounting platform, etc.

while some embodiments specifically contemplate providing a display system coupled to a housing frame, it will be readily appreciated that the system may be configured in alternative ways to achieve substantially the same functionality. For example, FIG. 9 is a schematic diagram of an example projection display having multi-angle depth sensor housing frames 920a-920c that may be implemented in some embodiments. Here, the frames 920a-920c have been secured to a wall 935, such as a wall in the user's 940 office, home, or shopping environment. A projector 950 (it will be understood that rear projection from behind the wall 935 may also be used in some embodiments if the material of the wall 935 is suitable). The wall 935 may extend beyond the interaction area in many directions, as indicated by ellipses 955a-955 c. The projector 950 may be positioned such that a desired image is projected on the wall 935. In this manner, the user may again use their hand 945 to make the gesture 930 and thereby direct the movement of the cursor 925. Similarly, projector 950 may communicate with computer system 905 and depth sensors in frames 920a-920c via direct wire connections 910a, 910b, wireless connections 915a-915c, or any other suitable communication mechanism.

although fig. 8 and 9 describe example embodiments having a "monolithic" display, in some embodiments, the display and frame housing may be designed such that a "modular" unit is formed that may be integrated as a single unit. For example, fig. 10 is a schematic diagram of an example composite display 1035 with a multi-angle depth sensor housing frame set that may be implemented in some embodiments. Likewise, user 1040 may use hand 1045 gesture 1030 to interact with a displayed item, such as cursor 1025. Computer system 1050 (shown here in the field and separate from the other components) may communicate with depth sensors and displays via direct wire connections 1010a, 1010b, wireless communications 1015a, 1015b, 1015c, or any other suitable communication method.

However, in this example embodiment, each vertical segment of composite system 1035 may comprise a separate module. For example, one module 1060 may include a depth sensor housing frame 1020a and three displays 1035a-1035 c. Computer system 1050 may employ the individual displays of each vertical module to generate a collective composite image across one or more of them. The remaining depth sensor housing frames 1020b, 1020c may be similarly associated with their own displays. It will be understood that in some embodiments, each module will have its own computer system, and as shown here, in some embodiments there may be a single computer system associated with several or all of the modules. The computer system(s) may process the depth data and provide the images to displays on their respective module(s).

Example Modular Interactive System dimension

FIG. 11 is a schematic diagram of a composite display with the multi-angle depth sensor housing of FIG. 10, including a turnaround of modular components 1110c in the system, which may be implemented in some embodiments. In particular, modular component 1110c is shown from a perspective view 1115a, a front view 1115b, and a side view 1115 c. The computer system 1105, or a separate computer system, may be used to control one or more displays and receive and process depth sensor data. The computer system 1105 may be used to control the display and processing data for all of the components 1110a-1110c, or to control only a single component, such as the component 1110 c.

fig. 12A-12C provide more detail regarding specific dimensions of a particular example composite display. In particular, FIG. 12A is a schematic bottom view of a composite display having the multi-angle depth sensor housing of FIG. 10 that may be implemented in some embodiments. In this example, the modules are arranged together to produce a display grid 1240, the display grid 1240 having a composite width 1215d of about 365 centimeters and having a height 1215b of about 205 centimeters in some embodiments. In some embodiments, the depth sensor housing height 1215a can be about 127 mm. Each display may have a width 1215c of about 122 centimeters in some embodiments and a height 1215f of about 69 centimeters in some embodiments. In some embodiments, the display may be an HDMI display having a resolution of 1920 x 1080 pixels. In some embodiments, the display 1240 may be elevated from the ground 1225 via the support structure 1245 by a distance 1215e of about 10 centimeters. On top of the display 1240 may be one or more depth sensor housing frames 1205, shown here transparently, to expose one or more of the depth sensors 1210a-1210 c.

FIG. 12B is a schematic top view of a composite display with the multi-angle depth sensor housing of FIG. 10 that may be implemented in some embodiments. Note that the depth sensor and the housing are not shown again for ease of understanding. Within region 1225d, the depth sensor may be capable of collecting depth data. Thus, user 1235 will stand within the area when interacting with the system. In some embodiments, the area may have a distance 1230f of about 300 centimeters in front of the display 1240 and be about the width 1215d of the display. In this embodiment, the side areas 1225a and 1225c may be excluded from the interaction. For example, the user may be notified to avoid attempting to interact within these areas because they include less than ideal relative angles to the depth sensors distributed on the system (in some embodiments, these areas may simply produce too much noise and be unreliable). The installation technician can mark or block these areas accordingly. These areas 1225a and 1225c can include lengths 1230b, 1230g of about 350 centimeters from the wall 1250 in some embodiments and distances 1230a, 1230h of about 100 centimeters from the active area 1225d in some embodiments. Area 1225b may be provided between support structure 1245 and wall support structure 1250 or other barrier to facilitate space for one or more computing systems. Here, a length 1215d may be reserved for this computing system space in some embodiments, and a distance 1230d of about 40 centimeters is used. In some embodiments, the support structure 1245 may extend throughout the area 1225b, and the computer system may reside thereon or within.

FIG. 12C is a side schematic view of a composite display having the multi-angle depth sensor housing of FIG. 10 that may be implemented in some embodiments.

It will be appreciated that the example dimensions provided above are merely in conjunction with this particular example to assist a user in understanding a particular embodiment. Thus, the dimensions can be easily changed to achieve substantially the same purpose.

example depth sensor frame-bracket mounting for Modular System

in various embodiments, the housing frame used to protect the depth sensor may take a variety of forms. FIG. 13A is an exploded schematic view of components in a frame of a multi-angle depth sensor housing frame that may be implemented in some embodiments. FIG. 13B is an assembly schematic of components in a frame of a multi-angle depth sensor housing that may be implemented in some embodiments.

The frame may include a top cover 1310, a back cover 1315, a bottom plate 1340, and two sensor viewing panels 1355a and 1355B (illustrated in fig. 13B, but not in fig. 13A, for visual economy). The viewing panels 1355a and 1355b may be screwed into place (e.g., into one or more of the bracket spacers 1305a-1305f, e.g., with washers that secure screws on opposite sides), clamped or otherwise mechanically coupled to the housing, and may also be held in place by the angled portions 1335 and 1340 a. The upper cover 1310 may have a length 1360b of about 1214mm in some embodiments and a width 1360a of about 178mm in some embodiments. In some embodiments, height 1360c may be approximately 127 mm.

end panels 1305a and 1305f may be configured to anticipate the desired angle at which top cover 1310, back cover 1315, and two sensor viewing panels 1355a and 1355 b. In particular, in some embodiments, angle 1370a may be about 25 °, in some embodiments, angle 1370b may be about 35 °, and in some embodiments, angle 1370c may be about 30 °. For clarity, in the depicted embodiment, the top 1310 and bottom 1340 are substantially parallel. Thus, in this example, the angle between the top plate 1310 and the back plate 1315 may be about 90 °. Similarly, the angle between the bottom plate 1340 and the back plate 1315 may be about 90 °. These angles not only can present a more aesthetically pleasing design, but they can also promote the structural integrity of the housing as a whole by conforming to the spacer dimensions.

in some embodiments, length 1375a may be about 97mm, in some embodiments, length 1375b may be about 89mm, in some embodiments, length 1375c of cover ridge 1335 may be about 6mm, in some embodiments, length 1375d of sensor viewing panel 1355a may be about 56mm, in some embodiments, length 1375e of sensor viewing panel 1355b may be about 54mm, and in some embodiments, length 1375f may be about 10 mm.

the upper cover 1310 can include a portion 1325 substantially parallel to the floor portion 1340, an angled portion 1330, and an angled retaining portion angled portion 1335 for retaining the upper sensor viewing panel 1355 a.

the back plate 1315 may include four cut-out grooves or inserts 1320a, 1320b, 1320c, and 1320 d. As discussed herein, these recesses may be present in some embodiments to receive the spacers 1305b-1305e to ensure that they are secured in a desired position within the housing. It will be appreciated that the number of grooves may be the same as or different from the number of spacers, as it may be desirable to secure only some of the spacers.

bottom panel 1345 can include an angled front portion 1340a (tab or fold) and an angled rear portion 1340b for at least partially retaining adjacent panels. The bottom panel 1345 may include two cut-out inserts 1350a and 1350b on its angled rear portion 1340 b. This may result in "raised" portions 1345a, 1345b, and 1345c of the angled rear portion 1340 b.

within the frame may be one or more of spacer carriers 1305a-1305f (also referred to simply as "spacers" or "carriers"). While the spacers 1305a and 1305f may serve as end plates, the spacers 1305b-1305e may be entirely or substantially within the housing frame. The spacer carriers 1305a-1305f need not have the same dimensions. For example, the support bracket 1305d may have a shorter length than the spacer brackets 1305b, 1305c, 1305 e. As discussed below, spacer brackets 1305a-1305c, 1305e, and 1305f may be used to ensure the structural integrity of the housing, even when, for example, a load is placed on top of portion 1325. The shorter bracket 1305d provides space for mounting a sensor pair, but may also contribute to the structural integrity of the housing. In some embodiments, bracket 1305d may be secured to base plate 1340 and upper cover 1310 by screws.

FIG. 14A is a view from two perspectives 1405a and 1405b of a spacer component such as component 1305a or 1305e of FIG. 13A, which may be implemented in some embodiments. The view angle 1405a is rotated substantially 90 deg. relative to the view angle 1405b to present a concave portion formed by the extensions 1425a-1425 f. Extensions 1425a-1425f may themselves be separated by spaces 1410a-1410 f. FIG. 14B is a view from two angles 1420a and 1420B of a mirror image spacer assembly such as assembly 1305B or 1305f of FIG. 13A that may be implemented in some embodiments. The viewing angle 1420b is rotated substantially 90 deg. relative to the viewing angle 1420a to present a concave portion formed by the extensions 1430a-1430 f. The extensions 1430a-1430f may themselves be separated by spaces 1415a-1415 f.

FIG. 15 is a view of a cradle bracket assembly, such as assembly 1305d, of FIG. 13A which may be implemented in some embodiments, as seen from two angles 1505a and 1505 b. Viewing angle 1505a is rotated substantially 90 deg. relative to viewing angle 1505b to present a concave portion formed by extensions 1520a-1520 e. The extensions 1520a-1520e may themselves be separated by spaces 1515a-1515 d.

FIG. 16 is an exploded schematic view of components in a frame of a multi-angle depth sensor housing that may be implemented in some embodiments, including a bracket spacer that the sensor pair receives. Again, although a particular number of spacers are shown in the images to facilitate understanding, it will be understood that more or fewer spacers than depicted herein may be present in different embodiments. As discussed above, the housing may include an upper plate 1610, a back plate 1615, and a backplane 1640 (for visual economy, the sensor viewing panel is not shown). While the spacers 1605a, 1605b, 1605c, 1605e, and 1605f provide structural support for the housing, the gantry spacer bracket 1605d can be shorter and recessed relative to the other spacers 1605a, 1605b, 1605c, 1605e, and 1605f to accommodate coupling with a pair of depth sensors 1660. In this example, the stent spacer bracket 1605d shown in isolated view 1600a has two holes 1670a, 1670b for receiving screws, nails, bolts, or other means for securing the depth sensor 1660 to the stent spacer bracket 1605 d. For example, the depth sensor 1660 itself may include mounting holes through which screws may be passed to secure the sensor in place (e.g., as is the case with some versions of the RealSenseTM depth sensor system).

In some embodiments, the spacers 1605b-1605e may be free to move within the housing while the spacers 1605a and 1605f are secured to each end of the housing. In this manner, the installation technician can configure the system to the system environment and the particular environment in which it is planned to be used. However, in some embodiments, the recesses 1620a-1620d may receive each of the spacer carriers 1605b-1605e to ensure that they are placed in a particular position within the housing. This predetermined positioning may be useful, for example, when the housing is transported as one of a set of housings to be installed as part of a composite device. In some embodiments, the recess may only accommodate a particular spacer, thereby forcing a technician to install a particular configuration. However, in some embodiments, for example, as shown herein, each recess may be capable of receiving any one of four spacers. In these embodiments, the technician is thus free to choose at which of the four positions the depth sensor is best positioned, thereby fulfilling their task. Thus, in the schematic top view 1600b shown here, the spacer 1605d and fixed sensor pair 1660 can be located off center of the housing.

To further clarify possible motivations for spacer placement discussed with reference to fig. 16, fig. 17 is a schematic diagram of a possible sensor placement configuration in a multi-angle depth sensor housing of an example user interface that may be implemented in some embodiments. In particular, user interface 1700 includes three cells 1705a-1705c in series. Each unit may include three vertically adjacent displays and corresponding sensor housings 1710a-1710 c. Within each sensor housing 1710a-1710c, the spacer position can be configured to anticipate the role of the housing's depth sensor in the overall user interface. For example, the position of the sensor pairs may vary slightly between each module because the optical spacing of the depth sensors is different than the spacing of each display screen.

thus, as shown in the schematic top cross-sectional view 1715b for the middle sensor housing 1710b, the shortened mount bracket 1720b and corresponding depth sensor pair can be positioned at the center of the housing 1710 b. In contrast, as shown in the schematic top cross-sectional view 1715c for the right sensor housing 1710c, the shortened mount bracket 1720c and corresponding depth sensor pair can be positioned at an offset 1725b from the center of the housing 1710 c. Similarly, as shown in the schematic top cross-sectional view 1715a for the left sensor housing 1710a, the shortened mount bracket 1720a and corresponding depth sensor pair can be positioned at an offset 1725a relative to the center of the housing 1710 a.

Example depth sensor frame-alternative bracket mounting for a modular system

FIG. 18 is a number of schematic views of an alternative bracket mounting assembly that may be used in some embodiments. Some sensor systems, such as RealSenseTM 300, may have mounting points that are better accommodated by certain form factors. The bracket mounting assembly of fig. 18 may better facilitate mounting such a system.

In particular, as shown in side view 1800a, spacer bracket 1805 may include multiple extensions. These extensions can include extensions 1805a and 1805b, where extensions 1805a have a lip for at least partially retaining upper viewing panel 1825a and extensions 1805b include a lip for at least partially retaining lower viewing panel 1825 b. As discussed above, these extensions may form a package. A support 1820 may be placed within the enclosure. The support may include a flat planar side 1820d adjacent to or forming a part of a surface of the spacer bracket 1805. A top planar portion 1820b and a lower planar portion 1820c extending from the planar side 1820d may be used to secure the support 1820 within the bracket spacer 1805. Front view 1800b (i.e., a perspective view of a person standing in front of depth sensors 1815a and 1815 b) removes spacer bracket 1805 and viewing panels 1825a, 1825b shown in side view 1800a and shows stand support 1820 from the "front view". Thus, the reader can more easily discern the top and lower planar portions 1820b, 1820c extending from the planar side 1820d of the stent support 1820 in the view 1800 b.

The top and lower planar portions 1820b, 1820c may be used to secure the support 1820 in various ways. For example, a screw may pass through the extension 1805a and the top planar portion 1820b, but in some embodiments only friction may be sufficient.

The support 1820 may also include an extended planar surface 1820 a. The extended flat surface 1820a may be used to couple the bracket support 1820 with the sensor mount 1810. The view 1800f of the support 1820 removes the other components (spacer bracket 1805, sensor mount 1810, viewing panels 1825a, 1825 b). Thus, the surface 1820a may be more easily discerned in this view (indicated by the dashed lines in 1800b, portions of the sensor mount 1810 and surface 1820a may be obscured by the sensors 1815a, 1815b from view from the front).

the sensor mount 1810 can provide a stable fixture for receiving the depth sensor systems 1815a and 1815 b. The view 1800c provides a view from the right side of the sensor mount 1810 (to the "right" when looking at the portion of the sensor mount 1810 that receives the depth sensor systems 1815a and 1815 b). View 1800d provides a view from the left side of the sensor mount 1810. The view 1800e provides a view from the front of the sensor mount 1810. The sensor mount 1810 can include a plurality of holes for receiving screws or other fixation devices to join the viewing panels 1825a, 1825b, the depth sensor systems 1815a and 1815b, the sensor mount 1810, and the bracket support 1820 into a composite structure.

In particular, the depicted example has eight holes for securing the composite structure. The bracket holes 1830c and 1830d can be used to secure the sensor mount 1810 to the support 1820 via the surface 1820 a. The view panel aperture 1830a can be used to secure the upper view panel 1825a to the sensor mount 1810 and the view panel aperture 1830b can be used to secure the lower view panel 1825b to the sensor mount 1810. Sensor holes 1830f and 1830e may be used to secure the upper depth sensor system 1815a to the sensor mount 1810. Similarly, sensor holes 1830h and 1830g may be used to secure the lower depth sensor system 1815b to the sensor mount 1810.

FIG. 19A is a perspective schematic view of portions of the alternative bracket mounting assembly of FIG. 18 in an exploded unassembled state as may be achieved in some embodiments. In the unassembled state, the sensor mount 1810, the stent support 1820, and the depth sensor systems 1815a and 1815b can be unconnected. Fig. 19B illustrates how the sensor mount 1810 and the stent support 1820 can be coupled by inserting screws, pins, or other coupling mechanisms through the holes 1830c and 1830 d. Fig. 19C is a perspective schematic view of sensor mount 1810, stent support 1820, and depth sensor systems 1815a and 1815b all coupled together. As indicated, screws, pins, or other coupling mechanisms may be inserted through holes 1830f and 1830e to enter fixation mechanisms 1905a and 1905b, respectively, of depth sensor system 1815a (although not visible in the figures, it will be understood that, although not identified in fig. 19C, screws, pins, or other coupling mechanisms may be similarly inserted through holes 1830h and 1830g visible in fig. 18) into fixation mechanisms of sensor system 1815 b. The upper viewing panel 1825a may then be secured by passing a screw, pin, or other coupling mechanism through the upper viewing panel 1825a and into the aperture 1830 a. Similarly, the lower viewing panel 1825b can then be secured by passing a screw, pin, or other coupling mechanism through the viewing panel 1825b and into the aperture 1830 b. Friction or grooves may be used, for example, to ensure a secure fit in each hole 1830a-1830h for a respective screw, pin, or other coupling mechanism.

example depth sensor frame of Modular System- "independent" installation

As described above, rather than securing one or more depth sensor pairs to one or more bracket brackets, various embodiments may secure the depth sensor pairs directly to the housing, or to a fixed mounting structure within the housing. For example, fig. 20A is a "perspective" view of a housing frame 2005 of a multi-angle depth sensor housing that may be implemented in some embodiments, including a depth sensor attached not via a bracket, but via a "stand-alone mount". In particular, one or more paired sensor arrangements 2010a and 2010b may be placed within the frame 2005. In this example, the depth sensor is again similar to the form factor of the RealSenseTM depth sensor system, but one will readily appreciate variations that employ other depth sensors. As indicated by ellipses 2015, there may be more than two illustrated sensor mounts, and the mounts may be arranged in a substantially linear arrangement.

the mount itself may typically include two sensors at different angles and a mounting bracket. For example, fig. 20B is a schematic view of a level sensor mount that may be implemented in some embodiments. The top depth sensor system 2020 may be mounted above the depth sensor system 2030 and at an angle relative to the depth sensor system 2030. Each sensor may (as in the example of a realsense depth sensor system) include, for example, an infrared transmitter 2020c, an infrared receiver 2020b (e.g., some embodiments may operate in the range of about 850 nm), and a connection 2020a (e.g., a USB connection, a firewire connection, a wireless bluetooth connection, etc.) to a computer system that is the overall management modular unit or display system. Some depth sensors additionally include RGB sensors, as discussed in more detail below, but as illustrated here, this need not be the case in all embodiments. For horizontal mounts, the extension 2025b may attach the mount 2025a to a frame housing or to a stand supporting the display system. Fig. 20C is a schematic view of a vertical sensor mount that can be implemented in some embodiments. For a horizontal mount, the extension 2030b may attach the mount 2030a to a frame housing or to a stand supporting a display system. In some embodiments, screw holes for receiving either the extension 2025b or the extension 2030b may be provided in the same mount to allow the installation technician flexibility in their configuration.

FIG. 20D is a schematic view of a field of view of varying depth regions achieved using the sensor mount of FIG. 20B or 20C that can be implemented in some embodiments. Although shown here as non-overlapping to facilitate understanding, it will be understood that in some embodiments, for example, as shown in fig. 25A and 25B, the fields of view for each sensor of a sensor pair may substantially overlap (which may be true for any of the disclosed mounting types including, for example, bracket mounts, independent mounts, and so forth). In particular, given the depth sensors 2050a and 2050B, the angle caused by installation using the mount of fig. 20B or 20C may produce corresponding fields of view 2055a and 2055B relative to the floor 2040.

FIG. 21A is a schematic diagram of an example multi-angle independent depth sensor mount 2105 with a transparent depth sensor representation in its relative position that may be implemented in some embodiments. FIG. 21B is a schematic view of the multi-angle depth sensor mount 2105 of FIG. 21A without a depth sensor that may be implemented in some embodiments. As illustrated, the mount 2105 may include a retaining extension 2115, the retaining extension 2115 including angled portions 2115a and 2115b configured to receive and retain the depth sensors 2110a, 2110 b. However, as discussed herein, various embodiments may use screws, clamps, or other fixation devices to couple the sensor with the mount instead of or in addition to retaining extension 2115.

For example, FIG. 22 is a schematic side view of various dimensions of an example multi-angle independent depth sensor mount that may be implemented in some embodiments. A complementary side view 2205a of the mount 2210 is a projected view 2205b on a plane below the mount. As indicated, the mount 2210 may be mounted to the housing via vertical screws (e.g., through the bottom of the housing) that enter the receivers 2240. Regions 2220a and 2220b indicate locations where each of the first and second depth sensor systems may reside on the mount (however, as discussed herein, there need not be two locations padded in each embodiment). Each sensor system may have a receiver to accept a mounting screw, pinion, or other securing mechanism. The access passages 2215a and 2215b allow such securing mechanisms to access the receivers, thereby coupling the depth sensor system to the mount 2210. These can be brought into positions corresponding to distances 2230a and 2230b from the mount (in this embodiment about half the width of the depth sensor system). In some embodiments, distances 2230a and 2230b may be substantially 5 mm. In some embodiments, angle 2245 may be substantially 25 °. In some embodiments, lengths 2225a and 2225d may be substantially 12 mm. In some embodiments, lengths 2225b and 2225c may be substantially 38 mm.

thus, in some embodiments, the depth sensor system may be pair-wise mounted to a separate mount or coupled with a bracket. In some embodiments, the pair may be bolted directly to the housing panel (e.g., without a retainer). It will be appreciated that various embodiments may use only a particular sensor placement mechanism, or may use a combination of mechanisms within the housing.

for example, FIG. 23 is an exploded schematic view of components in a frame of a multi-angle depth sensor housing including only independently mounted sensor pairs, as well as a schematic top cross-sectional view of an assembled structure, which may be implemented in some embodiments. Within the housing formed by the top cover 2310, back plate 2315, bottom plate 2340 and viewing faceplate (not shown), there may be one or more independently mounted pairs of depth sensors 2360a-2360e (in some embodiments, when a separate bracket is used, recesses 1620a-1620d may be omitted from back plate 2315). These pairs may be separated by zero or more spacers 2305a-2305 e. The number of intermediate spacers 2305b-2305d may be determined, for example, based on the material used for the housing and the desired structural integrity of the housing for the intended deployment environment.

In contrast to the dedicated independent mounts of FIG. 23, FIG. 24 is an exploded schematic view of the components in the frame of a multi-angle depth sensor housing including independently mounted and bracket mounted sensor pairs, as well as a schematic top cross-sectional view of the assembled structure, which may be implemented in some embodiments. Likewise, the housing may include a back plate 2415, a top plate 2410, and a bottom plate 2440. Between the spacers 2405a-2405c, 2405e, 2405f, there may be placed a pair of independently mounted depth sensors 2460a, 2460b, 2460d and a pair of bracket mounted depth sensors 2460c (mounted to, for example, bracket 2405 d). In this example, the mount has all of its possible attachment points padded with sensors, however, in some embodiments less than all points may be used. In addition, there is one mount per space between the spacers. It will be appreciated that in some embodiments more than one mount may be placed in a compartment, and some compartments may lack mounts. Additionally, although not shown here, it will be understood that some or all of the depth sensor pairs may be mounted directly to the sensor viewing panels 1355a and 1355b or to the rest of the housing.

FIG. 25A is a side view in various dimensions for an example interaction system with a multi-angle depth sensor housing 2520 that may be implemented in some embodiments. Fig. 25B is a schematic side view of a combined view for the system of fig. 25A. Note that depth field results as depicted in fig. 25A and 25B can be achieved using depth sensor pairs mounted to housings, independent mounts, bracket brackets, and the like. The angle 2505a reflects the angle of the sensor viewing panel 1355a relative to the floor plane, and in some embodiments may be about 65 °. The angle 2505b reflects the angle of the sensor viewing panel 1355b relative to the floor plane, and may be about 30 ° in some embodiments. In some embodiments, the upper sensor within housing 2520 may have a field of view 2510a of about 60 °. In some embodiments, the lower sensor within housing 2520 may also have a field of view 2510b of about 60 °. The upper depth sensor may be a height 2515a (in some embodiments, about 2.065 meters) from the floor 2530, and the lower depth sensor may be a height 2515b (in some embodiments, about 2.019 meters) from the floor 2530. In this example, the active area for interacting with the interface system may extend a distance 2525a (in some embodiments, about 3 meters) from the display.

Together, the fields of view 2510a and 2510b may result in a composite field of view 2510, which may be about 95 ° in some embodiments.

example depth sensor frame for Modular System-RGB Camera variants

FIG. 26 is a "perspective" view of a housing frame for a multi-angle depth sensor housing 2605 in which both a depth sensor and a visual image sensor are mounted, which may be implemented in some embodiments. For example, the depth sensor system 2620a may have a depth sensor transmitter 2625b and a depth sensor receiver 2625a (the depth sensor system 2620b may likewise have a corresponding transmitter and receiver). While the depth sensor viewing panels 2640b and 2640a may allow, for example, infrared frequencies to pass, the viewing panels 2640b and 2640a may be opaque to visual frequencies when transmitting from the depth sensor transmitter 2625b and when receiving at the depth sensor receiver 2625a (e.g., as an aesthetic or functional desire to prevent a user from seeing the depth sensor). Accordingly, some embodiments contemplate incorporating apertures or alternative materials into the depth sensor viewing panels 2640b and 2640a at locations 2635a-2635d to facilitate capture of images within the visual wavelength range. For example, as in the case of the realsense depth sensor, the RGB cameras 2630a, 2630b may already be integrated into the depth sensor system. In some embodiments, separate RGB cameras 2630c and 2630d dedicated to visual image capture may be used. Mounts may be adjusted and reused for these RGB-specific sensors to help ensure that they achieve the same or similar field of view as the various depth sensors. In some embodiments, sensors dedicated to depth capture information, such as sensor 2645, may also be present within the housing frame. Thus, as indicated by the ellipses 2610a and 2610b, there may be more mounts and sensors within the housing than are shown here.

Example gesture- "trigger Point"

Various embodiments described herein may be used to identify a library (corpus) of gestures and sub-gestures. The "gesture" itself may be represented as a sequence of one or more consecutive relationships between different parts of the user. A "sub-gesture" may be a sub-sequence of the sequence. For example, FIG. 27A is a schematic illustration of a user "trigger point" gesture that may occur in some embodiments. In the initial front view 2705a, the user may form a "gun" shape with their hand 2715 a. Here, only the user's left hand 2715a is shown, but gestures can be performed with either or both hands. Similarly, in this example, the user's thumb need not be extended to mimic the trigger mechanism on a real gun (e.g., as illustrated in hand variation 2750). Whether the extension of the thumb can be used to recognize the gesture can be based on context (e.g., when running a gaming application that expects a trigger gesture, such a specific action may not be needed, as compared to, for example, a general menu selection, which can distinguish between a trigger action and a general pointing). In some embodiments, as shown in side view 2710a, the user's forearm may be rotated so that the thumb is at the top of the hand.

in some embodiments, the system may recognize the user's hand in the orientation of side view 2710a as establishing a "trigger point" gesture, even without continuous temporal variation. However, by extending their arms forward, or 2720 forward and upward, as in front view 2705b and side view 2710b, the system can take on a continuum of states to compose a gesture (e.g., as a "launch" action in a gaming environment). In particular, FIG. 27B is a schematic diagram of a correspondence that may be used to identify a user "trigger point" gesture in some embodiments. At each depth frame, the system may determine a first time instant or average depth value, here the centroid point 2725, for all depth values classified as the user's head 2730 a. Similarly, the system may determine a first time 2735 of depth values that are classified as part of the user's hand 2730 b. The system may then determine a three-dimensional vector 2740a from one of these locations to another. This vector, in combination with the orientation and geometry of the user's hand 2715a, may alone be sufficient to determine whether a "trigger point" gesture is valid. However, in some embodiments, the system may compare the vector to subsequent vector 2740 b. If the comparison yields the expected relationship (e.g., translation vectors within a certain threshold), then the system may conclude, in conjunction with the hand geometry: the trigger action is valid. Thus, the gesture may be recognized as a series of conditions, for example, portions of the user's anatomy fulfilling certain geometric conditions (e.g., "palm up," "extended thumb," etc.), while also creating various vector relationships (e.g., various vector relationships between user parts are transformed within certain ranges).

Example gesture- "nudge"

FIG. 28A is a schematic illustration of a user "nudge" gesture that may occur in some embodiments. In particular, as shown in front view 2805a, side view 2810a, and top view 2855a, a user may initially have the palms of both hands 2815a and 2815b facing outward in front of them. At a subsequent time, the user may move both hands 2815a and 2815b forward 2820 as shown in front view 2805b, side view 28l0b, and top view 2855 b.

again, the system may detect gestures by comparing various vector relationships. In particular, fig. 28B is a schematic diagram of a correspondence that may be used to identify a user "nudge" gesture in some embodiments. At the starting position, vectors 2850a and 2850b may be obtained from the centroid 2830a of the head classification depth value 2825a to the centroid of each of the user's left and right hands 2830b and 2830 c. Likewise, centroids 2830b and 2830c may be determined as first moments of the pixels of the user's left hand 2825b and right hand 2825c, respectively. The change in vectors (e.g., the transformation from vectors 2850a and 2840a to vectors 2850b and 2840b, respectively) in conjunction with the user's hand orientation may be used as the basis for a system for determining that a gesture is present. It is understood that "first time of day", "means", "centroid" and the like may be used interchangeably herein. Fixed offsets from the grouping of depth values may also be used in some embodiments.

Example gesture- "open"

FIG. 29A is a series of schematic front and side views of a step in an "open" gesture that may occur in some embodiments. Initially, as shown in front view 2905a, side view 2910a, and top view 2915a, the user may initially bring the palms 2920a and 2920b of the hands face outward and close together in front of them. At a subsequent time, as shown in the front view 2905b, side view 2910b, and top view 2915b, the user may move both hands 2915a and 2915b forward 2920. As shown in the front view 2905b, side view 2910b, and top view 2915b, the user may move their hands 2920a and 2920b progressively further apart as if a pair of valances were separated or a coat was moved in a hatrack, etc.

FIG. 29B is a front and top schematic view that may be used in some embodiments to detect various correspondences of the "open" gesture of FIG. 29A. In particular, the vectors 2940a and 2945a may be formed by comparing the centroid 2925a of the user's head classification depth value 2930a with the centroids 2925b and 2925c of the user's right-hand classification depth value 2930b and left-hand classification depth value 2930 c. One or more computer systems associated with the user interface may then compare vectors 2940a and 2945a to vectors 2940b and 2945b, which in conjunction with the user's hand orientation at a subsequent time results in a determination of the performance of the "uncover" gesture.

example gesture- "swipe"

much like a finger swipe used on some handheld devices, some embodiments may recognize forearm gestures as corresponding to similar "swipe" functionality. FIG. 30A is a series of schematic front, side, and top views of a step in a "swipe" gesture that may occur in some embodiments. Initially, as shown in front view 3005a, side view 3010a, and top view 3015a, the user may initially extend one arm substantially from its side such that hand 3020a is parallel or nearly parallel to the plane of the user's torso. At a subsequent time, as shown by front, side and top views 3005b, 3010b, 3015b, 3005c, 3010c, 3015c, the user may move hand 3020a across their torso, as indicated by motion arrows 3025a and 3025 b. In some embodiments, a small lateral movement of the hand at the user's wrist, rather than the larger arm sweep shown here, may also be identified as a "swipe".

FIG. 30B is an elevational and overhead schematic view that may be used in some embodiments to detect various correspondences of the "swipe" gesture of FIG. 30A. In particular, the system may determine a vector 3040a from a center 3035a of the head classification depth value 3030a to a center 3035b of the user hand classification value 3030 b. The system may then compare vector 3040a with subsequently determined vector 3040b, in conjunction with the user's hand orientation, to determine that a "swipe" gesture has been performed.

Naturally, the swipe gesture may be performed by either hand. The gesture may be used, for example, to cycle through options in a menu or close a menu dialog. Thus, one will appreciate that gestures can be performed in the opposite direction with the other hand, in the other direction (e.g., vertically up and down), with a swipe of the back of the hand rather than the palm, and so forth.

example gesture- "circle"

FIG. 31A is a series of schematic front and side views of a step in a "circle" gesture that may occur in some embodiments. In particular, as shown in the front views 3105a-3105e and side views 3110a-3110e, the user may rotate their left and right hands 3115b and 3115a (from the perspective of the user) counter-clockwise and clockwise rotation groups (although one will appreciate that rotation in opposite directions may also be recognized in their own gestures). The rotation may be reflected as a series of smaller relationships between the hand positions as evidenced by the motion arrows 3120a-3120d and 3125a-3125 d. Fig. 31B is a composite frontal view 3130 of hand orientations associated with a correspondence that may be used to detect the "circle" gesture of fig. 31A in some embodiments. For example, the system may decompose a circle into a series of frame positions, here represented by the numbers 1-4. Eventually, the user may return their hand to substantially the same position as when they started the gesture (e.g., as in views 3105a and 3110 a).

Fig. 31C is a composite front view 3135 that may be used in some embodiments to detect a correspondence of the "circle" gesture of fig. 31A. In particular, the entire circular motion may be divided into a series of hand positions, which themselves may be represented by a plurality of vectors 3150a-d from the center 3140 of the head classified depth value pixel 3145 to the center of the user's right-hand classified depth value and a corresponding plurality of vectors 3155a-d from the center 3140 of the head classified depth value pixel 3145 to the center of the user's left-hand classified depth value pixel.

One will appreciate that additional sub-gestures may be created in a similar manner. For example, the individual circular motion of each hand may itself be used as a "one-handed circle" gesture. In addition, the direction of rotation may be reversed. Ellipses and other arbitrary hand movements may also be detected via a series of vector relationships.

Example gesture- "squat-squat"

not all gestures need to be performed with the user's hand. Additionally, the vectors used to recognize the gestures may be between consecutively captured frames, rather than between components within a single frame.

for example, fig. 32A is a series of schematic front and side views of a step in a "squat" gesture that may occur in some embodiments. FIG. 32B is a front and top schematic view that may be used in some embodiments to detect various correspondences of the "squat" gesture of FIG. 32A. Generally, as indicated in the front view 3205a, 3205b and side view 3210a, 3210b, by bending their knees and torso, a user may cause one or more correspondences that may be used to recognize a "squat" gesture to the system.

In some embodiments, the system may use a single vector 3240 taken from the center 3220b of the user's torso classified depth value 3215b to the center 3220a of the user's head classified depth value 3215a in an earlier frame to identify the performance of the "crouch" gesture. For example, the vector may point generally upward. However, as the user lowers their head, the vector may decrease in size or even change direction. Such a change in direction may be used to recognize a "crouch" gesture (although one will readily appreciate that other correspondences, such as between different times on the head itself, may also be satisfied).

example gesture detection method-example Process

FIG. 33 is a flow diagram illustrating aspects of a gesture detection process 3300 that may be implemented in some embodiments. At block 3305, the computer system may receive one or more new depth frames from one or more depth sensors on the interface.

The system may then consider the newly acquired frame and any previous frames in conjunction with the template at block 3315 until all gesture templates have been considered at block 3310 or until a template matching the acquired frame is found at block 3320. In the event that no matching frame is found after all templates are considered, the system may continue to acquire new depth frames. The template may simply be a stored set of sequential conditions, the fulfillment of which may be recognized by the computer system as corresponding to a successful completion of the gesture.

however, if a match occurs at block 3320, the system may output the gesture corresponding to the template, for example, to an application waiting for user input in the form of a recognized gesture at block 3325 before resetting all templates at block 3330. In some embodiments, "resetting" a template may simply refer to marking or clearing a flag such that the template does not consider a frame from the currently recognized gesture in its subsequent evaluation. For example, it may be desirable after the system recognizes a gesture "new start" rather than misinterpreting the end of a previous gesture as the start of a subsequent gesture. As discussed below, some embodiments may instead recognize a gesture and its sub-gestures.

FIG. 34 is a flow diagram illustrating aspects of a gesture template fulfillment determination process 3400 that may be implemented in some embodiments. For example, the process 3400 may occur as part of blocks 3315 and 3320. The template may include a data structure with a series of condition elements that must be fulfilled in order for the gesture to register with the system.

at block 3405, the template process may receive a new frame (e.g., the newly acquired frame at block 3305 may be made available to the template at block 3320). At block 3410, the system may determine a correspondence between the frame and zero or more previous frames depending on the gesture. For example, as discussed above, a template for a "squat" gesture may compare a vector between the center of the user's torso at the previous frame and the center of the user's head in the new frame to detect the gesture. Instead, the system may determine a vector between the user's head and the user's hand to see if a sequence of such vectors fulfills the condition for a "circle" gesture.

If the template element (e.g., sequence of correspondences, sequence of gesture orientations, etc.) does not coincide with the input frame at block 3415, the template may reset itself at block 3425. Conversely, if the frame continues to be consistent with the template at block 3415 (e.g., if the next unfulfilled set of elements is consistent), the system may continue record fulfillment at block 3420. For example, a gesture may require a series of vector correspondences, which are reflected in the template elements. When the sequence is interrupted, the template may "give up" the current match and start over.

as one example, consider a "circle" gesture template. The elements of the template may require that the first frame have a left hand and a right hand of the user positioned and oriented substantially as shown in views 3105a and 3110 a. Subsequent frames should then follow the path constraints established using vectors 3150a-d and 3155a-d to fulfill the remaining elements of the template. If the user's hands are away from these constraints, the element will not be fulfilled and the template may be reset at block 3425. Conversely, if fulfillment of all elements continues until the user has then returned substantially to the locations indicated in views 3105a and 3110a at block 3430, the system may determine that the template has been fulfilled and record this in the output at block 3435 (e.g., causing the system to transition from block 3320 to block 3325).

Example gesture detection method-example gesture Structure

some embodiments recognize the gesture as a discrete unit. For example, FIG. 35A is an example tree diagram illustrating a correspondence between multiple gestures that may occur in some embodiments. In particular, each edge 3510a-3510i represents a fulfillment of an order element constraint (or set of constraints for a given frame). Some gestures may share initial constraints (nodes 3505a-3505e are provided for reader reference only to identify mutually exclusive elements). For example, all three gestures D, E and F (3515d-3515F) begin at the same condition 3510 c. According to the above example, both the "nudge" and "circle" gestures may begin with recognizing that the user has placed both hands palm outward in the respective template in the depth frame. The structure of tree 3500a is such that gestures 3515a-3515f do not involve "sub-gestures". That is, if a gesture requires order elements A, B, C and D, then gestures that include only order elements A, B and C are not considered in the library.

In contrast, FIG. 35B is an example tree diagram illustrating a correspondence between multiple gestures (including sub-gestures) that may occur in some embodiments. Unlike the tree structure 3500a, the structure 3500b does allow sub-gesture recognition. Specifically, each gesture 3515g-3515j includes various conditions 3510j-3510n that are in addition to those required by the various sub-gestures 3515a and 3515 b. Embodiments that allow for sub-gestures may require specifying or modifying the processes of fig. 33 and 34 so that "fulfillment" of the gesture does not occur until the frame has satisfied one template and cannot satisfy any other template. For example, after conditions 3510c and 3510g have been satisfied, the system may wait until a subsequent frame indicates that conditions 3510k and 35101 have not been fulfilled before output gesture D3515D has been recognized. In contrast, if the condition 3510k is satisfied immediately, the system may output that gesture I3515I is satisfied instead of gesture D3515D. In some embodiments, rather than outputting the most recently completed sub-gesture in the chain if there are one or more subsequent conditions met before the failure condition is reached, the system may simply reset all templates and assume that no gesture is met. For example, if condition 3510i is satisfied, but neither condition 3510m nor 3510n is fulfilled, then the system may in some cases indicate that no gesture was recognized instead of outputting gesture D3515D (in contrast, if condition 3510i is never fulfilled, the system may output gesture D3515D). Deciding whether to recognize a sub-gesture when the downstream conditions may be based on context or application designer's specifications.

Example gesture detection method-gesture reservation

In some embodiments, the user interface may be used as a "general purpose" system on which application developers implement different applications. For example, a system may have a general purpose "operating system" environment in which users interact to select developer applications to run. In these embodiments, the system may need to specify certain "base gestures" with common behavior across all applications to avoid user confusion. For example, FIG. 36 is a Venn diagram illustrating various gesture set relationships that may occur in some embodiments.

within the scope of all possible user gestures 3605, some gestures may be retained as "base" 3610. For example, one may leave the arm across one person as a generic gesture for "stop application". Thus, the base subset may be a different subset from the various application-specific gestures 3615a-3615d (ellipses 3625 indicate that there may be more application-specific gesture sets than depicted here). Thus, as a reserved "base" gesture, application developers may be advised to refrain from using the gesture in their application (and possibly leading to base functionality, such as pausing the application, if system recognition, rather than any operational intent of the developer).

in contrast, applications have no reason to be able to share a common gesture if, for example, they will not run simultaneously, specific to the context of the application. This potential overlap is represented here in part by regions 3620a and 3620 b. For example, a "nudge" gesture may fall in region 3620B and may be used to move a virtual object in application B, select an item to purchase in application C, and launch a virtual weapon in application D.

Example contextual gesture embodiments

Fig. 37A is a schematic illustration of a user at a first location 3710a before an interface system 3705 that includes different interaction intervals, which may be implemented in some embodiments. In particular, gesture recognition and system behavior may depend on the distance of the user from the display in context. Thus, as shown in fig. 37B, when the user moves 3720 to a location 3710B in the interval closer to the screen, new behavior may result and the gesture may have different effects. To this end, the system may divide its area in front into "intervals" 3715a-3715 d. Thus, in addition to reporting detected gestures to the application, the system may also report the distance of the user from the screen or the intervals 3715a-3715d in which the user appears.

The behavior of the interface may also vary with the lateral position of the user in front of the display. For example, fig. 38A is a schematic diagram and corresponding user view of a user at a central location 3810a in front of an interface system 3805 running a dynamic vanishing point selection menu that may be implemented in some embodiments. Fig. 38B is a schematic view of a user at a left position 3810B, which is 3815a relative to a center position 3810 a. FIG. 38C is a schematic view of a user at a right position 3810C, which is 3815b relative to a center position 3810C. The dashed line on the floor may reflect the interval in front of the user interface, but is also for the reader's reference only, as in some embodiments the system may simply record the user's lateral position without having to explicitly determine the interval corresponding to that position.

as the user moves laterally relative to the display (as indicated by opposing arrows 3815a and 3815 b), the displayed rendered vanishing point may be adjusted to align with the user's new position. For example, the sensor housings of the displays and the nine displays are shown in phantom in views 3820a-3820 c. Here, views 3820a-3820c are shown, which are the views seen by a person viewing the display. In this example, the user is looking at three different rooms (coincidentally, each room is substantially the same width as each secondary display). When the user is in the center position 3810a, the vanishing point for the displayed image is in the center of the display, as shown in view 3820 a. However, when the user moves to left position 3810b, the system may adjust the vanishing point to the left, as shown in view 3820b, to again appear in front of the user. Additionally, the system may obscure views of other rooms as shown to mimic real-world behavior that would occur when a user moves position between real-world rooms. In this way, the user is less attentive that they are staring at the two-dimensional display and is more likely to accept an immersive experience as an extension of their reality.

Similar to the example above, when the user moves to the right position 3810c, as shown in view 3820c, the system may adjust the vanishing point and the field of view to correspond to the user's new right position. As discussed in more detail below, because the system can identify the head position of the user, not only lateral movement but also vertical movement can result in adjustment of the vanishing point and occlusion displayed to the user. In summary, this gives the user the impression of seeing the scene through the "real world", not just looking at the flat display.

In some embodiments, movement of the user towards the display may result in selective enlargement of the viewing area to invite more fine-grained user interaction. For example, fig. 39A is a schematic and resulting display of a user in a first position prior to engaging a contextual focus feature at the center of the user interface 3905, which may be implemented in some embodiments. FIG. 39B is a schematic view of a user in a second position after engaging a contextual focus feature at the center of a user interface and the resulting display change as may be implemented in some embodiments. Initially, in fig. 39A, a user may stand at position 3910a in front of the display. Thus, the user may perceive a view such as 3920 a. If the user is standing in front of a series of rooms in the real world, as shown in schematic 3925a, the display will correspond to the user standing outside of all three rooms 3930 at location 3935 a.

In contrast, in fig. 39B, when the user moves 3915 forward to a position 3910B close enough to the display, the system may infer the movement as a "selection" of a displayed object, room, etc. Thus, if the room in the middle has been "selected," the system can now extend the displayed image, as shown in view 3920 b. While such an extension may mimic real-world behavior, in some embodiments, such an extension may occlude more other rooms than would occur based on the physical distance 3915 moved by the user and zoom in more of the selected rooms. Indeed, as shown in the schematic 3925b depicting the real world series of rooms 3930, the displayed view is closer to the view that appears in the real world when the user moves distance 3940 to the threshold of room 3935 b. Thus, in a real world cell, the virtual distance 3940 translation of the virtual camera rendering scene 3920b may be greater than the actual distance 3915 moved by the user. In this way, the system can be freed from the immersive nature of vanishing point distortion to allow the user greater freedom in navigating the virtual environment. Indeed, in some embodiments, all movements relative to the center position may correspond to some greater or lesser factor in the virtual world at that time. For example, a logarithmic or exponential relationship may be used when the user wishes to navigate quickly between the microscopic and planetary range frames of reference for the same-4 feet of linear motion in the real world.

To further illustrate the example of fig. 39A and 39B, fig. 40A and 40B provide a schematic of the contextual focus feature of the user interface of fig. 39 and the resulting display changes-before and after engagement, but not in the center of the user interface but instead to the sides of the user interface-that may be implemented in some embodiments. Here, when the user moves forward the real world distance 4015 from location 4010a to location 4010b, the system may adjust the view from the initial view 4020a with the vanishing point in the leftmost room to the magnified view 4020 b. View 4020a corresponds to a user standing at a virtual location 4035a relative to room 4030 as shown in view 4025 a. View 4020b corresponds to the user's view from location 4035b at the threshold of the leftmost room as the user moves location 4040 forward relative to their original location 4035a in the virtual world, as shown in view 4025 b. As discussed above, the virtual movement 4040 may be larger than the real world translation 4015.

It will be appreciated that the room structure may be used as a "home" screen from which the user may select various applications to be run. If there are more than three applications, the user may perform a "swipe" gesture (horizontally and in some embodiments vertically and in other directions) or other suitable gesture to present additional room/application pairings. In some embodiments, a room may represent a "folder" that contains several applications as objects in the room. By approaching the room, the room can be enlarged and the user can then run one of the applications by pointing to the corresponding object.

Example applications

fig. 41 is a schematic diagram of a user interface system 4105 that runs an example "trigger point" based shooting application that may be implemented in some embodiments. For example, the room of fig. 40 may correspond to three different applications. By moving forward to the leftmost room and performing a "nudge" gesture, the user 4125 can initiate the video game where they must use a "trigger point" gesture to explode the hot and cold cube 4130 with corresponding cold 4110a and hot 4110b explosions from the user's left and right hands 4120a and 4120b, respectively. The system may infer the direction of the explosion by extrapolating from the directions 4115a, 4115b pointed at by the user's index finger. Performing the exit gesture may end the game and return the user to the room selection menu.

As another example, FIG. 42 is a schematic diagram of a user interface system 4205 running an example calligraphy training application that may be implemented in some embodiments. Here, the user 4225 moves their hand 4220 (as part of a "trigger point" or simply a "point" gesture) to guide a brush stroke in a handwriting application. For example, as the user moves the virtual brush corresponding to the direction 4215 they are pointing, the unfilled portion 4210a of the character 4210b may be filled (e.g., when the user completes the character stroke pattern in the correct stroke order). As with the cube game described above, the user may select the calligraphic application by approaching the corresponding room and performing a flick gesture from the room menu in front.

As another example application, FIG. 43 is a series of schematic diagrams of a user interface system 4305 running an example obstacle course application that may be implemented in some embodiments. When the user stands at the center position 4310a, they may perceive a route having a center vanishing point in the view 4320 a. The virtual footprints 4330a and 4330b or any other suitable avatar may be used to reflect the user's risk of collision in the obstacle environment. Once the application starts, the virtual world may begin to move past the user (e.g., the user may begin to "run in place" as their avatar begins to progress forward through the virtual environment).

In view 4320b, the user has jumped 4310b from side position 4315 to avoid collision with an impending obstacle 4325 a. Note that the vanishing point has been adjusted to reflect the new head position of the user at the top right of the display. In contrast, in view 4320c, the user squats at the location of the left side 4310c to avoid an approaching obstacle 4325b in view 4320 c. Likewise, the system has adjusted the vanishing point and perspective in view 4320c based on the user's new head position. As this example indicates, the system may continue to monitor certain user characteristics in parallel with gesture detection. For example, the computer system may continually record the user's head position and orientation in order to adjust the rendered view, even though the system continues to recognize various user gestures.

additionally, it will be appreciated that while many example applications have been described with respect to the embodiment of FIG. 10, these applications may also be used with other disclosed embodiments, such as those of FIGS. 8 and 9.

Computer system

FIG. 44 is a block diagram of an example computer system that may be used in conjunction with some embodiments. Computing system 4400 may include an interconnect 4405 connecting several components, such as, for example, one or more processors 4410, one or more memory components 4415, one or more input/output systems 4420, one or more storage systems 4425, one or more network adapters 4430. The interconnect 4405 may be, for example, one or more bridges, traces (traces), buses (e.g., ISA, SCSI, PCI, I2C, firewire bus, etc.), wires, adapters, or controllers.

the one or more processors 4410 may include, for example, an Inte1TM processor chip, a math co-processor, a graphics processor, or the like. The one or more memory components 4415 may include, for example, volatile memory (RAM, SRAM, DRAM, etc.), non-volatile memory (EPROM, ROM, flash memory, etc.), or the like. The one or more input/output devices 4420 may include, for example, a display device, a keyboard, a pointing device, a touch screen device, and/or the like. The one or more storage devices 4425 may include, for example, cloud-based storage, removable USB storage, disk drives, and the like. In some systems, memory component 4415 and storage device 4425 may be the same component. The network adapter 4430 may comprise, for example, a wired network interface, a wireless interface, a Bluetooth adapter, a line-of-sight interface, or the like.

It will be appreciated that only some of those depicted in fig. 44, alternative ones to those depicted in fig. 44, or additional ones in addition to those depicted in fig. 44 may be present in some embodiments. Similarly, components may be combined or used for dual purposes in some systems. The components may be implemented using dedicated hardwired circuitry, such as, for example, one or more ASICs, PLDs, FPGAs, or the like. Thus, some embodiments may be implemented in programmable circuitry (e.g., one or more microprocessors) programmed with software and/or firmware, or in dedicated hardwired (non-programmable) circuitry altogether, or in a combination of these forms, for example.

The data structures and message structures may be stored or transmitted via the network adapter 4430 via a data transmission medium (e.g., a signal on a communications link) in some embodiments. The transmission may occur over various media such as the internet, a local area network, a wide area network, or a point-to-point dial-up connection, among others. Thus, "computer-readable media" may include computer-readable storage media (e.g., "non-transitory" computer-readable media) and computer-readable transmission media.

the one or more memory components 4415 and the one or more storage devices 4425 may be computer-readable storage media. In some embodiments, one or more memory components 4415 or one or more storage devices 4425 may store instructions that may perform or cause the performance of various operations discussed herein. In some embodiments, the instructions stored in memory 4415 may be implemented as software and/or firmware. These instructions may be to be used to perform operations on the one or more processors 4410 to perform the processes described herein. In some embodiments, such instructions may be provided to the one or more processors 4410 by, for example, downloading the instructions from another system via the network adapter 4430.

remarks for note

The foregoing description and drawings are illustrative. Accordingly, neither the description nor the drawings should be construed as limiting the disclosure. For example, headings or subheadings are provided for the convenience of the reader only, for ease of understanding. Thus, headings or sub-headings should not be construed as limiting the scope of the disclosure, e.g., by grouping features presented in a particular order or together merely as an aid to understanding. Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. In case of conflict, this document, including any definitions provided herein, will control. Recitation of one or more synonyms herein does not exclude the use of other synonyms. The use of examples anywhere in this specification (including examples of any terms discussed herein) is illustrative only and is not intended to further limit the scope and meaning of the disclosure or any example terms.

Similarly, although specifically presented in the figures herein, those skilled in the art will appreciate that the actual data structures used to store information may vary from those shown. For example, the data structures may be organized differently, may contain more or less information than shown, may be compressed and/or encrypted, and so on. The figures and disclosure may omit general or well-known details to avoid obscuring the description. Similarly, the figures may depict a particular series of operations to facilitate understanding, which are merely examples of a broader set of such operations. Accordingly, it will be readily appreciated that additional, alternative, or fewer operations may often be used to achieve the same purpose or effect as depicted in some of the flowcharts. For example, the data may be encrypted, although not so presented in the flow chart, the items may be considered in a different loop pattern ("for" loop, "while" loop, etc.), or ordered in a different manner to achieve the same or similar effect, etc.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Thus, the phrase "in one embodiment" in various places throughout the specification does not necessarily refer to the same embodiment in each of those different places. Separate or alternative embodiments may not be mutually exclusive of the other embodiments. It will be recognized that various modifications may be made without departing from the scope of the embodiments.

64页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：增强/虚拟现实空间音频/视频的流式传输

System and method for gesture-based interaction

相关技术

网友询问留言