Playback switching between audio devices

文档序号：1926875 发布日期：2021-12-03 浏览：12次中文

阅读说明：本技术 音频设备之间的回放转换 (Playback switching between audio devices ) 是由戴恩·威尔伯丁南宥珍塔德奥·T·塔卢斯科尔·哈里斯帕特里克·德维纳尼古拉斯·A· 于 2020-02-28 设计创作，主要内容包括：本文描述的示例涉及在诸如“智能”耳机、耳塞和手持扬声器之类的便携式回放设备与基于区的媒体回放系统的回放设备之间转换回放会话。当在位置(例如,从在家到外出,或反之亦然)之间或在收听范例(例如,音频内容的个性化或大声回放)之间转换时,示例性技术促进回放的连续性。示例实施方式包括：检测交换触发；确定源回放设备和目标回放设备；以及在源回放设备和目标回放设备之间执行回放会话交换。(Examples described herein relate to transitioning playback sessions between portable playback devices, such as "smart" headphones, earpieces, and handheld speakers, and playback devices of zone-based media playback systems. When transitioning between locations (e.g., from home to out, or vice versa) or between listening paradigms (e.g., personalized or loud playback of audio content), the exemplary techniques facilitate continuity of playback. Example embodiments include: detecting a switching trigger; determining a source playback device and a target playback device; and performing a playback session exchange between the source playback device and the target playback device.)

1. A method, comprising:

detecting a playback session exchange trigger corresponding to a playback session when a first playback device plays back audio content during the playback session;

determining (a) one or more source playback devices and (b) one or more target playback devices, the one or more source playback devices including the first playback device, the one or more target playback devices including a second playback device; and

transitioning the playback session from the determined one or more source playback devices to the one or more target playback devices based on the playback session exchange trigger.

2. The method of claim 1, wherein transitioning the playback session from the determined one or more source playback devices to the one or more target playback devices comprises:

forming a synchronization group comprising the first playback device and the second playback device such that the first playback device and the second playback device play back the audio content in synchronization; and

muting the first playback device.

3. The method of claim 1 or 2, wherein transitioning the playback session from the determined one or more source playback devices to the one or more target playback devices comprises:

Sending an instruction to a cloud queue server to transfer the playback session from the first playback device to the second playback device, wherein the cloud queue server transfers the playback session to the second playback device based on the instruction.

4. The method of any of the preceding claims, wherein transitioning the playback session from the determined one or more source playback devices to the one or more target playback devices comprises:

transmitting data representing (i) a Uniform Resource Identifier (URI) indicative of a source of the audio content and (ii) an offset within the audio content to the second playback device, wherein the second playback device streams the audio content from the source of the audio content and begins playing back the audio content at the offset, and wherein the first playback device stops playing back the audio content at the offset.

5. The method of any preceding claim, wherein the first playback device comprises:

at least one processor;

a data storage device;

one or more amplifiers;

one or more transducers;

one or more batteries configured to drive the one or more amplifiers and the one or more transducers; and

One or more housings carrying the at least one processor, the data storage device, the one or more amplifiers, the one or more transducers, and the one or more batteries, wherein the one or more housings are formed as at least one of (a) headphones or (b) earplugs.

6. The method of any preceding claim, wherein the first playback device is paired with a control device via a first type of wireless connection, and wherein the first playback device is connected to the second playback device via the first type of wireless connection and a second type of wireless connection between the control device and the second playback device.

7. The method of any preceding claim, wherein detecting the playback session exchange trigger comprises:

an input representing a command to exchange the playback session is detected via a user interface of a control device.

8. The method of any preceding claim, wherein detecting input representing a command to exchange the playback session comprises:

detecting a touch to a touch sensitive area on the first playback device and holding an input, wherein the touch input performs a first action, the first action not being a swap.

9. The method of any preceding claim, wherein detecting input representing a command to exchange the playback session comprises:

detecting a touch and hold input to a touch sensitive area on the first playback device, wherein touch input performs a first action and touch and hold performs a group action, and wherein the first action is not a swap.

10. The method of any of claims 1-5, wherein the first playback device is paired with a bridge device via a first type of wireless connection, and wherein the first playback device is connected to the second playback device via the first type of wireless connection and a second type of wireless connection between the bridge device and the second playback device.

11. The method of claim 10, wherein detecting the playback session exchange trigger comprises:

detecting, via a user interface of the bridge device, an input representing a command to exchange the playback session.

12. The method of claim 10 or 11, wherein the bridging device comprises a circular housing, and wherein the method further comprises:

detecting rotation of the circular housing; and

Adjusting a playback volume of the first playback device in proportion to the rotation.

13. The method of any of claims 1-12, wherein the first playback device comprises:

at least one processor;

a data storage device;

one or more amplifiers;

one or more transducers;

one or more batteries configured to drive the one or more amplifiers and the one or more transducers; and

a housing carrying the at least one processor, the data storage device, the one or more amplifiers, the one or more transducers, and the one or more batteries, wherein the housing is formed as a handheld speaker.

14. The method of claim 13, wherein detecting the playback session exchange trigger comprises:

detecting that the housing is placed in a device base.

15. The method of any of the preceding claims, wherein the second playback device does not include a battery and draws current from a wall power source.

16. The method of any preceding claim, wherein detecting the playback session exchange trigger comprises:

detecting proximity of the second playback device to the first playback device.

17. The method of any of the preceding claims, wherein determining the one or more target playback devices comprises:

detecting proximity of the second playback device to the first playback device.

18. The method of any of the preceding claims, wherein the one or more target playback devices further comprise a third playback device, and wherein determining the one or more target playback devices comprises:

determining that the third playback device and the second playback device are configured as a synchrony group.

19. A system configured to perform the method of any one of claims 1-18.

20. An apparatus configured to perform the method of any one of claims 1-18.

21. A tangible, non-transitory computer-readable medium having stored therein instructions executable by one or more processors to perform the method of any one of claims 1-18.

22. A portable playback device comprising:

at least one processor;

a network interface;

one or more amplifiers;

one or more transducers;

one or more batteries configured to drive the one or more amplifiers and the one or more transducers; and

One or more housings formed as (a) an earbud or (b) a headset, the one or more housings carrying the at least one processor, the network interface, the one or more amplifiers, the one or more transducers, and the one or more batteries, and a data storage device having stored therein instructions executable by the one or more processors to perform the method of any of claims 1-18.

Technical Field

The present disclosure relates to consumer products, and more particularly, to methods, systems, products, features, services, and other elements related to media playback or some aspect thereof.

Background

The options of accessing and listening to digital audio in a playback setting have been limited until the 2002 SONOS corporation began to develop new types of playback systems. Sonos then filed one of its first patent applications entitled "Method for Synchronizing Audio Playback Multiple network Devices" in 2003 and began offering its first media Playback system for sale in 2005. Sonos wireless home sound systems enable people to experience music from many sources via one or more networked playback devices. Through a software control application installed on a controller (e.g., smartphone, tablet, computer, voice input device), one can play the content one wants in any room with a networked playback device. Media content (e.g., songs, podcasts, video sounds) may be streamed to the playback device such that each room with the playback device may play back corresponding different media content. Additionally, rooms can be grouped together for synchronized playback of the same media content, and/or the same media content can be listened to in all rooms simultaneously.

Drawings

The features, aspects, and advantages of the presently disclosed technology may be better understood with regard to the following description, appended claims, and accompanying drawings, which are set forth below. Persons skilled in the relevant art will appreciate that the features shown in the drawings are for illustrative purposes and that variations including different and/or additional features and arrangements thereof are possible.

FIG. 1A is a partial cut-away view of an environment having a media playback system configured in accordance with aspects of the disclosed technology.

Fig. 1B is a schematic diagram of the media playback system and one or more networks of fig. 1A.

Fig. 1C is a block diagram of a playback device.

Fig. 1D is a block diagram of a playback device.

Fig. 1E is a block diagram of a network microphone apparatus.

Fig. 1F is a block diagram of a network microphone apparatus.

Fig. 1G is a block diagram of a playback device.

Fig. 1H is a partial schematic diagram of the control apparatus.

1-I, IJ, IK, and 1L are schematic diagrams of corresponding media playback system zones.

Fig. 1M is a schematic diagram of a media playback system region.

Fig. 2A is a front isometric view of a playback device configured in accordance with aspects of the disclosed technology.

Fig. 2B is a front isometric view of the playback device of fig. 3A without the grille.

Fig. 2C is an exploded view of the playback device of fig. 2A.

Fig. 3A is a front view of a network microphone apparatus configured in accordance with aspects of the disclosed technology.

Fig. 3B is a side isometric view of the network microphone apparatus of fig. 3A.

Fig. 3C is an exploded view of the network microphone apparatus of fig. 3A and 3B.

Fig. 3D is an enlarged view of a portion of fig. 3B.

Fig. 4A, 4B, 4C, and 4D are schematic diagrams of a control device at various stages of operation, in accordance with aspects of the disclosed technology.

Fig. 5 is a front view of the control device.

Fig. 6 is a message flow diagram of a media playback system.

Fig. 7A is a partial cut-away view of an environment with a media playback system configured in accordance with aspects of the disclosed technology.

Fig. 7B is a block diagram of a portable playback device configured in accordance with aspects of the disclosed technology.

Fig. 7C is a front isometric view of a portable playback device implemented as a headset configured in accordance with aspects of the disclosed technology.

Fig. 7D is a front isometric view of a portable playback device implemented as an earbud configured in accordance with aspects of the disclosed technology.

Fig. 7E is a front isometric view of a portable playback device configured in accordance with aspects of the disclosed technology.

Fig. 7F is a front isometric view of a portable playback device having a device base configured in accordance with aspects of the disclosed technology.

Fig. 7G is a schematic diagram illustrating an example pairing configuration between a portable playback device and a control device.

FIG. 8A is a schematic diagram illustrating an example push exchange in accordance with aspects of the disclosed technology.

FIG. 8B is a schematic diagram illustrating an example pull swap in accordance with aspects of the disclosed technology.

FIG. 8C is a schematic diagram illustrating an example push exchange in accordance with aspects of the disclosed technology.

FIG. 9 is a schematic diagram illustrating an example audio-based recognition technique in accordance with aspects of the disclosed technology.

FIG. 10 is a schematic diagram illustrating an example control scheme in accordance with aspects of the disclosed technology.

FIG. 11 is a schematic diagram illustrating an example feedback technique in accordance with aspects of the disclosed technology.

Fig. 12A and 12B are example messaging diagrams illustrating example playback session exchange techniques.

FIG. 13A is a method flow diagram illustrating an example exchange pull technique in accordance with aspects of the disclosed technology.

FIG. 13B is a method flow diagram illustrating an example swap push technique in accordance with aspects of the disclosed technology.

Fig. 14 is a method flow diagram illustrating an example home theater exchange technique in accordance with aspects of the disclosed technology.

Fig. 15 is a method flow diagram illustrating techniques to facilitate playback session exchange in accordance with aspects of the disclosed technology.

Fig. 16A is a schematic diagram illustrating an example pairing configuration between a portable playback device and a bridge device.

Fig. 16B is a block diagram of a bridging device configured in accordance with aspects of the disclosed technology.

Fig. 16C is a front isometric view of a bridging device configured in accordance with aspects of the disclosed technology.

Fig. 16D is a diagram of a touch-sensitive area implemented in a bridging device configured in accordance with aspects of the disclosed technology.

Fig. 16E is a front view of a bridging device configured in accordance with aspects of the disclosed technology.

17A, 17B, 17C, 17D, 17E, and 17F are schematic diagrams of a bridging device user interface in various stages of operation in accordance with aspects of the disclosed technology.

Fig. 18A is a view of an example arrangement between a bridge device and a device base.

Fig. 18B is a view of an example arrangement between a portable playback device, a bridge device, and a device base.

Fig. 18C is a view of an example arrangement between a first portable playback device, a bridge device, a second portable playback device, and a device base.

19A, 19B, and 19C are schematic diagrams of a control device user interface in various stages of operation according to aspects of the disclosed technology.

20A, 20B, and 20C are example messaging diagrams illustrating example playback session exchange techniques.

21A, 21B, and 21C are schematic diagrams of a control device user interface in various stages of operation according to aspects of the disclosed technology.

22A, 22B, 22C, and 22D are diagrams illustrating example proximity-based playback session exchanges.

Fig. 23A is a front isometric view of an earplug configured in accordance with aspects of the disclosed technology.

Fig. 23B is a bottom view of a charging cartridge configured in accordance with aspects of the disclosed technology.

Fig. 23C is a top view of the charging box.

Fig. 23D is a first side view of the charging box.

Fig. 23E is a second side view of the charging box.

Fig. 23F is a front isometric view of an earplug showing an exemplary arrangement of a charging cartridge.

Fig. 23G is an isometric view of an earplug.

Fig. 23H is a first side view of the earplug.

Fig. 23I is a second side view of the earplug.

Fig. 23J is a third side view of the earplug.

Fig. 23K is a fourth side view of the earplug.

Fig. 23L is a fifth side view of the earplug.

Fig. 23M is a sixth side view of the earplug.

Fig. 24A is a front isometric view of a portable playback device implemented as a handheld speaker configured in accordance with aspects of the disclosed technology.

Fig. 24B is a side view of the portable playback apparatus.

Fig. 24C is a top view of the portable playback device.

Fig. 24D is a bottom view of the portable playback device.

Fig. 24E is a front isometric view showing a portable playback device with an exemplary arrangement of device bases.

Fig. 24F is a front isometric view of the portable playback device showing exemplary user input to the portable playback device.

Fig. 25A is a front view of a headset configured in accordance with aspects of the disclosed technology.

Fig. 25B is a first side view of the headset.

Fig. 25C is a second side view of the headset.

Fig. 26A is a front view of a headset configured in accordance with aspects of the disclosed technology.

Fig. 26B is a first side view of the earphone.

Fig. 26C is a second side view of the headset.

The drawings are for purposes of illustrating example embodiments, but one of ordinary skill in the art will appreciate that the techniques disclosed herein are not limited to the arrangements and/or instrumentality shown in the drawings.

Detailed Description

I. Overview

Example techniques described herein relate to transitioning a playback session between a wearable playback device, such as a "smart" headset and earbuds, and a playback device of a zone-based media playback system. Other example techniques involve transitioning a playback session between a portable (e.g., battery-powered, portable) playback device and a playback device of a zone-based media playback system. This conversion is referred to herein as an "exchange" or "playback session exchange". Such exemplary techniques facilitate continuity of playback when transitioning between locations (e.g., from home to out, or vice versa) or between listening paradigms (e.g., personalized or loud playback). Further, some example techniques may reduce the degree of user input (or other user engagement) involved in switching playback as compared to some other techniques.

In an illustrative example, a user starts a playback session on an exemplary headset while out. For example, a user begins to use an earpiece paired with a mobile device (e.g., a smartphone) over a wireless connection (e.g., 802.15)Or 802.11, etc.) listen to KEXP Seattle. In this example, the KEXP radio is streamed to the mobile device via the internet.

After home, the user may wish to continue listening to the KEXP radio for loud playback. To initiate a playback session exchange from the earpieces to a playback device in the kitchen, the user may provide an input to the earpieces. Since the earpieces are engaged in a playback session, the input designates the earpieces as the source of the playback session exchange. The target of the exchange (i.e., kitchen area) may have been previously specified in a predefined exchange pair with earplugs, or may be determined using proximity detection techniques (e.g., audio chirping) after input, as described in further detail herein. The earplugs and/or the mobile device perform a playback session exchange with the kitchen area and continue uninterrupted playback of the KEXP radio loud on the playback device in the kitchen.

In another illustrative example, a user may begin a playback session on an exemplary portable speaker. For example, a user starts listening to a WBEZ Chicago using a handheld speaker in a restaurant. In this example, the WBEZ Chicago is streamed to the handheld speakers over a home lan via the internet. To meditate, the user carries the handheld speaker to the living room and asks the voice assistant service to play meditation music. The handheld speaker plays the confirmation from the voice assistant service and begins playing back the curated meditation playlist from the streaming audio service.

When playing a well planned meditation playlist, the user's friends enter the living room and advise the user to view a new Childish Gambino track, which is played via the control application on their smartphone. To initiate a playback session exchange from the smartphone to the handheld speaker, the friend brings the smartphone in proximity to the handheld speaker to initiate a Near Field Communication (NFC) exchange between the smartphone and the handheld speaker. This exchange designates the smartphone as the source of the playback session exchange and the handheld speaker as the target. The smartphone performs a playback session exchange with the handheld speaker and continues to play the Childish Gambino track aloud over the handheld speaker without interruption.

To enjoy the childishgambono track using a larger power amplifier and/or larger transducer, the user initiates a playback session exchange from the handheld speaker to the playback device in the living room by providing input to the handheld speaker. The input designates the handheld speaker as the source of the playback session exchange. The handheld speaker automatically designates the living room zone as a swap target based on the detected proximity of the handheld speaker to the living room zone. The handheld speaker performs a playback session exchange with the living room area and continues uninterrupted playback of the Childish Gambino track aloud on the playback device in the living room.

In a third illustrative example, at night, a user may begin a playback session in the bedroom on a sound bar device that plays back audio content from a television. Wishing to turn the volume down so as not to disturb their partner's attempt to have their baby sleep in the adjacent room, the user initiates a playback session exchange from the soundbar device to their handheld speaker on the bedside table. Because the handheld speaker is physically closer to the user, the user can comfortably hear audio from the television at a lower volume.

After the infant is allowed to fall asleep, the companion enters the bedroom and finds that the user is asleep. To initiate a playback session exchange from the handheld speaker to the pair of headphones, the companion can provide input to the headphones. The input designates the headset as a target for the playback session exchange. The source of the exchange (i.e., the handheld speaker) is determined based on context (i.e., based on the active playback session). The handheld speaker performs a playback session exchange with the headset and continues uninterrupted playback of television audio loud on the headset.

As described above, example techniques described herein relate to playback session exchange. Example embodiments include: detecting a switching trigger; determining a source playback device and a target playback device; and performing a playback session exchange between the source playback device and the target playback device.

While some examples described herein may relate to functions performed by a given actor (e.g., "user," "listener," and/or other entity), it should be understood that this is for explanation purposes only. The claims should not be construed as requiring any such example actor to perform an action unless the claim's own language expressly requires such language.

Further, some functions are described herein as being performed "based on" or "in response to" another element or function. "based on" is to be understood as one element or function being related to another function or element. "responsive to" is to be understood as one element or function being the requisite result of another function or element. For the sake of brevity, a function is generally described as being based on another function when a chain of functions exists; however, such disclosure should be understood to disclose any type of functional relationship.

In the drawings, like reference numbers identify substantially similar and/or identical elements. To facilitate discussion of any particular element, one or more of the most significant digits of a reference number refer to the figure in which that element is first introduced. For example, element 110a is first introduced and discussed with reference to FIG. 1A. Many of the details, dimensions, angles, and other features shown in the figures are merely illustrative of particular embodiments of the disclosed technology. Accordingly, other embodiments may have other details, dimensions, angles, and features without departing from the spirit or scope of the disclosure. In addition, one of ordinary skill in the art will understand that other embodiments of the various disclosed techniques may be practiced without several of the details described below.

Suitable operating Environment

Fig. 1A is a partial cut-away view of a media playback system 100 distributed in an environment 101 (e.g., a house). Media playback system 100 includes one or more playback devices 110 (identified as playback devices 110a-110n, respectively), one or more network microphone devices ("NMDs") 120 (identified as NMDs 120a-120c, respectively), and one or more control devices 130 (identified as control devices 130a and 130b, respectively).

As used herein, the term "playback device" may generally refer to a network device configured to receive, process, and output data of a media playback system. For example, the playback device may be a network device that receives and processes audio content. In some embodiments, the playback device includes one or more transducers or speakers powered by one or more amplifiers. However, in other embodiments, the playback device includes one (or neither) of the speaker and the amplifier. For example, the playback device may include one or more amplifiers configured to drive one or more speakers external to the playback device via respective wires or cables.

Further, as used herein, the term NMD (i.e., "network microphone device") may generally refer to a network device configured for audio detection. In some embodiments, the NMD is a standalone device configured primarily for audio detection. In other embodiments, the NMD is incorporated into the playback device (or vice versa).

The term "control device" may generally refer to a network device configured to perform functions related to facilitating user access, control, and configuration of the media playback system 100.

Each of the playback devices 110 is configured to receive audio signals or data from one or more media sources (e.g., one or more remote servers, one or more local devices) and play back the received audio signals or data as sound. The one or more NMDs 120 are configured to receive spoken commands and the one or more control devices 130 are configured to receive user input. In response to the received spoken commands and/or user input, the media playback system 100 may play back audio via one or more playback devices 110. In some embodiments, playback device 110 is configured to begin playback of media content in response to a trigger. For example, the one or more playback devices 110 may be configured to play back a morning playlist upon detecting an associated trigger condition (e.g., presence of a user in the kitchen, detecting coffee machine operation). In some embodiments, for example, the media playback system 100 is configured to play back audio from a first playback device (e.g., playback device 100a) in synchronization with a second playback device (e.g., playback device 100 b). The interaction between the playback device 110, the NMD 120, and/or the control device 130 of the media playback system 100 configured according to various embodiments of the present disclosure is described in more detail below with reference to fig. 1B-6.

In the embodiment shown in fig. 1A, the environment 101 comprises a home having a plurality of rooms, spaces, and/or playback zones, including (clockwise from top left) a main bathroom 101A, a main bedroom 101b, a second bedroom 101c, a family or study 101d, an office 101e, a living room 101f, a restaurant 101g, a kitchen 101h, and an outdoor yard 101 i. Although certain embodiments and examples are described below in the context of a residential environment, the techniques described herein may be implemented in other types of environments. In some embodiments, for example, the media playback system 100 may be implemented in one or more commercial settings (e.g., a restaurant, shopping mall, airport, hotel, retail store, or other store), one or more vehicles (e.g., a sport-utility vehicle, bus, automobile, boat, airplane), multiple environments (e.g., a combination of home and vehicle environments), and/or other suitable environments where multi-zone audio may be desired.

The media playback system 100 may include one or more playback zones, some of which may correspond to rooms in the environment 101. The media playback system 100 may be established with one or more playback zones, after which additional zones may be added or removed to form, for example, the configuration shown in fig. 1A. Each zone may be named according to a different room or space (e.g., office 101e, main bathroom 101a, main bedroom 101b, second bedroom 101c, kitchen 101h, dining room 101g, living room 101f, and/or patio 101 i). In some aspects, a single playback zone may include multiple rooms or spaces. In some aspects, a single room or space may include multiple playback zones.

In the embodiment shown in fig. 1A, the main bathroom 101A, the second bedroom 101c, the office 101e, the living room 101f, the dining room 101g, the kitchen 101h, and the outdoor yard 101i each include one playback device 110, and the main bedroom 101b and the study 101d include a plurality of playback devices 110. In master bedroom 101b, playback devices 1101 and 110m may be configured to play back audio content, for example, in synchronization with respective ones of playback devices 110, bundled playback zones, merged playback devices, and/or any combination thereof. Similarly, in the study 101d, the playback devices 110h-110j can be configured to play back audio content, for example, in synchronization with individual ones of the playback devices 110, one or more bundled playback devices, and/or one or more consolidated playback devices. Additional details regarding bound and merged playback devices are described below with reference to FIGS. 1B and 1E, and FIGS. 1-I-1M.

In some aspects, one or more playback zones in the environment 101 may each play different audio content. For example, a user may be barbecuing in a courtyard 101i and listening to hip-hop music played by the playback device 110c, while another user is preparing food in the kitchen 101h and listening to classical music played by the playback device 110 b. In another example, the playback zone may play the same audio content in synchronization with another playback zone. For example, the user may be listening in the office 101e to the playback device 110f playing the same piece of music as hip-hop music played back by the playback device 110c on the courtyard 101 i. In some aspects, the playback devices 110c and 110f play back hip-hop music synchronously such that the user perceives the audio content to be played seamlessly (or at least substantially seamlessly) as it moves between different playback zones.

a. Suitable media playback system

Fig. 1B is a schematic diagram of the media playback system 100 and the cloud network 102. For ease of illustration, certain devices of the media playback system 100 and the cloud network 102 are omitted from fig. 1B. One or more communication links 103 (hereinafter referred to as "links 103") communicatively couple media playback system 100 and cloud network 102.

Links 103 may include, for example, one or more wired networks, one or more wireless networks, one or more Wide Area Networks (WANs), one or more Local Area Networks (LANs), one or more domain networks (PANs), one or more telecommunications networks (e.g., one or more Global System for Mobile (GSM) networks, Code Division Multiple Access (CDMA) networks, Long Term Evolution (LTE) networks, 5G communication network networks, and/or other suitable data transfer protocol networks), and so forth. The cloud network 102 is configured to deliver media content (e.g., audio content, video content, photos, social media content) to the media playback system 100 in response to a request sent from the media playback system 100 via the link 103. In some embodiments, the cloud network 102 is further configured to receive data (e.g., voice input data) from the media playback system 100 and to send commands and/or media content to the media playback system 100 accordingly.

Cloud network 102 includes computing devices 106 (identified as first computing device 106a, second computing device 106b, and third computing device 106c, respectively). Computing devices 106 may include various computers or servers, such as media streaming services servers, voice services servers, social media servers, media playback system control servers, and the like that store audio and/or other media content. In some embodiments, one or more computing devices 106 comprise modules of a single computer or server. In certain embodiments, one or more computing devices 106 include one or more modules, computers, and/or servers. Further, although cloud network 102 is described above in the context of a single cloud network, in some embodiments, cloud network 102 includes a plurality of cloud networks including communicatively coupled computing devices. Further, although cloud network 102 is illustrated in fig. 1B as having three computing devices 106, in some embodiments, cloud network 102 includes less (or more) than three computing devices 106.

The media playback system 100 is configured to receive media content from the network 102 via a link 103. The received media content may include, for example, a Uniform Resource Identifier (URI) and/or a Uniform Resource Locator (URL). For example, in some examples, the media playback system 100 may stream, download, or otherwise obtain data from a URI or URL corresponding to the received media content. Network 104 communicatively couples link 103 with at least a portion of the devices of media playback system 100 (e.g., one or more of playback device 110, NMD 120, and/or control device 130). The network 104 may include, for example, a wireless network (e.g., a WiFi network, bluetooth, Z-Wave network, ZigBee, and/or other suitable wireless communication protocol network) and/or a wired network (e.g., a network including ethernet, Universal Serial Bus (USB), and/or other suitable wired communication). As one of ordinary skill in the art will appreciate, as used herein, "WiFi" may refer to several different communication protocols that transmit at 2.4 megahertz (GHz), 5GHz, and/or other suitable frequencies, including for example Institute of Electrical and Electronics Engineers (IEEE)802.11a, 802.11b, 802.11g, 802.11n, 802.11ac, 802.11ad, 802.11af, 802.11ah, 802.11ai, 802.11aj, 802.11aq, 802.11ax, 802.11ay, 802.15, and so forth.

In some embodiments, the network 104 includes a private communication network that the media playback system 100 uses to send messages between various devices and/or to and from media content sources (e.g., one or more computing devices 106). In some embodiments, the network 104 is configured to be accessible only to devices in the media playback system 100, thereby reducing interference and contention with other home devices. In some examples, the private communication network is implemented as a mesh network, where devices in the media playback system form nodes in the mesh network. One or more root nodes of the mesh network then connect the mesh network to a home WiFi network that operates in parallel with the mesh network.

However, in other embodiments, the network 104 comprises an existing home communication network (e.g., a home WiFi network). In some embodiments, link 103 and network 104 comprise one or more of the same network. In some aspects, for example, link 103 and network 104 comprise a telecommunications network (e.g., an LTE network, a 5G network). Further, in some embodiments, the media playback system 100 is implemented without the network 104, and devices comprising the media playback system 100 may communicate with one another, e.g., via one or more direct connections, a PAN, a telecommunications network, and/or other suitable communication links.

In some embodiments, audio content sources may be added to or removed from the media playback system 100 periodically. For example, in some embodiments, the media playback system 100 indexes media items as one or more media content sources are updated, added to, and/or removed from the media playback system 100. The media playback system 100 may scan some or all of the identifiable media items in folders and/or directories accessible to the playback device 110 and generate or update a media content database that includes metadata (e.g., title, artist, album, track length) and other associated information (e.g., URI, URL) for each identified media item found. For example, in some embodiments, the media content database is stored on one or more of the playback device 110, the network microphone device 120, and/or the control device 130.

In the embodiment shown in fig. 1B, playback devices 1101 and 110m comprise group 107 a. Playback devices 1101 and 110m may be temporarily or permanently placed in different rooms in the home and grouped together in group 107a based on user input received at control device 130a and/or another control device 130 in media playback system 100. When arranged in group 107a, playback devices 1101 and 110m may be configured to synchronously playback the same or similar audio content from one or more audio content sources. In some embodiments, for example, group 107a includes a bonded region where playback devices 1101 and 110m include left and right audio channels, respectively, of multi-channel audio content to create or enhance a stereo effect of the audio content. In some embodiments, group 107a includes additional playback devices 110. However, in other embodiments, the media playback system 100 omits the arrangement of the groups 107a and/or other groupings of the playback devices 110. Additional details regarding the group and other arrangements of playback devices are described in more detail below with reference to fig. 1-I through IM.

The media playback system 100 includes NMDs 120a and 120d, each including one or more microphones configured to receive speech utterances from a user. In the embodiment shown in fig. 1B, the NMD 120a is a standalone device and the NMD 120d is integrated into the playback device 110 n. The NMD 120a is configured to receive voice input 121 from a user 123, for example. In some embodiments, the NMD 120a sends data associated with the received voice input 121 to a Voice Assistant Service (VAS) configured to (i) process the received voice input data, and (ii) send a corresponding command to the media playback system 100. In some aspects, for example, computing device 106c includes a VAS (e.g., byOne or more operational VASs) and/or a server. The computing device 106c may receive voice input data from the NMD 120a via the network 104 and the link 103. In response to receiving the voice input data, computing device 106c processes the voice input data (i.e., "play the Hey Jude of the cappuccino band"), and determines that the processed voice input includes a command to play a song (e.g., "Hey Jude"). Accordingly, computing device 106c sends a command to media playback system 100 to play back "Hey Jude" of the cappuccino band from an appropriate media service on one or more playback devices 110 (e.g., via one or more computing devices 106).

b. Suitable playback device

Fig. 1C is a block diagram of a playback device 110a that includes an input/output 111. Input/output 111 may include analog I/O111 a (e.g., one or more wires, cables, and/or other suitable communication links configured to carry analog signals) and/or digital I/O111 b (e.g., one or more wires, cables, or other suitable communication links configured to carry digital signals). In some embodiments, analog I/O111 a is an audio line-in connection, including, for example, an auto-detect 3.5mm audio line-in connection. In some embodiments, digital I/O111 b comprises a sony/philips digital interface format (S/PDIF) communication interface and/or cable and/or toshiba link (TOSLINK) cable. In some embodiments, digital I/O111 b includes a high-definition multimedia interface (HDMI) interface and/or cable. In some embodiments, digital I/O111 b includes one or more wireless communication links including, for example, Radio Frequency (RF), infrared, WiFi, bluetooth, or other suitable communication protocols. In some embodiments, analog I/O111 a and digital 111b include interfaces (e.g., ports, plugs, jacks) configured to receive connectors of cables that transmit analog and digital signals, respectively, without necessarily including cables.

Playback device 110a may receive media content (e.g., audio content including music and/or other sounds) from local audio source 105 via, for example, input/output 111 (e.g., a cable, wire, PAN, bluetooth connection, ad hoc wired or wireless communication network, and/or other suitable communication link). The local audio source 105 may include, for example, a mobile device (e.g., a smartphone, a tablet, a laptop) or other suitable audio component (e.g., a television, a desktop computer, an amplifier, a phonograph, a blu-ray player, a memory storing digital media files). In some aspects, the local audio source 105 comprises a local music library on a smartphone, computer, Network Attached Storage (NAS), and/or other suitable device configured to store media files. In certain embodiments, one or more of the playback device 110, NMD 120, and/or control device 130 includes a local audio source 105. However, in other embodiments, the media playback system omits the local audio source 105 altogether. In some embodiments, playback device 110a does not include input/output 111 and receives all audio content via network 104.

The playback device 110a also includes an electronic device 112, a user interface 113 (e.g., one or more buttons, knobs, dials, touch-sensitive surfaces, displays, touch screens), and one or more transducers 114 (hereinafter "transducers 114"). The electronic device 112 is configured to receive audio from an audio source (e.g., local audio source 105) via the input/output 111 via one or more computing devices 106a-106c of the network 104 (fig. 1B), amplify the received audio, and output the amplified audio for playback via the one or more transducers 114. In some embodiments, the playback device 110a optionally includes one or more microphones 115 (e.g., single microphone, multiple microphones, microphone array) (hereinafter referred to as "microphones 115"). In some embodiments, for example, a playback device 110a having one or more optional microphones 115 may operate as an NMD configured to receive voice input from a user and perform one or more operations accordingly based on the received voice input.

In the embodiment shown in fig. 1C, the electronic device 112 includes one or more processors 112a (hereinafter "processor 112 a"), a memory 112b, a software component 112C, a network interface 112d, one or more audio processing components 112g (hereinafter "audio component 112 g"), one or more audio amplifiers 112h (hereinafter "amplifier 112 h"), and a power supply 112i (e.g., one or more power supplies, power cords, power sockets, batteries, inductor coils, Power Over Ethernet (POE) interfaces, and/or other suitable power supplies). In some embodiments, the electronic device 112 optionally includes one or more other components 112j (e.g., one or more sensors, a video display, a touch screen).

The processor 112a may include clock-driven computing components configured to process data, and the memory 112b may include a computer-readable medium (e.g., a tangible, non-transitory computer-readable medium, a data storage device loaded with one or more software components 112 c) configured to store instructions for performing various operations and/or functions. The processor 112a is configured to execute instructions stored on the memory 112b to perform one or more operations. The operations may include, for example, causing the playback device 110a to retrieve audio data from an audio source (e.g., one or more computing devices 106a-106c (fig. 1B)) and/or another playback device 110. In some embodiments, the operations further include causing the playback device 110a to transmit audio data to another playback device 110a and/or other devices (e.g., one of the NMDs 120). Some embodiments include operations to pair playback device 110a with another of the one or more playback devices 110 to enable a multi-channel audio environment (e.g., stereo pair, bonded zone).

The processor 112a may also be configured to perform operations that cause the playback device 110a to synchronize playback of the audio content with another of the one or more playback devices 110. As will be understood by those of ordinary skill in the art, during the synchronized playback of audio content on multiple playback devices, the listener will preferably not be able to perceive the delay difference between the playback device 110a and the playback of the audio content by the other one or more other playback devices 110. Additional details regarding audio playback synchronization between playback devices may be found, for example, in U.S. patent No.8,234,395, which is incorporated herein by reference.

In some embodiments, the memory 112b may also be configured to store data associated with the playback device 110a, e.g., one or more zones and/or groups of zones of which the playback device 200 is a member, audio sources accessible to the playback device 110a, and/or a playback queue with which the playback device 110a (and/or another of the one or more playback devices) may be associated. The stored data may include one or more state variables that are periodically updated and used to describe the state of playback device 110 a. The memory 112b may also include data associated with the state of one or more other devices of the media playback system 100 (e.g., playback device 110, NMD 120, control device 130). In some aspects, for example, the status data is shared during a predetermined time interval (e.g., every 5 seconds, every 10 seconds, every 60 seconds) between at least a portion of the devices of the media playback system 100 such that one or more of the devices has up-to-date data associated with the media playback system 100.

Network interface 112d is configured to facilitate data transfer between playback device 110a and one or more other devices on a data network, such as link 103 and/or network 104 (fig. 1B). The network interface 112d is configured to transmit and receive data corresponding to media content (e.g., audio content, video content, text, photographs) and other signals (e.g., non-transitory signals) including digital packet data including an Internet Protocol (IP) based source address and/or an IP based destination address. The network interface 112d may parse the digital packet data so that the electronic device 112 properly receives and processes the data destined for the playback device 110 a.

In the embodiment shown in FIG. 1C, the network interface 112d includes one or more wireless interfaces 112e (hereinafter referred to as "wireless interfaces 112 e"). The wireless interface 112e (e.g., a suitable interface including one or more antennas) may be configured to wirelessly communicate with one or more other devices (e.g., one or more of the other playback devices 110, the NMD 120, and/or the control device 130) communicatively coupled to the network 104 (fig. 1B) according to a suitable wireless communication protocol (e.g., WiFi, bluetooth, LTE). In some embodiments, the network interface 112d optionally includes a wired interface 112f (e.g., an interface or receptacle configured to receive a network cable such as an ethernet, USB-A, USB-C, and/or Thunderbolt cable), the wired interface 112f configured to communicate over wired connections with other devices according to a suitable wired communication protocol. In certain embodiments, the network interface 112d includes a wired interface 112f and does not include a wireless interface 112 e. In some embodiments, the electronic device 112 excludes the network interface 112d altogether and sends and receives media content and/or other data via another communication path (e.g., input/output 111).

The audio component 112g is configured to process and/or filter data including media content received by the electronic device 112 (e.g., via the input/output 111 and/or the network interface 112d) to generate an output audio signal. In some embodiments, the audio processing component 112g includes, for example, one or more digital-to-analog converters (DACs), audio pre-processing components, audio enhancement components, Digital Signal Processors (DSPs), and/or other suitable audio processing components, modules, circuits, and/or the like. In certain embodiments, the one or more audio processing components 112g may comprise one or more subcomponents of the processor 112 a. In some embodiments, the electronic device 112 omits the audio processing component 112 g. In some aspects, for example, the processor 112a executes instructions stored on the memory 112b to perform audio processing operations to produce an output audio signal.

The amplifier 112h is configured to receive and amplify the audio output signal generated by the audio processing component 112g and/or the processor 112 a. The amplifier 112h may include electronics and/or components configured to amplify the audio signal to a level sufficient to drive the one or more transducers 114. In some embodiments, for example, amplifier 112h includes one or more switches or class D power amplifiers. However, in other embodiments, the amplifier includes one or more other types of power amplifiers (e.g., linear gain power amplifiers, class a amplifiers, class B amplifiers, class AB amplifiers, class C amplifiers, class D amplifiers, class E amplifiers, class F amplifiers, class G and/or H amplifiers, and/or other suitable types of power amplifiers). In certain embodiments, amplifier 112h comprises a suitable combination of two or more of the foregoing types of power amplifiers. Further, in some embodiments, each of the amplifiers 112h corresponds to each of the transducers 114. However, in other embodiments, the electronic device 112 includes a single amplifier 112h configured to output the amplified audio signals to the plurality of transducers 114. In some other embodiments, the electronic device 112 omits the amplifier 112 h.

The transducer 114 (e.g., one or more speakers and/or speaker drivers) receives the amplified audio signal from the amplifier 112h and presents or outputs the amplified audio signal as sound (e.g., audible sound waves having a frequency between approximately 20 hertz (Hz) and 20 kilohertz (kHz)). In some embodiments, the transducer 114 may comprise a single transducer. However, in other embodiments, the transducer 114 includes a plurality of audio transducers. In some embodiments, the transducers 114 include more than one type of transducer. For example, the transducers 114 may include one or more low frequency transducers (e.g., subwoofer, woofer), mid frequency transducers (e.g., mid range transducer, mid woofer), and one or more high frequency transducers (e.g., one or more tweeter). As used herein, "low frequency" may generally refer to audible frequencies below about 500Hz, "medium frequency" may generally refer to audible frequencies between about 500Hz and about 2kHz, and "high frequency" may generally refer to audible frequencies above 2 kHz. However, in certain embodiments, the one or more transducers 114 include transducers that do not comply with the aforementioned frequency ranges. For example, one of the transducers 114 may include a midbass transducer configured to output sound at a frequency between about 200Hz and about 5 kHz.

For example, SONOS companies currently offer (or have offered) to market certain playback devices, including, for example, "SONOS ONE," PLAY: 1 "," PLAY: 3 "," PLAY: 5 "," PLAYBAR "," CONNECT: AMP "," CONNECT ", and" SUB ". Other suitable playback devices may additionally or alternatively be used to implement the playback devices of the example embodiments disclosed herein. Additionally, one of ordinary skill in the art will appreciate that the playback device is not limited to the examples described herein or the SONOS product offering. In some embodiments, for example, the one or more playback devices 110 include wired or wireless headphones (e.g., ear-headphones, over-the-ear headphones, in-ear headphones). In other embodiments, one or more playback devices 110 include a docking station and/or interface configured to interact with a docking station for a personal mobile media playback device. In some embodiments, the playback device may be integrated into another device or component, such as a television, a lighting fixture, or some other device for use indoors or outdoors. In some embodiments, the playback device omits the user interface and/or the one or more transducers. For example, fig. 1D is a block diagram of a playback device 110p, the playback device 110p including input/output 111 and electronic device 112, but not user interface 113 or transducer 114.

Fig. 1E is a block diagram of a bound playback device 110q, the bound playback device 110q including a playback device 110a (fig. 1C) ultrasonically bound to a playback device 110i (e.g., subwoofer) (fig. 1A). In the illustrated embodiment, playback devices 110a and 110i are separate playback devices of playback devices 110 housed in separate housings. However, in some embodiments, bundled playback device 110q includes a single housing that houses both playback devices 110a and 110 i. The bound playback device 110q may be configured to process and reproduce sound differently than unbound playback devices (e.g., playback device 110a of fig. 1C) and/or paired or bound playback devices (e.g., playback devices 1101 and 110m of fig. 1B). In some embodiments, for example, playback device 110a is a full range playback device configured to present low, mid, and high frequency audio content, and playback device 110i is a subwoofer configured to present low frequency audio content. In some aspects, when bound to the first playback device, playback device 110a is configured to present only the mid-frequency and high-frequency components of the particular audio content, while playback device 110i presents the low-frequency components of the particular audio content. In some embodiments, the bound playback device 110q includes an additional playback device and/or another bound playback device. Additional playback device embodiments are described in more detail below with reference to fig. 2A-3D.

c. Suitable Network Microphone Device (NMD)

Fig. 1F is a block diagram of the NMD 120a (fig. 1A and 1B). The NMD 120a includes one or more voice processing components 124 (hereinafter referred to as "voice components 124") and several components described with respect to the playback device 110a (fig. 1C), including a processor 112a, a memory 112b, and a microphone 115. The NMD 120a optionally includes other components also included in the playback device 110a (fig. 1C), such as the user interface 113 and/or the transducer 114. In some embodiments, the NMD 120a is configured as a media playback device (e.g., one or more playback devices 110) and further includes, for example, one or more of an audio component 112g (fig. 1C), an amplifier 114, and/or other playback device components. In certain embodiments, the NMD 120a includes internet of things (IoT) devices, e.g., thermostats, alarm panels, fire and/or smoke detectors, and the like. In some embodiments, the NMD 120a includes a microphone 115, speech processing 124, and only a portion of the components of the electronic device 112 described above with respect to fig. 1B. In some aspects, for example, the NMD 120a includes the processor 112a and the memory 112B (fig. 1B), while one or more other components of the electronic device 112 are omitted. In some embodiments, the NMD 120a includes additional components (e.g., one or more sensors, cameras, thermometers, barometers, hygrometers).

In some embodiments, the NMD may be integrated into the playback device. Fig. 1G is a block diagram of a playback device 110r that includes an NMD 120 d. Playback device 110r may include many or all of the components of playback device 110a, and also includes microphone 115 and speech processing 124 (fig. 1F). The playback device 110r optionally includes an integrated control device 130 c. The control device 130c may include, for example, a user interface (e.g., user interface 113 of fig. 1B) configured to receive user input (e.g., touch input, voice input) without a separate control device. However, in other embodiments, the playback device 110r receives a command from another control device (e.g., the control device 130a of fig. 1B).

Referring again to fig. 1F, the microphone 115 is configured to acquire, capture, and/or receive sound from the environment (e.g., environment 101 of fig. 1A) and/or the room in which the NMD 120a is located. The received sound may include, for example, speech utterances, audio played back by the NMD 120a and/or another playback device, background speech, ambient sound, and so forth. The microphone 115 converts received sound into an electrical signal to generate microphone data. The speech processing 124 receives and analyzes the microphone data to determine whether speech input is present in the microphone data. The speech input may comprise, for example, an activation word followed by an utterance comprising a user request. As one of ordinary skill in the art will appreciate, the activation word is a word or other audio prompt representing a user voice input. For example, in a query When VAS, the user may speak the activation word "Alexa". Other examples include for invokingVAS "Ok, Google" and for Call"Hey, Siri" for VAS.

After detecting the activation word, speech processing 124 monitors microphone data accompanying the user request in the speech input. The user request may include, for example, controlling a third party device (e.g., a thermostat (e.g.,thermostat), lighting (e.g., PHILIPS)Lighting devices) or media playback devices (e.g.,playback device)). For example, the user may speak the activation word "Alexa" followed by a vocalization of "set thermostat to 68 degrees" to set the temperature in the home (e.g., environment 101 of fig. 1A). The user may speak the same activation word followed by a sound to "light up the living room" to turn on the lighting devices in the home living room area. The user may similarly speak an activation word followed by a request to play a particular song, album, or music playlist on a playback device in the home. Additional description regarding receiving and processing speech input data may be found below in more detail with reference to fig. 3A-3F.

d. Suitable control devices

Fig. 1H is a partial schematic diagram of the control apparatus 130a (fig. 1A and 1B). As used herein, the term "control device" may be used interchangeably with "controller" or "control system" . The control device 130a is configured to, among other things, receive user input related to the media playback system 100 and, in response, cause one or more devices in the media playback system 100 to perform an action or operation corresponding to the user input. In the illustrated embodiment, control device 130a comprises a smart phone (e.g., an iPhone) having media playback system controller application software installed thereon^TMAndroid phone). In some embodiments, the control device 130a includes, for example, a tablet computer (e.g., iPad)^TM) A computer (e.g., laptop, desktop), and/or other suitable device (e.g., television, car stereo, IoT device). In certain embodiments, the control device 130a comprises a dedicated controller for the media playback system 100. In other embodiments, the control device 130a is integrated into another device in the media playback system 100 (e.g., one or more of the playback device 110, NMD 120, and/or other suitable devices configured to communicate over a network) as described above with respect to fig. 1G.

Control device 130a includes electronics 132, a user interface 133, one or more speakers 134, and one or more microphones 135. The electronic device 132 includes one or more processors 132a (hereinafter "processor 132 a"), a memory 132b, software components 132c, and a network interface 132 d. The processor 132a may be configured to perform functions related to facilitating user access, control, and configuration of the media playback system 100. Memory 132b may comprise a data storage device that may be loaded with one or more software components that are executable by processor 302 to perform those functions. The software components 132c may include applications and/or other executable software configured to facilitate control of the media playback system 100. The memory 112b may be configured to store, for example, software components 132c, media playback system controller application software, and/or other data associated with the media playback system 100 and a user.

The network interface 132d is configured to facilitate network communications between the control device 130a and one or more other devices and/or one or more remote devices in the media playback system 100. In some embodiments, network interface 132 is configured to operate in accordance with one or more suitable communication industry standards (e.g., infrared, radio, wired standards including IEEE 802.3, wireless standards including IEEE 802.11a, 802.11b, 802.11g, 802.11n, 802.11ac, 802.15, 4G, LTE). The network interface 132d may be configured to transmit data to and/or receive data from, for example, the playback device 110, the NMD 120, other ones of the control devices 130, one of the computing devices 106 of fig. 1B, and devices including one or more other media playback systems, and the like. The transmitted and/or received data may include, for example, playback device control commands, state variables, playback zones, and/or zone group configurations. For example, based on user input received at the user interface 133, the network interface 132d may transmit playback device control commands (e.g., volume control, audio playback control, audio content selection) from the control device 304 to one or more playback devices 100. The network interface 132d may also send and/or receive configuration changes, e.g., adding/deleting one or more playback devices 100 to/from a zone; adding/deleting one or more blocks to/from a granule; forming a bound or merged player; separating one or more playback devices from a bound or consolidated player, and the like. Additional description of zones and groups may be found below with reference to fig. 1-I through 1M.

The user interface 133 is configured to receive user input and may facilitate control of the media playback system 100. The user interface 133 includes media content art 133a (e.g., album art, lyrics, video), a playback status indicator 133b (e.g., a passage and/or remaining time indicator), a media content information region 133c, a playback control region 133d, and a zone indicator 133 e. The media content information area 133c may include a display of relevant information (e.g., title, artist, album, genre, year of release) about the media content currently being played and/or the media content in a queue or playlist. Playback control region 133d may include a selectable (e.g., via touch input and/or via a cursor or other suitable selector) icon to cause one or more playback devices in a selected playback zone or group to perform a playback action, e.g., such asPlay or pause, fast forward, fast reverse, skip to next, skip to previous, enter/exit shuffle mode, enter/exit repeat mode, enter/exit cross-fade mode, etc. Playback control region 133d may also include selectable icons for modifying equalization settings, playback volume, and/or other suitable playback actions, etc. In the illustrated embodiment, user interface 133 includes a presentation on a smart phone (e.g., an iPhone) ^TMAndroid phone). However, in some embodiments, other user interfaces of varying formats, styles, and interaction sequences may alternatively be implemented on one or more network devices to provide similar control access to the media playback system.

One or more speakers 134 (e.g., one or more transducers) may be configured to output sound to a user of the control device 130 a. In some embodiments, the one or more speakers include respective transducers configured to output low, mid, and/or high frequencies, respectively. In some aspects, for example, the control device 130a is configured as a playback device (e.g., one of the playback devices 110). Similarly, in some embodiments, the control device 130a is configured as an NMD (e.g., one of the NMDs 120) that receives voice commands and other sounds via one or more microphones 135.

The one or more microphones 135 may include, for example, one or more condenser microphones, electret condenser microphones, dynamic microphones, and/or other suitable types of microphones or transducers. In some embodiments, two or more microphones 135 may be arranged to capture location information of an audio source (e.g., speech, audible sound) and/or configured to facilitate filtering of background noise. Further, in certain embodiments, the control device 130a is configured to function as a playback device and an NMD. However, in other embodiments, the control device 130a omits one or more speakers 134 and/or one or more microphones 135. For example, the control device 130a may include a device (e.g., a thermostat, an IoT device, a network device) that includes a portion of the electronic device 132 and the user interface 133 (e.g., a touch screen) without including any speakers or microphones. Additional control device embodiments are described in more detail below with reference to fig. 4A-4D and 5.

e. Suitable playback device configuration

Fig. 1-1 to 1M show example configurations of playback devices in zones and granules. Referring first to fig. 1M, in one example, a single playback device may belong to one zone. For example, the playback device 110g in the second bedroom 101C (fig. 1A) may belong to zone C. In some implementations described below, multiple playback devices may be "bundled" to form a "bundle pair," which together form a single zone. For example, the playback device 1101 (e.g., a left side playback device) may be bound to the playback device 1101 (e.g., a left side playback device) to form an a-zone. Bound playback devices may have different playback responsibilities (e.g., soundtrack responsibilities). In another embodiment described below, multiple playback devices may be merged to form a single zone. For example, playback device 110h (e.g., a front-facing playback device) may be merged with playback device 110i (e.g., a subwoofer) and playback devices 110j and 110k (e.g., left and right surround speakers, respectively) to form a single D-zone. In another example, playback devices 110g and 110h may be merged to form merged group or granule 108 b. The merged playback devices 110g and 110h may not be specifically assigned different playback responsibilities. That is, the incorporated playback devices 110h and 110i can play the audio contents individually as they are when not incorporated, in addition to playing the audio contents synchronously.

Each zone in the media playback system 100 may be provided as a single User Interface (UI) entity for control. For example, zone a may be provided as a single entity named the main bathroom. Zone B may be provided as a single entity named master bedroom. Zone C may be provided as a single entity named the second bedroom.

Bound playback devices may have different playback responsibilities, for example, the responsibilities of certain audio channels. For example, as shown in fig. 1-I, playback devices 1101 and 110m may be bound to produce or enhance the stereo effect of audio content. In this example, the playback device 1101 may be configured to play a left channel audio component, while the playback device 110k may be configured to play a right channel audio component. In some embodiments, this stereo binding may be referred to as "pairing.

In addition, bound playback devices may have additional and/or different corresponding speaker drivers. As shown in fig. 1J, a playback device 110h named Front (Front) may be bound with a playback device 110i named Subwoofer (SUB). Front-facing device 110h may be configured to present a medium-high frequency range, and subwoofer device 110i may be configured to present a low frequency. However, when unbound, the head end 110h may be configured to present the entire frequency range. As another example, fig. 1K shows that front-end device 110h and subwoofer device 110i are further bound to left-side playback device 110j and right-side playback device 110K, respectively. In some implementations, the left device 110j and the right device 102k can be configured to form surround or "satellite" channels of a home theater system. The bound playback devices 110h, 110i, 110j, and 110k may form a single D zone (fig. 1M).

The merged playback devices may not be assigned playback responsibilities and may each present the full range of audio content that the respective playback device is capable of playing back. However, the merged device may be represented as a single UI entity (i.e., a region as described above). For example, playback devices 110a and 110n in the main bathroom have a single UI entity for zone A. In one embodiment, the playback devices 110a and 110n may each output the full range of audio content that each respective playback device 110a and 110n is capable of synchronized playback.

In some embodiments, the NMD may be bound or merged with another device to form a zone. For example, the NMD 120b may be bound with the playback device 110e, which together form an F-zone, which is named "living room". In some embodiments, the stand-alone network microphone devices themselves may be in one zone. However, in other embodiments, the independent network microphone devices may not be associated with a zone. For example, additional details regarding associating a network microphone device and a playback device as a designated device or a default device may be found in the previously referenced U.S. patent application No.15/438,749.

The differentiation of individual, bound, and/or merged devices may be grouped to form a group of devices. For example, referring to fig. 1M, a zone a may be grouped with B zone to form a zone group including the two zones. Similarly, the G region may be grouped with the H region to form the group region 108 b. As another example, zone A may be grouped with one or more other zones C-I. The a-I zone can be grouped and ungrouped in a number of ways. For example, three, four, five or more (e.g., all) of the zones a-I may be grouped together. When grouped, the zones of a single and/or bound playback device may play back audio in synchronization with each other, as described in previously referenced U.S. patent No.8,234,395. The playback devices may dynamically group and ungroup to form new or different groups of synchronously played back audio content.

In various embodiments, a zone in the environment may be a default name for a zone within a group, or a combination of zone names within a group of zones. For example, granule 108b may be assigned a name, such as "restaurant + kitchen," as shown in FIG. 1M. In some embodiments, the granule may also be named a unique name selected by the user.

Some data may be stored in a memory of the playback device (e.g., memory 112C of fig. 1C) as one or more state variables that are periodically updated and used to describe the state of the playback zone, the playback device, and/or the zone group associated therewith. The memory may also include data associated with the state of other devices of the media system and is shared between the devices from time to time such that one or more of the devices has up-to-date data associated with the system.

In some embodiments, the memory may store instances of various variable types associated with the state. The variable instance may be stored with an identifier (e.g., a tag) corresponding to the type. For example, some identifiers may be a first type "a 1" for identifying playback devices of a zone, a second type "b 1" for identifying playback devices that may be bound in the zone, and a third type "c 1" for identifying a granule to which the zone may belong. As a related example, the identifier associated with the second bedroom 101C may indicate that the playback device is the only playback device in zone C, rather than the playback devices in the zone group. The identifier associated with the Den (Den) may indicate that the Den is not grouped with others, but includes bound playback devices 110h-110 k. The identifier associated with the restaurant may indicate that the restaurant is part of the restaurant + kitchen group 108b, and that devices 110b and 110d are grouped together (FIG. 1L). Since the kitchen is part of the restaurant + kitchen group 108b, the identifier associated with the kitchen may indicate the same or similar information. Other example region variables and identifiers are described below.

In another example, the media playback system 100 may store other associated variables or identifiers representing zones and granules, e.g., identifiers associated with zones, as shown in fig. 1M. A region may relate to a cluster of granules and/or a cluster of granules that are not within a granule. For example, FIG. 1M shows an upper region 109a that includes regions A-D, and a lower region 109b that includes regions E-I. In one aspect, a zone may be used to invoke a group of zones and/or a cluster of zones that share one or more zones and/or groups of zones of another cluster. On the other hand, this is different from a granule, which does not share a zone with another granule. Other examples of techniques for implementing zones may be found, for example, in U.S. application No.15/682,506 entitled "Room Association Based on Name" filed on 21/8/2017 and U.S. patent No.8,483,853 entitled "Controlling and manipulating groups in a multi-zone media system" filed on 11/9/2007. Each of these applications is incorporated by reference herein in its entirety. In some embodiments, the media playback system 100 may not implement a zone, in which case the system may not store variables associated with the zone.

In other examples, the playback devices 110 of the media playback system 100 are named and arranged according to a control hierarchy referred to as a family graph. Under the family map hierarchy, the basic unit of the family map hierarchy is "set". "collection" refers to a single device or multiple devices that operate together when performing a given function, e.g., a single playback device 110 or a bound region of a playback device. After "set", the next level of the hierarchy is "room". Under a family graph hierarchy, "rooms" can be thought of as containers for "collections" in a given room of a family. For example, an example "room" may correspond to a kitchen of a home, and be assigned the name "kitchen" and include one or more "collections" (e.g., "kitchen islands"). The next level of the example home layer hierarchy is a "zone," which includes two or more "rooms" (e.g., "upstairs" or "downstairs"). The highest level of the hierarchy of the family layer is "family". A household refers to an entire home, as well as all "collections" therein. Each level of the family graph hierarchy is assigned a human-readable name, which can be controlled via GUI and VUI. Additional details regarding the family Graph control hierarchy may be found, for example, in U.S. patent application No.16/216,357, entitled "Home Graph," which is incorporated herein by reference in its entirety.

Example systems and devices

Fig. 2A is a front isometric view of a playback device 210 configured in accordance with aspects of the disclosed technology. Fig. 2B is a front isometric view of the playback device 210 without the grill 216 e. Fig. 2C is an exploded view of the playback device 210. Referring collectively to fig. 2A-2C, the playback device 210 includes a housing 216, the housing 216 including an upper portion 216a, a right or first side portion 216b, a lower portion 216C, a left or second side portion 216d, a grill 216e, and a rear portion 216 f. A plurality of fasteners 216g (e.g., one or more screws, rivets, clips) attach the frame 216h to the housing 216. A cavity 216j (fig. 2C) in the housing 216 is configured to receive the frame 216h and the electronic device 212. The frame 216h is configured to carry a plurality of transducers 214 (identified in FIG. 2B as transducers 214a-214f, respectively). The electronic device 212 (e.g., electronic device 112 of fig. 1C) is configured to receive audio content from an audio source and send electrical signals corresponding to the audio content to the transducer 214 for playback.

The transducer 214 is configured to receive electrical signals from the electronic device 112 and is further configured to convert the received electrical signals to audible sound during playback. For example, the transducers 214a-214c (e.g., tweeters) may be configured to output high frequency sound (e.g., sound waves having a frequency greater than about 2 kHz). The transducers 214d-214f (e.g., midrange speakers, woofers, midrange speakers) may be configured to output sound (e.g., sound waves having a frequency below about 2 kHz) at a frequency that is lower than the frequency of the transducers 214a-214 c. In some embodiments, the playback device 210 includes a plurality of transducers that are different from the transducers shown in fig. 2A-2C. For example, the playback device 210 may include less than six transducers (e.g., one, two, three). However, in other embodiments, the playback device 210 includes more than six transducers (e.g., nine, ten). Further, in some embodiments, all or a portion of the transducer 214 is configured to operate as a phased array to desirably adjust (e.g., narrow or widen) the radiation pattern of the transducer 214 to change the user's perception of sound emitted from the playback device 210.

In the illustrated embodiment of fig. 2A-2C, the filter 216i is axially aligned with the transducer 214 b. The filter 216i may be configured to desirably attenuate the predetermined frequency range output by the transducer 214b to improve the sound quality and perceived sound level output by the transducer 214 in combination. However, in some embodiments, the playback device 210 omits the filter 216 i. In other embodiments, the playback device 210 includes one or more additional filters aligned with the transducer 214b and/or at least another of the transducers 214.

Fig. 3A and 3B are front and right isometric side views, respectively, of an NMD 320 configured in accordance with an embodiment of the disclosed technology. Fig. 3C is an exploded view of the NMD 320. Fig. 3D is an enlarged view of a portion of fig. 3B, including the user interface 313 of the NMD 320. Referring first to fig. 3A-3C, the NMD 320 includes a housing 316, the housing 316 including an upper portion 316a, a lower portion 316b, and a middle portion 316C (e.g., a grating). A plurality of ports, holes or apertures 316d in the upper portion 316a allow sound to pass to one or more microphones 315 (fig. 3C) located within the housing 316. The one or more microphones 316 are configured to receive sound via the holes 316d and to generate an electrical signal based on the received sound. In the illustrated embodiment, the frame 316e (fig. 3C) of the housing 316 surrounds cavities 316f and 316g configured to receive a first transducer 314a (e.g., tweeter) and a second transducer 314b (e.g., midrange, woofer), respectively. However, in other embodiments, the NMD 320 includes a single transducer, or more than two (e.g., two, five, six) transducers. In certain embodiments, the NMD 320 omits the transducers 314a and 314b altogether.

The electronic device 312 (fig. 3C) includes components configured to drive the transducers 314a and 314b and further configured to analyze audio data corresponding to electrical signals produced by one or more microphones 315. For example, in some embodiments, the electronic device 312 includes many or all of the components of the electronic device 112 described above with reference to fig. 1C. In certain embodiments, the electronic device 312 includes the components described above with reference to fig. 1F, e.g., the one or more processors 112a, memory 112b, software components 112c, network interface 112d, and so forth. In some embodiments, the electronic device 312 includes additional suitable components (e.g., a proximity sensor or other sensors).

Referring to fig. 3D, the user interface 313 includes a plurality of control surfaces (e.g., buttons, knobs, capacitive surfaces) including a first control surface 313a (e.g., a previous control), a second control surface 313b (e.g., a next control), and a third control surface 313c (e.g., play and/or pause control). The fourth control surface 313d is configured to receive touch inputs corresponding to activation and deactivation of the one or more microphones 315. The first indicator 313e (e.g., one or more Light Emitting Diodes (LEDs) or another suitable illuminator) may be configured to illuminate only when the one or more microphones 315 are activated. The second indicator 313f (e.g., one or more LEDs) may be configured to remain stable during normal operation and to blink or otherwise change from stable to indicating detection of voice activity. In some embodiments, user interface 313 includes additional or fewer control surfaces and luminaires. In one embodiment, for example, the user interface 313 includes a first indicator 313e and the second indicator 313f is omitted. Further, in certain embodiments, the NMD 320 includes playback devices and control devices, and the user interface 313 includes a user interface of the control devices.

Referring to fig. 3A-3D together, the NMD 320 is configured to receive voice commands from one or more neighboring users via one or more microphones 315. As described above with reference to fig. 1B, the one or more microphones 315 may capture, or record sounds in the vicinity (e.g., an area within 10m or less of the NMD 320) and send electrical signals corresponding to the recorded sounds to the electronic device 312. The electronic device 312 may process the electrical signal and may analyze the resulting audio data to determine that one or more voice commands (e.g., one or more activation words) are present. In some embodiments, for example, after detecting one or more suitable voice commands, the NMD 320 is configured to send a portion of the recorded audio data to another device and/or a remote server (e.g., one or more computing devices 106 of fig. 1B) for further analysis. The remote server may analyze the audio data, determine the appropriate action based on the voice command, and send a message to the NMD 320 to perform the appropriate action. For example, the user may say "Sonos, play Michael Jackson". The NMD 320 may record the user's voice utterance via the one or more microphones 315, determine the presence of the voice command, and send audio data with the voice command to a remote server (e.g., one or more remote computing devices 106 of fig. 1B, one or more servers of a VAS and/or another suitable service). The remote server may analyze the audio data and determine an action corresponding to the command. The remote server may then send a command to the NMD 320 to perform the determined action (e.g., play back audio content related to Michael Jackson). The NMD 320 can receive commands and play back audio content associated with Michael Jackson from a media content source. As described above with reference to fig. 1B, suitable content sources can include devices or storage devices communicatively coupled to the NMD 320 via a LAN (e.g., the network 104 of fig. 1B), a remote server (e.g., one or more remote computing devices 106 of fig. 1B), or the like. However, in certain embodiments, the NMD 320 determines and/or performs one or more actions corresponding to one or more voice commands without intervention or intervention by an external device, computer, or server.

Fig. 4A-4D are schematic diagrams of a control device 430 (e.g., the control device 130a, smartphone, tablet, dedicated control device, IoT device, and/or another suitable device of fig. 1H) illustrating respective user interface displays in various operating states. The first user interface display 431a (FIG. 4A) includes a display name 433a (i.e., "room"). The selected group field 433b displays audio content information (e.g., artist name, track name, album art) of the audio content played back in the selected group and/or section. The group regions 433c and 433d display the corresponding group and/or zone names, and audio content information of audio content played back in the playback queue of the respective group or zone or played back next. The audio content region 433e includes information related to audio content in the selected group and/or zone (i.e., the group and/or zone indicated in the selected group region 433 b). The lower display area 433f is configured to receive touch input to display one or more other user interface displays. For example, if the user selects "browse" in the lower display region 433f, the control device 430 may be configured to output a second user interface display 431B (fig. 4B) that includes a plurality of music services 433g (e.g., Spotify, Tunein station, Apple music, Pandora, Amazon, TV, local music, line-in) through which the user may browse media content and from which the user may select media content for play via one or more playback devices (e.g., one of the playback devices 110 of fig. 1A). Alternatively, if the user selects "my Sonos" in the lower display region 433f, the control device 430 may be configured to output a third user interface display 431C (fig. 4C). The first media content region 433h may include graphical representations (e.g., album art) corresponding to various albums, stations, or playlists. The second media content region 433i may include graphical representations (e.g., album art) corresponding to respective songs, tracks, or other media content. If the user selects graphical representation 433j (FIG. 4C), control device 430 may be configured to begin playback of audio content corresponding to graphical representation 433j and output a fourth user interface display 431d, which includes an enlarged version of graphical representation 433j, media content information 433k (e.g., track name, artist, album), transport controls 433m (e.g., play, previous, next, pause, volume) and an indication 433n of the currently selected group and/or zone name.

Fig. 5 is a schematic diagram of a control device 530 (e.g., laptop computer, desktop computer). The control device 530 includes a transducer 534, a microphone 535, and a camera 536. The user interface 531 includes a transfer control region 533a, a playback state region 533b, a playback region 533c, a playback queue region 533d, and a media content source region 533 e. The transport control area includes one or more controls for controlling media playback including, for example, volume, previous, play/pause, next, repeat, shuffle, track position, fade, balance, and the like. The audio content source region 533e includes a list of one or more media content sources from which a user may select media items for play and/or addition to a playback queue.

Playback zone region 533B may include a representation of a playback zone within media playback system 100 (fig. 1A and 1B). In some embodiments, the graphical representation of the playback zone may be selectable to bring up additional selectable icons to manage or configure the playback zone in the media playback system, e.g., creation of a bind zone, creation of a place group, segregation of a granule, renaming of a granule, etc. In the illustrated embodiment, a "grouping" icon may be provided within each graphical representation of the playback zone. The "grouping" icon provided within the graphical representation of the special zone may be selectable to invoke an option for selecting one or more other zones in the media playback system to be distinguished from the particular zone in the group. Once grouped, playback devices in a zone that has been distinguished from a particular zone may be configured to play audio content in synchronization with the playback devices in that particular zone. Similarly, a "group" icon may be provided within the graphical representation of the granule. In the illustrated embodiment, the "group" icon may be selectable to invoke an option to deselect one or more zones in the granule to be removed from the granule. In some embodiments, the control device 530 includes other interactions and implementations for grouping and ungrouping zones via the user interface 531. In some embodiments, the representation of the playback zone in the playback zone region 533b may be dynamically updated as the playback zone or zone configuration is modified.

The playback status region 533c includes a graphical representation of audio content currently playing, previously playing, or scheduled to play next in the selected playback zone or group. The selected playback zone or group may be visually distinguished on the user interface (e.g., within playback zone region 533b and/or playback queue region 533 d). The graphical representation may include track name, artist name, album year, track length, and other relevant information that may be useful to a user to learn when controlling the media playback system via the user interface 531.

The playback queue region 446 may include a graphical representation of the audio content in the playback queue associated with the selected playback zone or group. In some embodiments, each playback zone or group may be associated with a playback queue that contains information corresponding to zero or more audio items played back by that playback zone or group. For example, each audio item in the playback queue may include a Uniform Resource Identifier (URI), a Uniform Resource Locator (URL), or some other identifier that may be used by the playback devices in the playback zone or group to find and/or retrieve the audio item from a local audio content source or a network audio content source, which may then be played back by the playback devices. In some embodiments, for example, a playlist may be added to the playback queue, in which case information corresponding to each audio item in the playlist may be added to the playback queue. In some embodiments, the audio items in the playback queue may be saved as a playlist. In some embodiments, the playback queue may be empty or filled but "not in use" when the playback zone or group is continuing to play streaming audio content (e.g., an internet radio, which may continue to play until stopped), rather than a separate audio item having a playback duration. In some embodiments, the playback queue may include internet radio and/or other streaming audio content items and be "in use" when the playback zone or group is playing those items.

When a playback zone or group is "packetized" or "ungrouped," the playback queue associated with the affected playback zone or group may be cleared, or re-associated. For example, if a first playback zone that includes a first playback queue is grouped with a second playback zone that includes a second playback queue, the established granule may have an associated playback queue that is initially empty, contains audio items from the first playback queue (e.g., if the second playback zone is added to the first playback zone), or contains audio items from the second playback queue (e.g., if the first playback zone is added to the second playback zone), or contains a combination of audio items from both the first playback queue and the second playback queue. Subsequently, if the established granule is ungrouped, the resulting first playback zone may be re-associated with the previous first playback queue, or associated with a new playback queue that is empty, or contains an audio item from the playback queue associated with the granule established before the established granule was ungrouped. Similarly, the resulting second playback zone may be re-associated with the previous second playback queue, or associated with a new playback queue that is empty, or contains audio items from the playback queue associated with the established granule before the established granule was ungrouped.

Fig. 6 is a message flow diagram illustrating the exchange of data between devices of the media playback system 100 (fig. 1A-1M).

At step 650a, the media playback system 100 receives an indication of selected media content (e.g., one or more songs, albums, playlists, podcasts, videos, stations) via the control device 130 a. The selected media content may include, for example, media items stored locally on one or more devices connected to the media playback system (e.g., audio source 105 of FIG. 1C) and/or media items stored on one or more media service servers (e.g., one or more remote computing devices 106 of FIG. 1B). In response to receiving the indication of the selected media content, the control device 130a sends a message 651A (fig. 1A-1C) to the playback device 110a to add the selected media content to a playback queue on the playback device 110 a.

At step 650b, playback device 110a receives message 651a and adds the selected media content to the playback queue for playback.

At step 650c, the control device 130a receives an input corresponding to a command to play back the selected media content. In response to receiving an input corresponding to a command to play back the selected media content, control device 130a sends a message 651b to playback device 110a, causing playback device 110a to play back the selected media content. In response to receiving message 651b, playback device 110a sends message 651c to computing device 106a to request the selected media content. In response to receiving the message 651c, the computing device 106a sends a message 651d, which message 651d includes data (e.g., audio data, video data, URLs, URIs) corresponding to the requested media content.

At step 650d, the playback device 110a receives the message 651d with data corresponding to the requested media content and plays back the associated media content.

At step 650e, playback device 110a optionally causes one or more other devices to play back the selected media content. In one example, the playback device 110a is one of the bound zones of two or more players (fig. 1M). The playback device 110a may receive the selected media content and send all or a portion of the media content to other devices in the bound region. In another example, the playback device 110a is a coordinator of the group and is configured to transmit and receive timing information from one or more other devices in the group. The other one or more devices in the group may receive the selected media content from the computing device 106a and begin playback of the selected media content in response to a message from the playback device 110a, such that all devices in the group play back the selected media content synchronously.

Example synchronization packet techniques

Example synchronization techniques involve a group coordinator providing audio content and timing information to one or more group members to facilitate synchronized playback between the group coordinator and the group members. In some embodiments, at least some aspects of the subject technology arise from the technical structure and organization of audio information, playback timing, and clock timing information used by playback devices to play audio content from an audio source in synchronization with each other, including: how different playback devices generate playback timing based on clock timing (either local clock timing or remote clock timing), and how audio content is played based on both playback timing (either locally generated or remotely generated) and clock timing (either locally generated or remotely generated). Thus, to facilitate understanding of certain aspects of the disclosed technology, certain technical details of the audio information, playback timing and clock timing information, and how playback devices generate and/or use playback timing and clock timing to play audio content in different configurations are described below.

a.Audio content

The audio content may be any type of audio content now known or later developed. For example, in some embodiments, the audio content includes any one or more of: (i) streaming music or other audio obtained from a streaming media service (e.g., Spotify, Pandora, or other streaming media service); (ii) streaming music or other audio from a local music library (e.g., a music library stored on a user's laptop, desktop, smartphone, tablet, home server, or other computing device now known or later developed); (Iii) audio content associated with video content, e.g., audio associated with a television program or movie received from a television, a set-top box, a digital video recorder, a digital video disc player, a streaming video service, or any other source of now known or later developed audiovisual media content; (iv) text-to-speech or other audible content from a Voice Assistant Service (VAS) (e.g., Amazon Alexa or other VAS services now known or later developed); (v) audio content from a doorbell or intercom system (e.g., Nest, Ring, or other doorbell or intercom systems now known or later developed); and/or (vi) audio content from a telephone, video phone, video/teleconferencing system, or other application configured to allow users to communicate with each other via audio and/or video.

In operation, a "source" playback device obtains any of the above-described types of audio content from an audio source via an interface on the playback device (e.g., one of the network interfaces of the source playback device, a "line-in" analog interface, a digital audio interface, or any other interface suitable for receiving audio content in a digital or analog format now known or later developed).

An audio source is any system, device, or application that generates, provides, or otherwise makes any of the aforementioned audio content available to a playback device. For example, in some embodiments, the audio sources include any one or more of streaming media (audio, video) services, digital media servers or other computing systems, VAS services, televisions, cable set-top boxes, streaming media players (e.g., AppleTV, Roku, game consoles), CD/DVD players, doorbells, walkie-talkies, telephones, tablets, or any other source of digital audio content.

Playback devices that receive audio content from an audio source or otherwise obtain audio content from an audio source for playback and/or distribution to other playback devices are sometimes referred to herein as "source" playback devices, "master" playback devices, or "group coordinators. One function of a "source" playback device is to process received audio content for playback and/or distribution to other playback devices. In some embodiments, the source playback device sends the processed audio content to all playback devices configured to play the audio content. In some embodiments, the source playback device sends the processed audio content to a multicast network address, and all other playback devices configured to play the audio content receive the audio content via the multicast address. In some embodiments, the source playback device alternatively transmits the processed audio content to each unicast network address of each other playback device configured to play the audio content, and each other playback device configured to play the audio content receives the audio content via its unicast address.

In some embodiments, a "source" playback device receives audio content from an audio source in digital form (e.g., as a stream of packets). In some embodiments, each packet in the stream of packets has a sequence number or other identifier that specifies the order of the packet. Packets sent over a data packet network (e.g., an ethernet, WiFi, or other packet network) may arrive out of order, so the source playback device reassembles the stream of packets in the correct order using sequence numbers or other identifiers before performing further packet processing. In some embodiments, a sequence number or other identifier specifying the order of packets is or at least includes a timestamp indicating the time at which the packet was created. The packet creation time may be used as a sequence number based on the following assumptions: the packets are created in the order in which they should be played out later.

In some embodiments, the source playback device does not change the sequence number or identifier of the received packet during packet processing. In some embodiments, the source playback device reorders at least a first set of packets in the stream of packets based on a sequence identifier of each packet, extracts audio content from the received packets, reassembles a bitstream of the audio content from the received packets, and then repackages (repackage) the reassembled bitstream into a second set of packets, wherein the packets in the second set of packets have sequence numbers that are different from the sequence numbers of the packets in the first set of packets. In some embodiments, the individual packets in the second set of packets have a different length (i.e., shorter or longer) than the individual packets in the first set of packets. In some embodiments, reassembly of the bitstream from incoming packets, followed by subsequent repackaging of the reassembled bitstream into different sets of packets, facilitates uniform processing and/or transmission of audio content by the source playback device and other playback devices that receive the audio content from the source playback device. However, for some delay-sensitive audio content, reassembly and repackaging may be undesirable, and thus, in some embodiments, a source playback device may not perform reassembly and repackaging of some (or all) of the audio content it receives before playing the audio content and/or sending the audio content to other playback devices.

In some embodiments, an audio source provides audio content in digital form to a source playback device, e.g., via a digital line input interface. In such embodiments, the source playback device packetizes the digital audio into packets of audio content prior to sending the audio content to the other playback devices. In some embodiments, individual packets of audio content include a sequence number or other identifier so that when other playback devices receive the audio content, those other playback devices will be able to reliably arrange the received packets in the correct order before performing further packet processing.

In some embodiments, an audio source provides audio content in analog form to a source playback device, e.g., via an analog line input interface. In such embodiments, the source playback device converts the received analog audio to digital audio and packetizes the digital audio into packets of audio content prior to sending the audio content to the other playback devices. In some embodiments, individual packets of audio content include a sequence number or other identifier so that when other playback devices receive the audio content, those other playback devices will be able to reliably arrange the received packets in the correct order before performing further packet processing.

After obtaining audio content from an audio source or another playback device, in some embodiments, the playback device performs one or more of the following: (i) playing audio content separately, (ii) playing content in synchronization with one or more additional playback devices, and/or (iii) sending audio content to one or more other playback devices.

b.Playback timing

The playback devices disclosed and described herein use playback timing to play audio content in synchronization with each other. Individual playback devices can generate playback timing and/or play back audio content according to playback timing based on the configuration of the playback devices in the media playback network. The source playback device that generates the playback timing of the audio content also transmits the generated playback timing to all playback devices configured to play the audio content. In some embodiments, the source playback device sends the playback timing to a multicast network address, and all other playback devices configured to play the audio content receive the playback timing via the multicast address. In some embodiments, the source playback device alternatively transmits playback timing to each unicast network address of each other playback device configured to play audio content, and each other playback device configured to play audio content receives playback timing via its unicast address.

In operation, the playback device (or computing device associated with the playback device) generates playback timing of the audio content based on clock timing (described below), which may be "local" clock timing (i.e., clock timing produced by the source playback device or "remote" clock timing received from a different playback device (or other computing device).

In some embodiments, playback timing is generated for individual frames (or packets) of audio content. As described above, in some embodiments, the audio content is encapsulated in a series of frames (or packets), where each frame (or packet) comprises a portion of the audio content. In some embodiments, the timing of playback of the audio content includes the playback time of each frame (or packet) of the audio content. In some embodiments, the playback timing of individual frames (or packets) is included within the frame (or packet), e.g., in a header of the frame (or packet), in an extension header of the frame (or packet), and/or in a payload portion of the frame (or packet).

In some embodiments, the playback time of an individual frame (or packet) is identified within a timestamp or other indication. In such embodiments, the timestamp (or other indication) represents the time that the audio content was played within the individual frame (or packet). In operation, when the playback timing of an individual frame (or packet) is generated, the playback timing of the individual frame (or packet) is a future time with respect to a current clock time of the reference clock at the time of generating the playback timing of the individual frame (or packet). The reference clock may be a "local" clock at the playback device, or a "remote" clock at a separate network device (e.g., another playback device, a computing device, or another network device configured to provide clock timing for use by the playback device in generating playback timing and/or playing back audio content).

In operation, the playback device responsible for playing the particular audio content will play a portion of the particular audio content within an individual frame (or packet) at a playback time specified by the playback timing for the individual frame (or packet), the portion of the particular audio content adjusted to accommodate the clock difference between the source playback device, the clock timing, and the playback device responsible for playing the audio content, as described in more detail below.

c.Clock timing

The playback devices disclosed and described herein use clock timing to generate playback timing of audio content and play the audio content based on the generated playback timing. In some embodiments, the source playback device uses clock timing from a reference clock (e.g., a device clock, a digital audio converter clock, a playback time reference clock, or any other clock) to generate playback timing for audio content that the source playback device receives from an audio source. For individual playback devices, the reference clock may be a "local" clock at the playback device, or a "remote" clock at a separate network device (e.g., another playback device, a computing device, or another network device configured to provide clock timing for use by the playback device in generating playback timing and/or playing back audio content).

In some embodiments, all playback devices responsible for synchronously playing a particular audio content play back the particular audio content using the same clock timing as the reference clock. In some embodiments, the playback device uses the same clock timing to play the audio content that was used to generate the playback timing of the audio content.

In operation, the network device generating the clock timing also transmits the clock timing to all playback devices in the network that need to use the clock timing to generate playback timing and/or playback audio content. In some embodiments, the network device generating the clock timing transmits the clock timing to a multicast network address and all other playback devices configured to generate playback timing and/or play audio content receive the clock timing via the multicast address. In some embodiments, the network device alternatively transmits the clock timing to each unicast network address of each other playback device configured to play the audio content, and each other playback device configured to play the audio content receives the clock timing via its unicast address.

d.Generating playback timing using clock timing of a local clock

In some embodiments, the source playback device (i) generates playback timing of the audio content based on clock timing from a local clock at the source playback device, and (ii) transmits the generated playback timing to all other playback devices configured to play the audio content. In operation, when generating playback timing for an individual frame (or packet), the "source" playback device adds a "timing advance" to the current clock time of the local clock of the source playback device, which the source playback device is using to generate playback timing.

In some embodiments, "timing advance" is based on an amount of time that is greater than or equal to the sum of: (i) the network transmission time required for frames and/or packets including audio content sent from the source playback device to reach all other playback devices configured to play audio content synchronously using the playback timing, and (ii) the amount of time required for all other playback devices configured to play back synchronously using the playback timing to process frames/packets received from the source playback device for playback.

In some embodiments, the source playback device determines the timing advance by: one or more test packets are sent to one or more (or possibly all) of the other playback devices configured to play the audio content being sent by the source device, and then test response packets returned from those one or more of the other playback devices are received. In some embodiments, the source playback device and one or more other playback devices negotiate the timing advance via a plurality of test and response messages. In some embodiments with more than two additional playback devices, the source playback device determines the timing advance by: test and response messages are exchanged with all playback devices and then a timing advance is set that is sufficient for the playback devices to have the longest total network transmission time and packet processing time.

In some embodiments, the timing advance is less than about 50 milliseconds. In some embodiments, the timing advance is less than about 20-30 milliseconds. And in other embodiments the timing advance is less than about 10 milliseconds. In some embodiments, the timing advance remains constant after being determined. In other embodiments, the playback device that generates the playback timing may change the timing advance in response to a request from the receiving device indicating that a greater timing advance is needed (e.g., because the receiving device did not receive packets including the portion of the audio content until after the other device has played the portion of the audio content) or that a shorter time advance is sufficient (e.g., because the receiving device is buffering more packets including the portion of the audio content than is necessary to provide consistent, reliable playback).

As described in more detail below, all playback devices configured to play audio content synchronously will use playback timing and clock timing to play audio content synchronously with each other.

f.Playing audio content using local playback timing and local clock timing

In some embodiments, the source playback device is configured to play audio content in synchronization with one or more other playback devices. And, if the source playback device is generating playback timing using clock timing from a local clock at the source playback device, the source playback device will play the audio content using the locally generated playback timing and the locally generated clock timing. In operation, when the local clock used by the source playback device to generate playback timing reaches a time specified in the playback timing of an individual frame (or packet), the source playback device plays that individual frame (or packet) that includes the portion of the audio content.

For example, recall that when playback timing of an individual frame (or packet) is generated, the source playback device adds a "timing advance" to the current clock time of the reference clock used to generate the playback timing. In this case, the reference clock used to generate playback timing is a local clock at the source playback device. Thus, if the timing advance of an individual frame is, for example, 30 milliseconds, the source playback device plays the portion (e.g., sample or set of samples) of the audio content in the individual frame (or packet) 30 milliseconds after the playback timing at which the individual frame (or packet) was created.

In this manner, the source playback device plays the audio content using locally generated playback timing and clock timing from the local reference clock. As described further below, by playing portions of the audio content of an individual frame and/or packet when the clock time of the local reference clock reaches the playback timing of the individual frame or packet, the source playback device plays portions of the audio content in the individual frame and/or packet in synchronization with other playback devices.

i.Playing audio content using remote playback timing and remote clock timing

Recall that in some embodiments, a source playback device sends audio content and playback timing of the audio content to one or more other playback devices. And further recall that in some embodiments, the network device providing the clock timing may be a different device than the source playback device. A playback device that receives audio content, playback timing, and clock timing from another playback device is configured to play back the audio content using playback timing from the source playback device (i.e., remote playback timing) and clock timing from a clock at the other playback device (i.e., remote clock timing). In this manner, in this example, the receiving playback device uses the remote playback timing and the remote clock timing to play the audio content.

To play individual frames (or packets) of audio content in synchronization with each other playback device responsible for playing the audio content, the receiving playback device (i) receives frames (or packets) that include portions of the audio content from the source playback device, (ii) receives playback timing of the audio content from the source playback device (e.g., in a frame and/or packet header that includes the frames and/or packets of the portions of the audio content, or possibly separate from the frames and/or packets that include the portions of the audio content), (iii) receives clock timing from another network device (e.g., another playback device, a computing device, or another network device configured to provide clock timing for use by the playback device in generating playback timing and/or playing back the audio content), and (iv) when the local clock used by the receiving playback device for playback of the audio content reaches a return timing specified in the playback timing of the individual frames (or packets) received from the source playback device When time is dropped (as adjusted by a "timing offset"), the portion of the audio content in that individual frame (or packet) is played.

In operation, after a receiving playback device receives clock timing from another network device, the receiving device determines a "timing offset" for the receiving playback device. The "timing offset" comprises (or at least corresponds to) the difference between a "reference" clock at the network device used by the network device to generate the clock timing and a "local" clock at the receiving playback device used by the receiving playback device to play the audio content. In operation, each playback device that receives clock timing from another network device calculates its own "timing offset" based on the difference between its local clock and the clock timing, and thus, each playback-determined "timing offset" is specific to that particular playback device.

In some embodiments, when playing back audio content, the receiving playback device generates new playback timings (specific to the receiving playback device) for the various frames (or packets) of audio content by adding a previously determined "timing offset" to the playback timing of each frame (or packet) received from the source playback device. In this way, the receiving playback device translates the playback timing of the audio content received from the source playback device to the "local" playback timing of the receiving playback device. Because each receiving playback device calculates its own "timing offset," the "local" playback timing of the individual frames determined by each receiving playback device is specific to that particular playback device.

And when the "local" clock used by the receiving playback device to playback the audio content reaches the "local" playback time of an individual frame (or packet), the receiving playback device plays the audio content (or portion thereof) associated with that individual frame (or packet). As described above, in some embodiments, the playback timing of a particular frame (or packet) is in the header of that frame (or packet). In other embodiments, the playback timing of individual frames (or packets) is transmitted separately from the frames (or packets) comprising the audio content.

Because the receiving playback device plays frames (or packets) that include portions of the audio content according to the playback timing as adjusted by the "timing offset" relative to the clock timing, and because the source playback device generated the playback timing of those frames (or packets) relative to the clock timing, and played the same frames (or packets) that include portions of the audio content according to the playback timing and its determined "timing offset", the receiving playback device and the source playback device play the same frames (or packets) that include the same portions of the audio content synchronously (i.e., at the same time or substantially at the same time).

Additional details regarding audio playback synchronization between playback devices and/or zones may be found, for example, in U.S. patent No.8,234,395 entitled "System and method for synchronizing operations a plurality of independent closed digital data processing devices," the entire contents of which are incorporated herein by reference.

V. example Portable playback device

As described above, a particular playback device implementation may be configured for portable use. These portable implementations include wearable playback devices (e.g., headphones and earpieces) that are typically designed for personalized listening by one user at a time, and portable devices that are designed for loud playback. Fig. 7A is a partial cut-away view of a media playback system 100 that includes one or more portable playback devices 710 (identified as portable playback devices 710a, 710b, and 710c, respectively). Portable playback device 710 is similar to playback device 110, but is configured for portable use. Although they are shown at home in fig. 7A, the portable playback device 710 is configured to play back audio content at home and while "out".

As shown in the block diagram of fig. 7B, portable playback device 710a includes the same or similar components as playback device 110 a. However, for ease of portable use, the playback device 710a may be implemented in some form factor (e.g., headphones or earpieces) and include one or more power supply batteries 712i to provide portable power.

Referring to fig. 7B, portable playback device 710a includes input/output 711, which may include analog I/O711 a and/or digital I/O711B similar to the components of playback device 110. To facilitate portable use, the input/output 711 of the portable playback device 710a can include an interface (e.g., a bluetooth interface) to facilitate a connection with a bridge device (e.g., a mobile device) that the portable playback device 710a can use to stream audio content and otherwise communicate with the bridge device.

The playback device 710a also includes electronics 712, a user interface 713 (e.g., one or more buttons, knobs, dials, touch-sensitive surfaces, displays, touch screens), and one or more transducers 714 (hereinafter "transducers 714"). The electronic device 712 is configured to receive audio from an audio source via the input/output 711 via one or more computing devices 106a-106c of the network 104 (fig. 1B), amplify the received audio, and output the amplified audio for playback via the one or more transducers 714.

In some embodiments, the playback device 710a optionally includes one or more microphones 715 (e.g., single microphone, multiple microphones, microphone array) (hereinafter referred to as "microphone 715"). In some examples, the microphone 715 may include one or more voice microphones to facilitate voice input for phone calls and the like. In some embodiments, for example, the playback device 710a may operate as an NMD (similar to NMD 120 of fig. 1F) configured to receive voice input from a user using a voice microphone and perform one or more operations accordingly based on the received voice input. In other examples, the microphone 715 may include one or more Acoustic Noise Cancellation (ANC) microphones that, in operation, capture ambient noise in the environment to facilitate the playback device 710a in canceling the ambient noise.

In the embodiment shown in fig. 7B, the electronic device 712 includes one or more processors 712a (hereinafter "processor 112 a"), a memory 712B, a software component 712c, a network interface 712d, one or more audio processing components 712g (hereinafter "audio component 712 g"), one or more audio amplifiers 712h (hereinafter "amplifier 712 h"), and a power source 712i (e.g., one or more power supplies, power cords, power sockets, batteries, inductor coils, Power Over Ethernet (POE) interfaces, and/or other suitable power sources). In some embodiments, the electronic device 712 optionally includes one or more other components 112j (e.g., one or more sensors, a video display, a touch screen).

Network interface 712d is configured to facilitate data transfer between playback device 710a and one or more other devices on a data network, such as link 103 and/or network 104 (fig. 1B). The network interface 112d is configured to transmit and receive data corresponding to media content (e.g., audio content, video content, text, photographs) and other signals (e.g., non-transitory signals) including digital packet data including an Internet Protocol (IP) based source address and/or an IP based destination address. The network interface 712d may parse the digital packet data so that the electronic device 112 properly receives and processes the data destined for the playback device 110 a.

In the embodiment shown in fig. 7B, the network interface 712d includes one or more wireless interfaces 712e (hereinafter referred to as "wireless interfaces 712 e"). The wireless interface 712e (e.g., a suitable interface including one or more antennas) may be configured to wirelessly communicate with one or more other devices (e.g., one or more of the playback device 110, NMD 120, control device 130, other portable playback devices 710, and other devices disclosed herein (e.g., bridge devices)) communicatively coupled to the network 104 (fig. 1B) according to a suitable wireless communication protocol (e.g., WiFi, bluetooth, LTE). In some embodiments, the network interface 712d optionally includes a wired interface 712f (e.g., an interface or receptacle configured to receive a network cable such as an ethernet, USB-A, USB-C, and/or Thunderbolt cable), the wired interface 712f configured to communicate over wired connections with other devices according to a suitable wired communication protocol. In some embodiments, the electronic device 712 completely excludes the network interface 712d and sends and receives media content and/or other data via another communication path (e.g., input/output 711).

The audio component 712g is configured to process and/or filter data including media content received by the electronic device 712 (e.g., via the input/output 711 and/or the network interface 712d) to generate an output audio signal. In some embodiments, the audio processing component 712g includes, for example, one or more digital-to-analog converters (DACs), audio pre-processing components, audio enhancement components, Digital Signal Processors (DSPs), and/or other suitable audio processing components, modules, circuits, and/or the like. In certain embodiments, the one or more audio processing components 712g may include one or more subcomponents of the processor 712 a. In some embodiments, the electronic device 712 omits the audio processing component 712 g. In some aspects, for example, the processor 712a executes instructions stored on the memory 712b to perform audio processing operations to produce an output audio signal.

The amplifier 712h is configured to receive and amplify the audio output signal generated by the audio processing component 712g and/or the processor 712 a. The amplifier 7712h may include electronics and/or components configured to amplify the audio signal to a level for driving the one or more transducers 714. In some embodiments, for example, amplifier 712h includes one or more switches or class D power amplifiers. However, in other embodiments, the amplifier includes one or more other types of power amplifiers (e.g., linear gain power amplifiers, class a amplifiers, class B amplifiers, class AB amplifiers, class C amplifiers, class D amplifiers, class E amplifiers, class F amplifiers, class G and/or H amplifiers, and/or other suitable types of power amplifiers). In certain embodiments, amplifier 712h comprises a suitable combination of two or more of the foregoing types of power amplifiers. Further, in some embodiments, a single one of the amplifiers 712h corresponds to a single one of the transducers 714. However, in other embodiments, the electronic device 712 includes a single amplifier 712h configured to output the amplified audio signals to the plurality of transducers 714.

The transducer 714 (e.g., one or more speakers and/or speaker drivers) receives the amplified audio signal from the amplifier 712h and presents or outputs the amplified audio signal as sound (e.g., audible sound waves having a frequency between approximately 20 hertz (Hz) and 20 kilohertz (kHz)). In some embodiments, the transducer 714 may comprise a single transducer. However, in other embodiments, the transducer 714 comprises a plurality of audio transducers. In some embodiments, the transducer 714 includes more than one type of transducer. For example, the transducers 714 may include one or more low frequency transducers (e.g., subwoofer, woofer), mid frequency transducers (e.g., mid range transducer, mid woofer), and one or more high frequency transducers (e.g., one or more tweeter).

Fig. 7C is a front isometric view of a portable playback device 710a configured in accordance with aspects of the disclosed technology. As shown in fig. 7C, portable playback device 710a is implemented as headphones to facilitate more private playback as compared to loud playback by playback device 110. As shown, portable playback device 710a (also referred to as headset 710a) includes a housing 716a to support a pair of transducers 714a over or around the user's ears on the user's head.

The headset 710a also includes a user interface 713a with a touch sensitive area to facilitate playback control, e.g., transport and/or volume control. The touch sensitive area of the user interface 713a may support gesture control. For example, sliding forward or backward across the touch sensitive area may skip forward or backward. Other gestures include touch-and-hold and touch-and-hold-continuously, which may correspond to various switch and packet functions, as described in further detail below. In some implementations, the user interface 713a can include a respective touch-sensitive area external to each ear cup.

Fig. 7D is a front isometric view of a portable playback device 710b configured in accordance with aspects of the disclosed technology. As shown in fig. 7D, the portable playback device 710b is implemented as an earpiece, similar to the earpiece 710a, to facilitate more private playback compared to loud playback of the playback device 110. As shown, the portable playback device 710b (also referred to as ear bud 710b) includes a housing 716b to support a pair of transducers 714b within the user's ears. The ear bud 710b also includes a user interface 713b with a touch sensitive area to facilitate playback control, e.g., transmission and/or volume control. The earplugs 9710b may take the form of wired, wireless, or true wireless earplugs.

Fig. 7E is a front isometric view of portable playback device 710 c. The portable playback device 710c includes one or more larger transducers than the earpiece 710a and earpiece 710b to facilitate loud playback of the audio content. The speaker grill 716a covers the transducer. The portable playback device 710c may include a less powerful amplifier and/or smaller transducer relative to the playback device 110 to balance the battery life, sound output capability, and form factor (i.e., size, shape, and weight) of the portable playback device 710 c. The portable playback device 710c includes a user interface 713c with a touch sensitive area to facilitate playback control, e.g., transport and/or volume control.

Some portable playback devices 710 are configured to be placed on a device base 718. To illustrate, fig. 7F is a front isometric view of a portable playback device 710d, the portable playback device 710d configured to be placed on a device base 718 a. As with portable playback device 710c, portable playback device 710d includes one or more larger transducers than the earpiece 710a and earpiece 710b to facilitate loud playback of audio content. The speaker grill 716b covers the transducer. The portable playback device 710c includes a user interface 713d with a touch sensitive area to facilitate playback control, e.g., transport and/or volume control.

The device base 718a includes projections 719a and 719b that align with recesses 717a and 717b on the portable playback device 710 c. Such protrusions and recesses may facilitate placement of the portable playback device 710c on the device base 718a and may improve the stability of the playback device when it is positioned on the device base 718 a.

In an example implementation, the portable playback device 710c may be rotated about the device base 718a to control the volume of the portable playback device 710 c. For example, the portable playback device 710c may rotate relative to the device base 718a, which may generate a volume control signal in a sensor of the portable playback device 710c and/or the device base 718 a. In another example, a first portion of the device base 718a may be rotatable relative to a second portion of the device base 718 a. When the portable playback device 710c is placed on the device base 718a, rotation of the two portions generates a volume control signal in a sensor of the device base 718a that controls the volume of the portable playback device 710 c.

The device base 718a includes a device charging system. When the playback device 710c is placed on the device base 718a, the playback device 710c may draw current from the charging system to charge one or more of its batteries. In some examples, the charging system of device base 718a includes inductive charging circuitry (e.g., a coil that induces a current in a corresponding coil in playback device 710c, which wirelessly charges one or more batteries of playback device 710 c). Alternatively, the charging system of device base 718a includes conductive terminals through which playback device 710c may draw current from device base 718.

In an example, the device base 718a carries an identifier that distinguishes the device base 718a from at least some other device bases (e.g., other device bases of the media playback system 100, or perhaps more generally other device bases). In some implementations, when the playback device 710c is placed on the device dock 718a, the device dock 718a can passively communicate the identifier to the playback device 710 c. For example, the charging circuitry of device base 718 may include a current or voltage signature (i.e., pattern) that is unique as compared to other device bases. The playback device 710c may use the unique feature to identify the device dock 718. Alternatively, the charging circuit may superimpose a signal on the current transmitted from device base 718a (e.g., the current from device base 718a may include a higher frequency signal carrying an identifier of device base 718 a). In other examples, the device dock 718a includes an RFID tag, QR code, or other identification component that is read by the playback device 710c when the playback device 710c is placed on the device dock 718 a.

In some implementations, the device base 718 of the media playback system 100 is associated with each zone. Placing the portable playback device 710 on the device dock causes the device dock to join the associated zone. Additional details regarding the device Base may be found, for example, in U.S. patent No.9,544,701 entitled "Base Properties in a Media Playback System," which is hereby incorporated by reference in its entirety.

In some embodiments, the device base 718a includes a control system. The example control system of the device base 718a includes one or more processors and memory. The processor may be a clock driven computational component that processes input data according to instructions stored in a memory. Example operations include communicating via a communications interface (e.g.,an interface) to the playback device 710c (e.g., to cause the playback device 710c to join the associated zone via one or more instructions), to cause the charging system to provide current to the playback device 710c, etc.

In an example embodiment, the playback device 710 may operate in one of a first mode and a second mode. Typically, the playback device 710 operates in a first mode when in physical proximity to the media playback system 100 (e.g., at home and connected to the network 104) to facilitate interoperability with the playback devices 110a-110n of the media playback system 100 and operates in a second mode when "out," although the playback device 710 may also operate in the second mode when in physical proximity to the media playback system 100. The portable playback device 710 may switch between modes either manually (e.g., via user input to the user interface 713) or automatically (e.g., based on proximity to one or more playback devices 110a-110n, connection to the network 104, and/or based on the location of the mobile device).

The playback device 710 may operate in a first mode when connected to a wireless local area network (e.g., network 104). Through the connection to the wireless local area network, the playback device 710 can stream audio content from one or more audio sources, including local and remote (e.g., cloud) network locations. Further, in the first mode, the portable playback device 710 may interface with other devices of the media playback system 100. For example, the portable playback device 710 may form a synchronization packet or other arrangement with the playback devices 110a-110n and/or other portable playback devices 710 in the first mode. Further, in the first mode, the portable playback device 710 may be controlled by the control device 130 in the same or similar manner as the playback device 110.

When connected to the mobile device via bluetooth (802.15), the playback device 710 may operate in the second mode. In some aspects, in the second mode, the portable device operates similar to a conventional bluetooth speaker or wearable device. That is, the playback device 710 may be paired with a mobile device, such as a smartphone or tablet, and the user may play back the audio output of the mobile device. Similarly, the microphone 715a of the portable playback device 710 may provide audio input to the mobile device. As described above, this mode may be utilized while "out" to facilitate playback away from the media playback system 100 (e.g., outside the range of a home network). Further, this mode may be used in the vicinity of the media playback system 100, which may facilitate more private use of the portable playback device 710a, or provide convenient access to content on the mobile device for playback.

Fig. 7G shows an example pairing arrangement between the headset 710a and a mobile device configured to control the device 130 a. As described above, the mobile device may become the control device 130 via installation of control application software that may also provide a bridging feature to facilitate the control device 130a operating as an interface between the headset 710a and the media playback system 100.

The control device 130a may include a communication interface, processing capabilities, and/or other features that need not be implemented in the portable playback device 710 a. By "pairing" the portable playback device 710a with the control device 130a, the portable playback device 710 is able to take advantage of some of these features. Such an arrangement may allow the portable playback device 710a to be smaller and more portable, consume less power, and/or be less expensive, among other possible benefits.

For example, in various implementations, the portable playback device 710a may be implemented with or without a communication interface to connect to the internet (e.g., a cellular data connection) when "out". By connecting through the personal area (e.g.,(IEEE 802.15)) or a wireless local area network connection (IEEE 802.11) pairs the portable playback device 710a with the control device 130a, and the portable playback device 710a can stream music via the control device 130a and the paired connection. In embodiments that include a wireless local area network interface, the portable playback device 710a may be directly connected to a wireless local area network (e.g., network 104 (fig. 1B)) if available.

Similarly, in various embodiments, the portable playback device 710a may be implemented with or without a wireless local area network interface. By connecting through the personal area (e.g.,(IEEE 802.15)) pairs the portable playback device 710 with the control device 130a, and the portable playback device 710a can stream music via the control device 130a and the paired connected internet connection. In this example, the internet connection of the bridging device 860 may be a wireless local area network with a gateway to the internet or via a cellular data connection.

In an example implementation, the control device 130a is bound or defaulted to a particular playback device (e.g., playback device 110c), a bound region of playback devices (e.g., playback devices 1101 and 110m), or a group of playback devices (e.g., a "kitchen + restaurant" group). Alternatively, if a home layer hierarchy is used, the control device 130a may bind to or default to a particular collection, room, or area. Then, in this configuration, control of the bound playback device 110 via the NMD 120 or the control device 130 also controls the paired portable playback device 710 a.

Alternatively, the control devices 130a themselves may form a zone or a set. For example, in one example, control device 130a may be configured as a "portable for Annie" zone or a "headset for Annie" set. Configuring the control device 130a as a zone or set facilitates controlling the paired headset 710a with the NMD 120 and/or the control device 130 of the media playback system 100.

In alternative embodiments, the portable playback device 710a may interface with the media playback system 100 independently as its own zone or collection. Such an implementation of portable playback device 710a may include a cellular data connection to facilitate portable streaming (i.e., streaming away from media playback system 100 and/or network 104). In this example, the portable playback device 710a may join the media playback system 100 as a zone or collection when connected to the network 104 or otherwise in proximity to the playback device 110.

Example exchange techniques

As described above, example techniques described herein involve the transition (or "exchange") of playback sessions between the portable playback device 710 and one or more playback devices 110. During the playback session exchange, playback of the audio content is stopped at the "source" playback device and started at the "target" playback device at the same or substantially the same offset within the audio content. For example, the media playback system 100 may exchange playback between a "source" portable playback device 710 and one or more "target" playback devices 110. In other examples, the media playback system 100 may exchange playback between one or more "source" playback devices 110 and a "target" portable playback device 710.

To illustrate, a user may begin listening to audio content via the headset 710a or earpiece 710b while "out" and then switch playback of the audio content to one or more playback devices 110a-110n to continue listening to the loudly played back audio content at home. In another example, a user may start listening to audio content at home via headphones 710a or earbuds 710b (possibly so as not to disturb another person at home) and then switch to one or more playback devices 110a-110n to continue listening to the loudly played audio content. In a third example, a user may begin listening to the loudly played back audio content via the portable playback device 710c and then switch to one or more playback devices 110a-110n because the target playback device may have greater sound output capabilities (e.g., due to a higher power amplifier and/or larger transducer), be located in a different room, be configured in a synchrony group, or for any other reason.

Similarly, a user may be listening to audio content via one or more playback devices 110a-110n and switching playback of the audio content to the portable playback device 710. For example, the user may listen to television audio in study 101d (including playback devices 110h, 110i, 110j, and 110k) and then switch playback to ear-bud 710b for more personalized listening. As another example, the user may be listening to an internet radio in the kitchen 101h (including the playback device 110b) and then switch playback to the headset 710a to continue listening while out. As a third example, the user may be listening to music in bedroom 101c (including playback device 110g) and exchange playback to portable playback device 710c to bring the music to the yard.

A playback device having an ongoing playback session may maintain or access playback session data that defines and/or identifies the playback session. The playback session data may include data representing the source of the audio content (e.g., a URI or URL indicating the location of the audio content) and an offset indicating the location within the audio content at which playback begins. The offset may be defined as the time from the start of the track (e.g., in milliseconds) or as a number of samples, etc. In an example embodiment, the offset may be set to the playback position in the audio content at the current playback position to allow time for the target device to begin buffering the audio content. The source playback device then stops playing back the audio content at the offset, and the target playback device begins playing back the audio content at the offset.

The playback session data may include data representing the source of the audio content (e.g., a URI or URL indicating the location of the audio content) and an offset indicating the location within the audio content at which playback begins. The offset may be defined as the time from the start of the track (e.g., in milliseconds) or as a number of samples, etc.

The playback session data may also include data representing the playback status. The playback state may include a playback state (e.g., play, pause, or stop) of the session. If the playback session implements a playback queue, the playback session data may include a playback queue state, e.g., a current playback position within the queue.

The playback queue state may also include a queue version. For example, in a cloud queue embodiment, the cloud queue server and media playback system 100 may use queue versions to maintain consistency. The queue version may be incremented each time the queue is modified and then shared between the media playback system 100 and the cloud queue server to indicate the latest version of the queue.

The playback session data may also include authorization data, such as one or more keys and/or tokens. Such authorization data may include a token associated with the user account. During the playback session exchange, the media playback system 100 may verify that the token is authorized on both the source playback device and the target playback device. The authorization data may also include a token associated with the streaming audio service, which may enable the target playback device to access the audio content at the source. In addition, the authorization data may include a token associated with the playback session that enables the target playback device to access the session. Other example authorization data are also contemplated.

In some implementations, an input of the playback device triggers the exchange. This input may be referred to as "playback session exchange input". In an example, the playback session exchange input may be provided to a user interface on the playback device, such as user interface 313 on playback device 320 (fig. 7D) or user interface 713a of headset 710 (fig. 7B). Alternatively, when the user interface 430 is controlling one or more particular playback devices (e.g., zones or groups), the playback session exchange input may be provided to a user interface on the control device 130, such as the user interface 430 (fig. 4A-4D).

The playback device that receives the playback session exchange input may be referred to as the "originating playback device". In an example, the initiating playback device is the source or target for the exchange. When the portable playback device 710 has an ongoing playback session (e.g., the portable playback device is actively playing back audio content, or has an active but paused playback session) and receives a playback session exchange input, the portable playback device 710 may assume that the user wants to "push" the playback session to a nearby playback device 110. Thus, the portable playback device 710 is identified as the source of the exchange, and the nearby playback devices 110 are identified as targets.

To illustrate, fig. 8A is a schematic diagram showing an example push exchange between headphones 710a (fig. 7C) and playback device 110b (fig. 7A) in the kitchen 110h area. Initially, the headset 710a has an ongoing playback session, as shown in fig. 8A. The user then provides a playback session exchange gesture to the headset 710 a. The playback session on the headset 710a is pushed to the playback device 110 b. After the push, the kitchen 110h zone receives information about the playback session and continues to play back the ongoing playback session on the playback device 110 b.

Conversely, if the portable playback device 710 does not have an ongoing playback session, and a playback session exchange input is received, the portable playback device 710 will assume that the user wants to "pull" the playback session from the nearby playback device 110. Here, the portable playback device 710 is identified as the target of the exchange, and the nearby playback devices 110 are identified as the sources. To illustrate, fig. 8B is a schematic diagram illustrating an example pull exchange between headphones 710a (fig. 7C) and playback device 110B (fig. 7A) in the kitchen 110h area. As shown in fig. 8B, initially, playback device 110B has an ongoing playback session. The user then provides a playback session exchange gesture to the headset 710 a. The playback session on playback device 110b is pushed to headphone 710 a. After the push, the headset 710a has a playback session.

If both the portable playback device 710 and the nearby playback device 110 have an ongoing playback session, it may not be clear whether the user wants to push the playback session on the portable playback device 710 to the nearby playback device 110 or pull the playback session on the nearby playback device 110 back to the portable playback device 710. In some implementations, the portable playback device 710 can assume that the user wants to "push" the playback session to a nearby playback device 110. To illustrate, fig. 8C is a schematic diagram showing an example push exchange between headphones 710a (fig. 7C) and playback device 110b (fig. 7A) in the kitchen 110h area. Initially, both the earpiece 710a and the playback device 110b have an ongoing playback session, as shown in fig. 8C. The user then provides a playback session exchange gesture to the headset 710 a. The playback session on the headset 710a is pushed to the playback device 110 b. After the push, the kitchen 110h zone has a playback session from the headset 710a that is ongoing on the playback device 110 b. If the user instead wants to "pull" the playback session on the nearby playback device 110, the user may first stop the playback session on the portable playback device 710 and then provide a playback session exchange input to the portable playback device 710. In alternative embodiments, the portable playback device 710 may be configured to make the opposite assumption.

In the example of fig. 8A-8C, the initiating device is headset 710 a. In other examples, the user may provide a playback session exchange input to one of the playback devices 110 (e.g., playback device 110 b). In this case, similar assumptions may apply to the source and target of the specified exchange. In particular, when playback device 110b has an ongoing playback session and receives a playback session exchange input, playback device 110b may assume that the user wants to "push" the playback session to a nearby portable playback device 710. Conversely, if the playback device 110b does not have an ongoing playback session, and receives a playback session exchange input, the playback device 110b will assume that the user wants to "pull" the playback session from a nearby portable playback device 710.

In an example, the initiating playback device of the exchange can identify other playback devices in the exchange based on proximity to the initiating playback device. That is, the initiating playback device may identify one or more nearby playback devices as targets of a push exchange from the initiating playback device or as sources of a pull exchange to the initiating playback device. After receiving the playback session exchange input or based on receiving the playback session exchange input, the initiating playback device may automatically identify such nearby playback devices (i.e., without having to receive other user input from the playback session exchange input).

Some example techniques for identifying nearby playback devices involve audio-based identification. In an exemplary audio-based identification technique, an originating playback device requests that a playback device that is eligible for an exchange emit an identifiable sound (e.g., an audio chirp) that can be detected by one or more microphones of the originating playback device. The initiating playback device may then identify a nearby playback device based on the detected characteristics of the sound.

For illustration, fig. 9 is a schematic diagram illustrating an audio-based recognition technique using audio chirps. The audio chirp includes acoustic characteristics (e.g., one or more tones) that can identify a playback device that transmitted the audio chirp. In fig. 9, the user initiates an exchange on a portable playback device 710, here a headset 710 a. As described above, if a playback session is ongoing on the headset 710a, the headset 710a will assume that the user wishes to push the playback session to one or more nearby playback devices. Otherwise, the headphone 710a will assume that the user wishes to pull the playback session on one or more nearby playback devices back to the headphone 710 a.

After receiving the playback session exchange input or based on receiving the playback session exchange input, the headset 710a may identify the exchange-eligible playback devices in the media playback system 100. For a push exchange, the set of playback devices that meet the exchange conditions may include a particular type of playback device or playback devices that are assigned a particular role in the media playback system 100. For example, other portable playback devices may be configured to be non-compliant with the swap condition. As another example, only master devices in a bonding zone (e.g., stereo pair or surround sound configuration) may be considered eligible for swapping. For a pull exchange, the set of playback devices eligible for the exchange includes playback devices having an ongoing playback session. The set may be further narrowed based on other factors, such as those mentioned above.

As described above in section II, the playback device 110 in the media playback system 100 may maintain or access state variables that represent the playback device state and other configuration information. The state information is updated periodically or based on events (e.g., when a state changes), such as via subscription to particular types of events or states (e.g., playback events, group events, topology change events, player volume events, group volume events, playback metadata events) and notifications of particular events. The protocol for subscription may be a uPnP-based or proprietary controller protocol or API. The portable playback device 710, including the earpiece 710a and earpiece 719b, may similarly maintain or access these state variables and determine a set of playback devices that meet the exchange criteria based on information in the state variables. The state variable may be received from another playback device in the media playback system and/or from state information stored in a remote computing system in the cloud. In the example of fig. 9, the headphone 710a has identified the playback device 110b, the playback device 110g, and the playback device 100g as playback devices that comply with the swap condition.

Upon identifying a playback device that meets the swap condition, the earpiece 710a, which is the initiating playback device in the swap, causes the swap-qualified playback device to emit a corresponding audio chirp. For example, the earpiece 710a may send instructions to the playback device 110b, the playback device 110g, and the playback device 100g to cause these swap-eligible playback devices to emit unique audio chirps. In some examples, the audio chirp may be ultrasonic (e.g., greater than 20kHz) or near-ultrasonic (e.g., 19-20kHz) to avoid audio chirp propagation outside of the near-emitting playback device and/or to avoid user distraction.

Each audio chirp may include data in the form of an encoded identifier. Each coded identifier may be different and coded as a set of tones, as is known to the initiating playback device. The audio chirp may be sent by the playback devices that meet the exchange conditions simultaneously, concurrently, or sequentially, or when a particular playback device receives an instruction to send the audio chirp. In some examples, devices in a media playback system may provide timing information regarding when to send an audio chirp for each playback device.

After instructing playback devices that are eligible for the exchange to emit an audio chirp, the originating playback device in the exchange attempts to detect the emitted audio chirp via one or more microphones (e.g., microphone 715). For example, the headset 710a may attempt to detect an emitted audio chirp via one or more voice microphones in the housing of the headset 710 a. Alternatively, the earpiece 710a may attempt to detect the emitted audio chirp via one or more ANC microphones in the housing of the earpiece 710 a. In some cases, a particular microphone (ANC or voice) may be selected or tuned to be sensitive to ultrasonic or near-ultrasonic ranges, making these microphones particularly suitable for receiving audio chirps. Other examples are possible.

To identify a playback device that is "nearby," the originating playback device may compare the detected audio chirps. For example, the earpiece 710a may compare various metrics (e.g., sound pressure and signal-to-noise ratio of the detected audio chirps) to identify the "loudest" audio chirp, which may be assumed to have been emitted by the playback device that is physically closest to the originating playback device. In an example implementation, the initiating playback device may list or otherwise sort the swap-eligible playback devices by relative signal strength (e.g., SNR) and then select the highest-ranked swap-eligible playback device as the source or target of the swap.

As shown in fig. 9, the headphones 710a detect audio chirps emitted by the playback devices 110n and 110g in the bathroom 101a and bedroom 101c, respectively. However, the headphones 710a do not detect the audio chirp emitted by the playback device 110b, perhaps because the audio chirp cannot propagate from the kitchen 101h to the headphones 710a, the kitchen 101h being located on a different floor of the house relative to other zones. In this example, playback device 110n is determined to be the closest playback device because a comparison of the measure of the audio chirp issued by playback device 110n to the audio chirp issued by playback device 110g indicates that the audio chirp issued by playback device 110n is "loudest".

To facilitate comparison between the detected audio chirps, playback devices that meet the swap condition may issue audio chirps at the same or substantially the same volume level. In some cases, the instruction to issue an audio chirp comprises an instruction to change to a particular volume level. Since different playback devices have different types of transducers and/or amplifiers, the volume at which each playback device chirps may vary based on the device type. Alternatively, the playback device may be pre-configured to emit an audio chirp at a particular volume level.

The playback session exchange input may take various forms. For example, a particular input (e.g., a tap or gesture to the touch-sensitive area (or a portion thereof)) to the user interface 713a of the headset 710 (fig. 7B) may trigger an exchange. In other examples, the portable playback device 710 may include a physical button for triggering the exchange. Further, a pattern of touch inputs (e.g., short, long, short) or a tracking pattern (e.g., a shape such as a zigzag or triangle) may trigger the exchange. Other types of inputs are also contemplated.

In some particular examples, a touch and hold or a sustained touch and hold to a particular area (e.g., a play/pause area) on the touch sensitive area triggers an exchange. For purposes of illustration, fig. 10 is a diagram illustrating an exemplary control scheme for portable playback device 710c, playback device 110, and headset 710 a. As shown in fig. 10, a user may provide a press input (also referred to as a touch) to the touch-sensitive area to perform a main action (i.e., play or pause). If a physical button is available for swapping, the user may press and hold the physical button to invoke the swap.

If the user continues to hold the press input (touch and hold), a secondary action is performed. For portable playback device 710c and playback device 110, the secondary actions are grouped with nearby playback devices. That is, the originating playback device (i.e., portable playback device 710c or playback device 110) forms a synchrony group with nearby playback devices. In contrast, for the headset 710a, the secondary action is to perform a push swap or a pull swap, as described in connection with fig. 8A-8C. This configuration allows the user to more quickly access the switching function while using the headset 710 a. Since wearable playback devices are designed for relatively private listening as compared to portable playback device 710c and playback device 110, it is unlikely that a user would want to group headphones with these types of devices for synchronized playback. Other example embodiments may vary this control scheme.

If the user continues to hold the press input further (touch and hold continuously), a third action is performed. For portable playback device 710c and playback device 110, the third action is to perform a push exchange or a pull exchange with a nearby playback device. For the headset 710a, the third action is not configured. Other example control schemes may configure the headset 710a with a third action. In some examples, continuing to hold the touch after the last action in the chain may cancel the input.

For the user, the control scheme provides audible feedback for the exchange action. When the user provides the playback session exchange input to the first playback device, the user will be confident that the first playback device will be the source or target of the exchange (depending on whether the first playback device has an ongoing playback session) because they have provided the trigger input to the first playback device. However, the user may be less confident that the initiating playback device correctly identified the user's desired target (for push exchanges) or source (for pull exchanges). In particular, when using the example audio-based recognition techniques described above, it is possible for an originating playback device to recognize a different playback device as a different source or target than intended by the user, perhaps because of the unique acoustic characteristics of the environment that make the audio chirp emanating from a distant playback device appear to be closest.

Using this control scheme, packets between the potentially swapped source and target (if the user continues to hold) occur when a press and hold input is provided to the portable playback device 710c or playback device 110, which results in synchronized loud playback on the potentially swapped source and target. In particular, when the initiating playback device has an ongoing playback session, a push group is performed, which causes the initiating playback device and nearby playback devices to synchronously playback the ongoing playback session. In contrast, when the initiating playback device does not have an ongoing playback session, a pull group is performed, which causes the initiating playback device and the nearby playback devices to synchronously playback the ongoing playback session of the nearby playback devices. This loud synchronized playback provides the user with a preview of the source and target of the swap that would occur if the user continued to hold the input. If the playback devices of the "preview" in the group are different from the exchange source or target desired by the user, the user may provide input to cancel the group and/or exchange the action.

Further, in some embodiments, the control scheme may facilitate user selection of a desired exchange source or target by providing additional input. In particular, in some examples, the user may cycle the playback device that is eligible for swapping by providing one or more additional presses and hold inputs within a threshold period of providing the previous input. As described above, the initiating playback device may list playback devices that meet the swap condition by signal strength. In an example, a second press and hold input after the initial press and hold input will select a second swap-eligible playback device in the list. Similarly, a third press and hold input after the initial press and hold input will select a third swap-eligible playback device in the list. Subsequent inputs will continue to loop through the list (if other playback devices are listed that meet the swap condition).

In some examples, the portable playback device 710c may similarly traverse the list of playback devices eligible for a packet via a continuous touch and hold input. Playback devices that are eligible for packets may be the same as playback devices that are eligible for exchanges and identified using the same or similar audio-based identification techniques. For example, to push/pull a packet with the nearest playback device 110, the user may provide a first touch and hold input to playback device 710 c. To push/pull a packet with the next closest playback device 110, the user may provide a second touch and hold input to playback device 710c for a threshold period of time from the first input. Subsequent touch-and-hold inputs may also traverse the sorted list of playback devices eligible for swapping and/or grouping in sorted order from closest to farthest. After a threshold period of time, the user will need to initiate the input sequence again to perform the grouping.

Instead of performing a push-swap, the portable playback device 710c may instead be configured to push-cancel a packet if a push-swap gesture is performed on the portable playback device 710c while the portable playback device 710c has been grouped with the nearest playback device 110.

To assist the user in understanding the control scheme, the source playback device and/or the target playback device may provide feedback, including audio and/or visual feedback. For purposes of illustration, fig. 11-10 are diagrams illustrating exemplary feedback schemes for portable playback device 710c and playback device 110. As shown in fig. 11, at each stage in the control scheme, an initiating playback device (portable playback device 710c) that is the source in this example provides audio and/or visual feedback in conjunction with a corresponding action. In addition, the target playback device also provides audio and/or visual feedback when performing the packet and switch actions. For example, when the portable playback device 710c and the playback device 110 are grouped together, each playback device provides corresponding tone feedback (shown in fig. 11 as two different tones "Marco" and "Polo"), and the source playback device provides visual feedback. When the portable playback device 710c and the playback device 110 are exchanged, each playback device provides both audio and visual feedback, or only the portable playback device 710c provides both audio and visual feedback.

Upon identifying the source playback device or the target playback device of the pull exchange or push exchange, respectively, the initiating playback device transitions the playback session from the source playback device to the target playback device. In an example embodiment, the exchange involves forming a synchrony group including the source playback device and the target playback device. Exemplary synchronization packets are described in more detail in section III and section IV above. In forming the synchronization group, the target playback device begins playing back audio content in synchronization with the source playback device. The source playback device may then be removed from the synchrony group, which completes the exchange. The source playback device may remove or cancel the packet from the synchrony group by sending a command to cancel the packet from the source playback device to the target device.

In another example implementation, in a pull exchange, a target device may send a request for playback session information to a source playback device. The playback session information includes playback state information, such as a current playlist, tracks, and offsets. In yet another example embodiment, in a push exchange, an initiating device may send a command to start playback and include playback state information. The target playback device may continue the playback session using the playback state information without grouping and ungrouping with the source playback device.

Returning to the example of fig. 8A, to push the playback session from the headset 710a to the playback device 110b, the headset 710a forms a synchronization group that includes the headset 710a and the playback device 110b, which causes the headset 710a and the playback device 110b to synchronize the playback session. To complete the push swap, headset 710a leaves the synchrony group.

In the example of fig. 8A, since the playback session begins on headset 710a, headset 710a may initially operate as the source device for the synchrony group. As described above, the source device or the group coordinator obtains the audio of the synchronization group. After the earpiece 710a leaves the synchrony group, the playback device 110b can assume the role of the source device.

In some examples, during the exchange, to avoid the user being disturbed by audio playback during the exchange, playback may be manipulated by the source playback device or the target playback device. For example, the playback session may be paused concurrently with the creation of the sync group and then resumed after the earpiece 710a leaves the sync group. In other examples, the headset 710a or the playback device 110b, or both, may be muted until the exchange is complete. In yet another example, the headset 710a may continue playback for x seconds (e.g., 1 second, 2 seconds, 3 seconds, etc.) before pausing to allow for any delay in transitioning the playback session to the target playback device. Other examples are possible.

Returning to the example of fig. 8B, to pull the playback session from the playback device 110B to the headphone 710a, the headphone 710a forms a synchronization group that includes the headphone 710a and the playback device 110B, which causes the headphone 710a and the playback device 110B to synchronize the playback session. To complete the push exchange, playback device 110b leaves the synchrony group.

In the example of fig. 8B, since the playback session begins on playback device 110B, playback device 110B may initially operate as the source device of the synchrony group. As described above, the source device or the group coordinator obtains the audio of the synchronization group. After the playback device 110b leaves the synchrony group, the headset 710a can assume the role of the source device.

In an alternative embodiment, instead of leaving the synchrony group to complete the exchange, the source playback device remains in the synchrony group as the source device. While this typically results in the source playback device and the target playback device playing content synchronously, in these examples, the source playback device is placed in a mute state. Since the source playback device is muted, the playback session appears to have been exchanged from the user's perspective. This may be a true mute that disables or places certain components (e.g., audio amplifiers) in a low power consumption state, thereby reducing power consumption associated with loud playback when not in a mute state.

Example Home cinema exchange techniques

In some examples, a user may wish to transition a playback session from a soundbar type playback device to a wearable playback device to enable more private listening to audio from a television or other home theater source. Example soundbar type playback devices include playback device 110h (fig. 1K and 1J). The bar-tone type playback device is capable of receiving audio from a television, media player (e.g., set-top box, streaming media playback device, computer), or other home theater source via an audio input interface. Furthermore, a soundbar type playback device may operate as a source device that includes a bonded zone of one or more satellites that may play back a particular channel (e.g., playback devices 110J and 110K) and/or a particular frequency range (e.g., playback device 110i), as shown in fig. 1K and 1J, which illustrate the study 101 d. While some soundbar playback devices employ a soundbar housing to enable multiple audio drivers to be carried linearly along a front surface, soundbar type playback devices do not necessarily have a soundbar housing.

An exemplary sound bar playback device may be considered to operate in one of two modes for receiving audio content (referred to herein as a home theater mode and a music mode). In home theater mode, the soundbar type playback device receives audio from a physically connected source (e.g., a television) via an audio input interface. When streaming audio via a network interface, a soundbar type playback device may be considered to be in music mode. It is noted that the streaming audio need not be music, but may be other types of streaming audio content, such as podcasts or news programs. When streaming audio content in music mode, the soundbar type playback device may perform the exchange in the same or similar manner as described in section VI.

When in the home theater mode, to perform the exchange, the soundbar type playback device may enter another mode, referred to herein as "home theater exchange mode", or simply "exchange mode". When performing an exchange action with the wearable playback device using the exchange mode, the wearable playback device effectively becomes a satellite for a soundbar type playback device. In the exchange mode, if audio is played back from the audio input interface in the home theater mode, the soundbar type playback device serves as a source device. The wearable playback device then serves as a target playback device to receive and playback audio from the audio input interface. Conversely, if the wearable playback device has played back audio from the audio input interface in the exchange mode, the soundbar type playback device serves as the target playback device.

In some cases, the wearable playback device initiates the exchange mode. Fig. 12A is an example message flow diagram illustrating instructions exchanged between headset 710a, a soundbar type playback device 110h, one or more satellites (shown as playback devices 110j, 110K, and 110i, as shown in fig. 1K) in a bonded zone (den 101d) with the soundbar type playback device 110h, and one or more group members in an example exchange mode initiated by headset 710a (if the bonded zone is in a group with any additional zones).

Prior to entering the exchange mode, at 1281a, the soundbar type playback device 110h is playing back audio from the audio input interface in home theater mode. As a source device of a bonding zone including a satellite, in the home theater mode, the sound bar type playback device 110h distributes audio to the satellite according to the role of the satellite in the bonding zone. Furthermore, if the den 101d is in a group with one or more other zones, the soundbar type playback device 110h distributes the full range of audio content to the group members in the group that are the source device for the group.

At 1282b, the headset 710a receives a playback session exchange input, which may be a touch and hold input, among other examples described in connection with section VI. In this example, the headphones then identify the soundbar type playback device 110h as the source for the exchange (e.g., based on determining that the soundbar type playback device 110h is the physically closest playback device using audio-based recognition techniques).

Then, at 1283a, the headset 710a sends data to the soundbar type playback device 110h representing the instruction received by the soundbar type playback device 110h to transition to the exchange mode. The headset 710a and the soundbar type playback device 110h may send and receive data representing the instructions via the respective 802.11 compliant network interfaces. The headset 710a may transmit the data based on receiving the playback session exchange input.

Upon receiving data indicating an instruction to enter the exchange mode, the soundbar type playback device 110h switches from the home theater mode to the exchange mode. More specifically, at 1284a, the soundbar type playback device 110h adds the headphones 710a to a bind, which may be the same bind as the study 101d or a new bind.

In some examples, in a home theater mode, the soundbar type playback device 110h and the satellite operate as nodes in a mesh network. As described above in connection with fig. 1B, in some embodiments, network 104 may include a private communication network implemented as a mesh network. In the home theater mode, the soundbar type playback device 110h distributes playback timing information and audio to the satellite using a mesh network.

To facilitate adding the headset 710a to the bonded zone, the soundbar type playback device 110h transitions its 802.11 compliant network interface from operating as a node in the mesh network to operating as an access point. The access points form a first wireless Local Area Network (LAN) in a first wireless frequency band (e.g., a 5Ghz band). The soundbar type playback device 110h then sends data representing the Service Set Identifier (SSID) of the first wireless LAN and the credentials of the first wireless LAN to the first wearable playback device via the 802.11-compliant network interface, which allows the headset 710a to connect to the first wireless LAN. After the first wearable playback device is connected to the first wireless local area network formed by the soundbar type playback device, the soundbar type playback device 110h forms a binding zone that includes the soundbar type playback device 110h and the earpiece 710 a. This may be considered the same binding area as the study 101d or a new binding area. At 1285a, after connecting to the first wireless LAN, the headset 710a sends a message to the soundbar type playback device 110h to begin streaming the HT audio stream.

Further, in some examples, when in the exchange mode, the headset 710b effectively becomes a satellite for the soundbar type playback device 110 h. In this way, the sound bar type playback device 110h "parks" the satellite playback devices 110j, 110k, and 110i on the second wireless LAN in the second wireless band (e.g., the 2.4Ghz band), because the satellite playback devices 110j, 110k, and 110i will not play back audio. Parking the satellite on the second LAN allows the satellite to remain reachable (e.g., eventually reforming the binding upon transitioning back to home theater mode) and receive updates (e.g., state variable events) regarding the state of the media playback system 100. The soundbar type playback device 110h may form the second wireless LAN using its 802.11 compliant network interface.

At 1286a, the soundbar type playback device 110h stops streaming HT audio streams to the satellites (e.g., 110j, 110k, and 110 i). This may be performed as part of, or in conjunction with, docking satellite playback devices 110j, 110k, and 110i onto the second wireless LAN. Similarly, at 1287a, the soundbar type playback device 110h may stop streaming HT audio streams (if any) to the group members. Forming a new bind at 1284a may remove the soundbar type playback device 110h from any existing granule, which causes the group members to stop receiving HT audio streams.

At 1288a, the soundbar type playback device 110h streams the HT audio stream to the headphones 710a for playback. In conjunction with the headphone 710a receiving the stream and playing back the audio, the soundbar type playback device 110h mutes to complete the exchange. When muted, the soundbar type playback device 110h can continue to process the audio data for playback in synchronization with the headphones 710 a. The HT audio stream may include data representing a binding region and playback timing information of audio. In some examples, the audio is multi-channel audio, e.g., a surround sound track. In such an example, the soundbar type playback device 110h may downmix the surround sound tracks into tracks having fewer channels, e.g., stereo tracks. The surround sound tracks may be downmixed to contain the same number of channels supported by the wearable device or the portable playback device.

When in the exchange mode, the soundbar type playback device 110h may detect an event indicating a trigger to transition from operating in the exchange mode to operating in the home theater mode. Such events may include receiving data from the headset 710a representing an instruction to transition to the home theater mode (e.g., to end the exchange mode), which the headset 710a may send after receiving the playback session exchange input while in the exchange mode. As another example, the soundbar type playback device 110h may detect that the headset 710a has been disconnected from the first wireless LAN (and is no longer operating as a satellite) or paused for an x amount of time. Upon detecting such an event, the soundbar type playback device 110h can transition to the home theater mode.

The transition from the exchange mode to the home theater mode may involve the soundbar type playback device 110h transitioning its 802.11 compliant network interface from operating as an access point to operating as a node in the mesh network. In addition, the soundbar type playback device 110h may enable a satellite playback device to connect to the mesh network. In addition, the soundbar type playback device 110h may reform a bonded zone including the soundbar type playback device 110h and the satellite playback devices 110j, 110k, and 110 i.

Further, while in the exchange mode, an additional wearable playback device may be connected as a satellite to the soundbar type playback device 110 h. This may allow, for example, two partners to listen to television audio using separate wearable devices in the study 101h without waking up a sleeping child in the adjacent bedroom 101 c. The user may cause a second wearable playback device (e.g., ear bud 710b) to join the exchange mode by providing a playback session exchange input (e.g., touch and hold) to the second wearable device, which causes the second wearable device playback device to send data representing an instruction to transition to the exchange mode to the bar-tone type playback device 110 h. The soundbar type playback device 110h then joins the second wearable device using the technique shown in fig. 12A.

In some cases, the control device initiates the exchange mode. Fig. 12B is an example message flow diagram illustrating instructions exchanged between the control device 130a, the headset 710a, the soundbar type playback device 110h, and one or more satellites in a bonded area (study 101d) with the soundbar type playback device 110h in an example exchange mode initiated by the control device 130 a.

Prior to entering the exchange mode, at 1281b, the soundbar type playback device 110h is playing back audio from the audio input interface in home theater mode. As a source device of a bonding zone including a satellite, in the home theater mode, the soundbar type playback device 110h is a master device that distributes audio to the satellite according to the role of the satellite in the bonding zone. Furthermore, if the den 101d is in a group with one or more other zones, the soundbar type playback device 110h distributes the full range of audio content to the group members in the group that are the source device for the group.

At 1282b, control device 130a receives the playback session exchange input. Control device 130 may receive a playback session exchange input via a user interface (e.g., user interface 430). More specifically, a particular user interface 430 may control the headset 430a and may include one or more controls that, when selected, correspond to playback session exchange inputs.

Then, at 1283b, the control device 130a sends a swap command to the headphones, and the headphones send an instruction to the soundbar type playback device to transition to the swap mode. Alternatively, the control device 130a sends data representing the instruction received by the soundbar type playback device 110h to switch to the exchange mode to the soundbar type playback device 110 h. The control device 130a and the soundbar type playback device 110h may send and receive data representing the instructions via the respective 802.11 compliant network interfaces. The control device 130a may transmit the data based on receiving the playback session exchange input.

Upon receiving data indicating an instruction to enter the exchange mode, the soundbar type playback device 110h switches from the home theater mode to the exchange mode. More specifically, at 1284b, the soundbar type playback device 110h adds the headphones 710a to a bound zone, which may be the same bound zone as the study 101d (e.g., identified as "study") or a new bound zone (e.g., identified as "study + native headphones").

Similar to the example of fig. 12A, in some examples, in a home theater mode, the soundbar type playback device 110h and the satellite operate as nodes in a mesh network. To facilitate adding the headset 710a to the bonded zone, the soundbar type playback device 110h transitions its 802.11 compliant network interface from operating as a node in the mesh network to operating as an access point. The access points form a first wireless Local Area Network (LAN) in a first wireless frequency band (e.g., a 5Ghz band). The soundbar type playback device 110h then transmits data representing the Service Set Identifier (SSID) of the first wireless LAN and (ii) the credentials of the first wireless LAN to the first wearable playback device via the 802.11-compliant network interface, thereby allowing the headset 710a to connect to the first wireless LAN.

After the first wearable playback device is connected to the first wireless local area network formed by the soundbar type playback device, the soundbar type playback device 110h forms a binding zone that includes the soundbar type playback device 110h and the earpiece 710 a. This may be considered the same binding area as the study 101d or a new binding area. At 1285b, after connecting to the first wireless LAN, the headset 710a sends a message to the soundbar type playback device 110h to begin streaming the HT audio stream. At 1286c, the control device receives data indicating that the headset 710a is ready to receive audio from the soundbar type playback device 110 h.

Further, in some examples, when in the exchange mode, the headset 710b effectively becomes a satellite for the soundbar type playback device 110 h. In this way, since the headset 710b uses the first wireless LAN on the first wireless band, the soundbar type playback device 110h "parks" the satellite playback devices 110j, 110k, and 110i on the second wireless LAN in the second wireless band (e.g., 2.4Ghz band). Parking the satellite on the second LAN allows the satellite to remain reachable (e.g., eventually reforming the binding upon transitioning back to home theater mode) and receive updates (e.g., state variable events) regarding the state of the media playback system 100. The soundbar type playback device 110h may form the second wireless LAN using its 802.11 compliant network interface.

At 1287b, the soundbar type playback device 110h stops streaming HT audio streams to the satellite. This may be performed as part of, or in conjunction with, docking satellite playback devices 110j, 110k, and 110i onto the second wireless LAN.

At 1288b, the soundbar type playback device 110h streams the HT audio stream to the headphones 710a for playback. In conjunction with the headphone 710a receiving the stream and playing back the audio, the soundbar type playback device 110h mutes to complete the exchange. The HT audio stream may include data representing a binding region and playback timing information of audio. In some examples, the audio is multi-channel audio, e.g., a surround sound track. In such an example, the soundbar type playback device 110h may downmix the surround sound tracks into tracks having fewer channels, e.g., stereo tracks.

Example exchange method

Methods 1300A, 1300B, 1400, and 1500 shown in fig. 13A, 13B, 14, and 15 present example switching techniques according to example embodiments described herein. These example techniques may be implemented within an operating environment that includes, for example, the media playback system 100 of fig. 7A, one or more playback devices 110a-110n, one or more NMDs 130, one or more control devices 130, one or more portable playback devices 710, as well as other devices described herein, and/or other suitable devices. Further, operations illustratively shown as being performed by the media playback system may be performed by any suitable device, such as a playback device or a control device of the media playback system. Methods 1300A, 1300B, 1400, and 1500 may include one or more operations, functions, or actions as illustrated by one or more of the blocks shown in fig. 13A, 13B, 14, and 15. Although the various blocks are shown in a sequential order, these blocks may also be performed in parallel, and/or in a different order than described herein. Also, blocks may be combined into fewer blocks, divided into more blocks, and/or removed, depending on the desired implementation.

Additionally, the flow diagrams illustrate the functionality and operation of one possible implementation of the present embodiments, with respect to the implementations disclosed herein. In this regard, each block may represent a module, segment, or portion of program code, which comprises one or more instructions executable by a processor to implement a particular logical function or step in the process. The program code may be stored on any type of computer readable medium, such as a storage device including a diskette or hard drive. The computer readable medium may include a non-transitory computer readable medium, for example, a computer readable medium that stores data for a short time, such as register memory, processor cache, and Random Access Memory (RAM). The computer-readable medium may also include non-transitory media such as secondary or persistent long-term storage devices, e.g., Read Only Memory (ROM), optical or magnetic disks, compact disk read only memory (CD-ROM), and so forth. The computer readable medium may also be any other volatile or non-volatile storage system. The computer-readable medium may be considered a computer-readable storage medium, such as a tangible storage device. Additionally, for the embodiments disclosed herein, each block may represent circuitry wired to perform a particular logical function in the process.

a.Example method of pull switching

Method 1300A illustrates an example pull swap technique. A portable playback device (e.g., headset 710a, earpiece 710b, or portable playback device 710c) may perform a pull exchange technique to pull audio content in a playback session on the playback device 110 to the portable playback device.

At block 1302A, the method 1300A includes receiving a playback session exchange input. For example, the portable playback device 710 may receive data representing a first playback session exchange input. As described in connection with section VI, the playback session exchange input may initiate a pull exchange between the portable playback device 710 and one or more source playback devices when the portable playback device 710 is not currently playing audio content. In some examples, the portable playback device 710 receives the playback session exchange input via a user interface. For example, as discussed in connection with fig. 10, the earpiece 710a may receive a touch and hold input. Alternatively, the portable playback device 710c may receive a sustained touch and hold input. In other examples, the control device may receive the playback session exchange input and instruct a particular wearable or portable playback device to initiate the playback session exchange.

At block 1304A, the method 1300A includes identifying one or more source playback devices within the media playback system. For example, the portable playback device 710 may identify one or more eligible playback devices 110 as source playback devices. Eligible source playback devices for pull exchange include a playback device 110 that is connected to a first wireless LAN (e.g., network 104 in fig. 1B) and also plays back audio content in a playback session. As described in section VI, the eligible set of source playback devices may be filtered using various other factors (e.g., playback device type or role).

In some examples, the portable playback device 710 identifies one or more source playback devices via audio-based identification techniques, as described in section VI. In such an example, identifying one or more source playback devices may include identifying a set of swap-eligible playback devices in the media playback system, and then causing the set of swap-eligible playback devices to emit respective audio chirps that identify the swap-eligible transmitting playback devices. The portable playback device 710 may then detect, via the one or more microphones, an audio chirp issued by the one or more swap-eligible playback devices and select, based on the audio chirp from the one or more source playback devices, one or more source playback devices from the one or more swap-eligible playback devices that indicate that the one or more source playback devices are physically closest to the portable playback device 710 among the one or more swap-eligible playback devices. Selecting one or more source playback devices may include comparing one or more respective measures of the detected audio chirps emitted by the one or more swap-eligible playback devices to determine that the one or more source playback devices are physically closest to the portable playback device 710 among the one or more swap-eligible playback devices.

At block 1306A, the method 1300A includes exchanging the playback session from the source playback device to the portable playback device. For example, the portable playback device 710 can transition the playback session from the determined one or more source playback devices to the portable playback device 710. Transitioning the playback session can include forming a first synchrony group including the portable playback device 710 and the one or more source playback devices. Forming the first synchrony group causes the portable playback device 710 to begin playing the particular audio content for the playback session.

Transitioning the playback session can also include stopping playback of the particular audio content on the one or more source playback devices. In some examples, playback of the particular audio content on the one or more source playback devices is stopped by the one or more source playback devices leaving the first synchronization group. Alternatively, playback of particular audio content on one or more source playback devices is stopped by muting the one or more source playback devices. Other examples are possible.

b.Example method of push switching

Method 1300B illustrates an example push switch technique. A portable playback device (e.g., headset 710a, earpiece 710b, or portable playback device 710c) may perform a push switch technique to push audio content in a playback session on the portable playback device to a nearby playback device 110.

At block 1302B, the method 1300B includes receiving a playback session exchange input. For example, the portable playback device 710 may receive data representing a first playback session exchange input. As described in connection with section VI, the playback session exchange input may initiate a push exchange between the portable playback device 710 and one or more target playback devices when the portable playback device 710 is currently playing audio content. In some examples, the portable playback device 710 receives the playback session exchange input via a user interface. For example, as discussed in connection with fig. 10, the earpiece 710a may receive a touch and hold input. Alternatively, the portable playback device 710c may receive a sustained touch and hold input. In other examples, the control device may receive the playback session exchange input and instruct a particular wearable or portable playback device to initiate the playback session exchange.

At block 1304B, the method 1300B includes identifying one or more source playback devices within the media playback system. For example, the portable playback device 710 may identify one or more eligible playback devices 110 as target playback devices. Eligible target playback devices for pull exchange include a playback device 110 that is connected to a first wireless LAN (e.g., network 104 in fig. 1B) and also plays back audio content in a playback session. As described in section VI, the eligible set of target playback devices may be filtered using various other factors (e.g., playback device type or role).

In some examples, the portable playback device 710 identifies one or more target playback devices via audio-based identification techniques, as described in section VI. In such an example, identifying one or more target playback devices may include identifying a set of swap-eligible playback devices in the media playback system, and then causing the set of swap-eligible playback devices to emit respective audio chirps that identify the swap-eligible transmitting playback devices. The portable playback device 710 may then detect, via the one or more microphones, an audio chirp issued by the one or more swap-eligible playback devices and select, based on the audio chirp from the one or more source playback devices, one or more target playback devices from the one or more swap-eligible playback devices that indicate that the one or more target playback devices are physically closest to the portable playback device 710 among the one or more swap-eligible playback devices. Selecting one or more target playback devices may include comparing one or more respective measures of the detected audio chirps emitted by the one or more swap-eligible playback devices to determine that the one or more target playback devices are physically closest to the portable playback device 710 among the one or more swap-eligible playback devices. The comparison may be performed by any device in the media playback system and/or the remote computing system.

At block 1306B, method 1300B includes exchanging the playback session from the portable playback device to one or more target playback devices. For example, the portable playback device 710 can transition its playback session to one or more target playback devices. Transitioning the playback session can include forming a first synchrony group including the portable playback device 710 and one or more target playback devices. Forming the first synchrony group enables the one or more target playback devices to begin playing the particular audio content of the playback session.

Transitioning the playback session can also include stopping playback of the particular audio content on the portable playback device 710. In some examples, playback of particular audio content on one or more source playback devices is stopped by removing portable playback device 710 from the first synchrony group. Other examples are possible.

c.Example Home cinema exchange method

Method 1400 illustrates an example home theater exchange technique. The soundbar type playback device may perform home theater switching techniques to cause the wearable playback device or the portable playback device to playback audio received by the soundbar type playback device and transmitted to the target switching device.

At block 1402, the method 1400 includes playing back audio in a home theater mode. For example, a soundbar type playback device may play back audio in a home theater mode. In some examples, the soundbar type playback device is a master device of the first synchrony group. For example, an example soundbar type playback device is the playback device 110h that can operate as a source device of the binding of the study 101 d. The binding includes playback devices 110J and 110K and/or playback device 110i, as shown in fig. 1K and 1J.

At block 1404, method 1400 includes receiving an instruction to transition to a swap mode. For example, as shown in fig. 12A, the playback device 110h can receive data from a wearable playback device (e.g., headset 710a) representing an instruction to transition to an exchange mode. As another example, as shown in fig. 12B, the playback device 110h may receive data representing an instruction to switch to the exchange mode from the control device 130.

At block 1406, the method 1400 includes transitioning from the home theater mode to the exchange mode. The soundbar type playback device may switch from the home theater mode to the exchange mode based on receiving data indicative of an instruction to enter the exchange mode.

As described in connection with fig. 12A and 12B, the transition from the home theater mode to the exchange mode may include various steps. For example, to facilitate the wearable playback device connecting to the playback device 110h as a satellite, the playback device 110h may transition its 802.11-compliant network interface from operating as a node in a mesh network to operating as an access point forming a first wireless Local Area Network (LAN) in a first wireless frequency band. Further, the playback device 110h can send data representing a Service Set Identifier (SSID) of the first wireless LAN and credentials of the first wireless LAN to the wearable playback device via the 802.11-compliant network interface, which the wearable playback device can use to connect to the first wireless LAN.

Transitioning from the home theater mode to the exchange mode can also include forming a second synchrony group including the soundbar type playback device and the wearable playback device. For example, the playback device 110h and the headset 710a may form a second bonded zone after the headset 710a is connected to the first wireless LAN. After the second bound zone is formed, the playback device 110h can operate as a source device of the second bound zone. In this role, the playback device 110h sends data representing the playback timing information for the second synchrony group and the audio to the headset 710 a. The headset 710a plays back audio according to the timing information as described in section IV. After forming the second synchrony group, the playback device 110h mutes audio playback while the headset 710a plays back audio.

Transitioning from the home theater mode to the exchange mode can also include docking one or more satellite playback devices in the second wireless LAN. For example, the playback device 110h may cause the playback devices 110j and 110k and/or the playback device 110i to connect to a second wireless LAN in a second wireless band and leave the first synchronization group.

In other examples, a soundbar type playback device may add one or more additional wearable playback devices to the exchange pattern simultaneously with the first wearable playback device. For example, while in the swap mode, the playback device 110h may receive data from a second wearable playback device (e.g., the earpiece 710b, or another instance of the earpiece 710 a) representing an instruction to transition to the swap mode. Based on receiving the data representing the instruction to enter the exchange mode, the playback device 110h causes the second wearable playback device to join the second synchrony group.

Joining the second wearable playback device to the second synchronization group may include sending data to the second wearable playback device representing an SSID of the first wireless LAN and credentials of the first wireless LAN. For example, after the second wearable playback device is connected to the first wireless LAN formed by the playback device 110h, the playback device 110h receives an indication from the second playback device that the second playback device is ready for playback, and adds the second wearable playback device to a second synchronization group that includes the playback device 110h and the earpiece 710 b. The playback device 110h then sends data representing the playback timing information and audio for the second synchrony group to the second wearable playback device. The second wearable playback device plays back audio in synchronization with the first wearable playback device based on the playback timing information, as described in connection with section VI.

d.Example switching method

Method 1500 illustrates another example exchange method.

At block 1502, the method 1500 includes detecting an exchange trigger. The exchange trigger may initiate a playback session exchange between one or more source playback devices and one or more target playback devices. In various implementations, the source playback device or the target playback device detects the exchange trigger and initiates the playback session exchange. Alternatively, another associated device (e.g., control device 130 or bridge device 860) detects the trigger and initiates the playback session exchange.

As described herein, some example exchange triggers involve detecting a user action, e.g., a user input. For example, a source playback device (e.g., portable playback device 710) can detect a particular input representing an exchange command and initiate a playback session exchange based on detecting the particular input. As another example, control device 130 may detect a particular input representing an exchange command and initiate a playback session exchange based on detecting the particular input. Other examples are also contemplated.

Other example exchange triggers are based on proximity. For example, some example exchange triggers involve detecting proximity between a source playback device (or a paired device, e.g., control device 130a) and a target playback device. Example exchange triggers also include detecting proximity of a source playback device (or a paired device, e.g., control device 130a) to a particular location (e.g., a home location of the media playback system 100). Other example exchange triggers are described throughout, and other suitable exchange triggers are also contemplated.

At block 1504, method 1500 includes determining one or more source playback devices and one or more target playback devices. As described above, example embodiments relate to exchanging playback between one or more portable playback devices 710 and one or more playback devices 110. Depending on the context, the portable playback device 710 may operate as either a source playback device or a target playback device. Playback device 110 may likewise participate in a playback session exchange as either a source playback device or a target playback device.

In an example, the source playback device is determined based on context. For example, if the playback device 710 detects a particular input representing an exchange command, the playback device 710 may initiate a playback session exchange as a source playback device based on detecting the particular input. In another example, if control device 130 detects a particular input from playback device 110 representing a command to exchange playback, control device 130 may initiate a playback session exchange with playback device 110 as the source playback device, or may send data to playback device 110 indicating a command to cause playback device 110 to initiate a playback session exchange as the source playback device.

In other examples, the context is based on proximity. For example, if the portable playback device 710 detects the proximity of one or more potential target playback devices 110, the portable playback device 710 can initiate a playback session exchange with the portable playback device 710 as the source playback device. As another example, if the paired control device 130 or bridge device 860 detects the proximity of one or more potential target playback devices 110 and the paired portable playback device 710 is playing back audio content, the paired control device 130 or bridge device 860 may initiate a playback session exchange with the paired portable playback device 710 as the source playback device, or may send data indicative of nearby playback devices 110 to the paired portable playback device 710 to cause the paired portable playback device 710 to initiate a playback session exchange as the source playback device.

As described above in section V, in some examples, one or more target devices are determined based on a predefined exchange pair with a source playback device. For example, as shown in fig. 11A, the kitchen 101h is designated as a predefined swap pair with the headset 710 a. As described above, the exchange pairs may be configured and/or reconfigured via the control device 130 or other suitable device.

Alternatively, as described above in section V, one or more target devices are determined based on proximity to the source playback device. The proximity between the source playback device and the one or more target devices may be determined using any suitable proximity detection technique, including the proximity detection techniques described in section V above. Further, as described above, "proximity" may be defined within one or more ranges, such as a location (e.g., home), a zone, an area, or an individual device.

Further, in other examples, the one or more target devices are determined based on the context. For example, one or more playback devices may detect a particular input indicating a command to designate one or more playback devices as target playback devices. In other examples, the one or more target playback devices are determined based on an association between the target playback device and the device base. For example, if the device base 718a is associated with the kitchen 101h, placing the portable playback device 710c on the device base 718a may trigger a playback session exchange between the portable playback device 710c and the playback device 110 b.

When the first playback device 110 is determined to be a source or a target based on context, one or more additional playback devices 110 may be determined based on synchronization packets between the first playback device 110 and the one or more additional playback devices 110. For example, if the playback device 1101 in the master bedroom 101b is determined to be the target device, the playback device 110m is also determined to be the source playback device based on the binding pair configuration of the playback device 110m and the playback device 1101. In another example, if the kitchen + restaurant group is configured and the playback device 110d in the restaurant receives the swap input, then the playback device 110b is also determined to be the source playback device. This facilitates session exchange from all playback devices 110 participating in the playback session.

At block 1506, method 1500 includes switching a playback session from one or more source playback devices to one or more target playback devices. Within examples, the method 1500 may implement any suitable techniques to exchange playback sessions, such as the example messaging, cloud queuing, and grouping techniques described in section V. Other examples are also contemplated.

IX. example bridging device

In some example implementations, a portable playback device (e.g., the headset 710a, the earpiece 710b, or the portable playback device 710c) may interface with the media playback system 100 via the bridge device 860. Fig. 16A shows an example pairing arrangement between a headset 710a and a bridging device 860 a. In contrast to a general purpose smart phone or tablet that includes a bridging feature when configured to control the device 130, the bridging device 860a is configured with hardware and software to interface the portable playback device 710a with the media playback system 100. The bridge device 860a may also include other features to support or enhance the media playback system 100.

As with the control device 130a, the bridge device 860a may include communication interfaces, processing capabilities, and/or other features that need not be implemented in the portable playback device 710 a. When "portable playback device 710a is" paired "with bridge device 860a, portable playback device 710a is able to utilize some of these features. Such an arrangement may allow the portable playback device 710a to be smaller and more portable, consume less power, and/or be less expensive, among other possible benefits. For example, similar to the control device 130a, the bridge device 860a may include additional communication interfaces as compared to the portable playback device 710 a. For example, the headset 710a may utilize the cellular data connection of the bridge device 860a to connect to the internet. As another example, the headset 710a may connect to the playback device 110 or to the internet via the network 104 using the wireless network interface of the bridge device 860 a.

In another example, the portable playback device 710 may be paired with both a mobile device (e.g., a smartphone or tablet, perhaps implementing the control device 130 via installation controller application software) and a bridge device 860. In such embodiments, the portable playback device 710a may receive the first and second audio signals via a first network interface (e.g., A network interface) streams audio content from the mobile device and connects to the bridge device 860 via a second network interface (e.g., a wireless local area network interface). In this arrangement, the mobile device provides a connection to the internet to facilitate audio streaming, and the bridge device 860 serves as an interface to the media playback system 100.

In an example implementation, the bridge device 860a is bound to a particular playback device (e.g., playback device 110c), a bound region of a playback device (e.g., playback devices 1101 and 110m), or a group of playback devices (e.g., a "kitchen + restaurant" group). Alternatively, if a home layer hierarchy is utilized, the bridging device 860a may be bound to a particular collection, room, or region. Control of the playback device 110 bound to the bridge device 860a via the NMD 120 or the control device 130 then also controls the paired portable playback device 710 a.

Alternatively, the bridging devices 860a themselves may form a zone or set. For example, in one example, the bridging device 860a may be configured as a "local headset" zone or "local set of headsets". The bridge device 860a is configured to facilitate control of the paired headset 710a with the NMD 120 and/or the control device 130 of the media playback system 100.

Fig. 16B is a block diagram of a bridge device 860a including an input/output 811. Input/output 811 may include analog I/O811 a (e.g., one or more wires, cables, and/or other suitable communication links configured to carry analog signals) and/or digital I/O811 b. The bridge device 860a also includes the electronic device 812 and a user interface 813 (e.g., one or more buttons, knobs, dials, touch-sensitive surfaces, displays, touch screens). The bridging device 860a may optionally implement the NMD 820 and include one or more microphones 815 (e.g., single microphone, multiple microphones, microphone array) (hereinafter referred to as "microphone 815") to facilitate voice input.

In the embodiment shown in FIG. 16B, the electronic device 812 includes one or more processors 812a (hereinafter "processor 812 a"), memory 812B, software components 812c, a network interface 812d, and a power supply 812 i. In some embodiments, the electronic device 112 optionally includes one or more other components 812j (e.g., one or more sensors, a video display, a touch screen).

In some examples, the electronic device 812 includes one or more audio processing components 812g (hereinafter "audio components 812 g"), one or more audio amplifiers 812h (hereinafter "amplifiers 812 h"), and one or more transducers 814 to facilitate voice responses from the NMD 820. However, audio playback is not the intended purpose of the bridge device, and thus audio playback capabilities are often very limited compared to the playback device 110 and the portable playback device 710.

The processor 812a may include clock-driven computing components configured to process data, and the memory 812b may include a computer-readable medium (e.g., a tangible, non-transitory computer-readable medium, a data storage device loaded with one or more software components 812 c) configured to store instructions for performing various operations and/or functions. The processor 812a is configured to execute instructions stored on the memory 112b to perform one or more operations. Operations may include, for example, pairing with a particular portable playback device 710 and related functionality.

Network interface 812d is configured to facilitate data transfer between bridging device 860a and one or more other devices on a data network, such as link 103 and/or network 104 (fig. 1B). In the embodiment shown in fig. 16B, the network interface 812d includes one or more wireless interfaces 812e (hereinafter referred to as "wireless interfaces 812 e"). The wireless interface 812e (e.g., a suitable interface including one or more antennas) may be configured to wirelessly communicate with one or more other devices (e.g., one or more of the playback device 110, NMD 120, control device 130, and/or portable playback device 710) communicatively coupled to the network 104 (fig. 1B) according to a suitable wireless communication protocol (e.g., WiFi, bluetooth, LTE). In some examples, wireless interface 812e forms an ad-hoc network with paired portable playback device 710. In some embodiments, the network interface 812d optionally includes a wired interface 812f (e.g., an interface or receptacle configured to receive a network cable such as an ethernet, USB-A, USB-C, and/or Thunderbolt cable), which wired interface 812f is configured to communicate over wired connections with other devices according to a suitable wired communication protocol.

Fig. 16C is a front isometric view of a bridge device 860a configured as a command device 862a of the media playback system 100, in accordance with aspects of the disclosed technology. To configure the bridge device 860a as the command device 862a, the user interface 813a of the bridge device 860a includes a playback control. Example playback controls include transport (e.g., play/pause, skip forward/backward), and volume control, among others. Similar to control device 130, inputs to these playback controls are translated into playback commands via software component 812c and sent to one or more playback devices 110 and/or 710 to control playback via network interface 812 d.

In an example embodiment, the command device is configured to control only the paired and/or bound playback devices, as opposed to the control device 130, which generally controls the playback devices 110a-110n of the media playback system 100. For example, in the example of fig. 16A where bridge device 860a is paired with portable playback device 710a, the playback command issued on command device 862a is executed on portable playback device 710 a. In addition, when the bridge device 860a is bound to one or more playback devices 110, the playback commands issued on the command device 862a are also executed on the bound playback devices 110.

The user interface 813a of the bridge device 860a includes a dial 863a to facilitate volume control of the paired playback device 710 and/or the bound playback device 110. In this example, the carousel 863a is formed by a first portion of the housing 816a rotating around the base of the housing 816a, as shown in fig. 16C. Clockwise and counterclockwise rotation of the dial 863a corresponds to upward and downward volume adjustment.

The user interface 813a of the bridge device 860a also includes a touch-sensitive region 864a to facilitate transport control of the paired playback device 710 and/or the bound playback device 110, as shown in fig. 16D. A touch sensitive area 864a is formed on the top surface of the housing 816a as shown in fig. 16C. The touch sensitive area 864a may be implemented as a capacitive or resistive touch sensitive area or the like. In this example, a touch input to the center of the touch sensitive area 864a is interpreted as a play/pause switch. The touch-sensitive area 864a may also interpret a particular input as skipping forward and backward. For example, touch inputs to the right and left sides of the touch-sensitive area 864a may be interpreted as a skip forward and a skip backward, respectively. Alternatively, a left-to-right slide gesture on the touch-sensitive area 864a may be interpreted as a skip forward, while a right-to-left slide gesture may be interpreted as a skip forward.

In certain embodiments, the user interface 813a of the bridge device 813a is intentionally limited to a particular subset of playback commands as compared to the "full function" control supported by the control device 130. As shown in fig. 16C and 16D, such a subset may include volume controls and transmission controls (and possibly only certain transmission controls). Such a simplified, minimalist user interface may enhance the user experience of the paired playback device 710 or bound playback device 110 by reducing distraction and other possible benefits.

In embodiments that exclude a library and/or a command device 862a that searches controls to select audio content for playback, initiating playback via the command device 862a may initiate a particular audio container. The particular audio container may be preconfigured by the user via the control device 130 or automatically selected by the media playback system. Example audio containers include playlists, internet radio stations, albums, and podcasts.

Fig. 16E is a front view of an example bridge device 860 b. The housing 816b of the bridging device 860b is more rectangular than the circular housing 816a of the bridging device 860 a. The user interface 813b of the bridging device 860b includes a dial 863b on the front surface of the housing 816b, a touch-sensitive area 864b on the upper surface of the housing 816b, and buttons 865a-865d on the front surface of the housing 816 b. As with the dial 863a, the dial 863 facilitates volume control of the portable playback device 710 paired with the bridge device 860b and/or the playback device 110 bound to the bridge device 860 b. Further, the touch-sensitive region 864b facilitates transport control of the paired portable playback device 710 and/or the bound playback device 110 in a similar manner as the touch-sensitive region 864 a.

Buttons 865a-865d correspond to respective audio containers. The particular audio container may be preconfigured by the user via the control device 130 or automatically selected by the media playback system (e.g., based on user-specified preferences or listening frequency). Selection of a particular button 865 causes paired playback device 710 and/or bound playback device 110 to initiate playback of the corresponding container, similar to how a radio preset tunes the radio to the corresponding radio station.

For example, selection of button 865a causes bridge device 860b to send one or more instructions to paired playback device 710 to play back the audio container corresponding to button 865 a. The one or more instructions may include a URI indicating a location of the audio container at the computing device 106 (e.g., a content server of a streaming audio service). The paired playback device 710 then streams the audio container from the computing device 106 and plays back the audio container.

In some implementations, the bridging device 860 may include a graphical display. In such an example, the user interface 813 of the bridging device 860 may comprise a graphical user interface displayed on a touch-sensitive graphical display. In some examples, the graphical display is touch-sensitive to facilitate touch input to the graphical user interface. However, the graphical user interface may have limited playback control compared to the control devices 430 and 530, which may reduce distraction caused by the presence of a graphical display, among other possible benefits.

To illustrate, fig. 17A presents a first user interface display 1770a configured to be displayed on a bridging device having a circular touch-sensitive graphical display. For example, an example embodiment of the bridge device 860a may implement the touch-sensitive area 864a as a circular touch-sensitive graphical display. Other shapes and arrangements of touch sensitive graphical displays are also contemplated.

The first user interface display 1770a includes a plurality of regions 1771a-1771F that are similar to buttons 865a-865d (FIG. 8F). In the first user interface display 1770a, the regions 1770a-1770g are selectable via touch input to the respective region. Each region 1771 corresponds to a respective audio container. The particular audio container may be preconfigured by the user via the control device 130 or automatically selected by the media playback system (e.g., based on user-specified preferences or listening frequency). Example audio containers include internet radio stations, playlists, albums, podcasts, and other streaming audio content. Selection of a particular button 865 causes paired playback device 710 and/or bound playback device 110 to initiate playback of the corresponding container.

Currently, by way of illustration, the area 1770a is shown in a central location of the first user interface display 1770 a. Areas 1771b and 1771f are shown in part at bottom and top positions, respectively, in the first user interface display 1770 a. By scrolling up or down first user interface display 1770a using a slide up or down gesture, respectively, regions 1771b or 1771f may be displayed in full, and regions 1771c-1771e may also be displayed in a round robin fashion (round robin). For illustration, fig. 17B shows an upward slide that moves region 1771B toward a center position. Fig. 17C shows region 1771B in a centered position after the upward slide of fig. 17B. As shown in fig. 17C, when the region 1771b is in the center position, the regions 1771a and 1771C are partially displayed.

When a particular region 1771 (e.g., region 1771a) is selected, the bridge device 860 causes the paired playback device 710 and/or bound playback device 110 to initiate playback of the corresponding container. When region 1771a is selected again while the container is playing back, the bridge device 860 causes the paired playback device 710 and/or bound playback device 110 to stop playback of the corresponding container. In this way, region 1771 acts as a play/pause button.

Other transmission controls may be implemented by a graphical user interface. For example, as shown in fig. 17C, the swipe gesture in the first user interface display 1770a is related to skip forward and skip backward. Specifically, a slide left may cause a skip forward, while a slide right causes a skip backward.

Fig. 17D presents a second user interface display 1770b, which may be displayed based on a selection of area 1771 a. The second user interface display 1770b includes a region 1772 that includes a graphical representation of the audio content played back on the paired playback device 710 and/or the bound playback device 110, as well as media content information corresponding to the audio content. For example, if the selected audio container is playing a track, the metadata corresponding to the track is displayed in area 1772.

The second user interface display 1770b may also include one or more transmission controls. To illustrate, second user interface display 1770b includes forward jump control 1773a and backward jump control 1773 b. In various embodiments, the second user interface display 1770b may also include other transmission controls. For example, similar to the first user interface display 1770a, the swipe gesture in the second user interface display 1770b may be related to skip forward and skip backward.

The second user interface display 1770b may also include navigation controls. By way of example, the second user interface display 1770b includes navigation controls 1774a and 1774 b. The navigation control 1774a causes the bridging device 860 to display a first user interface display 1770 a. The navigation control 1774b causes the bridging device 860 to display a third user interface display 1770c that includes a queue.

To illustrate, fig. 17E presents a third user interface display 1770c that can be displayed based on a selection of a navigation control 1774 b. As shown, the third user interface display 1770c includes an interface for browsing within the audio container. Selecting a single song or other media item within the audio container results in playback of the media item. For example, if the selected audio container is a playlist, the third user interface display 1770c lists the audio tracks of the playlist. As another example, if the selected audio container is a podcast, the third user interface display 1770c may display other audio content available within the container (e.g., an episode of the podcast).

In some implementations, the graphical user interface facilitates selection of the portable playback device 710 to pair with the bridge device 860 and/or the playback device 110 to bind with the command device 862. To illustrate, fig. 17F presents a fourth user interface display 1770d having a plurality of toggle controls 1775 corresponding to respective portable playback devices 710 and zones. Toggling the toggle control 1775 pairs or binds the corresponding portable playback device 710 or playback device 110 with the bridge device. As shown, the toggle control 1775a corresponding to the headset 710a is toggled open such that the headset 710a is paired with the bridge device 860.

The selection of multiple zone names results in a zone being formed between the zone (if not already formed) and the bridge device 860 to pair with the zone (and thereby control all playback devices 110 in the zone). Selection of the "per place" switch places the media playback system 100 in a party mode (where all playback devices 110 play music synchronously), and pairs the bridge device 860 with all playback devices 110 in the media playback system 100.

In an example embodiment, the bridge device 860 charges one or more batteries by being placed on the device base 718. Fig. 18A illustrates the placement of a bridging device 860a on the device base 718 b. The bridge device 860a may interact with the device base 718b in the same or similar manner as the portable playback device 710 c. For example, if device dock 718b is associated with a zone of the media playback system 100, placing the bridge device 860a on device dock 718b may cause the bridge device 860a (and the paired portable device 710) to join the associated zone.

In an example implementation, the bridge device 860a may be rotated about the device base 718b to control the volume of the portable playback device 710 that is paired with the bridge device 860 a. In some implementations, rotation of the bridge device 860a about the device base 718b also controls the volume of the playback device 110 that is bound to the bridge device 860 a. Similar to the bridge device 718a, the bridge device 860a may rotate relative to the device base 718b, which may generate volume control signals in sensors of the bridge device 860a and/or the device base 718 b. In another example, a first portion of the device base 718b may be rotatable relative to a second portion of the device base 718 b. When the bridge device 860a is placed on the device base 718b, rotation of the two portions generates a volume control signal in the sensor of the device base 718b that controls the volume of the paired playback device 710.

The bridge device 860 of the media playback system 100 may also have other features of the portable device 710 that support the media playback system. For example, the bridge device 860 may support charging the portable device 170. To illustrate, fig. 18B shows an example stacked arrangement including a device base 718B that charges a bridging device 860a and a bridging device 860a that charges earplugs 710B via a charging box 1080. Similar to the device base 718, the bridge device 860a may charge the ear-buds 710b via inductive charging or via conductive terminals. In some embodiments, the device base 718b may directly charge the ear buds 710b by placing the charging cassette 1080 on the device base 718 b. Other form factors of the charging box 1080 may be used to charge other form factors of the portable playback device 710 (e.g., the headset 710 a).

Fig. 18C illustrates another example stacking arrangement to facilitate device charging. In this example, device base 718a charges portable playback device 710 c. The portable playback device 710c charges the bridge device 860 a. The bridging device 860a charges the ear buds 710b via the charging cassette 1080. In this arrangement, only the device base 718a requires an external power source to charge the various stacked devices.

X. additional exchange example

In some examples, the source and target of the exchange are predefined. In a predefined exchange pair, the source is the playback device 710 or one or more playback devices 110 that are playing audio content, and the target is another playback device that is not playing audio content. A playback exchange between the source playback device and the target playback device is performed when an exchange-triggering action, such as a button press or other user input, is detected.

In some implementations, the input of the source device of the swap pair triggers the swap. For example, a particular input (e.g., a tap or gesture to the touch-sensitive area (or a portion thereof)) to the user interface 713a of the headset 710 (fig. 7B) may trigger an exchange. In other examples, the portable playback device 710 may include a physical button for triggering the exchange. Further, a pattern of touch inputs (e.g., short, long, short) or a tracking pattern (e.g., a shape such as a zigzag or triangle) may trigger the exchange. Other types of inputs are also contemplated.

Additionally or alternatively, an input of the target device triggers the exchange. For example, a particular input of the user interface 113 of the playback device 110a (fig. 1C) may trigger an exchange. In other examples, playback device 110 may include a physical button that triggers the exchange. Operating a button (e.g., by selecting, touching, sliding, etc.) triggers an exchange. Other types of inputs are also contemplated.

In an example, a user interface (e.g., user interface 133 of control device 130a or user interface 813 of bridge device 860 a) may facilitate defining the predefined exchange pairs. To illustrate, fig. 19A presents a first user interface display 1931a to facilitate defining a swap pair of headphones 710a ("the present headphones"). By way of example, the first user interface display 1931a is configured to be displayed on the control device 430, but may be adapted to be displayed on other example devices disclosed herein. The control device 430 may display a first user interface display 1931a during a setup process for the headset 710 a. Further, the user may display the first user interface 1931a by setting a user interface display or the like.

As shown, the first user interface display 1931a includes a graphical indication of zones (i.e., zone names) within the media playback system 100 and a toggle control corresponding to each zone. The handover switching control configures the corresponding zone as an exchange pair with the headset 710 a. In this example, the kitchen 101h is defined as an exchange pair with the headset 710 c. Although toggle controls are shown by way of example, other types of controls may be used in alternative embodiments. An example user interface may include functionally similar user interface displays to define a swap pair for other portable playback devices 710 (e.g., ear buds 710b and/or portable playback device 710c) of the media playback system 100. The predefined exchange pairs may be stored in a data storage device of the control device 130, the playback device, and/or the portable playback device 710, possibly as one or more state variables shared between these devices.

Alternatively, if a home layer hierarchy is implemented, a similar user interface display may include a graphical indication of a set, room, and/or region of a home map configured in the media playback system 100. The user interface display may include toggle controls or other similar controls corresponding to each collection, room, and/or region. In this example, toggling the toggle control configures the corresponding set, room, and/or region as an exchange pair with the headset 710 a.

In some implementations, the media playback system 100 can define two or more exchange pairs of portable playback devices. To illustrate, fig. 19B presents a second user interface display 1931B to facilitate defining a plurality of exchange pairs of earplugs 710B. As shown, each predefined swap pair corresponds to a different input (e.g., a different gesture). Providing an input corresponding to a particular predefined exchange pair triggers the exchange of that exchange pair.

The user may define custom inputs corresponding to the predefined exchange pairs. To illustrate, FIG. 19C presents a third user interface display 1931C that facilitates defining a custom gesture. As shown, third user interface display 1931c includes a prompt to provide a custom gesture. After the press starts, the ear buds 710b and the playback devices 1101 and 110m in the switch pair monitor their respective user interfaces 713b and 113 to detect the custom input, which is then stored in the data storage device.

In other examples, placing the portable playback device 710 on the charging base triggers the exchange. For example, placing portable playback device 710c on device dock 718a (fig. 7F) may trigger the exchange. In some implementations, the exchange target is predefined for the portable playback device 710 c.

Alternatively, device pedestal 718a may be tied to one or more specific zones. Placement of portable playback device 710c on device dock 718a then triggers an exchange to one or more specific zones. Additional details regarding binding zones to device chassis may be found, for example, in U.S. patent No.9,544,701 entitled "Base Properties in a Media Playback System," which is hereby incorporated by reference in its entirety, as noted above.

In other examples, an input to the user interface of device base 718a may trigger an exchange. Example inputs include button presses (or other manipulations) or touch inputs to a touch-sensitive area, similar to the example inputs described above. For example, a particular gesture may be interpreted by device base 718a as an exchange trigger.

In other examples, an input to the user interface 113 of the NMD 120a triggers an exchange. For example, the user may say the voice input as "switch to kitchen". As described above in connection with fig. 3A-3D, a user may activate a voice assistant service to handle voice input with an activation word or button press (e.g., push-to-talk). The voice input includes a first command indicating an action ("swap") and a second command indicating a target playback device ("kitchen") of the action. Here, the voice input is transmitted to the voice assistant service and processed as described above in connection with FIGS. 3A-3D. In some cases, instructions corresponding to the processed voice command are transmitted back to the source playback device or the target playback device to cause a playback session exchange to be performed. Alternatively, instructions corresponding to the processed voice command are transmitted to the server to cause a playback session exchange to be performed, as described in further detail below in connection with fig. 12B and 12C. After the exchange, the NMD 120a can confirm the exchange by a voice response, e.g., "< audio content name > is now playing in the kitchen.

In some cases, when the exchange trigger is detected, both the source playback device and the target playback device are playing audio content. In such an example, the respective playback sessions of the source playback device and the target playback device can be exchanged such that the source playback device begins playing back audio content previously played on the target playback device and the target playback device begins playing back audio content previously played on the source playback device. Alternatively, the playback session of the source playback device is switched to the target playback device, and playback is stopped on the target playback device.

In an example embodiment, the source playback device may facilitate the exchange by sending playback session data to the target device. The playback session data may include data representing the source of the audio content (e.g., a URI or URL indicating the location of the audio content) and an offset indicating the location within the audio content at which playback begins. The offset may be defined as the time from the start of the track (e.g., in milliseconds) or as a number of samples, etc. In an example embodiment, the offset may be set to the playback position in the audio content at the current playback position to allow time for the target device to begin buffering the audio content. The source playback device then stops playing back the audio content at the offset, and the target playback device begins playing back the audio content at the offset.

The playback session data may also include one or more identifiers corresponding to the playback session. For example, the playback session data may include a session identifier that distinguishes the playback session from other playback sessions. The playback session data may also include an application identifier that identifies the media playback system controller application software that controls the playback session. In addition, the playback session data can include a streaming audio service identifier that identifies a streaming audio service hosting audio content at the source, and an audio item identifier (e.g., a unique identifier used by the streaming audio service to identify the audio content). As another example, a home identifier may be included in the playback session data to distinguish the media playback system 100 from other media playback systems. As another example, the group identifier may identify devices in a zone, a binding zone, or a group of zones.

To illustrate, fig. 20A is an example message flow diagram showing instructions exchanged between a source playback device, a target playback device, and a content server during an example exchange of a playback session. Such messages are representative and may include additional or fewer messages. In some implementations, instead of sending a message from the portable playback device 710 (as either the source playback device or the target playback device), the message is sent from the paired control device 130a (fig. 7G) or the paired bridge device 860a (fig. 16A).

At 2081a, the source playback device begins a playback session. The playback session may be initiated on the source playback device, the control device 130, or the bridge device 860, among others. In some cases, the playback session may include one or more additional playback devices that play back synchronously with the source playback device as part of the group.

At 2082a, the source playback device detects an exchange trigger (e.g., any of the example exchange triggers described above), and so on. In some cases, another device (e.g., the target playback device, the control device 130, the device base 718, or the bridge device 860) detects the exchange trigger and sends data to the source playback device indicating that the exchange trigger was detected.

At 2083a, the source playback device sends playback session data to the target playback device. As shown in the example, the playback session data includes data representing a URI indicating the source of audio content currently playing in the session (e.g., a currently playing track). The playback session data also includes data representing an offset in the audio content that indicates a position in the audio content at which playback is to begin. In addition, if the source playback device is playing audio content in a queue, the playback session data may also include data representing the queue, which may include the URIs in the queue corresponding to the individual media items, as well as the order of the queued media items. In addition, the playback session data includes one or more identifiers, as described above.

At 2084a, the target playback device sends a get message to the content server to request an audio content stream from the content server. The get message may include a URI indicating the source of the audio content at the content server. The acquisition message may also include an offset. The acquisition message may also include other data, such as one or more identifiers and/or authorization data.

Based on the get message, the content server streams the audio content to the target playback device for playback at 2085 a. The content service may begin streaming at an offset in the audio content. The target playback device then begins playing back the audio content at the offset in the audio content.

At 2086a, the target playback device sends an acknowledgement message to the source playback device after receiving the playback session data. In an example implementation, the source playback device may not stop playing back the session until an acknowledgement message is received from the target playback device. The acknowledgement message may indicate that the exchange was successful.

Other example embodiments utilize cloud queues to facilitate playback session exchange. In contrast to the queue in the data storage device of playback device 110 (i.e., the local queue), the cloud queue for the playback session is maintained in the cloud on computing device 106. In this embodiment, rather than controlling the playback devices 110a-110n locally via the network 104, the control device 130a controls the playback devices 110a-110n via the computing device 106 by manipulating a cloud queue on the computing device 106. The computing device 106 synchronizes the cloud queue (or a portion thereof) with the playback devices 110 participating in the playback session.

To illustrate, fig. 20B is an example message flow diagram showing instructions exchanged between a source playback device, a cloud queue server, a target playback device, and a content server during an example exchange of a playback session. Such messages are representative and may include additional or fewer messages. In some implementations, instead of sending a message from the portable playback device 710 (as either the source playback device or the target playback device), the message is sent from the paired control device 130a (fig. 7G) or the paired bridge device 860a (fig. 16A).

At 2081b, the source playback device begins a playback session. The playback session may be initiated on the source playback device, the control device 130, or the bridge device 860, among others. In some cases, the playback session may include one or more additional playback devices that play back synchronously with the source playback device as part of the group.

At 2082b, the source playback device detects an exchange trigger (e.g., any of the example exchange triggers described above), and so on. In some cases, another device (e.g., the target playback device, the control device 130, the device base 718, or the bridge device 860) detects the exchange trigger and sends data to the source playback device indicating that the exchange trigger was detected.

At 2087, the source playback device sends an exchange session message including the playback session data to the cloud queue server. The exchange session message may indicate the target playback device via one or more identifiers. In some examples, the cloud queue server may maintain a predefined exchange pair for the media playback system 100, for example, using the predefined exchange pair. The exchange session message may also include data representing an offset in the audio content indicating a position in the audio content to begin playback. In an example, the cloud queue server may also track the play position in the playback session, and may verify the play position using the position in the exchange session message. Further, exchanging session messages may include a home identifier that identifies the media playback system 100 (to distinguish from other media playback systems in other homes) and one or more player identifiers that identify the source playback device and/or the target playback device.

Based on receiving the exchange session message, the cloud queue server relocates the session from the source device to the target device. For example, the cloud queue server may use a home identifier in the playback session data to identify a cloud queue of the media playback system 100 and then use a group identifier (or queue identifier) to identify the cloud queue used in the playback session. The cloud queue server may exchange the session to the target playback device, changing the cloud queue data to associate the cloud queue with the target playback device. Alternatively, the cloud queue server may mirror the cloud queue of the source device with the cloud queue of the target playback device and then set the playback state of the cloud queue to match the playback state indicated in the playback session data.

For example, at 2088, the cloud queue server sends playback session data to the target playback device. Playback session data includes data representing a URI indicating the source of audio content currently playing in the session (e.g., a currently playing track). The playback session data also includes data representing an offset in the audio content that indicates a position in the audio content at which playback is to begin. Additionally, if the source playback device is playing back audio content having multiple audios in the cloud queue, the playback session data may also include data representing windows from the cloud queue. The window may indicate media items that follow the currently playing audio content and possibly media items that precede the currently playing audio content. The target playback device may queue the window in a local queue to facilitate further playback of the cloud queue in the transmitted session.

At 2084b, the target playback device sends a get message to the content server to request an audio content stream from the content server. Based on the get message, the content server streams the audio content to the target playback device for playback at 2085 b. The target playback device then begins playing back the audio content at the offset in the audio content.

Fig. 20C is an example message flow diagram illustrating instructions exchanged between a source playback device, a target playback device, and one or more servers (e.g., cloud queue servers and/or content servers, which may be implemented by one or more cloud servers) during another example exchange of a playback session. Such messages are representative and may include additional or fewer messages. In some implementations, instead of sending a message from the portable playback device 710 (as either the source playback device or the target playback device), the message is sent from the paired control device 130a (fig. 7G) or the paired bridge device 860a (fig. 16A).

At 2081c, the source playback device begins a playback session. The playback session may be initiated on the source playback device, the control device 130, or the bridge device 860, among others. In some cases, the playback session may include one or more additional playback devices that play back synchronously with the source playback device as part of the group.

At 2082c, the source playback device detects an exchange trigger (e.g., any of the example exchange triggers described above), and so on. In some cases, another device (e.g., the target playback device, the control device 130, the device base 718, or the bridge device 860) detects the exchange trigger and sends data to the source playback device indicating that the exchange trigger was detected.

At 2083b, the source playback device sends playback session data to the target playback device. The playback session data includes one or more identifiers, e.g., a playback session identifier and a queue identifier. The playback session data may also include a URI indicating the source of the audio content and an offset within the content.

At 2089, the target playback device sends an exchange session request to one or more servers. In a cloud queue implementation, the exchange session request may be in the form of a load queue request that indicates an instruction to load onto the target playback device the current cloud queue state of the cloud queue being played back by the source playback device. To facilitate such requests, the exchange session request includes one or more identifiers corresponding to the playback session (e.g., a home identifier, a playback device identifier of the target device, a queue identifier, a playback session identifier).

Upon receiving the exchange session request, the one or more servers facilitate streaming of the audio content to the target playback device. For example, one or more servers (content servers) may create a new session on the target playback device, such as by instructing a cloud queue server to create the new session on the target playback device. The request may include a home identifier, an application identifier, and a user account, among other identifiers. The playback session data may be used to mirror the playback session on the source playback device into a new session on the target playback device.

At 2085c, the content server streams the audio content to the target playback device for playback. The content service may begin streaming at an offset in the audio content. The target playback device then begins playing back the audio content at the offset in the audio content.

At 2086b, the target playback device sends an acknowledgement message to the source playback device after receiving the playback session data. In an example implementation, the source playback device may not stop playing back the session until an acknowledgement message is received from the target playback device. The acknowledgement message may indicate that the exchange was successful.

In other examples, the source playback device and the target playback device perform the exchange by forming a synchrony group. As described above, the example playback device 110 and/or the playback device 710 can dynamically form and un-form a synchrony group. As described above, additional details regarding audio playback synchronization between playback devices and/or zones may be found, for example, in U.S. Pat. No.8,234,395 entitled "System and method for synchronizing operations an amplitude of elementary recording data processing devices," the entire contents of which are incorporated herein by reference.

In some implementations, the source playback device forms a synchrony group with the target playback device and then mutes its output. When the synchrony group is formed, the target playback device begins playing back audio content for the given session in synchrony with the source device. To complete the "swap," the source device is muted. From the user's perspective, the playback session is exchanged even though both the source playback device and the target playback device are participating in the session. The muting may be a hidden (e.g., system) muting that is different from a muting command via a user interface. Hiding the mute may be performed by reducing the volume or setting the volume to zero on the source device while displaying on the user interface that the source device unmutes and pauses playback.

To swap playback back to the source playback device, the target playback device is removed from the synchrony group. A possible advantage of this embodiment is that the session can be switched back to the source device relatively delay-free, since the audio content does not need to be re-buffered. Another possible advantage of this embodiment is that the source playback device maintains control over the audio stream.

In other examples, detecting a proximity between a source playback device and a target playback device triggers an exchange. For example, the source playback device and the target playback device that detect the predefined exchange pair are in proximity to an exchange that may initiate a playback session between the source playback device and the target playback device. In some implementations, the source playback device and the target playback device that are exchanged are defined by the proximity of the source playback device to the target playback device. Example proximity detection may be implemented within one or more ranges, such as proximity to the media playback system 100 (i.e., home or some other known location), proximity zone, or proximity to a playback device.

For example, in some implementations, the proximity of the portable playback device 710 to the media playback system 100 initiates a playback session exchange with one or more target playback devices 110 in the home. In an example, when a user returns home with the portable playback device 710, the paired control device 130a (fig. 7G), or the paired bridge device 860 (fig. 16A), proximity of the portable playback device 710 to the media playback system 100 is detected via a sensor or wireless communication interface of the portable playback device 710, the paired control device 130a (fig. 7G), or the paired bridge device 860 (fig. 16A). The proximity detection initiates a playback session exchange between the portable playback device 710 and one or more target playback devices 110 in the home.

To illustrate, in an example embodiment, the paired control device 130a (fig. 7G) detects a wireless signal indicating that the portable playback device 710 is in proximity to the playback device 110. For example, the paired control device 130a (fig. 7G) may detect (e.g., connect to) an 802.11 network (e.g., network 104) in a home via network interface 132 d. Since the playback devices 110a-110n are connected to the network 104, detection of the network indicates that the paired control device 130a (and, by proxy, the paired portable playback device 710) is near home. Other example wireless signals include Near Field Communication (NFC) and 802.15 (NFC) Low power consumption) signals that may be transmitted by playback devices 110a-110n in the home. In other examples, the paired bridge device 860 (fig. 16A) may detect such signals, or the portable playback devices 710 may detect these signals directly via their respective network interfaces.

Alternatively, the paired control device 130a (FIG. 7G) detects proximity to the playback devices 110a-110n via one or more sensors. For example, the paired control device 130a may include a GPS sensor and compare the current GPS coordinates to stored GPS coordinates of the home (or other known location of the playback devices 110a-110 n) to determine whether the paired control device 130a is proximate to the stored location. In other examples, the paired control device 130a may detect proximity by detecting ultrasonic tones (or other signals) emitted by one or more playback devices 110a-110n using a microphone. Alternatively, the paired control device 130a may utilize a camera to detect a known object or signal in the home. Other examples are possible.

In some examples, authentication from the user is required before the playback session exchange is performed based on proximity. In some examples, verification is accomplished via input of a user interface on the source portable playback device 710 (or the paired control device 130 or bridge device 860 a). For example, the verification may be accomplished via a push notification (or other prompt, e.g., widget) displayed on the paired control device 130 a. To illustrate, fig. 21A presents a first user interface display 2131A that includes an example push notification 2191A. The paired control device 130a may display a first user interface display 2131a based on detecting proximity to the playback devices 110a-110 n.

As shown in fig. 21A, push notification 2191A of first user interface display 2131A includes a plurality of selectable controls. A first selectable control ("swap") causes the paired control device 130a to perform a playback session exchange between the headset 710a ("native headset") and the kitchen 101h (which may be a predefined swap pair (fig. 11A)) or the nearest playback device 110, etc. A second selectable control ("cancel") cancels the proximity-based exchange.

Selectable controls 2192a and 2193b are also shown in fig. 21A. The selectable control 2192a does not transfer the ongoing playback session, but rather causes the kitchen 101h to continue the stopped playback session (e.g., playback of the podcast). In an example implementation, the selectable control 2192a may represent a last stopped playback session on the portable playback device 710a, a last stopped playback session in the kitchen 101h, or a last stopped playback session in the media playback system 100, among others. Alternatively, push notification 2191a may include a plurality of selectable controls 2192 to select a different last stopped playback session.

The selectable control 2193a causes the kitchen 101h to start a new playback session that includes playback of a given playlist. In various implementations, the example push notification 2191 may include selectable controls 2193 to start a new playback session with various types of audio containers corresponding to the user. For example, similar to button 865 (fig. 16E) and/or region 1771 (fig. 17A), the corresponding selectable control 2193 can begin a new playback session with a favorite playlist, radio station, podcast, album, or artist, and so forth.

As further shown in fig. 21A, a third selectable control of push notification 2191A causes display of a user interface display to select a different swap target. To illustrate, fig. 21B presents a second user interface display 2131B that facilitates selection of an exchange objective. The user interface display 2131b includes a plurality of toggle controls corresponding to various regions of the media playback system 100 to facilitate selection of one or more target playback devices 110 n.

In some implementations, the proximity of the portable playback device 710 to a zone initiates a playback session exchange with the playback device 110 within the zone. Detecting that the portable playback device 710 is proximate to a given zone may involve detecting signals (e.g., wireless, ultrasonic) emitted by playback devices within that zone. In some implementations, detecting signals emitted by other smart devices within a zone may indicate proximity.

For example, the paired control device 130a may determine a profile corresponding to one or more zones. For example, in the kitchen 101h, the paired control device 130a may detect signals emitted by the playback device 110b and other smart devices (e.g., smart oven, smart refrigerator, smart power outlet) and save these signals as indicia of the kitchen 101h in a profile corresponding to the kitchen 101 h. Further, the paired control device 130a may combine the signal data with other sensor data (e.g., height) captured while in the kitchen 101 h. The markers in a given profile may also be weighted (e.g., the signals of playback devices in a given zone may be weighted more heavily than other smart devices within the zone).

Given the storage profiles of multiple zones in the media playback system 100, to detect whether the portable playback device 710 is proximate to a given zone, the paired control device 130a may compare the current signals and/or sensor data to the storage profiles corresponding to the zones. For example, the paired control device 130a may determine the closest match to the current signal and/or sensor data by comparing how many markers are present in each profile in the current signal and/or sensor data. The paired control device 130a may also set a threshold for the flag by determining proximity to a particular zone when a predetermined number (or percentage) of the flags are also present in the current signal and/or sensor data in the particular zone's stored profile. While these operations are described by way of example as being performed by the paired control device 130a, other devices, such as the portable playback device 710 and/or the bridge device 860, may also use the stored profile to determine the profile and/or detect proximity.

Additional techniques for facilitating determination of Zone proximity may be found, for example, in U.S. patent application publication No.2016/0062606 a1, entitled "Zone recognitions," which is hereby incorporated by reference in its entirety.

Similar to being close to home, the media playback system 100 may request verification that the user intends to perform an exchange before performing a playback session exchange to a zone based on proximity. To illustrate, fig. 21C presents a third user interface display 2131C that includes an example push notification 2191 b. The paired control device 130a may display a third user interface display 2131c based on detecting proximity to the study 101 d.

In other examples, proximity to a given zone is determined via user input to a playback device of the given zone. For example, a particular user input to the playback device 710 (or the paired control device 130a or bridge device 860a) may initiate a playback session exchange with the playback device 710 as the source playback device. User input to a given playback device 110 then selects that playback device (or associated zone) as the target playback device. The source playback device and the target playback device may be configured to perform an exchange to indicate proximity between the source playback device and the target playback device if the second input is detected within a predetermined time period (e.g., 5 seconds) after the first input.

In other examples, another trigger (e.g., a button press) initiates a playback session exchange to a target playback device in proximity to the source playback device. To illustrate, fig. 22A shows an example playback session exchange between a portable playback device 710c and a playback device 110e in proximity to the portable playback device 710 c. As shown, a particular exchange input (e.g., a long press on user interface 713 c) triggers a playback session exchange. In this example, the source playback device (i.e., portable playback device 710c) is identified via a particular exchange input. The target playback device (i.e., playback device 110e) is identified by the portable playback device via proximity detection.

As another example, fig. 14B illustrates an example playback session exchange between the headset 710a and the playback device 110e proximate to the portable playback device 710 a. As shown, the hold close action triggers a playback session exchange. In this example, both the source playback device (i.e., headphone 710a) and the target playback device (i.e., playback device 110e) are identified by a hold-off action, which results in a near field communication exchange between headphone 710a and playback device 110 e. Since near field communication has a limited range (e.g., 4cm), the near field communication exchange indicates proximity between the headset 710a and the playback device 110 e.

As another example, fig. 22C shows an example playback session exchange between the earpiece 710b and the playback device 110 e. In this example, the hold-close action of the control device 130a (paired with the earpiece 710b) triggers the playback session exchange. In this example, both the source playback device (i.e., ear bud 710b) and the target playback device (i.e., playback device 110e) are identified by a hold-off action, which results in a near field communication exchange between the paired control device 130a and playback device 110 e.

In another example, fig. 22D illustrates another example playback session exchange between the earpiece 710b and the playback device 110 e. In this example, the hold-close action of the bridging device 860a (paired with the earpiece 710b) triggers the playback session exchange. In this example, both the source playback device (i.e., earpiece 710b) and the target playback device (i.e., playback device 110e) are identified by a hold-off action, which results in a near field communication exchange between the paired bridge device 860a and playback device 110 e.

In some cases, the target playback device is a member of a synchrony group, such as a bonded zone (e.g., a stereo pair, e.g., master bedroom 101b or surround sound configuration (e.g., study 101d)) or a group of zones (e.g., a "kitchen + restaurant" group). As described above, example synchronization techniques involve a group coordinator providing audio content and timing information to one or more group members to facilitate synchronized playback between the group coordinator and the group members. In such an example, the target playback device may be a group coordinator (providing audio content and timing information to the group members) or a group member (receiving audio content and timing information from the group coordinator).

In an example embodiment, when the group coordinator is designated as the target playback device, the group coordinator may automatically "take" the group members during a playback session exchange by providing the group members with audio content and timing information corresponding to the exchanged playback session as a result of the synchronized group arrangement. That is, since the group members receive audio content and timing information from the group coordinator, when the group coordinator starts playing back the exchanged playback session, the group members also start playing back the exchanged playback session.

Generally, when a playback session exchange is initiated via the GUI of the control device 130a or the VUI of the NMD 120a, a binding or group becomes targeted as a whole by referring to the name of the binding, group or member zone. In local implementations, the control device 130a or the NMD 120a may send one or more messages to the group coordinator indicating the playback session exchange and then perform the exchange. In a cloud implementation, the control device 130a or the NMD 120a may send one or more messages to the cloud queue server indicating the playback session exchange to cause the cloud queue server to perform the exchange or relay instructions to the group coordinator to perform the playback session exchange.

In other cases, the group members are targeted for the exchange (e.g., by providing an input to the user interface of the group members indicating an exchange command). In local implementations, the group coordinator may send one or more messages to the group coordinator indicating an exchange command and then perform a playback session exchange. In a cloud implementation, the group members may send one or more messages to the cloud queue server indicating the playback session exchange to cause the cloud queue server to perform the exchange or to relay instructions to the group coordinator to perform the playback session exchange. Alternatively, the group members may send one or more messages to the group coordinator indicating the playback session exchange, which causes the group coordinator to send a playback session exchange request to the cloud server.

XI additional Portable playback device example

Fig. 23A is a front isometric view of an earplug 2310 comprising earplugs 2310a and 2310b configured in accordance with aspects of the disclosed technology. As shown, the earbud 2300 is carried in a charging box 2380.

Fig. 23B is a bottom view of the charging box 2380.

Fig. 23C is a top view of the charging box 2380.

Fig. 23D is a first side view of the charging box 2380.

Fig. 23E is a second side view of the charging box 2380.

Fig. 23F is a front isometric view of earplugs 2310a and 2310b showing an exemplary arrangement of a charging box 2380.

Fig. 23F is an isometric view of the earplug 2310 a.

Fig. 23H is a first side view of the earplug 2310 a.

Fig. 23I is a second side view of the earplug 2310 a.

Fig. 23J is a third side view of the earplug 2310 a.

Fig. 23K is a fourth side view of the earplug 2310 a.

Fig. 23L is a fifth side view of the earplug 2310 a.

Fig. 23M is a sixth side view of the earplug 2310 a.

Fig. 24A is a front isometric view of a portable playback device 2410 implemented as a handheld speaker configured in accordance with aspects of the disclosed technology.

Fig. 24B is a side view of the portable playback device 2410.

Fig. 24C is a top view of the portable playback device 2410.

Fig. 24D is a bottom view of the portable playback device 2410.

Fig. 24E is a front isometric view showing a portable playback device 2410 with an exemplary arrangement of device bases 2418.

Fig. 24F is a front isometric view of the portable playback device 2410 showing an exemplary user input to the portable playback device 2410.

Fig. 25A is a front view of a headset 2510 configured in accordance with aspects of the disclosed technology.

Fig. 25B is a first side view of the headset 2510.

Fig. 25C is a second side view of the headset 2510.

Fig. 26A is a front view of a headset 2610 configured in accordance with aspects of the disclosed technology.

Fig. 26B is a first side view of the earpiece 2610.

Fig. 26C is a second side view of the earpiece 2610.

Conclusion XII

The above discussion of portable playback devices, control devices, playback zone configurations, and media content sources provides but a few examples of operating environments in which the functions and methods described below may be implemented. Configurations of media playback systems, playback devices, and network devices and other operating environments not explicitly described herein may also be applicable and suitable for implementation of the functions and methods.

The above description discloses, among other things, various example systems, methods, apparatus, and articles of manufacture including, among other things, firmware and/or software executed on hardware. It should be understood that these examples are illustrative only and should not be considered as limiting. For example, it is contemplated that any or all of these firmware, hardware, and/or software aspects or components could be embodied exclusively in hardware, exclusively in software, exclusively in firmware, or in any combination of hardware, software, and/or firmware. Accordingly, the examples provided are not the only way to implement such systems, methods, apparatus, and/or articles of manufacture.

Furthermore, references herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one exemplary embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Thus, those skilled in the art will explicitly and implicitly appreciate that the embodiments described herein can be combined with other embodiments.

The description is presented primarily in terms of illustrative environments, systems, processes, steps, logic blocks, processing, and other symbolic representations that are directly or indirectly analogous to the operation of a data processing device coupled to a network. These process descriptions and representations are generally used by those skilled in the art to convey the substance of their work to others skilled in the art. Numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be understood by those skilled in the art that the present disclosure may be practiced without these specific, specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments. The scope of the disclosure is, therefore, indicated by the appended claims rather than by the foregoing description of the embodiments.

When any of the appended claims are read to cover a purely software and/or firmware implementation, at least one element in at least one example is hereby expressly defined to include a non-transitory tangible medium such as a memory, DVD, CD, Blu-ray, etc. storing the software and/or firmware.

Example 1: a method, comprising: detecting a playback session exchange trigger corresponding to a playback session when a first playback device plays back audio content during the playback session; determining (a) one or more source playback devices and (b) one or more target playback devices, the one or more source playback devices including the first playback device, the one or more target playback devices including a second playback device; and transition the playback session from the determined one or more source playback devices to the one or more target playback devices based on the playback session exchange trigger.

Example 2: the method of example 1, wherein transitioning the playback session from the determined one or more source playback devices to the one or more target playback devices comprises: forming a synchronization group comprising the first playback device and the second playback device such that the first playback device and the second playback device play back the audio content in synchronization; and muting the first playback device.

Example 3: the method of example 1 or 2, wherein transitioning the playback session from the determined one or more source playback devices to the one or more target playback devices comprises: sending an instruction to a cloud queue server to transfer the playback session from the first playback device to the second playback device, wherein the cloud queue server transfers the playback session to the second playback device based on the instruction.

Example 4: the method of any of the preceding examples, wherein transitioning the playback session from the determined one or more source playback devices to the one or more target playback devices comprises: transmitting data representing (i) a Uniform Resource Identifier (URI) indicative of a source of the audio content and (ii) an offset within the audio content to the second playback device, wherein the second playback device streams the audio content from the source of the audio content and begins playing back the audio content at the offset, and wherein the first playback device stops playing back the audio content at the offset.

Example 5: the method of any of the preceding examples, wherein the first playback device comprises: at least one processor; a data storage device; one or more amplifiers; one or more transducers; one or more batteries configured to drive the one or more amplifiers and the one or more transducers; and one or more housings carrying the at least one processor, the data storage device, the one or more amplifiers, the one or more transducers, and the one or more batteries, wherein the one or more housings are formed as at least one of (a) headphones or (b) earplugs.

Example 6: the method of any of the preceding examples, wherein the first playback device is paired with a control device via a first type of wireless connection, and wherein the first playback device is connected to the second playback device via the first type of wireless connection and a second type of wireless connection between the control device and the second playback device.

Example 7: the method of any of the preceding examples, wherein detecting the playback session exchange trigger comprises: detecting, via a user interface of the control device, an input representing a command to exchange the playback session.

Example 8: the method of any of the preceding examples, wherein detecting the input representing the command to exchange the playback session comprises: detecting a touch to a touch sensitive area on the first playback device and holding an input, wherein the touch input performs a first action, the first action not being a swap.

Example 9: the method of any of the preceding examples, wherein detecting the input representing the command to exchange the playback session comprises: detecting a touch and hold input to a touch sensitive area on the first playback device, wherein touch input performs a first action and touch and hold performs a group action, and wherein the first action is not a swap.

Example 10: the method of any of examples 1-5, wherein the first playback device is paired with a bridge device via a first type of wireless connection, and wherein the first playback device is connected to the second playback device via the first type of wireless connection and a second type of wireless connection between the bridge device and the second playback device.

Example 11: the method of example 10, wherein detecting the playback session exchange trigger comprises: detecting, via a user interface of the bridge device, an input representing a command to exchange the playback session.

Example 12: the method of example 10 or 11, wherein the bridging device comprises a circular housing, and wherein the method further comprises: detecting rotation of the circular housing; and adjusting a playback volume of the first playback device in proportion to the rotation.

Example 13: the method of any of examples 1-12, wherein the first playback device comprises: at least one processor; a data storage device; one or more amplifiers; one or more transducers; one or more batteries configured to drive the one or more amplifiers and the one or more transducers; and a housing carrying the at least one processor, the data storage device, the one or more amplifiers, the one or more transducers, and the one or more batteries, wherein the housing is formed as a handheld speaker.

Example 14: the method of example 13, wherein detecting the playback session exchange trigger comprises: detecting that the housing is placed in a device base.

Example 15: the method of any of the preceding examples, wherein the second playback device does not include a battery and draws current from a wall power source.

Example 16: the method of any of the preceding examples, wherein detecting the playback session exchange trigger comprises: detecting proximity of the second playback device to the first playback device.

Example 17: the method of any of the preceding examples, wherein determining the one or more target playback devices comprises: detecting proximity of the second playback device to the first playback device.

Example 18: the method of any of the preceding examples, wherein the one or more target playback devices further comprise a third playback device, and wherein determining the one or more target playback devices comprises: determining that the third playback device and the second playback device are configured as a synchrony group.

Example 19: a system configured to perform the method of any of examples 1-18.

Example 20: an apparatus configured to perform the method of any one of examples 1-18.

Example 21: a tangible, non-transitory computer-readable medium having instructions stored therein that are executable by one or more processors to perform a method according to any one of examples 1-18.

Example 22: a portable playback device comprising: at least one processor; a network interface; one or more amplifiers; one or more transducers; one or more batteries configured to drive the one or more amplifiers and the one or more transducers; and one or more housings formed as (a) an earbud or (b) a headset, the one or more housings carrying the at least one processor, the network interface, the one or more amplifiers, the one or more transducers, and the one or more batteries, and a data storage device having stored therein instructions executable by the one or more processors to perform the method of any of examples 1-18.

Example 23: a method involving a wearable device, the method comprising: receiving data representing a first playback session exchange input; based on receiving data representative of the first playback session exchange input, identifying one or more source playback devices within a media playback system that (a) connect to a first wireless Local Area Network (LAN) and (b) play back particular audio content in a playback session, wherein the wearable playback device is connected to the first wireless LAN via an 802.11-compliant network interface; and transitioning the playback session from the determined one or more source playback devices to the wearable playback device, wherein transitioning the playback session comprises (i) forming a first synchronization group that includes the wearable playback device and the one or more source playback devices, wherein forming the first synchronization group causes the wearable playback device to begin playing particular audio content of the playback session, and (ii) stopping playback of the particular audio content on the one or more source playback devices.

Example 24: the method of example 23, wherein identifying the one or more source playback devices comprises: identifying, in the media playback system, a set of playback devices that meet an exchange condition; causing the set of exchange-eligible playback devices to emit corresponding audio chirps that identify an exchange-eligible transmitting playback device; detecting, via one or more microphones, an audio chirp issued by one or more playback devices that are eligible for swapping; and selecting the one or more source playback devices from the one or more exchange-eligible playback devices based on an audio chirp from the one or more source playback devices indicating that the one or more source playback devices are physically closest to the wearable playback device among the one or more exchange-eligible playback devices.

Example 25: the method of example 24, wherein the one or more microphones comprise one or more acoustic noise cancellation microphones carried on one or more outer surfaces of one or more wearable housings, and wherein detecting an audio chirp emitted by one or more swap-eligible playback devices comprises: detecting, via the one or more acoustic noise cancellation microphones, an audio chirp emitted by one or more playback devices that are eligible for swapping.

Example 26: the method of any of preceding examples 23-25, wherein selecting the one or more source playback devices from the one or more swap-eligible playback devices comprises: comparing one or more respective metrics of the detected audio chirps issued by one or more swap-eligible playback devices to determine that the one or more source playback devices are physically closest to the wearable playback device among the one or more swap-eligible playback devices.

Example 27: the method of any of preceding examples 23-26, further comprising: receiving data representing a second playback session exchange input while playing back audio content in the converted playback session; identifying one or more target playback devices within the media playback system connected to the first wireless LAN based on the second playback session exchange input; and transitioning the playback session from the determined one or more target playback devices to the wearable playback device, wherein transitioning the playback session includes (i) forming a second synchrony group that includes the wearable playback device and the one or more target playback devices, wherein forming the second synchrony group causes the one or more target playback devices to begin playing the particular audio content of the playback session, and (ii) removing the wearable playback device from the second synchrony group.

Example 28: the method of any of preceding examples 23-27, wherein one or more wearable housings of the wearable playback device include a touch-sensitive region, and wherein receiving data representative of the playback session exchange input includes receiving input data representative of a touch on the touch-sensitive region and a hold input.

Example 29: the method of any preceding example 23-28, wherein receiving data representative of the playback session exchange input comprises: data representing instructions to perform a playback session exchange is received from a controller application on a mobile device via the 802.11-compliant network interface.

Example 30: the method of any of preceding examples 23-29, wherein stopping playback of the particular audio content on the one or more source playback devices comprises: after forming a synchronization group including the wearable playback device and one or more source devices, causing the one or more source devices to be removed from the synchronization group.

Example 31: the method of any of preceding examples 23-30, wherein the one or more source devices comprise a primary playback device configured to play back multi-channel audio, and wherein converting the playback session comprises: transmitting data representing an instruction to enter a switched mode to the primary playback device via the 802.11-compliant network interface; transmitting data representing an instruction to enter a switched mode to the primary playback device via the 802.11-compliant network interface; disconnect from the first wireless LAN via the 802.11-compatible network interface and connect to the second wireless LAN; and receiving, via the 802.11-compliant network interface, data representing (i) playback timing information for the first synchronization group and (ii) the multi-channel audio when connected to the second wireless LAN.

Example 32: the method of any of preceding examples 23-31, wherein the wearable playback device comprises: one or more network interfaces, wherein the one or more network interfaces comprise an 802.11-compatible network interface; one or more transducers; one or more amplifiers configured to drive the one or more transducers; one or more batteries; one or more processors; one or more wearable housings, the one or more wearable housings carrying the one or more network interfaces, the one or more transducers, the one or more amplifiers, the one or more batteries, the one or more processors, and a data storage device having instructions stored thereon, the instructions being executable by the one or more processors to cause the wearable playback device to perform the method of any of the preceding examples 23-31.

Example 33: the method of example 32, wherein the one or more wearable housings of the wearable playback device are formed as one of (a) an earpiece or (b) one or more earplugs.

Example 34: a system configured to perform the method of any of examples 23-32.

Example 35: a device configured to perform the method of any of examples 23-32.

Example 36: a tangible, non-transitory computer-readable medium having instructions stored therein that are executable by one or more processors to perform a method according to any one of examples 23-32.

Example 37: a method involving a wearable device, the method comprising: receiving data representing a first playback session exchange input; based on receiving data representative of the first playback session exchange input, identifying one or more source playback devices within a media playback system that (a) connect to a first wireless Local Area Network (LAN) and (b) play back particular audio content in a playback session, wherein the wearable playback device is connected to the first wireless LAN via an 802.11-compliant network interface; and transitioning the playback session from the determined one or more source playback devices to the wearable playback device, wherein transitioning the playback session comprises (i) forming a first synchronization group that includes the wearable playback device and the one or more source playback devices, wherein forming the first synchronization group causes the wearable playback device to begin playing particular audio content of the playback session, and (ii) stopping playback of the particular audio content on the one or more source playback devices.

Example 38: the method of example 37, wherein identifying the one or more source playback devices comprises: identifying, in the media playback system, a set of playback devices that meet an exchange condition; causing the set of exchange-eligible playback devices to emit corresponding audio chirps that identify an exchange-eligible transmitting playback device; detecting, via the one or more microphones, an audio chirp issued by one or more playback devices that are eligible for swapping; and selecting the one or more source playback devices from the one or more exchange-eligible playback devices based on an audio chirp from the one or more source playback devices indicating that the one or more source playback devices are physically closest to the wearable playback device among the one or more exchange-eligible playback devices.

Example 39: the method of example 38: wherein the one or more microphones comprise one or more acoustic noise cancellation microphones carried on one or more external surfaces of one or more wearable housings, and wherein detecting an audio chirp emitted by one or more swap-eligible playback devices comprises: detecting, via the one or more acoustic noise cancellation microphones, an audio chirp emitted by one or more playback devices that are eligible for swapping.

Example 40: the method of any of preceding examples 37-39, wherein selecting the one or more source playback devices from the one or more swap-eligible playback devices comprises: comparing one or more respective metrics of the detected audio chirps issued by one or more swap-eligible playback devices to determine that the one or more source playback devices are physically closest to the wearable playback device among the one or more swap-eligible playback devices.

Example 41: the method of any of the preceding examples 37-40, further comprising: receiving data representing a second playback session exchange input while playing back audio content in the converted playback session; identifying one or more target playback devices within the media playback system connected to the first wireless LAN based on the second playback session exchange input; and transitioning the playback session from the determined one or more target playback devices to the wearable playback device, wherein transitioning the playback session includes (i) forming a second synchrony group that includes the wearable playback device and the one or more target playback devices, wherein forming the second synchrony group causes the one or more target playback devices to begin playing the particular audio content of the playback session, and (ii) removing the wearable playback device from the second synchrony group.

Example 42: the method of any of preceding examples 37-41, wherein one or more wearable housings of the wearable playback device include a touch-sensitive region, and wherein receiving data representative of the playback session exchange input includes receiving input data representative of a touch on the touch-sensitive region and a hold input.

Example 43: the method of any of preceding examples 37-42, wherein receiving data representative of the playback session exchange input comprises: data representing instructions to perform a playback session exchange is received from a controller application on a mobile device via the 802.11-compliant network interface.

Example 44: the method of any of preceding examples 37-43, wherein stopping playback of the particular audio content on the one or more source playback devices comprises: after forming a synchronization group including the wearable playback device and one or more source devices, causing the one or more source devices to be removed from the synchronization group.

Example 45: the method of any of preceding examples 37-44, wherein the one or more source devices include a primary playback device configured to play back multi-channel audio, and wherein converting the playback session includes: transmitting data representing an instruction to enter a switched mode to the primary playback device via the 802.11-compliant network interface; receiving, via the 802.11-compliant network interface to the primary playback device, data representing (i) a Service Set Identifier (SSID) of a second wireless LAN, a second wireless LAN formed by the primary playback device, and (ii) credentials of the second wireless LAN; disconnect from the first wireless LAN via the 802.11-compatible network interface and connect to the second wireless LAN; and receiving, via the 802.11-compliant network interface, data representing (i) playback timing information for the first synchronization group and (ii) the multi-channel audio when connected to the second wireless LAN.

Example 45: the method of any of the preceding examples 37-44, wherein the wearable playback device comprises: one or more network interfaces, wherein the one or more network interfaces comprise an 802.11-compatible network interface; one or more transducers; one or more amplifiers configured to drive the one or more transducers; one or more batteries; one or more processors; one or more wearable housings, the one or more wearable housings carrying the one or more network interfaces, the one or more transducers, the one or more amplifiers, the one or more batteries, the one or more processors, and a data storage device having instructions stored thereon, the instructions being executable by the one or more processors to cause the wearable playback device to perform the method of any of the preceding examples 37-44.

Example 46: the method of example 45, wherein the one or more wearable housings of the wearable playback device are formed as one of (a) an earpiece or (b) one or more earplugs.

Example 47: a system configured to perform the method of any of examples 37-46.

Example 48: an apparatus configured to perform the method of any one of examples 37-46.

Example 49: a tangible, non-transitory computer-readable medium having instructions stored therein that are executable by one or more processors to perform a method according to any one of examples 37-46.

Example 50: a method involving a first playback device and a second playback device, the method comprising: playing back audio received via the audio input interface while in a home theater mode, wherein the first playback device is a master device of a first synchronization group; receiving data representing an instruction to transition to exchange mode from the second playback device via an 802.11-compliant network interface while in the home theater mode; switching from the home theater mode to an exchange mode with the second playback device based on receiving data representing an instruction to enter the exchange mode with the second playback device, wherein switching from the home theater mode to the exchange mode comprises: transitioning the 802.11 compatible network interface from operating as a node in a mesh network to operating as an access point, the access point forming a first wireless Local Area Network (LAN) in a first wireless frequency band; transmitting data representing (i) a Service Set Identifier (SSID) of the first wireless LAN and (ii) a certificate of the first wireless LAN to the second playback device via the 802.11-compliant network interface; forming a second sync group including the first playback device and the second playback device after connecting to a first wireless LAN formed by the first playback device; receiving, via an 802.11-compliant network interface to the second playback device, data representing (i) the playback timing information for the second synchrony group and (ii) the audio, wherein the second playback device plays back the audio; and after joining the second synchronization group, playing back the audio in synchronization with the first playback device, wherein the first playback device mutes the playback of the audio while the second playback device plays back the audio.

Example 51: the method of example 50, wherein the first synchronization group comprises the first playback device and one or more satellite playback devices, wherein the audio comprises multi-channel audio, and wherein playing back the multi-channel audio comprises: transmitting, to the one or more satellite playback devices via the 802.11-compliant network interface, data representing (i) the playback timing information for the first synchronization group and (ii) respective channels of the multi-channel audio, and wherein transitioning from the home theater mode to the exchange mode further comprises: causing the one or more satellite playback devices to (i) connect to a second wireless LAN in a second wireless band and (ii) leave the first synchronization group.

Example 52: the method of example 51, further comprising: detecting an event indicative of a trigger to transition from operating in the exchange mode to operating in the home theater mode; after detecting the event, switching from the exchange mode to the home theater mode, wherein switching from the exchange mode to the home theater mode comprises: connecting the one or more satellite playback devices to the mesh network; transitioning the 802.11 compatible network interface from operating as the access point to operating as a node in the mesh network; and re-forming a first synchronization group comprising the first playback device and the one or more satellite playback devices; operating in the home theater mode, transmitting, to the one or more satellite playback devices via the 802.11-compliant network interface, data representing (i) the first synchronization group's playback timing information and (ii) respective channels of the multi-channel audio; and playing back one or more channels of the multi-channel audio in synchronization with the one or more satellite playback devices playing back the respective channels of the multi-channel audio.

Example 53: the method of example 52, wherein detecting the event comprises detecting that the first wireless playback device has disconnected from the first wireless LAN.

Example 54: the method of any of preceding examples 50-53, wherein the audio received via the audio input interface comprises surround sound tracks, and wherein the functions further comprise downmixing the surround sound tracks into stereo tracks, and wherein sending the data representative of the audio comprises sending the data representative of the stereo tracks to the second playback device.

Example 55: the method of any of preceding examples 50-54, further comprising: while in the switched mode, receiving data representing an instruction to transition to the switched mode from a third playback device via the 802.11-compatible network interface; based on receiving data from the third playback device representing an instruction to enter the swap mode, joining the third playback device to the second synchronization group, wherein joining the third playback device to the second synchronization group comprises: transmitting data representing (i) an SSID of the first wireless LAN and (ii) a certificate of the first wireless LAN to the third playback device via the 802.11-compliant network interface; adding the third playback device to a second synchrony group including the first playback device and the second playback device after the third playback device is connected to a first wireless LAN formed by the first playback device; and transmitting data representing (i) the playback timing information for the second synchronization group and (ii) the audio via an 802.11-compliant network interface to the third playback device, wherein the third playback device plays back the audio in synchronization with the second playback device.

Example 56: the method of any of preceding examples 50-55, further comprising: playing back audio content received via the one or more network interfaces while in music mode; receiving, from the second playback device via the 802.11-compatible network interface, data representing instructions to form a third synchrony group with the second playback device while playing back the audio content in the music mode; forming the third synchronization group with the second playback device, wherein forming the third synchronization group with the second playback device configures the first playback device to play the audio content in synchronization with the second playback device; and leaving the third synchrony group after forming the third synchrony group with the second playback device, wherein the second playback device is a master device for the third synchrony group.

Example 57: the method of example 56, further comprising: receiving data representing a playback session exchange trigger; identifying, in a playback session, one or more source playback devices within a media playback system that play back particular audio content based on receiving data representative of the playback session exchange trigger, wherein identifying the one or more source devices comprises: identifying a set of playback devices in the media playback system that meet an interchange condition, the set including the first playback device; causing the set of exchange-eligible playback devices to emit corresponding audio chirps that identify an exchange-eligible transmitting playback device; detecting, via the one or more microphones, an audio chirp by one or more swap-eligible playback devices from the set of swap-eligible playback devices, the one or more swap-eligible playback devices comprising the first playback device; and selecting the first playback device from the one or more swap-eligible playback devices as the one or more source playback devices based on an audio chirp from the first playback device indicating that the first playback device is physically closest to the second playback device among the one or more swap-eligible playback devices.

Example 58: the method of example 56, further comprising: while in the music mode and prior to receiving data representing instructions to form the third synchrony group with the second playback device, receiving data representing instructions to issue a particular audio chirp from the second playback device via the 802.11-compatible network interface; and issuing, via the one or more transducers, a particular audio chirp based on receiving data representing an instruction to issue the particular audio chirp.

Example 59: the method of any of the preceding examples 50-58, wherein the second playback device includes one or more housings, and wherein the one or more housings are formed as one of (a) headphones or (b) a set of earplugs.

Example 60: the method of any preceding example 50-59, wherein the first playback device comprises: an audio input interface; one or more network interfaces, wherein the one or more network interfaces comprise an 802.11-compatible network interface; one or more transducers; one or more amplifiers configured to drive the one or more amplifiers; and a housing carrying the audio input interface, the one or more network interfaces, the one or more transducers, the one or more amplifiers, the one or more processors, and a data storage device having instructions stored thereon, the instructions being executable by the one or more processors to cause a soundbar type playback device to perform the method according to any of the preceding examples 50-59.

Example 61: the method of any of preceding examples 50-60, wherein the second playback device comprises: one or more network interfaces, wherein the one or more network interfaces comprise an 802.11-compatible network interface; one or more transducers; one or more amplifiers configured to drive the one or more transducers; one or more batteries; one or more processors; a housing carrying the one or more network interfaces, the one or more transducers, the one or more amplifiers, the one or more batteries, the one or more processors, and a data storage device having instructions stored thereon, the instructions being executable by the one or more processors to cause the wearable playback device to perform the method of any of the preceding examples 50-60.

Example 62: a system configured to perform the method of any of examples 50-61.

Example 64: an apparatus configured to perform the method of any of examples 50-61.

Example 65: a tangible, non-transitory computer-readable medium having instructions stored therein that are executable by one or more processors to perform a method according to any of examples 50-61.

134页详细技术资料下载

Playback switching between audio devices

相关技术

网友询问留言