Digitally representing user engagement targeted content based on biometric sensor data

文档序号：1048006 发布日期：2020-10-09 浏览：9次中文

阅读说明：本技术 基于生物特征传感器数据数字地表示用户参与定向内容 (Digitally representing user engagement targeted content based on biometric sensor data ) 是由 A·A·查佩尔 L·S·奥斯特罗夫于 2018-09-28 设计创作，主要内容包括：一种用于获得用户参与音频-视频内容的数字表示的计算机实施方法包括：由输出设备播放包括音频-视频内容的数字数据,该输出设备基于数字数据输出音频-视频输出；以及从至少一个传感器接收传感器数据,该至少一个传感器被定成感测一个或多个用户在参与音频-视频输出时的非自愿响应。该方法还包括基于传感器数据确定内容参与能力(CEP)值,并且将CEP的数字表示记录在计算机存储器中。一种装置被配置为使用硬件、固件和/或软件来执行该方法。(A computer-implemented method for obtaining a digital representation of user engagement with audio-video content comprising: playing, by an output device, digital data including audio-video content, the output device outputting an audio-video output based on the digital data; and receiving sensor data from at least one sensor, the at least one sensor being dimensioned to sense an involuntary response of one or more users while participating in the audio-video output. The method also includes determining a content engagement Capability (CEP) value based on the sensor data and recording a digital representation of the CEP in a computer memory. An apparatus is configured to perform the method using hardware, firmware, and/or software.)

1. A computer-implemented method for digitally representing user engagement with audio-video content in a computer memory, the method comprising:

playing, by an output device, digital data comprising audio-video content, the output device outputting an audio-video output based on the digital data;

receiving, by at least one computer processor, sensor data from at least one sensor positioned to sense involuntary responses of one or more users while participating in the audio-video output;

determining, by an algorithm executed by the at least one computer processor, at least one digital representation of a content engagement Capability (CEP) based on the sensor data; and

recording the at least one digital representation of the CEP in a computer memory.

2. The method of claim 1, wherein determining the at least one digital representation of a CEP further comprises determining an arousal value based on the sensor data and comparing a stimulated average arousal based on the sensor data to an expected average arousal.

3. The method of claim 2, wherein the sensor data comprises one or more of: electroencephalogram EEG data, galvanic skin response GSR data, facial electromyogram fEMG data, electrocardiography EKG data, video facial action unit FAU data, brain-computer interface BMI data, video pulse detection VPD data, pupil dilation data, body chemistry sensing data, functional magnetic imaging fMRI data, and functional near infrared data fNIR.

4. The method of claim 2, further comprising determining the desired average arousal based on further sensor data measuring similar involuntary responses of the one or more users while participating in known audio-visual stimuli.

5. The method of claim 4, further comprising playing the known audio-visual stimuli, the known audio-visual stimuli comprising known non-evoked stimuli and known evoked stimuli.

6. The method of claim 4, wherein determining the at least one digital representation of a CEP further comprises detecting one or more stimulation events based on the sensor data exceeding a threshold for a period of time.

7. The method of claim 6, further comprising computing one of a plurality of event competencies for each of the one or more users and for each of the stimulation events, and aggregating the event competencies.

8. The method of claim 7, further comprising assigning a weight to each of the event capabilities based on one or more source identifications of the sensor data.

9. The method of claim 7, wherein determining the desired average arousal further comprises detecting one or more stimulation events based on the further sensor data exceeding a threshold for a period of time, and computing one of a plurality of desired abilities of the known audio-visual stimulation for the one or more users and for each of the stimulation events.

10. The method of claim 9, wherein determining the at least one digital representation of CEP comprises computing a ratio of a sum of the event capabilities to an aggregation of the desired capabilities.

11. The method of claim 2, further comprising determining a arousal error measure based on comparing the arousal value to a target emotional episode of the audio-video content.

12. The method of claim 11, wherein the targeted emotional episode includes a set of targeted arousal values, each targeted arousal value uniquely associated with a different interval of a continuous time series.

13. The method of claim 2, further comprising determining a digital representation of a degree of evaluation based on the sensor data.

14. The method of claim 13, wherein the sensor data comprises one or more of: electroencephalogram EEG data, facial electromyogram fmg data, video facial action unit FAU data, brain-computer interface BMI data, functional magnetic imaging fMRI data, body chemistry sensing data, misreading data, and functional near-infrared data fNIR.

15. The method of claim 13, further comprising normalizing the digital representation of an appraisal score based on similarity values collected for known audio-visual stimuli.

16. The method of claim 13, further comprising determining a rating error measure based on comparing the digital representation of a rating to a target emotional episode of the audio-video content.

17. The method of claim 16, wherein the target emotional episode includes a set of target numerical representations of ratings, each target numerical representation of ratings being uniquely associated with a different interval of a continuous time series.

18. The method of claim 1, wherein the at least one digital representation of a CEP comprises a sequence of digital representations of CEPs, wherein each component of the sequence is operated on based on a discrete period in the audio-video content.

19. The method of claim 1, further comprising outputting a symbolic representation of the at least one digital representation of a CEP to at least one of a display screen or an audio transducer.

20. The method of claim 19, further comprising recording the digital data of a live performance of at least one actor and outputting the digital representation to the display screen or the audio transducer arranged to be perceptible by the at least one actor.

21. An apparatus for digitally representing user engagement with audio-video content in a computer memory, comprising a processor coupled to a memory, the memory holding program instructions that, when executed by the processor, cause the apparatus to perform:

playing, by an output device, digital data comprising audio-video content, the output device outputting an audio-video output based on the digital data;

receiving sensor data from at least one sensor positioned to sense involuntary responses of one or more users while participating in the audio-video output;

determining at least one content engagement Capability (CEP) value based on the sensor data; and

recording the at least one digital representation of the CEP in a computer memory.

22. The apparatus of claim 21, wherein the memory further retains instructions for: the at least one digital representation of the CEP is determined, at least in part, by determining an arousal value based on the sensor data, and comparing a stimulated average arousal based on the sensor data to a desired average arousal.

23. The apparatus of claim 22, wherein the memory further retains instructions for: receiving the sensor data comprising one or more of electroencephalogram EEG data, electrodermal response GSR data, facial electromyogram fEMG data, electrocardiography EKG data, video facial action Unit FAU data, brain-machine interface BMI data, video pulse detection VPD data, pupil dilation data, functional magnetic imaging fMRI data, body chemistry sensing data, and functional near infrared data fNIR.

24. The apparatus of claim 22, wherein the memory further retains instructions for: determining the desired average arousal based on further sensor data measuring similar involuntary responses of the one or more users while participating in known audio-video stimuli.

25. The apparatus of claim 24, wherein the memory further retains instructions for: playing the known audio-visual stimuli, including known non-evoked stimuli and known evoked stimuli.

26. The apparatus of claim 24, wherein the memory further retains instructions for: determining the at least one digital representation of the CEP at least in part by detecting one or more stimulation events based on the sensor data exceeding a threshold for a period of time.

27. The apparatus of claim 26, wherein the memory further retains instructions for: computing one of a plurality of event competencies for each of the one or more users and for each of the stimulation events, and aggregating the event competencies.

28. The apparatus of claim 27, wherein the memory further retains instructions for: assigning a weight to each of the event capabilities based on one or more source identifications of the sensor data.

29. The apparatus of claim 27, wherein the memory further retains instructions for: detecting one or more stimulation events based on the further sensor data exceeding a threshold for a period of time, and computing one of a plurality of desired capabilities of the known audio-video stimulation for the one or more users and for each of the stimulation events.

30. The apparatus of claim 29, wherein the memory further retains instructions for: determining the at least one digital representation of the CEP at least in part by operating on a ratio of a sum of the event capabilities to an aggregation of the desired capabilities.

31. The apparatus of claim 22, wherein the memory further retains instructions for: determining a arousal error measure based on comparing the arousal value to a target emotional episode of the audio-video content.

32. The apparatus of claim 31, wherein the memory further retains instructions for the comparing, wherein the targeted emotional episode includes a set of targeted arousal values, each targeted arousal value uniquely associated with a different interval of a continuous time series.

33. The apparatus of claim 22, wherein the memory further retains instructions for: a digital representation of the evaluation degree is determined based on the sensor data.

34. The apparatus of claim 33, wherein the memory further retains instructions for determining a digital representation of a degree of evaluation, wherein the sensor data comprises one or more of: electroencephalogram EEG data, facial electromyogram fmg data, video facial action unit FAU data, brain-computer interface BMI data, functional magnetic imaging fMRI data, body chemistry sensing data, and functional near-infrared data fNIR.

35. The apparatus of claim 33, wherein the memory further retains instructions for: the digital representation of the degree of evaluation is normalized based on similarity values collected for known audio-visual stimuli.

36. The apparatus of claim 33, wherein the memory further retains instructions for: determining a rating error measure based on comparing the digital representation of ratings to a target emotional episode of the audio-video content.

37. The apparatus of claim 36, wherein the memory further retains instructions for determining the appraisal degree error based on the target emotional episode, the target emotional episode including a set of target digital representations of appraisal degrees, each target digital representation of appraisal degrees being uniquely associated with a different interval of a continuous time series.

38. The apparatus of claim 21, wherein the memory further retains instructions for determining the at least one digital representation of a CEP, the at least one digital representation of a CEP comprising a sequence of digital representations of CEPs, wherein each component of the sequence is operated on based on a discrete period in the audio-video content.

39. The apparatus of claim 21, wherein the memory further retains instructions for: outputting a symbolic representation of the at least one digital representation of the CEP to at least one of a display screen or an audio transducer.

40. The apparatus of claim 39, wherein the memory further retains instructions for: recording the digital data of a live performance of at least one actor and outputting the digital representation to the display screen or the audio transducer arranged to be perceptible by the at least one actor.

41. An apparatus for digitally representing user engagement with audio-video content in computer memory, comprising:

means for playing digital data including audio-video content by an output device that outputs an audio-video output based on the digital data;

means for receiving sensor data from at least one sensor positioned to sense involuntary responses of one or more users while participating in the audio-video output;

means for determining at least one content engagement Capability (CEP) value based on the sensor data; and

means for recording the at least one digital representation of the CEP in a computer memory.

Technical Field

The present disclosure relates to methods and apparatus for assessing user engagement with digitally targeted content based on sensor data indicative of an emotional state of the user.

Background

While new entertainment media and more spectacular effects bring an unprecedented entertainment experience to viewers, the basis for targeted content remains the story and actors. A successful movie combines an appealing story with convincing actors and a visually and audibly appealing arrangement for as broad an audience as possible, usually for the type of movie. The decision making is based on the director's artistic and business sensitivities, which typically develop years or months prior to first release. A large amount of manufacturing budget is spent on fixed products, which are only visible once by most viewers. Everyone has been the same end product. Directors cannot offer a product that everyone will resonate, so they create a product for a common or market advantage.

One technique used for narrative content is branching. Branching in computer-generated audio-video entertainment data may go up to the 1980 s or earlier. Today's complex video games blur the boundary between narrative and interactive entertainment, mixing branching and interactive technologies. Immersive entertainment technologies such as virtual and augmented reality offer more opportunities to engaging audiences. Data mining through machine learning enables new correspondences between low-level data and various objectives, including consumer preferences and trends, to be discovered. The proliferation of mobile phones and internet of things (IoT) devices has driven the explosive growth of network-connected sensors. It is now possible to collect more real-time batch data about content consumers than before.

In addition to forking, content producers use a number of techniques to gauge the appeal of the projected content before putting it into production. Once a feature is produced, surveys, focus groups, and test marketing can be completed to fine-tune the content narrative and plan marketing strategies. Collecting accurate and detailed audience response data by conventional methods is difficult and time consuming.

Accordingly, it is desirable to develop new methods and other new techniques for assessing user engagement with targeted content to overcome these and other limitations of the prior art and to help producers provide a more engaging entertainment experience for the next-day audience.

Disclosure of Invention

This summary and the following detailed description are to be construed as integrating additional features of the disclosure that may include additional and/or additional subject matter. The omission of any segment is not an indication of the priority or relative importance of any element described in the integrated application. Differences between these sections may include additional disclosure of alternative embodiments, additional details, or alternative descriptions of the same embodiments using different terminology, as is apparent from the respective disclosure.

In one aspect of the disclosure, a computer-implemented method for digitally representing user-engagement audio-video content in a computer memory includes playing, by an output device, digital data including audio-video content, the output device outputting an audio-video output based on the digital data. "user" refers to a person who is an audience member, who is experiencing targeted content as a consumer for entertainment purposes. The method also includes receiving, by at least one computer processor, sensor data from at least one sensor positioned to sense an involuntary (involuntary) response of one or more users while participating in the audio-video output. For example, the sensor data may include one or more of the following received from the respective sensor: electroencephalogram (EEG) data, Galvanic Skin Response (GSR) data, facial electromyogram (fmg) data, Electrocardiogram (EKG) data, video Facial Action Unit (FAU) data, brain-computer interface (BMI) data, Video Pulse Detection (VPD) data, pupil dilation data, functional magnetic imaging (fMRI) data, body chemistry sensing data, and functional near infrared data (fNIR). The method may also include determining, by an algorithm executed by the at least one computer processor, at least one digital representation of content engagement Capabilities (CEP) based on the sensor data, and recording the at least one digital representation of the CEP in a computer memory. Suitable algorithms are described in the following detailed description.

CEP is an objective, algorithmic, and digital electronic measure of a user's biometric state that is related to the user's engagement of stimuli (i.e., targeted content). We use two orthogonal measures for CEP, namely arousal (arousal) and evaluation (value). As used herein, "arousal" is used in accordance with its psychological meaning to mean in a physiologically alertness, arousal, and attention state or condition. A high arousal indicates interest and concern, and a low arousal indicates boredom and disinterest. "ratings" are also used herein in their attractive or aesthetic psychological sense. A positive rating indicates attraction and a negative rating indicates aversion.

The at least one digital representation of the CEP may comprise a sequence of digital representations of the CEP, wherein each component of the sequence is operated on the basis of a discrete period in the audio-video content. The method may include outputting a symbolic representation of at least one digital representation of the CEP to at least one of a display screen or an audio transducer. In some embodiments, the method may include recording digital data during a live performance of at least one actor and arranging the display screen or audio transducer to be still perceptible by the at least one actor during the live performance.

In one aspect, determining at least one digital representation of the CEP further comprises determining an arousal value based on the sensor data, and comparing the stimulated average arousal based on the sensor data to an expected average arousal. The method may include determining a desired average arousal based on further sensor data measuring similar involuntary responses of one or more users while participating in known audio-video stimuli. The desired average arousal is a numerical representation of each user's arousal response to a known stimulus, which is used to normalize differences between individual users. The method may further comprise playing known audio-visual stimuli comprising known non-evoked stimuli and known evoked stimuli, optionally simultaneously annotating substantially different conditions of the individual user, such as the user's initial arousal status, mood, fatigue, rest, health, medication, or intoxication.

In another aspect, determining at least one digital representation of the CEP may include detecting one or more stimulation events based on the sensor data exceeding a threshold for a period of time. Additionally, the method may include computing one of a plurality of event capabilities for each of the one or more users and for each of the stimulation events, and aggregating the event capabilities. The method may include assigning a weight to each of the event capabilities based on one or more source identifications of the sensor data. Determining the desired average arousal may also include detecting one or more stimulation events based on the further sensor data exceeding a threshold for a period of time, and computing one of a plurality of desired capabilities of known audio-visual stimuli for one or more users and for each of the stimulation events. Determining the at least one digital representation of the CEP may include calculating a ratio of a sum of the event capabilities to an aggregation of the desired capabilities.

In another aspect, the method may include determining a arousal error measure based on comparing the arousal value to a target emotional episode of the audio-video content. The targeted emotional episode may include a set of targeted arousal values, each uniquely associated with a different interval of a continuous time series, and the method may include determining a numerical representation of the appraisal degree based on the sensor data.

As part of determining the digital representation of the CEP or for other uses, the method may include determining a digital representation of a user's rating based on the sensor data. Sensor data suitable for evaluation determination may include, for example, any one or more of the following: electroencephalography (EEG) data, facial electromyography (fmg) data, video Facial Action Unit (FAU) data, brain-computer interface (BMI) data, functional magnetic imaging (fMRI) data, body chemistry sensing data, and functional near-infrared data (fNIR). The method may include normalizing the digital representation of the degree of evaluation based on similarity values collected for known audio-visual stimuli. In one aspect, the method may include determining a rating error measure based on comparing the digital representation of the rating to a target emotional episode of the audio-video content. The target emotional episode may be or may include a set of target digital representations of ratings and/or arousals, each target digital representation being uniquely associated with a continuous time sequence of digital audio-video content or a different interval of a sequence of frames.

The foregoing methods may be implemented in any suitable programmable computing device by providing program instructions in a non-transitory computer readable medium, which when executed by a computer processor, cause the device to perform the operations described. The processor may be local to the device and user, remotely located, or may comprise a combination of local and remote processors. An apparatus may include a computer or a set of connected computers used to produce targeted content for a content output device. The content output device may include, for example, a personal computer, a mobile phone, a notebook computer, a television or computer monitor, a projector, a virtual reality device, or an augmented reality device. Other elements of the apparatus may include, for example, an audio output device and a user input device, which participate in the performance of the method. An apparatus may include a virtual or augmented reality device, such as a headphone or other display that reacts to movements of a user's head and other body parts. The apparatus may include a biometric sensor that provides data that is used by the controller to determine the digital representation of the CEP.

To the accomplishment of the foregoing and related ends, the one or more examples comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the annexed drawings set forth in detail certain illustrative aspects and are indicative of but a few of the various ways in which the principles of the examples may be employed. Other advantages and novel features will become apparent from the following detailed description when considered in conjunction with the drawings and the disclosed examples, which include all such aspects and their equivalents.

Drawings

The features, nature, and advantages of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout.

FIG. 1 is a schematic block diagram illustrating aspects of a system and apparatus for digitally representing user engagement with audio-video content in computer memory based on biometric sensor data, the system and apparatus coupled to one or more distribution systems.

FIG. 2 is a schematic block diagram illustrating aspects of a server for digitally representing user engagement with audio-video content in computer memory based on biometric sensor data.

FIG. 3 is a schematic block diagram illustrating aspects of a client device for digitally representing user engagement with audio-video content in computer memory based on biometric sensor data.

FIG. 4 is a schematic diagram showing features of a virtual reality client for digitally representing user engagement with audio-video content in computer memory based on biometric sensor data.

Fig. 5 is a flow diagram illustrating high-level operations of a method of determining a digital representation of a CEP based on biometric sensor data collected during performance-oriented content.

FIG. 6 is a block diagram illustrating high-level aspects of a system for digitally representing user engagement with audio-video content in computer memory based on biometric sensor data.

Fig. 7A is a diagram indicating an arrangement of emotional states with respect to an axis of a two-dimensional emotional space.

Fig. 7B is a diagram indicating the arrangement of the emotional state with respect to the axis of the three-dimensional emotional space.

Fig. 8 is a data table showing an example of output from an application for measuring a Face Action Unit (FAU).

FIG. 9 is a graph illustrating a peak Galvanic Skin Response (GSR) in favor of associated emotional arousal.

FIG. 10 is a graph showing combined tonic (tonic) and phasic (phasic) GSR responses over time for correlating emotional arousal.

FIG. 11 is a table illustrating biometric input options for measuring content engagement capabilities.

FIG. 12 is a flow chart illustrating a process for determining a content rating based on biometric response data.

FIG. 13 is a flow chart illustrating a process for determining ratings, arousals, and content engagement capacity measures.

Fig. 14 is a block diagram illustrating a data processing use case in response to the contents of the biometric rating and arousal data.

FIG. 15 is a flow chart illustrating a use case for a user to participate in producing and distributing a digital representation of content.

FIG. 16 is a flow diagram illustrating aspects of a method for digitally representing user engagement with audio-video content in computer memory based on biometric sensor data.

17-18 are flow diagrams illustrating further optional aspects or operations of the method illustrated in FIG. 16.

FIG. 19 is a conceptual block diagram illustrating components of an apparatus or system for digitally representing user engagement with audio-video content in computer memory based on biometric sensor data.

Detailed Description

Various aspects are now described with reference to the drawings. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more aspects. It may be evident, however, that the various aspects may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing these aspects.

Referring to fig. 1, a method for digitally representing user engagement with audio-video content in computer memory based on biometric sensor data may be implemented in a client-server environment 100. Other architectures may also be suitable. In a network architecture, sensor data may be collected and processed locally and transmitted to a server that computes a digital representation of audio-video content in a user's participating computer memory based on biometric sensor data. As used herein, "targeted content" refers to digital audio-video content that is targeted, at least in part, according to a scheme designed to entertain or inform and simultaneously mobilize the viewer's mood according to a plan for adjusting narrative tension (sometimes referred to herein as an "emotional episode"). Targeted content is not limited to content presented in a movie theater or features of movie theater length, but may include any audio-video content, including content for mobile phones and other small screens, VR and AR content, and interactive games containing narrative elements (even if non-narrative elements are included). Narrative tension projected by an emotional episode may cause a user to experience sensations of opposite polarity, such as fear and confidence or opposition and attraction, at different times or in response to certain dramatic events, characters, objects, images, sounds, music, or other stimuli generated from digital audio-video content. The intensity of these sensations is generally related to the interest and pleasure that the user gains from the experience, and may also be planned by emotional episodes that increase with stress, reach climax, and ease. Users of movie content naturally react during the experience of an emotional episode by involuntarily entering a stressed neurological or neurophysiological state (e.g., a positive or negative emotion sometimes referred to as an opinion, and an intensity, magnitude, or strength sometimes referred to as an arousal (arousal)).

Targeted content may also be configured to support interactive features similar to video game features or may not include interactive features. The targeted content may be branched in response to data indicative of the mood of the user, or may be non-branched. The targeted content may include programs targeted using tools other than the conventional script, such as game programs, art programs, documentaries, and reality programs, in addition to content generated using the conventional script.

Users of targeted content react by naturally expressing their emotions during their experience with the visual, auditory, olfactory, or tactile sensations generated by the output device receiving the targeted content signal. If the targeted content is configured to support it, the user (more precisely, "player actor") may also actively interact with characters or other objects appearing in the targeted content. As used herein, a "player actor" is a user of a client device or interface equipped with or coupled to a biometric sensor that interacts with targeted content using the client device or interface by involuntarily entering a neurological or neurophysiological state (e.g., emotionalization) detected by the biometric sensor (whether or not direct input is also provided using the controller), causing the targeted content to change in response to a sensed biometric response without intentional action by the player actor. A data processing server, such as "math" server 110, may receive sensor data from biometric sensors positioned to detect neuro-, neuro-physiological-, or physiological-responses of audience members during consumption of targeted content. Server 100 may process the sensor data to obtain a digital representation indicating, as a function of time or video frames, a neuro-, neuro-or physiologic response (referred to as an "emotional response" for convenience) of the audience to the targeted content indicated along one or more measurement axes (e.g., arousal and appraisal). In an alternative embodiment, based on real-time emotional feedback, content adaptive Artificial Intelligence (AI) may adaptively adjust the targeted content to increase or maintain player actor's participation in the character's view of the narrative.

A suitable client-server environment 100 may include various computer servers and client entities that communicate via one or more networks, such as a Wide Area Network (WAN)102 (e.g., the internet) and/or a Wireless Communications Network (WCN)104 (e.g., a cellular telephone network). Which contains documents and application code compatible with Web protocols, including but not limited to HTML, XML, PHP, and JavaScript documents, for example, or executable scripts, the Web/application server 124 may serve an application for outputting targeted content and for collecting biometric sensor data from users experiencing the targeted content, in an alternative, the data collection application may be served from the math server 110, the cloud server 122, the blockchain entity 128, or the content data server 126.

The environment 100 may include one or more data servers 126 for storing: data, e.g., video, audio-video, audio, and graphical content components of targeted content for consumption using a client device; software for execution on or in conjunction with a client device, such as sensor control and emotion detection applications; and data collected from users or client devices. The data collected from the client device or user may include, for example, sensor data and application data. Sensor data may be collected by a background (non-user-oriented) application running on the client device and transmitted to a data receiving device (sink), such as cloud-based data server 122 or discrete data server 126. Application data refers to application state data, including but not limited to records of user interactions with the application or other application input, output, or internal states. The applications may include software for outputting targeted content, collecting biometric sensor data, and supporting functionality. Applications and data may be served from other types of servers, such as any server accessing the distributed blockchain data structure 128, or a peer-to-peer (P2P) server 116 such as may be provided by a set of client devices 118, 120 operating simultaneously as mini-servers or clients.

As used herein, a "user" is always a consumer of targeted content from which the system node collects emotional response data for determining a digital representation of participation in the targeted content. When actively participating in content via an avatar or other mechanism, the user may also be referred to herein as a "player actor". The viewer is not always the user. For example, the spectator may be a passive spectator from whom the system does not collect any emotional response data. As used herein, a "node" includes a client or server that participates in a computer network.

The network environment 100 may include various client devices, such as a mobile smartphone client 106 and a notebook client 108 connected to a server via the WCN 104 and WAN 102, or a mixed reality (e.g., virtual reality or augmented reality) client device 114 connected to a server via the router 112 and WAN 102. In general, a client device may be or may include a computer that a user uses to access targeted content provided via a server or from a local store. In one aspect, the data processing server 110 may determine a digital representation of biometric data for use in real-time or offline applications. Controlling branching or activity of objects in narrative content is an example of a real-time application. Offline applications may include, for example, "green light" production recommendations, automated screening of production recommendations before green light, automated or semi-automated packaging of promotional content (such as trailers or video advertisements), customized editing of content for a user or group of users (automated and semi-automated).

Fig. 2 shows a data processing server 200 for digitally representing a user's participation in audio-video content in a computer memory based on biometric sensor data, the data processing server 200 may operate in the environment 100, in a similar network or as a stand-alone server. The server 200 may include one or more hardware processors 202, 214 (two of which are shown). The hardware includes firmware. Each of the one or more processors 202, 214 may be coupled to an input/output port 216 (e.g., a universal serial bus port or other serial or parallel port), to a source 220 of sensor data and viewing history for indicating an emotional state of the user. The viewing history may include a log-level record of changes from a baseline script of the content package or an equivalent record of control decisions made in response to player actor emotional states and other inputs. The viewing history may also include content viewed on television, Netflix, and other sources. Any source containing a derived emotional episode can be used for input to an algorithm for digitally representing a user's participation in audio-video content in a computer memory based on biometric sensor data. Server 200 may track player actor actions and emotional responses across multiple content titles for an individual or group of peers. Some types of servers (e.g., cloud servers, server farms, or P2P servers) may include multiple instances of discrete servers 200 that cooperate to perform the functions of a single server.

The server 200 may include a network interface 218 for sending and receiving applications and data, including but not limited to sensor and application data for digitally representing user engagement with audio-video content in computer memory based on biometric sensor data. The content may be served from the server 200 to the client device or may be stored locally by the client device. If stored locally to the client device, the client and server 200 may cooperate to handle the collection of sensor data and transmission to the server 200 for processing.

Each processor 202, 214 of the server 200 may be operatively coupled to at least one memory 204, the memory 204 holding functional modules 206, 208, 210, 212 of one or more applications for performing the methods described herein. These modules may include, for example, a correlation module 206 that correlates the biometric feedback with one or more metrics such as arousal or appraisal. Correlation module 206 may include instructions that, when executed by processors 202 and/or 214, cause the server to correlate the biometric sensor data with one or more physiological or emotional states of the user using Machine Learning (ML) or other processes. Event detection module 208 may include functionality to detect an event based on the measure of emotion exceeding a data threshold. These modules may also include, for example, a normalization module 210. Normalization module 210 may include instructions that, when executed by processors 202 and/or 214, cause the server to normalize metric ratings, arousals, or other values using the baseline input. The modules may also include an arithmetic function 212, which when executed by the processor, causes the server to calculate content engagement Capabilities (CEPs) based on the sensor data and other outputs from the upstream modules. Details of determining CEP are disclosed later herein. Memory 204 may contain additional instructions, such as an operating system and support modules.

Referring to fig. 3, the content consumption device 300 generates biometric sensor data indicative of a physiological or emotional response of the user to the output generated from the targeted content signal. The apparatus 300 may comprise, for example, a processor 302, e.g., based on Intel^TMOr AMD^TMCentral processing unit based on 80x86 architecture, ARM^TMDesigned system on a chip, or any other suitable microprocessor. The processor 302 may be communicatively coupled to auxiliary devices or modules of the 3D environment apparatus 300 using a bus or other coupling. Optionally, the processor 302 and its coupled auxiliary devices or modules may be housed within the housing 301 or coupled to the housing 301, the housing 301 being, for example, a housing with a form factor of a television, a set-top box, a smart phone, wearable glasses, glasses or a visor, or other form factor (form factor).

A user interface device 324 may be coupled to the processor 302 for providing user control input to the media player and data collection process. The process may include outputting video and audio for a conventional flat screen or projection display device. In some embodiments, the targeted content control process may be or may include an audio-video output for an immersive mixed reality content display process operated by a mixed reality immersive display engine executing on processor 302.

The user control input may include, for example, a selection from a graphical user interface or other input (e.g., text or directional commands) generated via a touch screen, keyboard, pointing device (e.g., game controller), microphone, motion sensor, camera, or some combination of these or other input devices, represented by block 324. Such user interface devices 324 may be coupled to the processor 302 via input/output ports 326, such as a Universal Serial Bus (USB) or equivalent port. Control inputs may also be provided via sensors 328 coupled to the processor 302. The sensors 328 may be or may include, for example, motion sensors (e.g., accelerometers), position sensors, cameras or camera arrays (e.g., stereo arrays), biometric temperature or pulse sensors, touch (pressure) sensors, altimeters, orientation sensors (e.g., Global Positioning System (GPS) receivers and controllers), proximity sensors, motion sensors, smoke or vapor detectors, gyroscope position sensors, radio receivers, multi-camera tracking sensors/controllers, eye tracking sensors, microphones or microphone arrays, electroencephalogram (EEG) sensors, Galvanic Skin Response (GSR) sensors, facial electromyography (fmg) sensors, Electrocardiogram (EKG) sensors, video Facial Action Unit (FAU) sensors, brain-computer interface (BMI) sensors, Video Pulse Detection (VPD) sensors, motion sensors, smoke or vapor detectors, gyroscope position sensors, radio receivers, multi-camera tracking sensors/controllers, eye tracking sensors, microphones or microphone arrays, electroencephalogram, A pupil dilation sensor, a body chemistry sensor, a functional magnetic imaging (fMRI) sensor, a photoplethysmography (PPG) sensor, or a functional near infrared data (fNIR) sensor. One or more sensors 328 may detect biometric data used as indicators of the emotional state of the user, such as one or more of facial expression, skin temperature, pupil dilation, respiratory rate, muscle tone, nervous system activity, pulse, EEG data, GSR data, fmg data, EKG data, FAU data, BMI data, pupil dilation data, chemical detection (e.g., oxytocin) data, fMRI data, PPG data, or fNIR data. Additionally, the sensor(s) 328 may detect the user's context, e.g., the user's physical environment and the identified location, size, orientation, and motion of objects in the environment; motion or other state of the user interface display, such as motion of the virtual reality headset. The sensor may be built into the wearable article, or may be non-wearable, including the display device itself, or in accessory equipment, such as a smartphone, smart watch, or implantable medical monitoring device. The sensors may also be placed in nearby devices such as, for example, an internet-connected microphone and/or camera array device for hands-free network access.

Sensor data from one or more sensors 328 may be processed locally by CPU 302 to control display output, and/or transmitted to server 200 for real-time processing by the server, or for non-real-time processing. As used herein, "real-time" refers to processing in response to user input without any arbitrary delay between input and output; that is to say, as quickly as is technically possible. "non-real-time" or "offline" refers to batch or other use of sensor data that is not used to provide immediate control input to control the display, but may control the display after some arbitrary amount of delay.

To enable communication with another node of the computer network (e.g., the targeted content server 200), the client 300 may include a network interface 322, such as a wired or wireless Ethernet port. Network communications may be used, for example, to implement a multi-player experience, including immersive or non-immersive experiences of targeted content. The system may also be used for non-targeted multi-user applications such as social networking, group entertainment experiences, educational environments, video games, and the like. Network communications may also be used for data transfer between clients and other nodes of the network for purposes including data processing, content delivery, content control and tracking. The client may manage communications with other network nodes using a communication module 306, which communication module 306 handles application-level communication requirements and lower-level communication protocols, preferably without user management.

The display 320 may be coupled to the processor 302, for example, via a graphics processing unit 318 integrated in the processor 302 or a separate chip. Display 320 may include, for example, a flat panel color Liquid Crystal (LCD) display illuminated by a Light Emitting Diode (LED) or other lamp, a projector driven by an LCD display or by a Digital Light Processing (DLP) unit, a laser projector, or other digital display device. The display device 320 may be incorporated into a virtual reality headset or other immersive display system, or may be a computer monitor, a home theater or television screen, or a projector in a projection room or theater. Video output driven by a mixed reality display engine operating on the processor 302, or other application for coordinating user input with immersive content display and/or generating a display, may be provided to the display device 320 and output to the user as a video display. Similarly, an amplifier/speaker or other audio output transducer 316 may be coupled to the processor 302 via the audio processor 312. Audio output associated with the video output and generated by the media player module 308, the targeted content control engine, or other application may be provided to the audio transducer 316 and output to the user as audible sound. The audio processor 312 may receive analog audio signals from the microphone 314 and convert them to digital signals for processing by the processor 302. The microphone may be used as a sensor for detecting emotional states and as a device for user input of verbal commands or social verbal responses to NPCs or other player actors.

3D environment device 300 may also include Random Access Memory (RAM)304 that holds program instructions and data for fast execution or processing by the processor during control of targeted content in response to the emotional state of the user. When device 300 is powered down or otherwise inactive, program instructions and data may be stored in a long-term memory, such as a non-volatile magnetic, optical, or electronic memory storage device (not shown). Either or both of RAM 304 or storage devices may include a non-transitory computer-readable medium that holds program instructions that, when executed by processor 302, cause device 300 to perform the methods or operations described herein. The program instructions may be in any suitable high-level language (e.g., C, C + +, C #, JavaScript, PHP, or Java)^TM) Written and compiled to produce machine language code for execution by the processor.

Program instructions may be grouped into functional modules 306, 308 to facilitate coding efficiency and understandability. The communication module 306 may include coordinating communication of the sensor data (if metadata) to the calculation server. The sensor control module 308 may include controlling sensor operation and processing raw sensor data for transmission to a compute server. The modules 306, 308, even though recognizable as partitions or groupings in the source code, may not necessarily be recognizable as separate blocks of code in machine-level encoding. A code bundle that is directed to a particular type of function can be considered to comprise modules regardless of whether machine code on the bundle can be executed independently of other machine code. These modules may be high-level modules only. The media player module 308 may perform, in whole or in part, the operations of any of the methods described herein, as well as equivalents. The operations may be performed independently, or may be performed in cooperation with one or more other network nodes (e.g., server 200).

In addition to conventional 2D output or 3D output for display on a two-dimensional (planar or curved) screen (e.g., via a television, a mobile screen, or a projector), the content control methods disclosed herein may also be used with Virtual Reality (VR) or Augmented Reality (AR) output devices. Fig. 4 is a schematic diagram illustrating one type of immersive VR stereoscopic display device 400, which is exemplified in a more specific form factor as client 300. Client device 300 may be provided in a variety of form factors, with device 400 providing just one example of client device 300. The inventive methods, apparatus and systems described herein are not limited to a single form factor and may be used in any video output device suitable for content output. As used herein, "targeted content signal" includes any digital signal used for audio-video output of targeted content, which may be branched and interactive or non-interactive. In one aspect, targeted content may change in response to a detected emotional state of a user.

The immersive VR stereoscopic display device 400 can include a tablet support structure made of an opaque lightweight structural material (e.g., rigid polymer, aluminum, or cardboard) configured to support and allow removable placement of a portable tablet computing or smartphone device including a high-resolution display screen (e.g., an LCD display). The device 400 is designed to be worn close to the face of the user so that a wide field of view can be achieved using a small screen size (such as a smartphone). The support structure 426 holds a pair of lenses 422 relative to the display screen 412. The lens may be configured to enable a user to comfortably focus on the display screen 412, and the display screen 412 may be held approximately one to three inches from the user's eyes.

The device 400 may also include a viewing shield (not shown) coupled to the support structure 426 and configured from a soft, flexible, or other suitable opaque material to shape to fit the user's face and block external light. The shield may be configured to ensure that the only visible light source for the user is the display screen 412, thereby enhancing the immersive effects of using the device 400. A screen divider may be used to separate the screen 412 into independently driven stereoscopic regions, each region being visible through only a respective one of the lenses 422. Thus, the immersive VR stereoscopic display device 400 can be used to provide stereoscopic display output, thereby providing a more realistic perception of the 3D space for the user.

The immersive VR stereoscopic display device 400 may also include a bridge (not shown) for placement over the nose of the user to facilitate accurate positioning of the lenses 422 relative to the eyes of the user. The device 400 may also include an elastic strap or band 424, or other headgear, for fitting around and holding the device 400 to the user's head.

The immersive VR stereoscopic display device 400 can include additional electronic components of the display and communication unit 402 (e.g., a tablet or smartphone) in relation to the user's head 430. When the support 426 is worn, the user views the display 412 through a pair of lenses 422. The display 412 may be driven by a Central Processing Unit (CPU)403 and/or a Graphics Processing Unit (GPU)410 via an internal bus 417. The components of the display and communication unit 402 may also include, for example, one or more transmit/receive components 418 to enable wireless communication between the CPU and an external server via a wireless coupling. The transmit/receive component 418 may operate using any suitable high bandwidth wireless technology or protocol, including, for example, cellular telephone technology such as third, fourth, or fifth generation partnership project (3GPP) Long Term Evolution (LTE), also known as 3G, 4G, or 5G, global system for mobile communications (GSM) or Universal Mobile Telecommunications System (UMTS), and/or Wireless Local Area Network (WLAN) technology, for example, using protocols such as Institute of Electrical and Electronics Engineers (IEEE) 802.11. One or more transmit/receive components 418 may stream video data from a local or remote video server to the display and communication unit 402, and may uplink sensor and other data to the local or remote video server for control or audience response techniques as described herein.

The components of the display and communication unit 402 may also include one or more sensors 414 coupled to the CPU403, for example, via a communication bus 417. Such sensors may include, for example, an accelerometer/inclinometer array that provides orientation data indicative of the orientation of the display and communication unit 402. Since the display and communication unit 402 is fixed to the user's head 430, this data may also be calibrated to indicate the orientation of the head 430. The one or more sensors 414 may also include, for example, a Global Positioning System (GPS) sensor that indicates the geographic location of the user. The one or more sensors 414 may also include, for example, a camera or image sensor positioned to detect the orientation of one or more of the user's eyes, or to capture video images of the user's physical environment (for VR mixed reality), or both. In some embodiments, a camera, image sensor, or other sensor configured to detect a user's eyes or eye movements may be mounted in support structure 426 and coupled to CPU403 via bus 416 and a serial bus port (not shown), such as a Universal Serial Bus (USB) or other suitable communication port. The one or more sensors 414 may also include, for example, an interferometer positioned in the support structure 404 and configured to indicate the surface profile to the user's eye. The one or more sensors 414 may also include, for example, a microphone, microphone array, or other audio input transducer for detecting spoken user commands or verbal and non-verbal auditory responses to display output. The one or more sensors may include a human voice recognition (subvocalization) mask using electrodes as described in a paper published by Arnav Kapur, Pattie Maes, and Shreyas Kapur in 2018 on the american computer association ACM intelligent user interface congress. The words recognized by the human voice may be used as command inputs, as an indication of a degree of arousal or evaluation, or both. The one or more sensors may include, for example, an electrode or microphone for sensing heart rate, a temperature sensor configured for sensing skin or body temperature of the user, an image sensor coupled to the analysis module to detect facial expressions or pupil dilation, a microphone for detecting spoken and non-spoken utterances, or other biometric sensors for collecting biofeedback data including nervous system responses that can indicate mood via algorithmic processing, including any of the sensors already described in connection with fig. 3 at 328.

The components of the display and communication unit 402 may also include, for example, an audio output transducer 420, such as a speaker or piezoelectric transducer in the display and communication unit 402, or an audio output port for a headset or other audio output transducer mounted in headwear 424, or the like. The audio output device may provide surround sound, multi-channel audio, so-called "object oriented audio," or other audio track output that accompanies the stereo immersive VR video display content. The components of display and communication unit 402 may also include a memory device 408 coupled to CPU403, for example, via a memory bus. The memory 408 may store, for example, program instructions that, when executed by the processor, cause the apparatus 400 to perform the operations described herein. The memory 408 may also store data (e.g., audio-video data) in a library or buffer during streaming from a network node.

Having described examples of suitable clients, servers, and networks for performing methods for digitally representing user engagement with audio-video content in computer memory based on biometric sensor data, more detailed aspects of such methods will be discussed. Fig. 5 shows an overview of a method 500 for computing content engagement Capabilities (CEP), which may include four related operations in any functional order or in parallel. These operations may be programmed as executable instructions for a server, as described herein.

Correlation operation 510 uses an algorithm to correlate the biometric data of the user or group of users with an emotional indicator. Optionally, the algorithm may be a machine learning algorithm configured to process the context indication data in addition to the biometric data, which may improve accuracy. The contextual indication data may include, for example, user orientation, user location, time of day, day of week, ambient light level, ambient noise level, and the like. For example, if the user's environment is full of interference, the significance of the biofeedback data may be different than in a quiet environment.

The emotional indicator may be a symbolic value associated with an emotional episode. The indicator may have constituent elements, which may be quantitative or non-quantitative. For example, the indicator may be designed as a multidimensional vector that represents the strength of psychological diathesis (such as cognitive load, arousal, and appraisal). The psychological degree of evaluation is the attractive or expected state of an event, object or situation; the degree of evaluation is referred to as positive when the subject feels something good or attractive, and is referred to as negative when the subject feels the object is repulsive or unpleasant. Arousal is the state of alertness and attention of the subject. The machine learning algorithm may include at least one Supervised Machine Learning (SML) algorithm, for example, one or more of a linear regression algorithm, a neural network algorithm, a support vector algorithm, a naive bayes algorithm, a linear classification module, or a random forest algorithm.

Event detection operation 520 analyzes time-related signals from one or more sensors during output of targeted content to a user and detects events in which the signals exceed a threshold. The threshold may be a fixed predetermined value, or a variable such as a rolling average. Examples of GSR data are provided herein below. A discrete measure of emotional response may be computed for each event. The mood may not be directly measurable, so the sensor data indicates emotional modulation. Affective modulation is modulation of a biometric waveform due to an emotional state or a change in emotional state. In one aspect, to obtain a baseline correlation between emotional modulation and emotional state, known visual stimuli (e.g., from a focal group test or personal calibration session) may be displayed to the player actor to evoke some type of emotion. While under stimulus, the testing module may capture biometric data of the player actor and compare the stimulus biometric data to the stationary biometric data to identify emotional modulations in the biometric data waveform.

A normalization operation 530 performs an arithmetic or other numerical comparison between the test data for the known stimulus and the measurement signal for the user, and normalizes the measurement values for the event. Normalization compensates for variations in the individual responses and provides a more useful output. Once the input sensor event is detected and normalized, a calculation operation 540 determines CEP values for a user or group of users and records these values in a time-dependent record in computer memory.

Machine learning (also known as AI) can be an effective tool to reveal correlations between complex phenomena. As shown in fig. 6, a system 600 responsive to sensor data 610 indicative of an emotional state of a user may use a machine learning training process 630 to detect correlations between audio-video and narrative stimuli 620 and biometric data 610. The training process 630 may receive the stimulus data 620 temporally correlated with the biometric data 610 from a media player client (e.g., client 300, 402). The data may be associated with a particular user or group of users, or may be generic. Both types of input data (associated and generic to the user) may be used together. The generic input data may be used to calibrate the baseline emotional response to classify the baseline emotional response to the scene or arrangement of film elements. If most users exhibit similar biometric claims (tells) when viewing a scene within a narrative background, the scene may be categorized with other scenes that motivate similar biometric data of the user. Similar scenes may be collected and reviewed by a human creative producer who may manually score the scenes on the sentiment indicator index 640 with the assistance of an automated analysis tool. In the alternative, indicator data 640 may be scored by human and semi-automatic processing without being classified by similar scenes. These elements of human scoring become training data for the machine learning process 630. In some embodiments, the human scored elements of the targeted content may include the user, such as via an online questionnaire. The scoring should take into account cultural demographics and information about the response of different cultures to scene elements can be informed by the expert.

The ML training process 630 compares the scores of scenes or other movie elements determined by humans and machines and uses iterative machine learning methods known in the art to reduce the error between the training data and its own estimates. Creative content analysts may score data from multiple users based on their professional judgment and experience. Individual users may score their own content. For example, users who are willing to assist in training their personal "director software" to recognize their emotional state may score their emotions while viewing content. The problem with this approach is that user scoring may interfere with their normal response, misleading the machine learning algorithm. Other training methods include clinical testing of a subject's biometric response in a short segment of content, followed by a survey of the clinical subject with respect to their emotional state. A combination of these and other methods may be used to develop training data for machine learning process 630.

As used herein, biometric data provides a "story" on how users experience their targeted content, i.e., a sense of whether they are engaged in entertainment values in narrative theory. Content engagement capabilities are a measure of overall engagement in the overall user experience with targeted content and are monitored and scored during and after the experience is complete. The overall user's pleasure is measured as the difference between the expected biometric data modulation capability (measured during calibration) and the average sustained biometric data modulation capability. The measure of user engagement may be determined by other methods and related to or part of the scoring of the content engagement capacity. For example, an offer to exit the interview response or accept a purchase, subscription, or attention may be included in or used to adjust the content participation capability. The proposed response rate may be used during or after content presentation to provide a more complete measure of user engagement.

The mood of the user entering the interaction affects the way the "story" is interpreted, so the story experience should be as mood calibrated as possible. If the process fails to calibrate the mood, the mood can be taken into account in the presented emotional episode to support more aggressive ratings interaction, provided we can measure the ratings of the player's actors. The instant system and method would be most suitable for a healthy calm individual, although it would provide an interactive experience for each person involved.

Fig. 7A shows an arrangement 700 of emotional states relative to axes of a two-dimensional mood space defined by a horizontal rating axis and a vertical axis arousal. The illustrated emotions based on the ratings/arousals emotion model are shown in this arrangement by way of example only and not as actual or typical measurements. The media player client may measure the ratings using biometric sensors that measure facial action units, while the arousal measurement may be done via GSR measurements, for example.

The mood space may be characterized by more than two axes. Fig. 7B illustrates a three-dimensional model 750 of the mood space, where the third axis is social dominance (dominance) or confidence. Model 750 shows the VAD (evaluative, arousal, confidence) model. The 3D model 1550 may be useful for complex emotions involving social levels. In another embodiment, the engagement metric from the biometric data may be modeled as a three-dimensional vector that provides the cognitive workload, arousal, and appraisal from which the processor may determine the calibrated primary and secondary emotions. The engagement metric may be generalized to an N-dimensional model space, where N is one or greater. Currently, we place the CEP in a two-dimensional space 700 having an evaluation degree axis and an arousal degree axis, but the CEP is not limited thereto. For example, confidence is another psychometric axis that can be added. During emotional calibration, a baseline arousal and appraisal may be determined on an individual basis.

The emotion determination from the biometric sensor is based on an opinion/arousal emotion model, where opinion is (positive/negative) and arousal is magnitude. With this model, we can verify the intent of creating work by measuring narrative theoretical structures such as stress (hopes with fear) and stress enhancement (arousal increases over time), etc., and dynamically change story elements more based on the user's mind, as described in more detail in U.S. provisional patent application 62/614,811 filed 1, 8, 2018. The focus of the present disclosure is to determine useful measures of ratings and arousal (CEP) for real-time and offline applications, as described in more detail below.

In a testing environment, electrodes and other sensors may be manually placed on a subject user in clinical function. For consumer applications, placement of the sensors should be less invasive and more convenient. For example, image sensors of visible and infrared wavelengths may be built into the display equipment. When a user wears the jewelry or grasps the controller while using the VR equipment, electrodes may be built into the headwear, controller, and other wearable jewelry to measure skin conductivity, pulse, and electrical activity.

The emotional ratings may be indicated and measured using facial displacement/speech spectral analysis. Face analysis using the face action unit developed by Paul Ekman is a useful method. For example, an application plug-in developed by Affectiva provides probabilities of a subject's emotions based on statistics and self-reported emotions for facial expressions. The plug-in works by generating probabilities of emotional ratings based on spectral analysis (for speech) and primarily for zygomatic/frown movements of the face. To date, Affectiva has measured over 700 million faces. It may be desirable to develop a FAU-based approach that is optimized for evaluating users consuming targeted content. Fig. 8 shows a table 800 containing sample outputs from instances of Affectiva evaluating FAU from different users. These columns contain the numerical measure of the selected FAU. Machine learning or other techniques may be used to optimize the output to identify the most convincing FAU for use in determining engagement metrics for targeted content.

Redundancy may be useful for reliability. The server may compute the ratings and arousals in a number of ways and use 2-in-1 voting to improve the reliability and usability of ratings and arousals measurements. For example, when using redundant sensor inputs, the server may set the weight of the electrodermal response to 1, the weight of the heart rate variability to 0.25, and the weight of the percentage of the maximum evaluative signal to 0.25. For another example, for the evaluation degree sensor input, the server may set the weight of the electroencephalogram data to 1 and the weight of the facial action unit data to 0.75. Several other measures of evaluative and arousal may also be useful, as outlined in table 1100 in fig. 11. The indication of the preferred input is based on practical considerations and may vary as the technology improves. A high weighted input is considered more accurate or more reliable than a lower weighted input. Even in a test environment, all data sources are rarely available. Nevertheless, as described below, the system designer may still use a variety of measurement options.

Electroencephalography (EEG) forehead asymmetry analysis may be a useful method for measuring the proximity and avoidance relationship of emotional tells as an evaluative. The sensors measure the voltage potentials of the neurons passing through the brain and compute the alpha wave power for different hemispheres. The power change across both hemispheres accounts for approach and avoidance, either a positive or negative emotion. For example, the frontal asymmetry exponent may be computed as the logarithm of the ratio of the right hemisphere power to the left hemisphere power and related to the approach and avoidance or the positive and negative ratings.

The strength of the evaluative measure may also be used to measure arousal. For example, the measurement system may be used to measure the maximum muscle displacement of the face during a smile. The arousal is then inferred by the ratio of current to maximum displacement or other comparison operation.

Emotional arousal can be measured by a variety of methods, including cutaneous electrical impulse (phasic) response. When the subject is evoked, there is a rapid change in skin resistance, which can be measured by sending a small current to the body and measuring the resistance. Fig. 9 shows the GSR peak response over time. The number of impulse responses (peaks) observed over time is related to the intensity of the arousal. Thus, the simplest measure of event capability is E_PIs ═ C, where "E_P"is event capability and" C "is signal out of applicabilityIn addition, the peak amplitude also indicates the strength of the arousal, which can be translated into arousal capability, for example, expression ∑ P_a(wherein "P" is_a"is the peak amplitude of each event) provides a measure of the event capability of the content the equivalent unitless measure of event capability is represented by ∑ P_a/S_avgGiven therein "S_avg"is the average signal value of the content. Both measurement methods may be useful. Another useful metric may be the integrated area under the amplitude curve. For example, in the case where the signal is above the threshold line, the area between the signal response line 1010 and the fixed threshold 1020 or the variable threshold 1030 may be calculated. The event count, peak amplitude, and integrated area may be combined in any useful manner. For example, expression (CP)_at+A)/S_avgt (where "a" is the total area defined by the signal line when above the threshold line, "t" is the minimum time increment (e.g., 1 second), and the other symbols are as defined above) provides a unitless measure of event capability for any content segment. Other metrics may also be of interest, such as event capability per unit time. The event capability per unit time may be derived by dividing any of the aforementioned unitless event capabilities by the run length of the content.

The electrical skin tone response is a slower resistance response, which also indicates arousal. The dominant DC shift in the signal is an indicator of arousal. Fig. 10 shows a graph 1000 of a combined tonic GSR response and a phasic GSR response 1010. Fixed threshold 1020 may be used to detect events used in computing CEP. In the alternative, a variable threshold 1030 (e.g., a rolling average) may be used to compensate for long-term variability.

The server may use p (dc) I ^2R (where the current is equal to the current supplied by the GSR circuit) to compute the arousal power consumption from the peak amplitude. It can also use the same equation to compute the arousal DC power shift, just for the stressor response. Based on the GSR, the two can be combined to obtain an overall impulsive (phasic) power response and a tonic power response for arousal.

Although fig. 10 indicates the arousal magnitude of the GSR or any other quantitative signal indicative of arousal, a similar metric may be used to compute the event capability of any desired measure of evaluative. Different ratings qualities may require different measurements.

The face action unit has been discussed in connection with fig. 8. The output of the FAU analysis may be processed using rule-based algorithms and/or machine learning to detect the degree of evaluation and arousal. For a normally expressed user, the FAU should have high reliability and a wide measurement range. Although reliability has not been proven at present, it may also be used to detect arousal.

Heart Rate Variability (HRV) is another useful measure of emotional acuity, shown in the table of fig. 11 as pulse detection (row 4). Pulses may be detected using electrodes, audio or vibration analysis, listening or sensing of pulses using an audio sensor or accelerometer. Pulse can also be detected by image analysis. Blood pressure changes during the heartbeat pulses can cause visible effects on the face or limbs and these effects can be detected using image sensors and image processors. The heart rate of a healthy individual varies more than 0.15 hz. The lower frequency of HRV after stimulation is attributed to the sympathetic nervous system response related to arousal. During the experience, the server that determines the CEP may monitor the HRV during the calibration process and throughout the user experience as a arousal story. Similarly, heart rate increase is a measure of emotional acuity. The CEP server may monitor the heart rate of the user during the calibration process and throughout the user experience as a arousal story. It may calculate the percentage change in heart rate and apply a threshold to detect an arousal event. For example, if the heart rate is greater than 10% of baseline, and if the heart rate variability is greater than 0.15Hz, the subject may be evoked.

Pupil dilation (row 5) is a reliable indicator of arousal and can be detected using image analysis. Functional near infrared (fNIR, line 6) can be relatively easily achieved using infrared sensors calibrated to operate over the human skin temperature range. When used to confirm other measurements, the skin temperature indicates both the arousal and evaluation degrees.

Facial electromyogram (fmg) is an EMG applied to facial muscles. It is well known to be useful for emotion detection. Contraction and relaxation of muscle fibers produces electrical signals that can be detected by electrode pairs attached to specific muscle groups. Two muscle groups of the face may be particularly useful, namely the frown muscle (corrugator supperioli) group for frowns and the zygomatic muscle (zygomatic major muscle) group for smiles. However, frowning, smiling, and other expressions may be detected using image analysis and facial action units without requiring the user to wear electrodes. In mixed reality (VR or AR) applications where the user is wearing headwear that blocks imaging of facial expressions, fmg may provide an alternative method of collecting data related to emotional responses.

Human voice recognition (Subvocalization) can be considered as a kind of fmg that can detect the activity of muscle groups related to speech. The user wears a partial mask that fits the electrode array against selected areas of the chin, neck and chin just outside the lower lip. The neural network may be trained to detect human voice recognition of the user with an accuracy greater than 90%. The speech recognized by human voice can be used for any purpose of speech. In computing engagement capabilities, spontaneous human voice-recognized speech (e.g., exclamations) may indicate both arousal and appraisal. Unlike facial expressions, human voice recognition cannot be detected using image analysis. Thus, in situations where a user wishes to speak to a game or other entertainment process without making a sound, human voice recognition using a mask may be optimal.

An Electrocardiogram (EKG) measures the contraction and relaxation of the heart muscle using electrodes attached to the chest and/or back of a subject. It is used in healthcare to monitor and diagnose heart disease in patients. The EKG provides a more detailed map of heart activity that can provide more information about arousal than a simple pulse. In the alternative, a simplified form of EKG may be used, where it is more convenient than other pulse detection methods. For example, where a user wears an EKG module with one or more electrodes positioned for pulse sensing (e.g., in a smart watch, fitness tracker, or other accessory), the data collection module may receive signals from the EKG module and process the signals for pulse detection or other characteristics of the cardiac signal, such as Heart Rate Variability (HRV).

Brain-machine interface (BMI) refers to an implanted electrode or other electrical or electrochemical sensor that responds to brain activity. BMI devices may be used in a clinical setting but are unlikely to be available to consumers for the foreseeable future. BMI is mentioned here as a future possibility in case the technology becomes available in consumer applications.

Functional magnetic resonance imaging (fMRI) measures brain activity by detecting blood flow. Unlike conventional MRI, fMRI detects changes in tissue over time by detecting blood oxygen levels. Thus, fMRI is closely related to brain activity and can be used to sense activity or inactivity in different brain regions. However, current fMRI devices are bulky and impractical to use outside of a clinical setting.

Chemical detection devices vary in complexity and volume from spectroscopy, which is specific to a particular chemical, to micro or nano sensors. Originally, it might be most practical to incorporate micro or nano sensors into the output decoration. For example, a chemical sensor incorporating one or more microelectronic chemical sensors may be placed in the headwear near the nose and mouth. Alternatively or additionally, sensors may be placed on the skin to detect chemicals excreted in sweat. Chemicals and compounds of interest for detecting arousal and assessment may include, for example, cortisol, epinephrine, norepinephrine, oxytocin, acetylcholine, dopamine, endorphin, serotonin and pheromones. However, many of such chemicals may be difficult to detect using external sensors.

The gaze direction (line 13) is easy to detect using image analysis. Depth of focus can also be detected by measuring corneal flexion. The direction and depth of focus do not indicate a degree of appreciation or arousal, but do indicate interest. User interest may be useful for control of content or better utilization of other sensor data. For example, if a user is watching "wrong" content, and their data indicates a rating or arousal error, the weight of the error may be reduced for temporary distraction.

Targeted emotional episodes based on targeted content may be stored in a computer database as ratings/arousal targets. The server may perform a difference operation to determine the error between the planned/predicted and measured emotional arousal and appraisal degrees. The error may be used for content control. Once the delta between the predicted and measured values exceeds a threshold, correction will be commanded by the story management software. If the user's rating (based on the target emotional episode) is in the wrong direction compared to the predicted management, the processor may change the content by the following logic: if the absolute value of (predicted evaluation degree-measured evaluation degree) is >0, the content is changed. The change in content may be several different items about the player's actor that are known specifically to the software, or may be a trial or recommendation from the AI process. Likewise, the processor may change the content if the arousal error falls below 50% of the predicted arousal (absolute value of error > 0.50 × predicted value). The change in content may be several different items about the player's actor that are known specifically to the software, or may be a trial or recommendation from the AI process.

Fig. 12 illustrates a method 1200 for determining a content rating of audio-video content, including content engagement Capability (CEP). CEP is the event capability "P" of the subject's content_v"sum of the capabilities of similar content in the type" P_x"ratio of the two components. For different subjects, and in general for different users, the same method is used to operate on P_vAnd P_x. Thus, the sums cover different total times, event capabilities P_vCovers the time period "t_v", the time period" t_v"equal to" n "event capability periods Δ t of the subject's content_vSum of (a):

also, the desired capability P_xWhen coveredSegment "t_x", the period" t_x"m" event capability periods Δ t equal to desired content_xSum of (a):

for any given event "n" or "m", capability P_vAnd P_xEach is the dot product of a capability vector P and a weighted vector W of dimension i, as follows:

in general, it is possible to cope with force vectors

Various definitions are made. In any given CEP calculation, the capability vector and the expected baseline for the subject's content should be defined consistently with each other, and the weight vector should be the same. The capability vector may include only the arousal metric, only the appraisal value, a combination of the arousal metric and the appraisal metric, or a combination of any of the foregoing with other metrics (e.g., a confidence metric). In one embodiment, a measure of "a" of the number of "j" arousals is used_j"and" k "evaluation degree measures" v_k"a capability vector defined by a combination of

Calculating CEP, where each metric is adjusted by a calibration offset "C" from a known stimulus, where j and k are any non-negative integers, as follows:

wherein

C_j＝S_j-S_jO_j＝S_j(1-O_j) Equation 6

The index "j" in equation 6 represents an index from 1 to j + k, S_jRepresents a scale factor, and O_jRepresenting the deviation between the minimum value of the sensor data range and its true minimum value. Weighting vector corresponding to the capability vector of equation 5Can be expressed as:

where each weight value scales its respective factor in proportion to the relative estimated reliability of the factor.

Calibrated dot product given by equations 3 and 4

And the time factors given by equations 1 and 2, the processor may calculate the content engagement Capability (CEP) of an individual user as follows:

ratio t_x/t_vThe inequalities in the sum of the different time series are normalized and the ratio is made unitless. A user CEP value greater than 1 indicates that the user/player actor/audience has an emotional engagement experience that is beyond its expectations with respect to genre. A user CEP value less than 1 indicates that engagement is less than the user's desire for the type of content.

CEP may also be computed for content titles across the audience of "v" users based on the ratio of the content event capabilities of "x" users to the expected capabilities of "m" users, which are not necessarily the same, as follows:

variables v and x are the number of content users and participating baseline viewers, respectively. The audience expectation capacity in the denominator represents the expectation that the audience brings to the content, and the event capacity in the numerator represents the sum of arousal or rating events of the audience when experiencing the content. The processor sums the event capabilities for each event (n) and user (v) and the desired capabilities for each event (m) and user (x). Then, it passes the ratio of the event capability to the expected capability, and passes the ratio xt_x/vt_vDifferent time totals and audience population are normalized to calculate CEP. CEP is a component of the content rating. Other components of the content rating may include aggregate rating errors and rating errors for particular rating objectives (e.g., win, disapproval, etc.).

Equation 5 describes a calibrated capability vector consisting of arousal and appraisal metrics derived from biometric sensor data. In the alternative, the processor may define a partially uncalibrated capability vector in which the sensor data signals are scaled prior to conversion to digital values as part of the lower level digital signal processing, but not offset by the user, as follows:

if a partially uncalibrated capability vector is used, an aggregate calibration offset may be calculated for each factor and the dot product given from equations 3 and 4 before computing the content engagement Capability (CEP)

And (4) subtracting. For example,the aggregate calibration offset of (a) may be given by:

in this case, the capability vector can be calculated by the following equation

Calibrated value of (a):

the calibrated capability vector may be similarly calculated

Referring again to the method 1200 (fig. 12) in which the foregoing expressions may be used, a calibration process 1202 of the sensor data is first performed to calibrate the user's response to known stimuli, such as a known resting stimulus 1204, a known stimulating stimulus 1206, a known positive-appraisal stimulus 1208, and a known negative-appraisal stimulus 1210. The known stimulus 1206-. For example, the international emotion picture system (IAPS) is a database of pictures used to study mood and attention in psychological studies. To stay consistent with the content platform, images or those found in IAPS or similar knowledge bases may be generated in a format consistent with the target platform for use in calibration. For example, a picture of the subject that triggers the emotion may be generated as a video clip. Calibration ensures that the sensors operate as intended and provides consistent data between users. Inconsistent results may indicate that the sensor is malfunctioning or a configuration error, which may be corrected or ignored. The processor may determine one or more calibration coefficients 1216 for adjusting signal values for consistency between devices and/or users.

Can be calibratedTo have both scaling and offset characteristics. To be used as an indicator of arousal, appraisal, or other psychological states, the sensor data may need to be calibrated using both a scaling factor and an offset factor. For example, GSR may theoretically vary between zero and 1, but in practice depends on fixed and variable conditions of human skin, which vary both individually and over time. In any given session, the subject's GSR may be at a certain GSR_min>0 to a certain GSR_max<1. Both the magnitude of the range and its scaling can be measured by exposing the subject to a known stimulus and estimating the magnitude and scaling of the calibration factor, the estimation being made by comparing the results from sessions with known stimuli to the expected range for the same type of sensor. In many cases, the reliability of the calibration may be suspect, or calibration data may not be available, necessitating estimation of the calibration factor from field data. In some embodiments, the sensor data may be pre-calibrated using an adaptive machine learning algorithm that adjusts the calibration factor for each data stream as more data is received and omits higher level processing from the adjustment task for calibration.

Once the sensor is calibrated, the system normalizes the sensor data response data for type differences at 1212 using, for example, equation 8 or equation 9. Different types produce different ratings and arousal scores. For example, the adventure action types have different cadences, emotional goals, and intensities. Thus, participation capabilities cannot be compared across types unless a type of participation profile is considered. Type normalization scores content against content in the same type, enabling equivalent comparisons between types. Normalization 1212 may be performed on a subject group prior to testing the audience or focal group or key features using the expected normalized stimuli 1214. For example, the audience may view one or more trailers of the same type as the main feature, and may calculate event capabilities for the one or more trailers. In the alternative, archived data for the same user or group of users may be used to compute the desired capabilities. The desired capability is computed using the same algorithm as the measurement used or to be used for the event capability, and may be adjusted using the same calibration coefficients 1216. The processor stores the desired capabilities 1218 for later use.

At 1220, the processor receives sensor data during playback of the subject content and computes an event capability for each metric of interest (such as arousal and one or more ratings qualities). At 1228, the processor can solicit or otherwise aggregate events for the content after the end of the playout or on a running basis during the playout. At 1230, the processor operates on content ratings, including content engagement Capabilities (CEP) as previously described. The processor first applies the applicable calibration coefficients and then computes the CEP by dividing the aggregated event capabilities by the desired capabilities, as described above.

Optionally, the arithmetic function 1220 may include comparing event capabilities of each detected event or a smaller subset of detected events to reference emotional episodes defined for the content, at 1224. The reference episode may be, for example, a target episode, a predicted episode, one or more past episodes of content, or a combination of the preceding episodes defined by the creative producer. At 1226, the processor may save, increment, or otherwise accumulate error vector values describing the error of one or more variables. The error vector may include the difference between the reference episode and the measured response for each measured value (e.g., arousal and appraisal values) for a given scene, time period, or set of video frames. The error vectors and vector matrices may be used for content evaluation or content control.

The error measure may include or add other indicators for content evaluation. The content engagement capabilities and error measurements may be compared to purchases, subscriptions, or other conversions related to the presented content. The system may also use standard deviation or other statistical measures to measure audience response consistency. The system can measure content engagement, ratings, and arousals for individuals, homogeneous groups, and aggregated audiences.

Referring to fig. 13, an alternative method 1300 for measuring emotional ratings and arousals and determining content engagement capabilities 1326 bifurcates at 302 into an AI-based branch and a rule-based branch. However, other suitable methods may combine the AI-based scheme and the rule-based scheme in various ways. Two branches begin with sensor data 1312, 1308. For simplicity of illustration, separate flow objects 1312, 1308 are shown, but both branches may use the same sensor data derived from any one or more of the biometric sensors described herein.

On the AI branch, sensor measurement data 1312 is input to a machine learning mode analysis process 1330 along with sensor training data 1310, the machine learning mode analysis process 1330 outputting measurements of emotional ratings, arousals, and event capability values for the emotional capability calculation process 1320, as described herein. When AI-derived data is used, the power calculation 1320 may omit calibration and normalization, as calibration and normalization are processed as an integral part of the machine learning process. In a sense, normalization and calibration are embedded in the training data 1310. An example of a suitable AI process is described herein in connection with fig. 6. The output from the AI process branch should be related to the rule-based output but will not be the same because a different algorithm is used. As described above, the competency computation 1320 compares different measures of arousal, event competency, and desired competency and derives content engagement competencies from them. Although content engagement capability 1336 is derived from the arousal measurements, it is a unitless comparative measure for the entire content title or main sub-component based on a power pulse or continuous measure. CEP1336 differs from arousal data 1334 in that arousal data 1334 is not comparable, but is correlated to specific times or frames in the video content for error determination or content analysis.

On the rule-based branch, baseline sensor data 1304 and expected sensor data 1306 for calibration are accessed by a rule-based process 1314 to determine calibration and normalization coefficients, as previously described. The calibration and normalization coefficients are output to downstream rule-based algorithms, including emotional ability algorithm 1320 and emotional ability algorithm 1320, ratings algorithm 1318, and arousal algorithm 1316. The sensor measurement data is also input to these three calculations. The ratings output 1324 and arousal output 1322 are time-dependent ratings and arousal magnitudes from one or more sensor inputs. As described above, measurements from different sensor types may be combined and weighted according to reliability.

FIG. 14 illustrates a system 1400 for producing and distributing configurable content in response to ratings and arousal data. Production ecosystem 1410 generates a data model 1420 for content and related digital audio-video assets 1430. In addition to video and audio clips, the A/V data 1430 may also include 3D modeling and rendering data, video and digital objects for virtual reality or augmented reality, metadata, and any other digital assets required during play. The data model 1420 defines content elements such as stories, sequences, scenes, conversations, emotional episodes, and characters. The elements may be fixed or configurable. For example, a configurable story may be defined as a branching narrative. In the alternative, the narrative may be fixed and the configurable elements limited to supporting features such as character or object appearance and substitute phrases for expressing multi-line conversations. The data model 1420 may be expressed in a proprietary form 1440 and provided to a data server 1450 along with the audio-video assets 1430 required for playback. Cloud server 1450 may also be provided with an application that generates control document 1470 for story elements in a standard format suitable for use on the destination platform, such as JavaScript object notation (JSON) or extensible markup language (XML). The control document 1470 may include definitions for targeted emotional episodes of the content.

Cloud server 1450 provides story element control document 1470 with audio-video content 1460 selected for the destination user and platform. For example, the assets 1460 provided for the mobile platform may be different than the assets of a home theater, both different than the assets of a theater or virtual reality experience. As another example, the content of an adult user may be different than the content assigned to a toddler. Many other reasons for different content selections may also apply. Story element control document 1460 may be cropped for selected content and may be configured for use by a JavaScript or similar content viewing application executing on client 1480. The viewing application 1485 on the client 1480 may be configured to receive sensor input, such as described above in connection with fig. 3-4. Based on the sensor input and story element document 1470, application 1485 may perform a process for reducing the detected ratings and/or arousal errors by comparing the sensor input to expected values for an emotional plot. The application may use a trial and error method that is enhanced with available information about past user responses and preferences to select alternative story elements. The selected elements are provided to an output device 1490, which output device 1490 produces audible output using an audio transducer and produces a visual display using a screen, projector, or other electro-optical device. Other outputs may include, for example, motion, tactile, or olfactory outputs. System 1400 illustrates a use case for real-time content control by a client device. Real-time control may also be implemented at the server level for content streaming to the client.

FIG. 15 illustrates a method 1500 for illustrating the method 1500, the method 1500 including offline use cases and real-time control cases for sensor-based content rating, including content engagement capabilities and specific ratings, call-out times, or content correlation matrices. Method 1500 begins with content design 1502, which content design 1502 receives as one of its inputs marketing and emotional response (e.g., content rating) data for past comparable products. The input may also include a target emotional episode. In the alternative, emotional episodes may be defined during content design 1502. Once the content design 1502 defines the desired scenes and sequences of scenes, preparation of the draft scenes can begin at 1506. A draft of testability may be completed by a focal group undergoing draft output and biometric emotional response sensor data generated, which may be processed and rated ready for content rating 1508, as described above. The production team reviews the content ratings and modifies the draft scenario. Once the modifications are completed, the content rating process 1508 may again apply to the draft version of the content title. Once the title is ready for publication at 1510, it enters one or more distribution channels. If the content is not responsive to the emotional feedback data, the market analysis team may use the tools and techniques described herein to test and track 1516 the emotional response as another marketing indicator. The emotional response data may be added to the marketing and content ratings data 1504 for use in the design of future products. Referring back to 1514, if the content responds, it is provided for play and control 1518. During play, the media player receives and processes emotional response sensor data 1524 according to measurements 1522 made by one or more biometric emotional sensors as described herein. Once each playback device finishes playing, it may process the sensor feedback to develop content ratings or submit the feedback data to a server (not shown). Once the content ratings of the content are completed, they may be added to the content ratings data 1504 for future use in product design.

In view of the foregoing, and as a further example, fig. 16-18 illustrate one or more methods 1600 and related functional aspects for digitally representing user engagement with audio-video content in computer memory based on biometric sensor data. Method 1600 may be performed by an immersive mixed reality output device or a non-immersive flat screen device, a projector or other output device including a programmable computer, by one or more computers in communication with an output device, or by a combination of an output device and one or more computers in communication with the output device.

Referring to fig. 16, a computer-implemented method for digitally representing user engagement audio-video content in computer memory based on biometric sensor data may include: at 1610, digital data including audio-video content is played by an output device that outputs an audio-video output based on the digital data. The output device may be or may include a portable or non-portable flat screen device, a digital projector, or a wearable accessory for alternate or augmented reality, in each case coupled with an audio output function and optionally with other output functions (e.g., motion, haptic, or olfactory). Playing the digital data may include, for example, saving the digital data in a cache or other memory of the output device and processing the data for output by at least one processor of the output device.

The method 1600 may include: at 1620, sensor data is received by at least one computer processor from at least one sensor positioned to sense an involuntary response of one or more users while engaged in the audio-video output. The sensor data may include any one or more of the data described herein, for example, the measurement type data summarized in fig. 11.

The method 1600 may include: at 1630, at least one digital representation of content engagement Capability (CEP) is determined by an algorithm executed by the at least one computer processor based on the sensor data. The algorithm may include, for example, operating a ratio of aggregate event capabilities to aggregate expected capabilities for comparable content in the type. The aggregate event capability may be a sum of sensor amplitudes indicating a degree of arousal of an event defined by exceeding a threshold. The aggregate expected capability of comparable content may be computed by the same method as the event capability. Further details and illustrative examples of this algorithm may be as described above in connection with fig. 11. The method 1600 may include: at 1640, at least one digital representation of the CEP is recorded in computer memory.

The method 1600 may include any one or more of the additional operations 1700 or 1800 shown in fig. 17-18 in any operable order. Each of these additional operations need not be performed in every embodiment of the method, and the presence of any of operations 1700 or 1800 does not necessarily require that any other of these additional operations be performed as well.

Referring to fig. 17, which illustrates certain additional operations or aspects 1700 for digitally representing user engagement with audio-video content in computer memory based on biometric sensor data, the method 1600 may further comprise: at 1710, at least one digital representation of the CEP is determined, at least in part, by determining an arousal value based on the sensor data and comparing the stimulated average arousal based on the sensor data to a desired average arousal. To sense arousal, the sensor data may include any one or more of: electroencephalogram (EEG) data, Galvanic Skin Response (GSR) data, facial electromyogram (fmg) data, Electrocardiogram (EKG) data, video Facial Action Unit (FAU) data, brain-computer interface (BMI) data, Video Pulse Detection (VPD) data, pupil dilation data, body chemistry sensing data, functional magnetic imaging (fMRI) data, and functional near infrared data (fNIR).

In a related aspect, method 1600 may include: at 1720, a desired average arousal is determined based on further sensor data that measures similar involuntary responses of one or more users while participating in a known audio-video stimulus (e.g., comparable content of the same type as the content). Comparable content may include the same type of content intended for a similar audience and platform and having a similar length. In a related aspect, method 1600 may include: at 1730, known audio-visual stimuli, including known non-evoked stimuli and known evoked stimuli, are played for calibration of sensor data. At 1740, determining at least one digital representation of the CEP of method 1600 may further include detecting one or more stimulation events based on the sensor data exceeding a threshold for a period of time. In this case, the method 1600 may include computing one of a plurality of event capabilities for each of one or more users and for each of the stimulation events, and aggregating the event capabilities. In one aspect, the processor may assign a weight to each of the event capabilities based on one or more source identifications of the sensor data. Additional explanation of these operational and event capabilities is described above in connection with FIG. 10.

In a related aspect, method 1600 may include: at 1750, a desired average arousal is determined at least in part by detecting one or more stimulation events based on the further sensor data exceeding a threshold for a period of time, and computing one of a plurality of desired abilities of known audio-video stimuli for one or more users and for each of the stimulation events. Additionally, at 1760, determining at least one digital representation of the CEP may include a ratio of a sum of the operational event capabilities to an aggregation of the desired capabilities, as shown in the expression provided above in connection with fig. 12.

Referring to fig. 18, which illustrates certain additional operations 1800, the method 1600 may further include: at 1810, a arousal error measure is determined based on comparing the arousal value to a target emotional episode of the audio-video content. The target emotional episode may be or may include a set of target digital representations of the degree of evaluation, each digital representation being uniquely associated with a continuous time sequence of the digital audio-video content or a different interval of a sequence of frames. The error may be measured by a difference in values, a ratio of values, or a combination of difference and ratio (e.g., (target-actual)/target). In a related aspect, the method 1600 may further comprise: at 1820, a digital representation of the evaluation degree is determined based on the sensor data. The digital representation of the degree of appreciation may include a quantitative measure indicative of the amplitude or power of the sensor signal(s) corresponding to the magnitude of the detected emotion. Suitable sensor data for the degree of evaluation may include, for example, one or more of the following: electroencephalogram (EEG) data, facial electromyogram (fmg) data, video Facial Action Unit (FAU) data, brain-computer interface (BMI) data, functional magnetic imaging (fMRI) data, body chemistry sensing data, human voice recognition data, and functional near-infrared data (fNIR). Method 1600 may include, for example, determining a digital representation of the criticality based on the sensor data, filtering, for example by a computer processor, to remove noise and distortion, scaling, and converting to a time-dependent list of symbolic digital values expressed in binary code.

In a related aspect, the method 1600 may further comprise: at 1830, the digital representation of the ratings is normalized based on similarity values collected for known audio-visual stimuli. "similarity values" refer to values collected using the same methods and processing algorithms as the numerical representation of the degree of evaluation, or converted to comparable values using these same representations. The known stimuli may include calibration stimuli and standardized stimuli as described above. Normalization operation 1830 utilizes a normalized stimulus from similar content of the same type as the content of the computed criticality error.

In another related aspect, the method 1600 may further comprise: at 1840, a ratings error measure is determined based on comparing the digital representation of the ratings to a target emotional episode of the audio-video content. The target emotional episode may include a set of target numerical representations of the degree of evaluation, each numerical representation being uniquely associated with a continuous time series or a different interval of a sequence of frames. The frame sequence is a form of a time sequence of content running at a constant frame rate.

In another aspect, the at least one digital representation of the CEP comprises a sequence of digital representations of the CEP, wherein each component of the sequence is operated on the basis of a discrete period in the audio-video content. The discrete period may be defined by a time or frame count. The method 1600 may further include: at 1850, a symbolic representation of the at least one digital representation of the CEP is output to at least one of a display screen or an audio transducer.

Emotional feedback may also be used to control or influence live entertainment. Thus, method 1600 may include: at 1850, digital data comprising audio-video content of a live performance by at least one actor is recorded, and a representation of the CEP and/or the rating or arousal error, or equivalent measure, is output to a display screen or audio transducer arranged to be perceptible by the at least one actor during the live performance. For example, the display screen may include a stage monitor, and the audio conversion may be incorporated into the earpiece. Thus, the actor may receive detailed information regarding the degree of appreciation and the degree of arousal and adjust the performance to achieve a predetermined goal.

FIG. 19 is a conceptual block diagram illustrating components and related functionality of an apparatus or system 1900 for digitally representing user engagement with audio-video content in computer memory. Device or system 1900 may include additional or more detailed components for performing the functions or process operations as described herein. For example, the processor 1910 and memory 1916 may contain examples of processes described above for representing a user's participation in audio-video content in computer memory. As depicted, apparatus or system 1900 can include functional blocks that can represent functions implemented by a processor, software, or combination thereof (e.g., firmware).

As shown in fig. 19, an apparatus or system 1900 may include an electrical component 1902 for playing digital data including audio-video content by an output device that outputs an audio-video output based on the digital data. The component 1902 may be or may comprise means for said playing. The apparatus may include a processor 1910 coupled to a memory 1916 and to the output of the at least one biometric sensor 1914 that executes an algorithm based on program instructions stored in the memory. Such algorithms may include, for example, reading metadata describing the audio-video content, opening one or more files stored on a computer-readable medium or receiving audio-video data via a streaming connection, decoding the audio-video content and producing a digital video stream and a digital audio stream from the content, and directing the streams to respective video and audio processors.

The apparatus 1900 can further include an electrical component 1904 for receiving sensor data from at least one sensor positioned to sense involuntary responses of one or more users while engaged in the audio-video output. The means 1904 may be or may comprise means for said receiving. The apparatus may include a processor 1910 coupled to a memory 1916 that executes an algorithm based on program instructions stored in the memory. Such algorithms may include a series of more detailed operations in parallel with the player component 1902, such as checking one or more ports assigned to receive sensor data, decoding data received at the assigned ports, checking data quality and optionally performing an error routine if the data quality fails the test, and saving the decoded sensor data in a cache memory location defined for use by components 1904 and 1906.

The apparatus 1900 can further include an electrical component 1906 for determining at least one content engagement Capability (CEP) value based on the sensor data. The component 1906 may be or may comprise means for said determining. The apparatus may include a processor 1910 coupled to a memory 1916 that executes an algorithm based on program instructions stored in the memory. Such an algorithm may include a series of more detailed operations, such as those described in connection with fig. 12.

The apparatus 1900 can further include an electrical component 1908 for recording at least one digital representation of the CEP in a computer memory. The component 1908 may be or may include means for said recording. The apparatus may include a processor 1910 coupled to a memory 1916 that executes an algorithm based on program instructions stored in the memory. Such algorithms may include a series of more detailed operations, for example, connecting to an application that maintains a database or other data structure for storing CEPs and other content ratings, encoding the CEPs into messages having dependency data such as content titles and time periods or frame sets per Application Program Interface (API), and sending the messages according to the API.

The apparatus 1900 may optionally include a processor module 1910 having at least one processor. The processor 1910 may be in operable communication with the modules 1902, 1908 via a bus 1913 or similar communication coupling. In the alternative, one or more of the modules may be instantiated as functional modules in the memory of the processor. Processor 1910 can initiate and schedule processes or functions performed by electrical components 1902-1908.

In a related aspect, the apparatus 1900 may include a network interface module 1912 or equivalent I/O port operable to communicate with system components over a computer network. The network interface module may be or may include, for example, an ethernet port or a serial port (e.g., a Universal Serial Bus (USB) port), a Wi-Fi interface, or a cellular telephone interface. In further related aspects, the apparatus 1900 may optionally include means for storing information, such as, for example, the memory device 1916. A computer-readable medium or memory module 1916 may be operatively coupled to the other components of the apparatus 1900 via the bus 1913, etc. The memory module 1916 may be adapted to store computer readable instructions and data to affect the processes and acts of the module 1902, 1908 and its subcomponents or the processor 1910, the method 1600, and the one or more additional operations 1700 disclosed herein, or any method performed by the media player described herein. Memory module 1916 may retain instructions for performing the functions associated with modules 1902 and 1908. Although shown as being external to memory 1916, it is to be understood that the modules 1902, 1908 may exist within either the memory 1916 or on-chip memory of the processor 1910.

The apparatus 1900 may include or may be connected to one or more biometric sensors 1914, and the biometric sensors 1914 may be of any suitable type. Various examples of suitable biometric sensors are described above. In an alternative embodiment, processor 1910 may include a networked microprocessor from a device operating on a computer network. Additionally, the apparatus 1900 may be connected to output devices described herein via the I/O module 1912 or other output ports.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the aspects disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. For example, a component or module may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a component or module. One or more components or modules may reside within a process and/or thread of execution and a component or module may be localized on one computer and/or distributed between two or more computers.

Various aspects will be presented in terms of systems that may include several components, modules, and the like. It is to be understood and appreciated that the various systems may include additional components, modules, etc. and/or may not include all of the components, modules etc. discussed in connection with the figures. Combinations of these methods may also be used. Various aspects disclosed herein may be performed on an electrical device, including devices that utilize touch screen display technology, heads-up user interfaces, wearable interfaces, and/or mouse and keyboard type interfaces. Examples of such devices include VR output devices (e.g., VR headphones), AR output devices (e.g., AR headphones), computers (desktop and mobile), televisions, digital projectors, smart phones, Personal Digital Assistants (PDAs), and other electronic devices both wired and wireless.

In addition, the various illustrative logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable Logic Device (PLD) or complex PLD (cpld), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

The operational aspects disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, or a combination thereof,Removable disk, CD-ROM, Digital Versatile Disk (DVD), Blu-ray^TMOr any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a client device or a server. In the alternative, the processor and the storage medium may reside as discrete components in a client device or server.

Furthermore, one or more versions may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed aspects. Non-transitory computer-readable media may include, but are not limited to, magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips, or other formats), optical disks (e.g., Compact Disk (CD), DVD, Blu-ray)^TMOr other format), smart cards, and flash memory devices (e.g., card, stick, or other format). Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope of the disclosed aspects.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

In view of the exemplary systems described above, methodologies that may be implemented in accordance with the disclosed subject matter have been described with reference to several flow diagrams. While, for purposes of simplicity of explanation, the methodology is shown and described as a series of blocks, it is to be understood and appreciated that the claimed subject matter is not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Moreover, not all illustrated blocks may be required to implement the methodologies described herein. Additionally, it should be further appreciated that the methodologies disclosed herein are capable of being stored on an article of manufacture to facilitate transporting and transferring such methodologies to computers.

46页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：信号传输的方法和装置

Digitally representing user engagement targeted content based on biometric sensor data

相关技术

网友询问留言