Media annotation with product source link

文档序号：1850873 发布日期：2021-11-16 浏览：2次中文

阅读说明：本技术 带有产品源链接的媒体注释 (Media annotation with product source link ) 是由亨利·斯科特-格林安贾莉·马利克于 2019-04-08 设计创作，主要内容包括：公开了用于注释和源链接图像中显示的对象的技术。一种示例方法包括由处理设备检测图像中的对象；将图像中的对象与源指示符相关联；由处理设备注释图像以指示对象与源指示符相关联；接收对图像中的对象的用户选择；以及基于源指示符和与用户选择相关联的场境数据标识源,其中该源包括关于对象的信息。(Techniques for annotating and source linking objects displayed in an image are disclosed. An example method includes detecting, by a processing device, an object in an image; associating an object in the image with a source indicator; annotating, by the processing device, the image to indicate that the object is associated with the source indicator; receiving a user selection of an object in an image; and identifying a source based on the source indicator and the contextual data associated with the user selection, wherein the source includes information about the object.)

1. A method, comprising:

detecting, by a processing device, an object in an image;

associating the object in the image with a source indicator;

annotating, by the processing device, the image to indicate that the object is associated with the source indicator;

receiving a user selection of the object in the image; and

identifying a source based on the source indicator and contextual data associated with the user selection, wherein the source includes information about the object.

2. The method of claim 1, wherein the image comprises one or more frames of a video, and wherein the object is displayed in the one or more frames.

3. The method of claim 2, further comprising:

detecting a set of objects in the one or more frames of the video; and

selecting the object from the set of objects based on viewing data for the video.

4. The method of claim 3, wherein the viewership data indicates preferences of one or more current viewers, future viewers, or past viewers of the image.

5. The method of any preceding claim, wherein identifying the source comprises:

determining a plurality of sources associated with the object based on the source indicator;

selecting the source from the plurality of sources based on the context data associated with the user selection; and

providing a source identifier for presenting the source to a viewer of the image.

6. The method of any preceding claim, wherein the contextual data comprises a geographic location of a user viewing the image, a source preference of the user, or an availability of the object at a time of selection by the user.

7. The method of any preceding claim, wherein annotating the image comprises updating a presentation of the image in a user interface to emphasize the object, wherein the updating comprises at least one of: outlining, highlighting, color changing, or brightening a portion of the image.

8. The method of any preceding claim, wherein receiving the user selection of the object in the image comprises receiving an indication that a user has selected a portion of the image in a user interface that includes the object.

9. The method of any preceding claim, wherein detecting an object in the image comprises:

performing digital image processing on image data of the image; and

identifying an object in the image based on the digital image processing.

10. A system, comprising:

a memory; and

a processing device communicatively coupled to the memory, the processing device executing instructions to perform the method of any of claims 1-9.

11. A non-transitory computer-readable storage medium having instructions stored thereon, which, when executed by a processing device, cause the processing device to perform operations comprising the method of claims 1 to 9.

Technical Field

The present disclosure relates to image analysis, and in particular to supplementing image content with a source that provides information about objects displayed in the image content.

Background

Many computing devices include a content sharing aspect that enables users to capture, view, and share media content. The media content may be a video or still image showing the characteristics of the object. The object may be a product of focus of the media content (e.g., a primary product) or may be a secondary product in the background. A viewer may see an object in a media item and be interested in obtaining more information about the object.

Disclosure of Invention

The following presents a simplified summary of various aspects of the disclosure in order to provide a basic understanding of such aspects. This summary is not an extensive overview of the disclosure. It is intended to neither identify key or critical elements of the disclosure nor delineate any scope of particular embodiments of the disclosure or any scope of the claims. Its sole purpose is to present some concepts of the disclosure in a simplified form as a prelude to the more detailed description that is presented later.

According to a first aspect of the present disclosure, a method is provided that includes detecting an object in an image, associating the object in the image with a source indicator, annotating the image to indicate that the object is associated with the source indicator, receiving a user selection of the object in the image, and identifying a source based on the source indicator and contextual data associated with the user selection, wherein the source includes information about the object.

In another aspect of the present disclosure, the image includes one or more frames of a video, and the object is, for example, a product displayed in the one or more frames. The detection may involve detecting a set of objects (e.g., a product set) in one or more frames of the video. The processing device may further determine viewer preferences based on the viewership data for the video and select an object from the set of objects based on the viewer preferences. The viewership data may indicate preferences of one or more current viewers, future viewers, or past viewers of the image.

In yet another aspect, the method may identify a source by determining, based on the source indicator, a plurality of sources associated with the object and selecting a source from the plurality of sources based on context data associated with the user selection. The context data may include, for example, the geographic location of the user viewing the image, the user's source preferences, or the availability of the object at the time of the user's selection. The processing device may further provide a source identifier of the source to a viewer of the image.

In yet another aspect, detecting an object in an image may include performing digital image processing on image data of the image and identifying the object in the image based on the digital image processing. The detected object may then be annotated by updating the presentation of the image to emphasize the object. In one example, the update may involve at least one of a delineation, a highlighting, a color change, or a brightening of a portion of the image.

In yet another aspect, receiving a user selection of an object in an image may include receiving an indication that a user has selected a portion of an image that includes the object.

According to a second aspect, the disclosure is a system comprising a processing device configured to detect an object in an image, associate the object in the image with a source indicator, annotate the image to indicate that the object is associated with the source indicator, receive a user selection of the object in the image, and identify a source based on the source indicator and contextual data associated with the user selection, wherein the source includes information about the object.

According to a third aspect, the disclosure is a computer program product configured such that, when processed by a processing device, the computer program product causes the processing device to detect an object in an image, associate the object in the image with a source indicator, annotate the image to indicate that the object is associated with the source indicator, receive a user selection of the object in the image, and identify a source based on the source indicator and contextual data associated with the user selection, wherein the source includes information about the object.

Individual features and/or combinations of features defined above or below with respect to any particular embodiment according to any aspect of the present disclosure may be used individually, separately or in combination with any other defined feature in any other aspect or embodiment. Furthermore, the present disclosure is intended to cover an apparatus configured to perform any of the features described herein with respect to a method and/or a method of using or producing, using or manufacturing any of the apparatus features described herein.

Drawings

The present disclosure is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings. Wherein:

fig. 1 illustrates an example system architecture according to an embodiment of this disclosure.

FIG. 2 is a block diagram illustrating a computing device having an image component and a source component in accordance with an embodiment of the present disclosure.

FIG. 3 is an exemplary user interface displaying an image with annotation objects according to an embodiment of the disclosure.

FIG. 4 is a flow diagram illustrating a method for annotating an image to emphasize an object and link the object with a particular information source in accordance with an embodiment of the present disclosure.

Fig. 5 is a block diagram illustrating an exemplary computer system according to an embodiment of the present disclosure.

Detailed Description

Modern computer systems typically enable content creators to manually modify media items to change image content and include details about objects (e.g., products) shown in the media items. Modification typically requires specialized image editing software and involves the content creator manually editing the image content to add references (e.g., arrows and text labels) to the objects. Image modifications made by the creator may be permanent and may be displayed to all users, even though some users may be more interested in one of the other objects in the image. Editing image content to add a reference can be a time consuming process and can add content that is obtrusive to the viewing experience. The content creator may alternatively add references to objects in the description of the media item, but these references typically remain (remain) hidden if the user does not view the extended version of the description. In addition, the added reference may include a static website address for the particular source, such as if the product is no longer available (e.g., new product released, out of stock, geographic restrictions), the retailer may stop providing information for the object.

Aspects of the present disclosure address the above and other deficiencies by providing techniques that can enhance image content to emphasize particular objects and indicate the source of the objects. That is, aspects of the present disclosure provide a guided human-machine interaction process to assist a user in performing technical tasks. The source may provide additional information regarding the creation, use, or sale of the object. In one example, the technique may involve detecting an object in an image using an object recognition technique. The image may be a still image or a video frame and the identified object may be linked to a source (e.g., a web server) that provides more information about the object. The techniques may annotate the image to indicate to a viewer that the object is associated with a source indicator that can be used to identify a particular source. The annotation may emphasize the object by outlining (outline), highlighting, or any other type of modification to the image. When a viewer selects an object that is emphasized, the techniques may determine the best source to provide to a particular viewer based on user-selected context data (contextual data), which may include temporal data, location data, or availability data (e.g., language availability or product availability). When one or more images (e.g., videos) include multiple sets of objects, the techniques may select a subset of the identified objects based on viewership (viewership) data. Viewership data may indicate past, current, or future viewer preferences and may enable techniques to select objects of more interest to the viewer.

Systems and methods described herein include techniques to enhance a graphical user interface to enable a viewer to more effectively identify a source that provides information about a particular object displayed in an image. In particular, aspects of the technology may provide an indication of the source of the object to the viewer in a more optimal manner. A more optimal way than adding tags and pointers may be less obtrusive and more obvious than including a static list of web sites in the description or comment fields. Aspects of the technology may also automatically (without user input) annotate and identify sources based on current or expected viewership. This may be done for a particular viewer or set of viewers, and may be performed after the content creator has shared the media item but before, during, or after the viewer has requested consumption of the media item. For example, the techniques may determine that a user has been interested in a particular type of object and may annotate the type of object in a media item (e.g., a video) after the user has requested the media item.

Various aspects of the above-referenced methods and systems are described in detail below by way of example and not limitation. The examples provided below discuss the techniques in the context of a content sharing platform that may enable end users to upload media items and share media items. In other examples, the techniques may be applied to enhance existing broadcast mechanisms for providing media to end users. The media items discussed below include image data, however, the teachings of the present disclosure may be applied to media forms (e.g., audio, executable instructions, text) where no image is present and the annotations may be provided via any form of human perceptible signal.

Fig. 1 illustrates an example system architecture 100 in accordance with an embodiment of the present disclosure. The system architecture 100 may include a content sharing platform 110, computing devices 120A-Z, sources 130A-Z, and a network 140.

Content sharing platform 110 may include one or more computing devices (such as a rack-mounted server, router computer, server computer, personal computer, mainframe computer, laptop computer, tablet computer, desktop computer, etc.), data stores (e.g., hard disk, memory, database), networks, software components, hardware components, or combinations thereof, which may be suitable for implementing various features described herein. In some implementations, the content sharing platform 110 can enable users to edit uploaded media items 112, which media items 112 can be associated with one or more channels (e.g., channel a, channels B-Z) or playlists (not shown) or separate media items. The media items 112 can include images that can be transferred (e.g., downloaded or streamed) as image data 114 to the computing devices 120A-Z.

The computing device 120A may access the image data 114 from the content sharing platform 110 and may supplement the image data 114 to annotate and embed links to one or more information sources. Computing device 120A may be part of the content sharing platform 110 or a separate server and provide annotation and source linking services to computing devices 120B-Z acting as clients. In the example shown in fig. 1, computing device 120A may include an image component 122 and a source component 124. Image component 122 may be used to analyze image data 114 and identify objects represented by image data 114. Image data 114 may include data for one or more frames of a still image or video, and may be enhanced to annotate a subject. Source component 124 can enable computing device 120A to supplement image data 114 with a source indicator corresponding to one of the identified objects. The source indicator may enable the computing devices 120B-Z to determine a source that provides information about the object. There may be many sources available and the source component 124 may enable the computing device 120A to provide a source that is optimal for a particular viewer, as will be discussed in more detail below.

Sources 130A-Z may be devices that store information about at least one of the identified objects. The device may include one or more computing devices, storage devices, other devices, or a combination thereof. The source may be accessed remotely by one of the computing devices 120A-Z via an external network (e.g., the internet), or may be accessed locally by one of the computing devices 120A-Z via an internal network (e.g., a Local Area Network (LAN), an enterprise bus). The sources 130A-Z may be operated by the same entity (e.g., a content sharing entity) that operates the computing device 120A or may be operated by different entities (e.g., third parties). Different entities may participate in the production, distribution, design, marketing, manufacturing, maintenance, support, or sale of a subject. In one example, the source 130 may be a web server operated by an entity that provides the object and may contain object information 132.

The object information 132 may be data describing aspects of the object. The object may be a tangible or intangible product and the object information 132 may include information about the object that may be presented to the user. The information about the object may provide details about the object or related objects, and may include descriptive information (e.g., product summary, technical specifications, model, version), availability information (e.g., release date, retailer, inventory, similar products), location information (e.g., region/country to which the object is available or may be shipped), price information (e.g., purchase cost, subscription cost, advertiser bid), other information, or combinations thereof.

Computing devices 120B-Z may include one or more computing devices that act as clients and may consume services provided by computing device 120A, content sharing platform 110, or a combination thereof. Computing devices 120B-Z may be referred to as "client devices" or "user devices" and may include Personal Computers (PCs), laptops, smartphones, tablets, netbooks, and the like. Computing devices 120B-Z may each be associated with a personal user (e.g., viewer, owner, operator) that may use the computing device to access image data 114. Computing devices 120B-Z may each be owned and utilized by different users in different geographic locations.

Computing devices 120B-Z may include media viewers 126B-Z that provide a user interface to viewers to consume and select portions of image data 114. The media viewer may be any program that enables a computing device to present an image to a user and enables the user to select a region of interest within the image. The images may be displayed as part of one or more videos, web pages, documents, books, other media, or combinations thereof. The media viewer may be integrated with one or more other programs and may access, retrieve, render, and/or navigate content (e.g., web pages such as hypertext markup language (HTML) pages, digital media items, etc.). The media viewer may render, display, and/or present the content to the viewing user. In one example, the media viewer may be embedded within an internet browser and the image may be embedded in a web page (e.g., a web page that may provide information about products sold by an online merchant). In another example, the media viewer 126A may be a standalone application (e.g., a mobile application) that allows a user to view media items (e.g., digital videos, digital photos, electronic books, etc.).

Network 140 may include a public network (e.g., the internet), a private network (e.g., a Local Area Network (LAN) or a Wide Area Network (WAN)), a wired network (e.g., ethernet), a wireless network (e.g., an 802.11 network or a Wi-Fi network), a cellular network (e.g., a Long Term Evolution (LTE) network), a router, a hub, a switch, a server computer, and/or combinations thereof.

FIG. 2 depicts a block diagram illustrating an exemplary computing device 120, the computing device 120 including techniques for supplementing an image with source indicators for one or more objects in the image. Computing device 120 may be the same as one or more of computing devices 120A-Z of fig. 1. The illustrated components may include more or fewer components or modules without loss of generality. For example, two or more components may be combined into a single component, or features of a component or module may be divided into two or more components. In one implementation, one or more components may reside on different computing devices (e.g., a server device and a client device).

In general, functions described as being performed by computing device 120 in one embodiment may be performed by a plurality of different computing devices 120A-Z in other embodiments. For example, computing device 120 may execute programs for one or more functions of image component 122, and a different device may execute one or more functions of source component 124. The functionality attributed to a particular component may be performed by different or multiple components operating together. In the example shown in fig. 2, computing device 120 may include an image component 122 and a source component 124.

The image component 122 may enable the computing device 120 to analyze the image and identify one or more objects represented in the image. The image may be an artifact depicting a visual perception of the object and may be the same as or similar to a still image (photo, picture, drawing, rendering, drawing), one or more frames of a video (e.g., a motion picture), other images, or a combination thereof. The image may be captured by a camera device and be part of a media item (e.g., web page, video, executable file). The image may be shared or transmitted to computing device 120 via content sharing platform 110, digital storage device, other digital transmission mechanism, or a combination thereof. Computing device 120 may receive the plurality of images and store them as image data 114 in data store 230.

Image data 114 may include image details and image content. Image details may include information about the image, such as title, description, comment, storage location, filename, author, source, file size, duration, format, resolution, image size, edit or create time, other details, or a combination thereof. The image content of the image may include pixel data representing pixel values or changes to pixel values and may be used to render the image. Both the image content and the image data may be indicative of an object depicted in the image.

The objects in the image may be any object that may be perceived by humans and may include tangible or intangible products, goods, services, other deliverables, or combinations thereof. Tangible objects may be touched by humans and may include physical products, goods, merchandise, or other objects. Intangible objects may be directly or indirectly perceived by humans without being touched, and may include music, computer programs, services, other intangible elements, or a combination thereof.

In one example, the image component 122 may include an object detection module 210, a viewer preference module 212, an image annotation module 214, and a user selection module 216. The object detection module 210 may analyze the image data to detect objects in the image. The object detection module 210 may use image details (e.g., title, description, comment), image content (e.g., pixel values), or a combination thereof to determine objects in the image. In using the image content, the object detection module 210 may perform digital image processing on the image content of the image. Digital image processing may involve segmenting image content into one or more segments and applying one or more object recognition techniques (e.g., an object classifier) to detect objects within the one or more segments of the image. The object detection module 210 may be performed individually (e.g., on a particular frame) or on a series of images, and may or may not account for object motion across frames. The object detection module 210 may or may not identify the object using input from a user. The user may provide input before identifying the object (e.g., the creator indicates the area with the object) or after identifying the object (e.g., selecting a subset of the objects from a list).

The viewer preference module 212 may determine the preferences of one or more viewers based on viewership associated with the images. Viewing may be based on past, current, or future viewers of the image or other similar image. A past viewer may have viewed the image and may no longer view the image, whereas a current viewer may have begun viewing the image and may still be viewing the image. For example, when the image is a video, the user may have started watching the video and is also watching the video. The future viewer may be a viewer who has not seen the image yet but may see the image in the future. The future viewer may also be referred to as an intended viewer or a potential viewer. Future viewers may be determined based on the historical behavior of the viewer or the behavior of one or more similar viewers. For example, a future viewer may have consumed or subscribed to the content of a channel or playlist and may not have received an image but is expected to receive and consume the image in the future. Viewing of images may be stored in data store 230 as viewing data 232.

Viewership data 232 may include data regarding the viewer and data regarding the image content being viewed. The data may include one or more measurements for a particular viewer (e.g., a current viewer) or for multiple viewers (e.g., listeners). In one example, viewership data 232 may include characteristics of a group of viewers, consumption data, other data, or a combination thereof. The characteristics of the group of viewers may provide details about the group of viewers and may include, for example, the location, language, and/or other similar information of the viewers. The consumption data may be viewer-specific or image content-specific and may include duration of consumption, number of viewers, rate of decline, portions reviewed, portions paused or zoomed in, other measurements, or combinations thereof.

The viewer preference module 212 may determine preferences of one or more viewers based on the viewer data 232, the image data 114, the object data, other data, or a combination thereof. Preferences may indicate what types of objects are of interest to one or more viewers and may be used to determine the objects to annotate and provide source indicators. In one example, characteristics of the viewer or listener (e.g., location, language, etc.) and historical media consumption may be used to identify preferences for particular objects or object types. In another example, consumption of a video by a particular user may indicate a preference. For example, if a user re-views an aspect of a displayed object or zooms in to better view a portion of the video of the object, the viewer preference module 212 may determine that the viewer or similar viewer is interested in the object. This may be quantified and weighted and used to select which identified objects should be annotated by the image annotation module 214 in a subsequent portion of the video or in another video that the viewer may consume.

The image annotation module 214 may enable the computing device 120 to annotate the image to emphasize one or more detected objects. The image annotation may indicate to the viewer that the object is associated with additional information that may be accessed via the graphical user interface. Annotating an image may involve updating the presentation of the image by changing the image content of the object, the image content surrounding the object, or a combination thereof. The image annotation module 214 can add content to or remove content from the image to emphasize the object. In one example, the annotation image can relate to a delineation, highlight, darken, color change, zoom in or out, crop, or a combination thereof. The area occupied by the annotations may depend on characteristics of the respective computing devices 120B-Z, such as screen size, or characteristics of the software used to view the images, such as window size. The annotation portion may include screen coordinates that correspond to objects in the image. In the example of a video, as the object moves during video playback, the annotation portion may also move with the object.

The image annotation module 214 may utilize the data of the object detection module 210 and the viewer preference module 212 to select which objects are emphasized in the image. As discussed above, an image (e.g., a still image or video) may include many objects, and some of these objects may be primary objects that are the focus of the image (e.g., products under review), while other objects may be secondary objects that are in the background or foreground (e.g., other products in the video). When multiple sets of objects are identified, the image annotation module 214 can choose to annotate all or a subset of the objects in the set (e.g., not all of the identified objects). The latter may be advantageous because annotating all objects may be obtrusive or distracting and may adversely affect the viewing experience. Furthermore, annotating all objects may result in an increased processing burden. Thus, there may be a tradeoff between annotating a sufficient number of objects to provide an improved user interface and reducing the processing burden associated with the annotations. Determining which objects to annotate may be based on the preferences discussed above and may involve analyzing data for preferences, viewings, images, objects, other data, or a combination thereof. Additionally or alternatively, determining which objects to annotate may be based on processing power. For example, the image annotation module 214 can determine the processing capabilities of the respective computing devices 120B-Z and selectively annotate images based on the processing capabilities of the computing devices 120B-Z. In this way, more computing devices with greater computing power may receive more annotations than computing devices with lower computing power. Any other characteristic of the computing device may be used, such as the screen size or resolution, or the window size of the media player displaying the image. For example, a larger window size can accommodate more annotations than a smaller window size. Some or all of the data (e.g., measurements) may be weighted and used to generate a score (e.g., a subject preference score) and the score of the subject may be compared to a threshold. One or more objects with scores that meet a threshold (above or below a threshold) may be chosen for annotation and source linking.

The objects may be picked and emphasized at any time before the image is presented to the viewer. For example, the image may be annotated before, during, or after being captured by the camera, edited by an author, provided to a content distributor (e.g., content sharing platform, advertiser, broadcaster), requested by a viewer device, transmitted to a viewer device, loaded by a media viewer, rendered by a media viewer, displayed by a media viewer, other time, or a combination thereof. In one example, an image may be annotated by modifying the image content (e.g., pixel values) of the original image. In another example, an image may be annotated by applying one or more layers to the original image without modifying the original image. One or more layers may correspond to one or more annotated objects (e.g., one-to-one or one-to-many), and some or all of these layers may be sent to the viewer's computing device. In either example, the server or client may choose to provide annotations based on the hardware and/or software used by the current viewer, or the location of the current viewer and/or the objects of most interest to the current viewer, which may be a subset of the identified objects.

The user selection module 216 may enable the computing device 120 to detect user selection of one or more emphasized objects in the annotation image. The user selection may be provided by a user (e.g., a viewer) and received by the computing device 120 in the form of user input. The user input may correspond to a region of the image and may include gestures (e.g., touch or non-touch gestures), mouse input, keyboard input, eye tracking, device movement (e.g., shaking), other user input, or a combination thereof. In response to the user input, user selection module 216 may determine the object the user is selecting and store context data 234. In an embodiment, the user selection corresponds to the user selecting an area that includes an annotation portion. A user who wants more information about an object that has been annotated may select the object by, for example, clicking a mouse or touching a screen at the location of the annotated object. In the example of a video, as described above, the annotation portion may also move with the object as the object moves relative to the screen during video playback. Accordingly, embodiments of the disclosed subject matter provide an improved user interface that assists a user in performing technical tasks.

Context data 234 may indicate a context of the user selection and may be based on data captured at a time before, during, or after the user selection. The context data 234 may correspond to the geographic location of the user, or the availability of objects at the time of the user's selection, the user's source preferences, other characteristics, or a combination thereof. The source component 124 can use the context data 234 to determine a particular source of the viewer.

Source component 124 can enable computing device 120 to supplement an image with a source that can provide additional information about objects in the image. The source data 236 may be used to identify a source before, during, or after a user selects an object. The source data 236 may include a source indicator, a source identifier, other data, or a combination thereof. In the example shown in fig. 2, source component 124 can include an indicator module 220, a source parsing module 222, and a provisioning module 224.

The indicator module 220 may associate the identified object with the source indicator. In one example, the indicator module 220 may associate an object in the image with a source indicator by linking the source indicator with a particular annotation of the image (e.g., object summary X corresponds to source indicator Y). The data of the source indicator may then be embedded within the image, included in a media item containing the image, or transmitted by a service providing the image or media item. The data of the source indicator may be hidden from the viewer or visible to the viewer (e.g., a URL in a description or comment).

The source indicator may include data that may then be used to identify the source. The source indicator may be a generic source indicator that may indicate that a source is present but may not specify a particular source. The source indicator may include object identification data that may correspond to a particular object in the image. The object identification data may be a link, a unique identifier, a symbol, or an encoder corresponding to a particular object in the image, and may include digital or non-digital data. The object identification data may identify any level of specificity of the object, e.g., it may indicate a category (e.g., phone, beverage, car), type (e.g., smartphone, soda bottle, oatmeal box, car), brand (e.g.,CocaGeneral) The model number (e.g.,iphone X, Coke Zero, cherios, X7), product lines (e.g., X plus, 16 ounce glass bottles, honey nuts, sports bags), other levels of specificity, or combinations thereof. In one example, the object identification data may be human-readable or machine-readable and may be based on Universal Resource Locators (URLs), Universal Product Codes (UPCs), Stock Keeping Units (SKUs), bar codes (quick response (QR) codes), Global Trading Item Numbers (GTINs), international article numbers (EAN), Vehicle Identification Numbers (VINs), International Standard Book Numbers (ISBN), other data, or combinations thereof. In one example, the source indicator may identify a particular product but not a particular source (e.g., a particular retailer) for the product. Before, during, or after the user selection has been detected, a particular source may be determined based on data of the source indicator by using the source parsing module 222.

Source parsing module 222 may analyze the source indicators and context data associated with the user selection to identify a particular source. As discussed above, multiple sources may have information of the selected object and the source parsing module 222 may select one of the sources to provide to the viewer. Parsing the source indicator may involve determining a set of candidate sources and selecting a subset (e.g., one or more) of the sources to provide to the user. In one example, the source indicator may include a link to an internal or external service (e.g., source aggregator, marketplace) that provides the candidate source. The source resolution module 222 may use the links, object identification data, and context data to identify a set of candidate sources and select one of the sources. In one example, the set of candidate sources may include a plurality of retailers that provide information about the source and enable viewers to purchase the object. The source resolution module 222 may then use the context data 234 to select the source that best suits the viewer. This may involve selecting one of the sources based on one or more of the following weighted or unweighted factors, such as price, inventory, delivery date, location, return policy, retailer preferences, other information, or combinations thereof.

The providing module 224 may enable the computing device 120 to provide a source to the viewer's computing device. This may involve one or more levels of abstraction or redirection. In one example, the web server may be accessed by the viewer's computing device via a generic source indicator and may return a source identifier for a particular source. The viewer's computing device may access the source using the source identifier (e.g., URL) to obtain and present the object information. In another example, the providing module 224 may use the source identifier to retrieve the object information from the source and may transmit the object information to a computing device of the viewer (e.g., a viewer device that does not directly access the source). In either example, the object information may be accessed by a viewer's computing device and presented to the viewer.

Data store 230 may include memory (e.g., random access memory), drives (e.g., hard drives, solid state drives), database systems, caching mechanisms, or other types of components or devices capable of storing data. The data store 230 may also include multiple storage components (e.g., multiple drives or multiple databases) that may span multiple computing devices (e.g., multiple server computers). In some implementations, the data store 230 can be cloud-based. One or more components may utilize data store 230 to store public and private data, and data store 230 may be configured to provide secure storage for private data.

Where the systems discussed herein collect or may utilize personal information about a user (e.g., a viewer), the user may be provided with an opportunity for a program or feature to collect user information (e.g., information about the user's social network, social actions or activities, profession, the user's preferences, or the user's current location) or to control whether and/or how to receive content from a content server that may be more relevant to the user. In addition, certain data may be processed in one or more ways before it is stored or used, so that personally identifiable information is removed. For example, the identity of the user may be processed such that personally identifiable information cannot be determined for the user, or the geographic location of the user (such as to a city, zip code, or state level) may be generalized where location information is obtained such that a particular location of the user cannot be determined. Thus, the user may control how information is collected about the user and used by the content server.

Fig. 3 depicts an exemplary user interface 300 illustrating how the techniques may present image annotations to a viewer. The user interface may display an image 310 with one or more image annotations 314A-C. The image annotations 314A-C may emphasize one or more objects within the image 310, such as object 312A (e.g., a smartphone) and object 312B (e.g., a beverage container). The image annotations 314A-C may be contained within the original image 310 or may be one or more layers displayed over the image 310. The image annotations 314A-C may correspond to particular objects 312A-B and may include delineating objects (e.g., image annotation 314A), surrounding objects (e.g., image annotation 314B), filling objects (e.g., image annotation 314C), other annotations emphasizing objects, or combinations thereof.

Fig. 4 depicts a flow diagram of an example method 400 for annotating and source-linking one or more objects in an image in accordance with one or more aspects of the present disclosure. Each of the method 400 and its various functions, routines, subroutines, or operations may be performed by one or more processors of a computer device executing the method. In some implementations, the method 400 may be performed by a single computing device. Alternatively, the method 400 may be performed by two or more computing devices, each performing one or more separate functions, routines, subroutines, or operations of the method.

For simplicity of explanation, the methodologies of the present disclosure are depicted and described as a series of acts. However, acts in accordance with the present disclosure may occur in various orders and/or concurrently, and with other acts not presented and described herein. Moreover, not all illustrated acts may be required to implement a methodology in accordance with the disclosed subject matter. Further, those skilled in the art will understand and appreciate that a methodology could alternatively be represented as a series of interrelated states or events, such as via a state diagram. Additionally, it should be appreciated that the methodologies disclosed herein are capable of being stored on an article of manufacture to facilitate transporting and transferring such methodologies to computing devices. The term "article of manufacture" as used herein is intended to encompass a computer program accessible from any computer-readable device or storage media. In one embodiment, the method 400 may be performed by the image component 122 and the source component 124 of fig. 1 and 2.

The method 400 may be performed by a processing device of a server device or a client device and may begin at block 402. At block 402, a processing device may detect an object in an image. The object may be a product and may be displayed within multiple images (e.g., frames) of a user-generated video (e.g., a product review). In one example, detecting the object may involve performing digital image processing on the image data to identify the object in the image. In another example, detecting the object may involve receiving user input (e.g., a content creator gesture) identifying one or more locations or regions in the image that correspond to the object. The image may be a portion of a media item (e.g., a video, a web page, a mobile application, an electronic book) and may be a still image or one or more frames of a video.

In one example, a frame of a video may include multiple objects and the processing device may annotate the objects of most interest to the viewer. This may involve the processing device detecting the set of objects and determining viewer preferences based on the viewership data of the video. The processing device may select one or more objects of possible interest based on the viewing data. The viewership data may indicate preferences of one or more current viewers, future viewers, or past viewers of an image or related images (e.g., different images/videos from the same source).

At block 404, the processing device may associate an object in the image with a source indicator. The source indicator may be a high level source indicator that may indicate that a source is present but cannot identify a particular source. The source indicator may include data that may be used by the server device or the client device to identify a particular source.

At block 406, the processing device may annotate the image to indicate that the object is associated with the source indicator. Annotating the image may involve updating a presentation of the image in a user interface to emphasize one or more objects chosen from the collection. The updating may involve at least one of summarizing, brightening, changing color, or brightening a portion of the image.

At block 408, the processing device may receive a user selection of an object in the image via the user interface. The user selection may be based on a user input (e.g., a gesture) identifying one of the objects. For example, the user may click or touch an object in the image. In response to the user selection, the processing device may capture contextual data including a geographic location of the user viewing the image, source preferences of the user, availability of the object at the time of the user selection, other data, or a combination thereof.

At block 410, the processing device may identify a source based on the source indicator and context data associated with the user selection. The source may include information about the object and may provide the information to the computing device for presentation to a viewer of the image in a user interface. In one example, identifying the source may involve determining a set of sources associated with the object based on the source indicator. The processing device may further select one or more sources from the set based on context data associated with the user selection. The processing device may also provide source identifiers of the selected one or more sources for presentation to a viewer of the image in a user interface. In response to completing the operations described above with reference to block 410, the method may terminate.

Fig. 5 illustrates a diagrammatic representation of machine in the exemplary form of a computer system 500 within which computer system 500 a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. In alternative embodiments, the machine may be connected (e.g., networked) to other machines in a LAN, an intranet, an extranet, or the internet. The machine may operate in the capacity of a server or a client machine in a client-server network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be a Personal Computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Additionally, while only a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set of instructions to perform any one or more of the methodologies discussed herein. Some or all of the components of computer system 500 may be used by or illustrative of one or more of computing devices 120A-Z.

The exemplary computer system 500 includes a processing device (processor) 502, a main memory 504 (e.g., Read Only Memory (ROM), flash memory, such as synchronous dram (sdram) or Ram bus dram (rdram), etc.), static memory 506 (e.g., flash memory, Static Random Access Memory (SRAM), etc.), and a data storage device 518 that communicate with each other via a bus 508.

Processor 502 represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, the processor 502 may be a Complex Instruction Set Computing (CISC) microprocessor, Reduced Instruction Set Computing (RISC) microprocessor, Very Long Instruction Word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. The processor 502 may also be one or more special-purpose processing devices such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), network processor, or the like. The processor 502 is configured to execute instructions 526 for performing the operations and steps discussed herein.

The computer system 500 may also include a network interface device 522. Computer system 500 may also include a video display unit 510 (e.g., a Liquid Crystal Display (LCD), Cathode Ray Tube (CRT), or touch screen), an alphanumeric input device 512 (e.g., a keyboard), a cursor control device 514 (e.g., a mouse), and a signal generation device 520 (e.g., a speaker).

The data storage 518 may include a computer-readable storage medium 524 on which is stored one or more sets of instructions 526 (e.g., software) embodying any one or more of the methodologies or functions described herein. The instructions 526 may also reside, completely or at least partially, within the main memory 504 and/or within the processor 502 during execution thereof by the computer system 500, the main memory 504 and the processor 502 also constituting computer-readable storage media. The instructions 526 may further be transmitted or received over a network 574 (e.g., the network 140) via the network interface device 522.

In one embodiment, the instructions 526 include instructions for one or more source components 124, which may correspond to the synonyms described with respect to fig. 1 and 2. While the computer-readable storage medium 524 is shown in an example embodiment to be a single medium, the terms "computer-readable storage medium" or "machine-readable storage medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The terms "computer-readable storage medium" or "machine-readable storage medium" shall also be taken to include any medium that is capable of storing or encoding a set of instructions or that carries a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure. The term "computer readable storage medium" shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media.

In the preceding description, numerous details have been set forth. However, it will be apparent to one having ordinary skill in the art having had the benefit of the present disclosure that the present disclosure may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present disclosure.

Some portions of the detailed descriptions have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as "receiving," "transmitting," "generating," "causing," "adding," "subtracting," "inserting," "including," "removing," "extracting," "analyzing," "determining," "enabling," "identifying," "modifying," or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (e.g., electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

The present disclosure also relates to an apparatus, device, or system for performing the operations herein. This apparatus, device, or system may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer-or machine-readable storage medium, such as, but is not limited to, any type of disk including floppy disks, compact disk read only memories (CD-ROMs), and magnetic-optical disks, Read Only Memories (ROMs), Random Access Memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions.

The word "example" or "exemplary" is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as "exemplary" or "exemplary" is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word "example" or "exemplary" is intended to present concepts in a concrete fashion. As used in this application, the term "or" is intended to mean an inclusive "or" rather than an exclusive "or". That is, unless specified otherwise, or clear from context, "X includes a or B" is intended to mean any of the natural inclusive permutations. That is, if X comprises A; x comprises B; or X includes both A and B, then "X includes A or B" satisfies any of the foregoing. In addition, the articles "a" and "an" as used in this application and the appended claims should generally be construed to mean "one or more" unless specified otherwise or clear from context to be directed to a singular form. Reference throughout this specification to "an embodiment" or "one embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase "an embodiment" or "one embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, it should be noted that the use of "A-Z" symbols with reference to certain elements of the drawings is not intended to limit a particular number of elements. Thus, "a-Z" is to be interpreted as the presence of one or more elements in a particular embodiment.

It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other embodiments will be apparent to those of skill in the art upon reading and understanding the above description. The scope of the disclosure can, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

19页详细技术资料下载

Media annotation with product source link

相关技术

网友询问留言