Responding to machine learning requests from multiple clients

文档序号：1821552 发布日期：2021-11-09 浏览：9次中文

阅读说明：本技术 响应于来自多个客户端的机器学习请求 (Responding to machine learning requests from multiple clients ) 是由 D·墨菲 T·d·S·保拉 W·T·斯泰勒 J·E·卡里安 A·S·小达西尔瓦 J·C·瓦于 2019-03-14 设计创作，主要内容包括：一种方法包括用计算设备从第一客户端接收标识机器学习模型和传感器的第一客户端请求。该方法包括响应于第一客户端请求,用计算设备向服务器发送调用以将标识的机器学习模型应用于来自标识的传感器的数据集。该方法包括用计算设备从第二客户端接收标识与第一客户端请求相同的机器学习模型和传感器的第二客户端请求。该方法包括用计算设备向第一客户端和第二客户端二者发送来自标识的机器学习模型的响应数据,而不响应于第二客户端请求而向服务器发送附加调用。(A method includes receiving, with a computing device, a first client request from a first client identifying a machine learning model and a sensor. The method includes sending, with the computing device, a call to a server to apply the identified machine learning model to a data set from the identified sensor in response to the first client request. The method includes receiving, with the computing device, a second client request from a second client identifying the same machine learning model and sensors as the first client request. The method includes sending, with the computing device, response data from the identified machine learning model to both the first client and the second client without sending additional calls to the server in response to the second client request.)

1. A method, comprising:

receiving, with a computing device, a first client request from a first client identifying a machine learning model and a sensor;

in response to the first client request, sending, with the computing device, a call to a model server to apply the identified machine learning model to a dataset from the identified sensor;

receiving, with the computing device, a second client request from a second client identifying the same machine learning model and sensors as the first client request; and

sending, with the computing device, response data from the identified machine learning model to both the first client and the second client without sending additional calls to the model server in response to the second client request.

2. The method of claim 1, and further comprising:

receiving, with a computing device, a dataset from an identified sensor; and

preprocessing is performed on the data set with a computing device to generate preprocessed data.

3. The method of claim 2, and further comprising:

sending, with the computing device, the pre-processed data to a model server to apply the identified machine learning model to the pre-processed data.

4. The method of claim 2, wherein the pre-processing of the data set is performed by a pre-processing pipeline in the computing device, and wherein the pre-processing pipeline comprises a plurality of processing units linked together in a manner defined by a configuration file.

5. The method of claim 4, wherein each processing unit receives an input vector, processes the input vector, and outputs an output vector.

6. The method of claim 4, wherein a first one of the processing units at the beginning of the pre-processing pipeline receives the data set from the identified sensor and a last one of the processing units at the end of the pre-processing pipeline is in communication with the model server.

7. The method of claim 6, wherein a last of the processing units sends pre-processing data to a model server and receives response data.

8. The method of claim 1, wherein the computing device, first client, and second client are all part of the same local network.

9. The method of claim 8, wherein the model server is part of the same local network.

10. The method of claim 8, wherein the model servers are not part of the same local network.

11. The method of claim 1, wherein the identified sensor is a camera sensor and the data set from the identified sensor is a video stream.

12. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to:

receiving, from a first client, a first machine learning client request identifying a machine learning model and a sensor;

in response to the first machine learning client request, causing the machine learning model server to apply the identified machine learning model to the set of sensor data from the identified sensor and receive response inference data;

receiving, from at least one additional client, at least one additional machine learning client request identifying the same machine learning model and sensors as the first machine learning client request; and

sending response inference data received in response to the first machine learning client request to both the first client and the at least one additional client.

13. The non-transitory computer-readable storage medium of claim 12, storing instructions that, when executed by a processor, further cause the processor to:

receiving a sensor data set from the identified sensor;

performing preprocessing on the sensor data set to generate preprocessed data; and

sending the pre-processed data to a machine learning model server to apply the identified machine learning model to the pre-processed data.

14. A system, comprising:

a server computing device comprising an interface to receive a first client request from a first client identifying a machine learning model and a sensor and to receive a second client request from a second client identifying the same machine learning model and sensor as the first client request;

a pre-processing pipeline in the server computing device to perform pre-processing on the sensor data from the identified sensor to generate pre-processed data in response to the first client request, send the pre-processed data to the model server to apply the identified machine learning model to the pre-processed data set, and receive response inference data from the model server; and is

Wherein the interface sends the response inference data to both the first client and the second client.

15. The system of claim 14, wherein the pre-processing pipeline comprises a plurality of processing units linked together, and wherein each processing unit receives an input vector, processes the input vector, and outputs an output vector.

Background

Drawings

Fig. 1 is a block diagram illustrating a machine learning system including a deep learning server according to one example.

Fig. 2 is a block diagram illustrating elements of the deep learning server shown in fig. 1 according to one example.

Fig. 3 is a block diagram illustrating an example system implementation of the machine learning system shown in fig. 1.

FIG. 4 is a diagram illustrating a pre-processing pipeline according to one example.

Fig. 5 is a diagram illustrating a pre-processing pipeline that receives a single video stream and pre-processes the video stream for two different machine learning models, according to one example.

Fig. 6 is a block diagram illustrating elements of a deep learning server coupled to a machine learning model server, according to one example.

Fig. 7 is a flow diagram illustrating a method of responding to machine learning requests from multiple clients, according to one example.

Detailed Description

In the following detailed description, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific examples in which the disclosure may be practiced. It is to be understood that other examples may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims. It will be understood that features of the various examples described herein may be combined with each other, in part or in whole, unless specifically noted otherwise.

Deep learning is a specialized field of machine learning and artificial intelligence that can be used in different fields such as computer vision, speech recognition, and text translation. In computer vision, a computer learns how to interpret images to detect people and identify objects or scenes. Deep learning models typically use a wide range of resources, such as memory and CPU power. Having simpler clients (such as smartphones, digital assistants, robots, or even PCs with low-end graphics) run those models may limit the size, accuracy, and number of models that a user can run simultaneously. This may exceed the capabilities of the device if the user wants to perform a frame-by-frame analysis from several video sources.

While machine learning/deep learning applications may be deployed in the cloud, some applications have specific issues that motivate local deployment, such as privacy, security, data bandwidth, and real-time low latency decisions. In terms of privacy and security, there is sometimes a concern as to whether information leaves the home (e.g., video or voice of the home) or local network of the office (e.g., video or voice of sensitive information). With respect to data bandwidth and latency, for the case involving processing a video stream, continuously sending data from high resolution frames to the cloud involves large bandwidth and makes it difficult to have real-time (or near real-time) results. The dependence on external network conditions may result in the inability to make inferences (and thus decisions) in real time.

Some edge devices may be able to process machine learning at the edge. However, such devices may be inadequate if multiple tasks are to be performed. For example, if a user wants to perform object detection, facial recognition, and semantic segmentation in multiple camera streams of a house, the edge device may be able to perform one of these tasks, but may not be able to perform all of these tasks. Replicating them indefinitely can become inefficient and cumbersome.

Some examples of the disclosure are directed to a local server called a Deep Learning Server (DLS) to provide access to several instances of machine learning, and are extensible for more instances. The deep learning server system may include a plurality of computers. The deep learning server provides an interface through which many clients can request inferences from multiple different machine learning models based on sensor data from multiple different sensors without sending data outside the local network and without relying on the bandwidth and latency of external network services. The deep learning server may have a customizable physical and logical architecture. The deep learning server may monitor several video sources on the local network and notify the client when a prediction or inference of a video source occurs. The deep learning server may be connected to several machine learning models distributed or running on the same server and provide a robust and flexible video pre-processing pipeline to optimize resources for several different clients. The client may involve many different types of devices including robots, printers, mobile phones, assistants/kiosks, etc.

Some examples of deep learning servers essentially combine client requests for the same machine learning model and the same sensors to improve efficiency. When multiple clients request the same machine learning model for the same data source, the deep learning server identifies the situation and makes a single call to the model server. Some examples of deep learning servers use configuration files (e.g., JavaScript object notation (JSON) configuration files) to create a pipeline that communicates with the model server and performs pre-processing on the sensor data before the data is provided to the machine learning model. Some examples of deep learning servers run on fast HTTP/2 with the gRPC protocol, where binary data transfer is used to achieve high frame rates in prediction and inference. The gRPC protocol is an open source Remote Procedure Call (RPC) protocol that uses HTTP/2 for transmission and uses a protocol buffer as an interface description language.

Fig. 1 is a block diagram illustrating a machine learning system 100 including a deep learning server 104 according to one example. The system 100 includes client computing devices 102 (1) and 102 (2) (collectively clients 102), a deep learning server 104, model servers 106 (1) and 106 (2) (collectively model servers 106), and sensors 110 (1) and 110 (2) (collectively sensors 110).

The sensors 110 provide sensor data to the deep learning server 104. The sensor data may provide an explicit indication of the occurrence of an event (e.g., a door sensor provides an indication that a door has been opened), or the sensor data may be data that may be provided to a machine learning model that is trained to make inferences about the data (e.g., a video stream that is analyzed to perform face detection). The term "machine learning model" as used herein generally refers to a trained machine learning model that has previously undergone a training process and is configured to make inferences from received data. Each model server 106 includes at least one machine learning model 108. The client 102 may send a request to the deep learning server 104 to monitor certain ones of the sensors 110 and provide event notifications to the client 102 when those sensors 110 detect an event. The client 102 may also send a request to the deep learning server 104 to apply a particular one of the machine learning models 108 to sensor data from a particular one of the sensors 110 and return the results to the client 102.

Fig. 2 is a block diagram illustrating elements of the deep learning server 104 shown in fig. 1 according to one example. The deep learning server 104 includes at least one processor 202, memory 204, input device 230, output device 232, and display 234. In the illustrated example, the processor 202, the memory 204, the input device 230, the output device 232, and the display 234 are communicatively coupled to one another by a communication link 228.

Input device 230 includes a keyboard, mouse, data port, and/or other suitable device for inputting information into server 104. Output devices 232 include speakers, data ports, and/or other suitable devices for outputting information from server 104. Display 234 may be any type of display device that displays information to a user of server 104.

Processor 202 includes a Central Processing Unit (CPU) or another suitable processor. In one example, memory 204 stores machine-readable instructions executed by processor 202 for operating server 104. The memory 204 includes any suitable combination of volatile and/or nonvolatile memory, such as a combination of Random Access Memory (RAM), Read Only Memory (ROM), flash memory, and/or other suitable memory. These are examples of non-transitory computer-readable storage media. Memory 204 is non-transitory in the sense that: it does not encompass a transitory signal, but instead consists of at least one memory component that stores machine-executable instructions for performing the techniques described herein.

The memory 204 stores an interface module 206, an event publish-subscribe (publish-subscribe) manager module 208, a pre-processing pipeline manager module 210, and a pre-processing pipeline 212. Processor 202 executes modules 206, 208, and 210 and instructions of pre-processing pipeline 212 to perform the techniques described herein. Note that some or all of the functionality of modules 206, 208, and 210 and pre-processing pipeline 212 may be implemented using cloud computing resources.

The interface module 206 manages communications between the server 104 and the client 102 and between the server 104 and the model server 106. The event publish-subscribe manager module 208 manages subscription requests from clients 102 to subscribe to certain event notifications and publish those event notifications to clients 102. The pre-processing pipeline manager module 210 generates a pre-processing pipeline 212 based on a received configuration file (e.g., a JSON file). The pre-processing pipeline 212 performs pre-processing on sensor data from certain ones of the sensors 110 (FIG. 1) before providing the data to the machine learning model 108. The functions performed by modules 206, 208, and 210 are described in further detail below.

In one example, various subcomponents or elements of server 104 may be embodied in a plurality of different systems, wherein different modules may be grouped or distributed across a plurality of different systems. To achieve its desired functionality, the server 104 may include various hardware components. Among these hardware components may be multiple processing devices, multiple data storage devices, multiple peripheral adapters, and multiple network adapters. These hardware components may be interconnected using multiple buses and/or network connections. The processing device may include a hardware architecture that retrieves executable code from the data storage device and executes the executable code. The executable code, when executed by a processing device, may cause the processing device to implement at least some of the functionality disclosed herein.

Fig. 3 is a block diagram illustrating an example system implementation 300 of the machine learning system 100 shown in fig. 1. System 300 includes client 302, interface 310, Real Time Streaming Protocol (RTSP) cameras 314 (1) and 314 (2) (collectively RTSP cameras 314), event transmission-subscription manager 316, first machine-learned inference service ("service a") 310, sensors 320 (1) -320 (3) (collectively sensors 320), interface 322, and second machine-learned inference service ("service B") 324. The sensors 320 include a presence sensor 320 (1), a temperature sensor 320 (2), and a door sensor 320 (3). Interface 310 is communicatively coupled to RTSP camera 314, event-to-order manager 316, machine-learning inference service 318, and interface 322 via communication link 312. The RSTP camera 314 streams video via the RTSP protocol.

In the illustrated example, the first machine learning service 318 is a scene recognition service and the second machine learning service 324 is a face detection and image classification service. Dashed line 321 represents a network boundary indicating that a machine learning inference service, such as service 324, may be provided from outside the local network. The other elements shown in fig. 3 are within the local network. Client 302 corresponds to one of clients 102 (fig. 1). The interface 310 corresponds to the interface module 206 (fig. 2). The RSTP camera 314 and the sensor 320 correspond to the sensor 110 (fig. 1). The event issue-order manager 316 corresponds to the event issue-order manager 208 (fig. 2). Machine learning services 318 and 324 each correspond to one of model servers 106 (FIG. 1).

The interface 310 accepts connections from clients such as the client 302 using the gRPC protocol. In one example, the client 302 and interface 310 use standard call/response definitions to make inferences. The client 302 can ask the interface 310 what machine learning model it services, and what sensors it can monitor, and the interface 310 will provide response information. Client 302 may also require interface 310 to monitor a particular sensor using a particular machine learning model and a set of parameters provided by client 302 and return detections (e.g., monitor sensor a with model M and return all detections to me with 95% accuracy).

The event publish-subscribe manager 316 is responsible for notifying the clients 302 of subscription events from the sensors 320 and from the customized rules. When one of the sensors 320 detects an event, the event publish-subscribe manager 316 sends an event notification 308 via the interface 310 to all clients that have subscribed to the event. The event may be, for example, that a person is detected by presence sensor 320 (1), or that the temperature sensed by temperature sensor 320 (2) rises above a certain threshold, or that the door monitored by door sensor 320 (3) has just opened. Client 302 may send a subscription request 306 (e.g., a subscription request for presence detection from presence sensor 320 (1)) to an event publish-subscribe manager 316 via interface 310 to subscribe to a specified event. An event may be a simple verification, such as in the case of temperature, but may also originate from something after a computational machine learning inference. Client 302 may send a subscribe request 306 to subscribe to events originating from any sensor 320, as well as events resulting from inferences performed by machine learning services 318 and 324, and interface 310 sends event notifications 308 to client 302 for all events to which client 302 has subscribed. This may be done in a publish-subscribe manner.

Client 302 may also be in bi-directional communication 304 with interface 310, which includes client 302 sending an inference request to interface 310, and in response, interface 310 sending an inference result to client 302. In an inference request, client 302 may identify the video stream from RTSP camera 314 (2) (or another camera) and which machine learning inference service 318 or 324 to use, and may also specify that the inference is to fall within certain parameters before the client is to be notified.

In one example, deep learning server 104 accesses a video stream of a particular inference request and subscription event via interface 310; capturing frames from the video streams; pre-processing the captured frames using a pre-processing pipeline 212 (FIG. 2); and sends the pre-processed image data to machine learning inference services 318 and 324 via interface 310 (and interface 322 of service 324). For efficiency, the same pre-processing frame may be applied to multiple machine learning inference services 318 and 324. In response, machine learning inference services 318 and 324 apply machine learning models to the received image data and send the resulting inferences to interface 310. Deep learning server 104 then sends the inference results to client 302 (and potentially other clients) via interface 310 based on the inferences received from machine learning inference services 318 and 324.

The interface 310 allows the machine learning model to be serviced by machines other than the deep learning server 104. From the perspective of server 104, multiple machines with multiple GPUs may be arranged in such a way that models X, Y and Z are available at specific IP addresses and ports. Thus, if more resources are desired, more machines can be added to the topology and more models can be installed on it.

The use of multiple machine learning models by multiple clients to request predictions of multiple video streams from remote cameras (e.g., RTSP cameras) creates problems such as: (1) multiple connections to the same RTSP stream may result in reduced efficiency; (2) multiple clients requiring the same prediction (e.g., image classification) for the same stream may result in reduced efficiency; (3) requiring multiple predictions for the same stream by a single client may result in reduced efficiency. The examples of the deep learning server 104 disclosed herein address these issues, including handling multiple client-required inferences for multiple video streams by consolidating requests and utilizing model batch processing to efficiently consume computing resources.

In some examples, the deep learning server 104 may use the same video stream and the same machine learning model for several different clients. For example, the client may connect to the deep learning server 104 and ask for notification when there are people in the kitchen. The deep learning server 104 then connects to the kitchen camera and starts monitoring its RTSP stream and evaluates each frame on the model server with a human detection machine learning model. When a person is detected, the deep learning server 104 notifies the client and sends back the frame in which it occurred. If a second client connects to the deep learning server 104 and requires human detection on the same camera, the deep learning server 104 may reply to the second client with the same inference results, since the inference for the given camera is already in progress.

Different machine learning models may operate on different types of data. For example, the object detection model may operate on 299x299 images having three color channels, normalized with a certain standard deviation. The pre-processing pipeline may be used to convert the data into a format suitable for a particular machine learning model. Because each machine learning model may involve different pre-processing, the deep learning server 104 provides the user with the ability to specify a pre-processing pipeline for any given machine learning model due to its input expectations. The process of defining the pre-processing pipeline is described in further detail below with reference to FIG. 4.

FIG. 4 is a diagram illustrating a pre-processing pipeline according to one example. The class diagram 402 of the pre-processing pipeline includes a processing unit abstract class 406 and concrete subclasses 408, 410, 412, and 414. A concrete sub-category 408 is an image resizing processing unit that resizes an image. A specific sub-category 410 is a grayscale processing unit that converts an image to a grayscale image. A specific sub-class 412 is a threshold processing unit that converts the image into a binary threshold image. Specific subclasses 414 are additional examples indicating that additional specific subclasses may be provided. In one example, deep learning server 104 includes a set of existing subclasses that a user can use to create a pre-processing pipeline, and also provides the user with the ability to create custom subclasses.

Selected ones of the concrete subclasses 408, 410, 412, and 414 can be instantiated and linked together in a specified manner to generate instantiation pipeline 404. As shown in fig. 4, the instantiation pipeline 404 includes processing unit instances 416 (1) -416 (7) (collectively processing unit instances 416) linked together as shown. Each processing unit instance 416 is an instance of one of the specific subclasses 408, 410, 412, and 414. In one example, each processing unit instance 416 receives a data vector as input, processes the input data vector, and outputs another vector to all processing unit instances connected to the output of that processing unit instance. Multiple outputs may be generated from a single input. Example processing units 416 (1) receive input data for the pipeline 404, and example processing units 416 (6) and 416 (7) generate output data for the pipeline 404. The configuration of an instantiated pipeline, such as pipeline 404, can be defined by a configuration file (e.g., a JSON configuration file). The following pseudo-code example 1 provides an example of a JSON configuration file for defining a pre-processing pipeline:

pseudo code example I

Json

{

The 'pipeline': [

"name": a "feeder",

"input channel": the "video stream",

the "type": "active feeder",

"parameters": [

{

"name": "maximum number of threads",

the "value": "1"

{

"name": "queue size",

the "value": "1"

}

]

{

"name": "image feeder",

the "type": "image feeder"

{

"name": "prototype-640 x 480",

the "type": "resizing",

"parameters": [

{

"name": the "width" of the sheet,

the "value": "640"

{

"name": the "height",

the "value": "480"

}

"output": [

{

"name": a "prototype",

the "type": "tensor-rgb"

}

]

}

]

}。

The deep learning server 104 may dynamically reuse the pre-processing pipeline for multiple clients as long as the clients require the same machine learning model for the same sensor data. For example, if three clients require inferences of the same video stream, the deep learning server 104 may automatically utilize the same pipeline and attach a new client to the end of the pipeline to receive results from the machine learning model. The pre-processing pipeline makes pre-processing more efficient because processing components can be reused and by reducing resource utilization via resource sharing, more machine learning inferences can be provided for clients with the same available hardware at the edge. The pre-processing pipeline is also flexible enough to accept different types of data sources, such as audio or video.

Further, when there are multiple machine learning models working on the same video stream, but with different pre-processing functions, the deep learning server 104 can use a pre-processing pipeline to handle this situation. Fig. 5 is a diagram illustrating a pre-processing pipeline 500 according to one example, the pre-processing pipeline 500 receiving a single video stream and pre-processing the video stream for two different machine learning models 512 and 518. The pre-processing pipeline 500 includes an input feeder 506, an image resizing processing unit 508, a first tensor processing unit 510, and a second tensor processing unit 516. Dashed line 511 represents the boundary between deep learning server 104 and the model servers of models 512 and 518. In the illustrated example, an input feeder 506 provides a single connection from the camera 502 to the video stream 504, which input feeder 506 provides the video stream to an image resizing processing unit 508. Unit 508 resizes each frame in the received video stream and then has outputs that branch to units 510 and 516 to convert each resized frame into two different types. Each of the machine learning models 512 and 518 has its own specific input format for the image. In the illustrated example, unit 510 converts each resized frame into a fluid 8 type frame for machine learning model 512, and unit 516 converts each resized frame into a float32 type frame for machine learning model 518. The machine learning model 512 is an object detection model that detects objects in the received frames and outputs images 514 that identify the detected objects 515 (i.e., people, dogs, and two chairs). The machine learning model 518 is an image segmentation model that segments the received frame and outputs a segmented image 520.

Fig. 6 is a block diagram illustrating elements of a deep learning server 104 coupled to a machine learning model server, according to one example. The deep learning server 104 includes a pre-processing pipeline 601 that includes three processing units 602, 604, and 606 at the end of the pipeline 601.

In the illustrated example, processing unit 602 is a Tensorflow service processing unit configured to send sensor data and inference requests to Tensorflow service model server 608. Tensorflow service model server 608 accesses Tensorflow runtime 610 and machine learning model 612 to provide received sensor data to model 612 and return resulting inferences to Tensorflow service processing unit 602. The Tensorflow service processing unit 602 may provide the inference results to a plurality of clients. If the client requests an inference of a different video stream, a new instance of the Tensorflow service processing unit may be created for the video stream.

The Tensorflow service model server 608 also accesses the Keras runtime 614 and the machine learning model 616 to provide the received sensor data to the model 616 and return the resulting inferences to the Tensorflow service processing unit 602. The Tensorflow service processing unit 602 enables the deep learning server 104 to serve Tensorflow and Keras machine learning models, and also any C + + classes that implement serviceable interfaces.

Processing unit 604 is a Caffe service processing unit configured to send sensor data and inference requests to Caffe service model server 618. Caffe service model server 618 provides the received sensor data to the machine learning model and returns the resulting inferences to processing unit 604. Processing unit 606 is a cloud API processing unit configured to send sensor data and inference requests to cloud API 620. Cloud API 620 provides the received sensor data to the machine learning model and returns the resulting inferences to processing unit 606. The flexibility of the pre-processing pipeline 601 allows any back-end to be inserted into the pipeline. Thus, other model service solutions may be added as they are created.

The deep learning server 104 is a flexible and extensible system having resources that allow deployment of multiple machine learning models, and includes components that provide efficient management and communication interfaces that allow clients to access its features. Some examples disclosed herein provide for execution of machine learning methods for edge devices without the burden and risk of communicating with the cloud. Some examples of the deep learning server 104 provide the following features: (1) avoid sending private data to the cloud; (2) the problem of establishing a secure connection with the cloud is avoided; (3) avoiding reliance on network latency to implement real-time decisions (e.g., for robots); (4) the cost of high bandwidth to avoid sending some inferred all data to the cloud; (5) implementing machine learning inference for devices with limited computing resources (e.g., mobile phones, robots, televisions, printers, etc.); (6) efficiently managing multiple inference requests and notifications; (7) implementing new applications (e.g., applications where the client requests to be notified of the presence of a front door person, in which case the client points the server to the camera video stream and requests that the person detect inference and notified events); (8) simplifying the deployment of new inference models; (9) a gRPC and HTTP/2 based efficient communication infrastructure; (10) efficiently managing multiple data sources, such as home/office cameras; and (11) efficient data preparation prior to model inference computation provided by the customizable pre-processing pipeline.

One example of the present disclosure is directed to a method of responding to machine learning requests from a plurality of clients. Fig. 7 is a flow diagram illustrating a method 700 of responding to machine learning requests from multiple clients, according to one example. In one example, the deep learning server 104 is configured to perform the method 700. At 702 in method 700, a computing device receives a first client request from a first client identifying a machine learning model and a sensor. At 704, in response to the first client request, the computing device sends a call to a server to apply the identified machine learning model to a data set from the identified sensor. At 706, the computing device receives a second client request from the second client identifying the same machine learning model and sensors as the first client request. At 708, the computing device sends response data from the identified machine learning model to both the first client and the second client without sending additional calls to the server in response to the second client request.

Note that the response data sent to both the first client and the second client in method 700 may be based on client requests involving partially overlapping intervals of sensor data, and additional response data may be sent to either the first client or the second client. For example, if the sensor generates a video stream, the video stream may be separated into frames, and the first and second client requests may specify partially, but not completely, overlapping sets of frames. The overlapping portion-based response data may be reported to both clients, while the non-overlapping portion-based additional response data may be reported only to the client identifying those portions.

The method 700 may further include receiving, with the computing device, the dataset from the identified sensor; and performing, with the computing device, pre-processing on the data set to generate pre-processed data. The method 700 may further include sending, with the computing device, the pre-processed data to a model server to apply the identified machine learning model to the pre-processed data.

The preprocessing of the data set in method 700 may be performed by a preprocessing pipeline in the computing device, and the preprocessing pipeline may include a plurality of processing units linked together in a manner defined by a configuration file. Each processing unit may receive an input vector, process the input vector, and output an output vector. A first one of the processing units at the beginning of the pre-processing pipeline may receive the data set from the identified sensor, and a last one of the processing units at the end of the pre-processing pipeline may communicate with the model server. The last of the processing units may send the pre-processed data to the model server and receive the response data.

The computing device, the first client, and the second client in method 700 may all be part of the same local network. The model server may or may not be part of the same local network. The sensors identified in method 700 may be camera sensors and the data sets from the identified sensors may be video streams.

Another example of the present disclosure is directed to a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to: receiving, from a first client, a first machine learning client request identifying a machine learning model and a sensor; in response to the first machine learning client request, causing the machine learning model server to apply the identified machine learning model to the sensor data set from the identified sensor and receive response inference data; receiving, from at least one additional client, at least one additional machine learning client request identifying the same machine learning model and sensors as the first machine learning client request; and sending response inference data received in response to the first machine learning client request to both the first client and the at least one additional client.

The non-transitory computer-readable storage medium may further store instructions that, when executed by the processor, further cause the processor to: receiving a sensor data set from the identified sensor; performing preprocessing on the sensor data set to generate preprocessed data; and sending the pre-processed data to a machine learning model server to apply the identified machine learning model to the pre-processed data.

Yet another example of the present disclosure is directed to a system that includes a server computing device that includes an interface to receive a first client request from a first client identifying a machine learning model and a sensor and to receive a second client request from a second client identifying the same machine learning model and sensor as the first client request. The system includes a preprocessing pipeline in the server computing device to, in response to a first client request, perform preprocessing on sensor data from the identified sensor to generate preprocessed data, send the preprocessed data to the model server to apply the identified machine learning model to the preprocessed data set, and receive response inference data from the model server. The interface sends response inference data to both the first client and the second client. The pre-processing pipeline may include a plurality of processing units linked together, and each processing unit may receive an input vector, process the input vector, and output an output vector.

Although specific examples have been illustrated and described herein, various alternative and/or equivalent implementations may be substituted for the specific examples shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the specific examples discussed herein. Therefore, it is intended that this disclosure be limited only by the claims and the equivalents thereof.

19页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：信息处理装置、信息处理方法和信息处理程序

Responding to machine learning requests from multiple clients

相关技术

网友询问留言