Creating and deploying packages to devices in a fleet based on operations derived from a machine learning model

文档序号：1895041 发布日期：2021-11-26 浏览：2次中文

阅读说明：本技术 基于从机器学习模型得出的操作向机群中的设备创建和部署包 (Creating and deploying packages to devices in a fleet based on operations derived from a machine learning model ) 是由 C·G·卡勒 A·穆亚尔于 2020-03-26 设计创作，主要内容包括：提供了用于基于源自机器学习模型或其他自动反馈模型的操作来创建包并将包部署到机群中的设备的系统和方法。作为示例,提供了一种用于创建用于部署到一组设备的包括有效负载的包的方法。方法包括：接收有效负载,其中有效负载具有与将有效负载部署到一组设备相关的相关联的一组有效负载参数。方法还包括：使用处理器,自动创建用于部署到一组设备的包,其中包包括用于将有效负载部署到一组设备的指令,并且其中指令指定如下多个操作中的至少一个操作,该多个操作是至少基于相关联的一组有效负载参数的子集从机器学习模型得出的。(Systems and methods are provided for creating and deploying packages to devices in a cluster based on operations derived from a machine learning model or other automated feedback model. As an example, a method for creating a package comprising a payload for deployment to a set of devices is provided. The method comprises the following steps: a payload is received, where the payload has an associated set of payload parameters related to deploying the payload to a set of devices. The method further comprises the following steps: automatically creating, using a processor, a package for deployment to a set of devices, wherein the package includes instructions for deploying a payload to the set of devices, and wherein the instructions specify at least one of a plurality of operations derived from a machine learning model based at least on a subset of an associated set of payload parameters.)

1. A method for creating a package, the package comprising a payload and for deployment to a set of devices, the method comprising:

receiving the payload, wherein the payload has an associated set of payload parameters related to deploying the payload to the set of devices; and

automatically creating, using a processor, the package for deployment to the set of devices, wherein the package includes instructions for deploying the payload to the set of devices, and wherein the instructions specify at least one operation of a plurality of operations derived from a machine learning model based at least on a subset of the associated set of payload parameters.

2. The method of claim 1, wherein the automatically creating the package for deployment to the set of devices comprises processing metadata or other commit parameters associated with the payload.

3. The method of claim 1, wherein the machine learning model is trained based on training data that includes a mapping between at least the subset of the associated set of payload parameters and a set of labels that classify an impact of deploying the payload to the set of devices.

4. The method of claim 3, wherein the set of labels includes a first label that classifies the impact as an impact and a second label that classifies the impact as no impact.

5. The method of claim 1, wherein the plurality of operations comprise actions relating to: which information to monitor and any trigger thresholds associated with the monitored information.

6. The method of claim 1, wherein the plurality of operations comprise actions relating to: a schedule associated with deploying the package to the set of devices.

7. The method of claim 1, wherein the plurality of operations comprise actions relating to: a door associated with deploying the package to the set of devices.

8. The method of claim 1, wherein the plurality of operations comprise actions relating to: a watchdog associated with deploying the package to the set of devices.

9. A method of deploying a package to a cluster, the method comprising:

evaluating the fleet to determine a set of fleet parameters associated with deploying the package to the fleet;

automatically creating, using a processor, a deployment plan for deploying the package to the fleet, wherein the deployment plan includes instructions for deploying the package to the fleet, and wherein the instructions specify at least one operation of a plurality of operations derived from a machine learning model based at least on a subset of the set of fleet parameters.

10. The method of claim 9, wherein evaluating the cluster comprises processing metadata associated with the cluster.

11. The method of claim 9, wherein the machine learning model is trained based on training data comprising a mapping between at least the subset of the fleet parameters and at least one label associated with the deployment plan.

12. The method of claim 9, wherein the machine learning model is trained based on feedback regarding deployment of the package to the fleet.

13. The method of claim 9, wherein the plurality of operations comprise actions corresponding to monitoring deployment of the package to the fleet.

14. The method of claim 9, wherein the plurality of operations comprise actions corresponding to generating information about a minimum scan tree comprising a set of devices in the cluster.

15. A system for deploying a package to a fleet, the system configured to:

evaluating a cluster to determine a set of cluster parameters associated with deploying the package to the cluster; and

Background

It is difficult to deploy a package that includes firmware or other low-level system code to a component in a cluster that contains hardware corresponding to a cloud. The public cloud comprises a global network of servers that perform various functions including storing and managing data, running applications, and delivering content or services, such as streaming video, email, office productivity software, or social media. Servers and other components may be located in data centers around the world. While public clouds provide services to the public over the internet, enterprises may use private or hybrid clouds. Private and hybrid clouds also include a network of servers housed in a data center.

The data center includes not only servers, but also other components, such as network switches, routers, and other devices. Servers and other components may be provided by different vendors and may include different types or versions of motherboards, CPUs, memory, and other devices. In addition to computer, network, and storage components, data centers include other components such as enclosures, racks, power supply units, and other such components.

Each of these devices may require low-level system code, including firmware. It is challenging to deploy packages to a wide variety of devices that may be distributed in many data centers around the world. Therefore, there is a need for a method and system for deploying packages to devices in a cluster.

Disclosure of Invention

In one example, the present disclosure is directed to a method for creating a package comprising a payload for deployment to a set of devices. The method may include: a payload is received, where the payload has an associated set of payload parameters related to deploying the payload to a set of devices. The method may further comprise: automatically creating, using a processor, a package for deployment to a set of devices, wherein the package includes instructions for deploying a payload to the set of devices, and wherein the instructions specify at least one of a plurality of operations derived from a machine learning model based at least on a subset of an associated set of payload parameters.

In another example, the present disclosure is directed to a method of deploying a package to a cluster. The method can comprise the following steps: the cluster is evaluated to determine a set of cluster parameters associated with deploying the package to the cluster. The method can comprise the following steps: using a processor, a deployment plan for deploying the package to the fleet is automatically created, wherein the deployment plan includes instructions for deploying the package to the fleet, and wherein the instructions specify at least one of a plurality of operations derived from the machine learning model based at least on a subset of a set of fleet parameters.

In yet another example, the present disclosure is directed to a system for deploying a package to a cluster. The system may be configured to: the cluster is evaluated to determine a set of cluster parameters associated with deploying the package to the cluster. The system may be configured to: using a processor, a deployment plan for deploying a package to a cluster is automatically created, wherein the deployment plan includes instructions for deploying the package to the cluster, and wherein the instructions specify at least one of a plurality of operations derived from a machine learning model based at least on a subset of a set of cluster parameters.

In yet another example, the present disclosure is directed to a method for creating a package comprising a payload for deployment to a set of devices. The method can comprise the following steps: a payload is received, where the payload has an associated set of payload parameters related to deploying the payload to a set of devices. The method may further comprise: automatically creating, using a processor, a package for deployment to a set of devices, wherein the package includes instructions for deploying a payload to the set of devices, and wherein the instructions specify at least one of a plurality of operations derived from an automatic feedback model based at least on a subset of an associated set of payload parameters.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Drawings

The present disclosure is illustrated by way of example and is not limited by the accompanying figures, in which like references indicate similar elements. Elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale.

FIG. 1 illustrates a diagram of a system environment for deploying a package to devices in a cluster that includes hardware in a cloud, according to one example;

FIG. 2 is a block diagram of a system including deployment and monitoring of a cluster, according to an example;

FIG. 3 is a block diagram of a cluster in a data center according to one example;

FIG. 4 illustrates a block diagram of deployment and monitoring, according to an example;

FIG. 5 illustrates a diagram including a memory 500 with modules to execute instructions for performing operations associated with deployment and monitoring;

FIG. 6 illustrates components of a package according to one example;

FIG. 7 illustrates an example of a segmented deployment according to one example;

FIG. 8 shows a flow diagram of a method for deploying a package, according to an example;

FIG. 9 shows a diagram of a scan tree of hardware in one stage according to an example;

FIG. 10 shows a diagram of a scan tree of hardware in another stage according to an example;

11A and 11B illustrate a flow diagram of a method for deploying a package, according to an example;

FIG. 12 illustrates an influence table according to an example;

FIG. 13 illustrates a machine learning system according to one example;

FIG. 14 illustrates a memory including instructions and data for use with the machine learning system of FIG. 13, according to one example;

FIG. 15 illustrates a flow diagram of a method for creating a package including a payload for deployment to a set of devices, according to one example;

FIG. 16 illustrates a flow chart of a method for deploying a package to a cluster, according to an example;

FIG. 17 illustrates a deployment dashboard, according to one example.

Detailed Description

Examples described in this disclosure relate to creating and deploying packages including payloads to a cluster. Certain examples relate to creating and deploying packages based on operations derived from a machine learning model. Deploying a package that includes firmware or other low-level system code to a component in the cloud that includes hardware is difficult. The public cloud comprises a global network of servers that perform various functions including storing and managing data, running applications, and delivering content or services, such as streaming video, email, office productivity software, or social media. Servers and other components may be located in data centers around the world. While public clouds provide services to the public over the internet, enterprises may use private or hybrid clouds. Private and hybrid clouds also include a network of servers housed in a data center.

Data centers include not only servers, but other components such as network switches, routers, and other devices. Servers and other components may be provided by different vendors and may include different types or versions of motherboards, CPUs, memory, and other devices.

Each of these devices may require low-level system code including firmware. It is challenging to deploy packages to a wide variety of devices that may be distributed in many data centers around the world. This is because deployment of the package needs to be done safely, reliably and reliably. There are several external factors that can affect safety, robustness and reliability goals. By way of example, deployments can often be managed much more than once, especially when some deployments have high impact potential. Certain types of changes or goals require explicit consent from other parties to gate deployment (e.g., potential power or performance impact). Furthermore, the impact of the deployment of the package may need to be monitored to ensure a safe and reliable deployment. Finally, the payload typically includes firmware or other code from other companies and must be evaluated and tested to ensure safety.

To ensure secure, robust, and reliable deployment of packets, certain examples of the present disclosure relate to ensuring quality payloads, proper validation and testing, and monitoring the impact on a cluster. Certain examples relate to using machine learning to improve the creation and deployment of packages.

FIG. 1 illustrates a diagram of a system environment 100 for deploying a package to devices in a cluster comprising hardware in a cloud, according to one example. Examples of devices include, but are not limited to, Baseboard Management Controllers (BMCs), CPUs, GPUs, FPGAs, FPGA instances, rack managers/controllers, chassis managers/controllers, power unit controllers, storage devices (e.g., SSDs or HDDs), network devices (e.g., switches, routers, firewalls, and bridges), or any other device in a data center that may need to be updated. The packet may include a payload that may include instructions, low-level system code, firmware, settings, configurations, or other information that may need to be updated. System environment 100 may include payload submission 102, package creation 110, package repository 120, deployment and monitoring 140, compute/storage/network 160, control plane 170, data plane 180.

With continued reference to FIG. 1, package creation 110 may include scanning 112, testing 114, and packaging 116. In this example, the payload may be received via payload submission 102. The payload submission 102 may be implemented using a self-service portal for a payload engineering team. Thus, in this example, the payload submission 102 may provide a graphical user interface via which any payload may be submitted. As part of the payload submission 102, a variety of relevant information may be obtained and stored in a database (e.g., database 212 associated with deployment and monitoring). This may be done by submitting a questionnaire to the submitter to obtain the information. Alternatively or additionally, the information may be included as metadata associated with the payload. The information may include information related to deployment, changes, testing, and impact. Payload-related information or any information derived from submitted information is referred to as parameters associated with the payload.

Still referring to fig. 1, scanning 112 may include scanning the payload for various parameters associated with the payload. As an example, metadata associated with the payload may be scanned and extracted as part of the process. The extracted information may be processed to determine whether it meets the submission criteria. Compliance with the commit criteria may indicate that the commit is a valid commit. After the submission is validated, the extracted information may be stored as a record in a database associated with deployment and monitoring 140. The information may also have a timestamp. Scanning 112 may also include scanning the payload for any viruses or other undesirable artifacts (artifacts). Testing 114 may include testing the payload by updating a particular target device with the payload to ensure that the payload will operate as intended at the time of installation. The packing 116 may include packing a payload. Additional steps involved as part of this process are described in other portions of this disclosure.

With continued reference to FIG. 1, as indicated by the dashed box 130, machine learning may be used to deploy a machine learning model associated with deployment and monitoring 140. Similarly, as shown by dashed box 150, machine learning may be used to deploy machine learning associated with compute/storage/network 160, control plane 170, and data plane 180. Additional details regarding the use of such machine learning models are provided in other portions of this disclosure.

FIG. 2 is a block diagram of a system 200 including deployment and monitoring (e.g., deployment and monitoring 140 of FIG. 1) and a cluster 210 according to one example. As used in this disclosure, the term cluster may include, but is not limited to, some or all of the data centers owned and operated by the cloud service provider, one or all of the data centers owned by the cloud service provider operated by the customer of the service provider, any other combination of data centers, a single data center, or even some cluster in a particular data center. The deployment and monitoring 140 can be coupled to the cluster 210 via a data plane 220. The deployment and monitoring 140 may be coupled to the cluster 210 via a control plane 230. The cluster 210 may include one or more data centers, which may in turn include computing clusters/storage/network devices. Thus, in this example, the cluster 210 may include cluster 1240, cluster 2260, and cluster Q280, where Q may be an integer greater than 1. The cluster 1240 may be coupled to the deployment and monitoring 140 via the bus 222; the cluster 2260 may be coupled to the deployment and monitoring 140 via a bus 224; the cluster Q280 may be coupled to the deployment and monitoring 140 via a bus 226. The cluster 1240 may be coupled to the deployment and monitoring 140 via the bus 232; the cluster 2260 may be coupled to the deployment and monitoring 140 via the bus 234; the cluster Q280 may be coupled to the deployment and monitor 140 via a bus 236. As shown in fig. 2, machine learning may be coupled to deployment and monitoring 140. Additional details associated with machine learning are provided later in this disclosure. Although not shown in fig. 2, deployment and monitoring 140 may be coupled to any of the clusters shown in fig. 2 via any number of intermediate networks (e.g., a wide area network, an optical network, a wireless network, a wired network, or other type of network). Thus, the term bus as used in this disclosure includes, but is not limited to, signal lines coupled via routers, switches, other network devices, signal lines coupled via any type of network, wireless connections, combinations of signal lines and wireless connections, switching fabrics, and the like. Although FIG. 2 shows a particular number of clusters of the cluster 210 arranged in a particular manner, the cluster 210 may include more or fewer clusters. Further, while fig. 2 illustrates a particular arrangement of deployment and monitoring 140 in relation to the cluster 210, the deployment and monitoring 140 may be arranged differently, including portions that are distributed in several locations and interconnected via different types of networks or buses.

Fig. 3 is a block diagram of a cluster 300 in a data center according to an example. Cluster 300 may be one of the clusters included in the cluster managed using deployment and monitoring 140. In this example, cluster 300 may include top-of-rack (TOR) switches that may be used to interconnect racks of hardware components. Thus, the cluster 300 may include TOR switches 302, 304, and 306. Each TOR switch may couple at least some of the components in the cluster 300. By way of example, TOR switch 304 may interconnect chassis 310, chassis 330, and chassis 350. Each rack may include a rack manager and a number of racks, which may include components such as servers, network storage, network hardware (e.g., switches, routers, and bridges), and so forth. The rack 310 may include several chassis, including, for example, chassis 314, 316, 318, and 320. Similarly, rack 330 may include several chassis, including, for example, chassis 334, 336, 338, and 340. In addition, rack 350 may include multiple chassis, including, for example, chassis 354, 356, 358, and 360. Each rack may include a rack manager configured to interface with a deployment and monitoring system, such as deployment and monitoring 140. Thus, a rack 310 may include a rack manager 312, a rack 330 may include a rack manager 332, and a rack 350 may include a rack manager 352. Each chassis may include servers, such as blades (blades), organized into groups. Some or all of the chassis may also include network and storage devices. Each chassis may also include fans to provide cooling air to the servers or other components housed within the chassis. Although fig. 3 illustrates a particular arrangement of racks, switches, chassis, and components within a chassis, the systems and methods disclosed herein are not limited to any particular arrangement. Thus, the system and method are applicable to any organization of a data center for both a data plane and a control plane.

FIG. 4 illustrates a block diagram of deployment and monitoring 400 (e.g., deployment and monitoring 140 of FIG. 1) according to an example. Deployment and monitoring 400 may include a processor 402, I/O devices 404, memory 406, display 408, sensors 410, deployment database 412, and network interface 414, which may be interconnected via a bus 420. The processor 402 may execute instructions 406 stored in memory. The I/O devices 404 may include components such as a keyboard, mouse, voice recognition processor, or touch screen. The memory 406 may be any combination of non-volatile memory or volatile memory (e.g., flash, DRAM, SRAM, or other types of memory). The display 408 may be any type of display, such as an LCD, LED, or other type of display. The sensors 410 may include telemetry or other types of sensors configured to detect and/or receive information (e.g., conditions associated with a device). The sensors 410 may include sensors configured to sense conditions associated with a CPU, memory or other storage component, FPGA, motherboard, baseboard management controller, or the like. The sensors 410 may also include sensors configured to sense conditions associated with the rack, chassis, fan, Power Supply Unit (PSU), and the like. The sensors 410 may also include sensors configured to sense conditions associated with network interfaces, controllers (NICs), top of rack (TOR) switches, in-rack (MOR) switches, routers, Power Distribution Units (PDUs), rack-level Uninterruptible Power Supply (UPS) systems, and the like.

With continued reference to FIG. 4, the sensor 410 may be implemented in hardware, software, or a combination of hardware and software. Some sensors 410 may be implemented using a sensor API that may allow the sensors 410 to receive information via the sensor API. Software configured to detect or listen for particular conditions or events may communicate via the sensor API any conditions associated with the device being monitored by deployment and monitoring 400. Remote sensors or other telemetry devices may be incorporated within the data center to sense conditions associated with components installed therein. Remote sensors or other telemetry may also be used to monitor other adverse signals in the data center and provide information to deployment and monitoring. As an example, if a fan cooling a rack is out of service, a sensor may sense this and report to the deployment and monitoring functions. This type of monitoring may ensure that any second order effects of deployment are detected, reported, and corrected.

Still referring to FIG. 4, a deployment database 412 may be used to store records relating to payload submissions and packages. In addition, the deployment database 412 may also store data used to generate reports related to the deployment. Additional details regarding the functionality of the deployment database 412 are recorded in other portions of the disclosure.

Network interface 414 may include a communication interface, such as an ethernet, cellular radio, bluetooth radio, UWB radio, or other type of wireless or wired communication interface. Bus 420 may be coupled to both the control plane and the data plane. Although fig. 4 shows deployment and monitoring 400 as including a certain number of components arranged and coupled in a certain manner, it may include fewer or more components arranged and coupled differently. Further, the functionality associated with deployment and monitoring 400 may be distributed as desired.

Fig. 5 illustrates a diagram of a memory 500 (e.g., memory 406 of fig. 4) including modules having instructions for performing operations associated with deployment and monitoring 400 (as well as deployment and monitoring 140 of fig. 1). The memory 500 may include a pre-scan module 502, a plan module 504, a packing module 506, a verification module 508, a deployment module 510, and a deployment monitor 512. The pre-scan module 502 may evaluate payload parameters (e.g., payload commit 102) provided as part of a payload commit. The pre-scan module 502 can evaluate the impact of the deployment of any package that includes a payload (e.g., using predefined information from other tables in the database 412 of fig. 4) and by evaluating the current cluster configuration. The results generated by the pre-scan module 502 may be recorded in the database 412 of fig. 4. The planning module 504 may be based on the work performed by the pre-scan module 502 and determine any additional planning information or steps that would be required and record the results back into the database 412. The planning module 504 may determine risk factors associated with the planned deployment. The planning module 504 may also determine gates and watchdog that may be needed to ensure safe and reliable deployment. Details relating to the gates and watchdog may be recorded in database 412. Based on all of this information, the planning module 504 can evaluate the coverage of the current configuration of the relevant area that can provide the given cluster at various stages for deployment and validation. Automated deployment may be by deploying as part of the verification of a package to a primary phase (e.g., a phase including nodes without workload), then a secondary phase (e.g., a phase including nodes with non-customer workload), then a minimum scan tree (described below), then a cluster deployment, the scope of which may vary based on the package and the payload. A minimum scan tree for a cluster can be automatically generated to achieve target coverage, taking into account current cluster usage and composition. Additional details regarding the generation of the minimal scan tree for a cluster are provided with respect to fig. 9 and 10. Finally, if the parameters associated with the payload or packet indicate that additional segmentation is recommended (e.g., a generation before another generation of devices), a suggested segmentation plan may then be generated. The generated information may be stored in database 412.

Still referring to FIG. 5, the packaging module 506 can process the submitted payload and build the deployment package. As part of this process, the packaging module 506 may generate deployment instructions based on the payload parameters. The packaging module 506 may specify a set of in-production Test (TIP) devices that may be targeted for deployment before further launch of the deployment. In one example, the TIP mechanism may be used to perform a minimum coverage test to validate packets in a critical configuration. Additional details of an example package are provided in fig. 6. Thus, the example package 600 may include one or more payloads for deployment. In this example, the packet 600 may include a payload 1602, a payload 2604, and a payload N606. N may be an integer greater than 1. Package 600 may also include health monitor 608. Health monitor 608 may include information about what is monitored and trigger thresholds associated with the monitored information. The package 600 may also include package deployment instructions 610. In one example, the package deployment instructions 610 may include operations or actions that specify a deployment plan. The package deployment instructions may also include instructions regarding monitoring second order effects at a more general level.

Referring to fig. 5, the verification module 508 may receive the completed packet and verify the packet. This process may include initiating a test-in-production (TIP) deployment for the target device. Deployment monitor 512 may be implemented as a logical service to monitor the progress of the active deployment. The status of each submission and subsequent deployments may be tracked, including the start time of the deployment, the end time of the deployment, the runtime of the deployment, and any delays in the deployment. Deployment monitor 512 may also track performance metrics related to the deployment. Table 1 below shows some of the major Key Performance Indicators (KPIs) that may be tracked by deployment monitoring 512.

TABLE 1

The deployment and monitoring 512 may also track additional KPIs, referred to as secondary KPIs in table 2 below.

TABLE 2

Fig. 7 illustrates an example of a segmented deployment 700 according to one example. Package deployment can affect hundreds of unique components, each of which can have a corresponding SKU or unique identifier to identify the component type that distinguishes the component from other component types. An example component may be a particular CPU version produced by intel. In this example, deployment of firmware to Intel CPUs may affect tens of unique CPU versions produced by Intel, which may be deployed as part of a cluster receiving firmware updates. A particular CPU version may have a corresponding SKU to identify that version of the CPU. A cluster may include thousands of CPUs with that particular CPU version. A cluster may include many other versions of intel CPUs and thousands (or fewer) of CPUs for each of these other versions. In this example, deployment may be done securely by following a secure deployment procedure. An example secure deployment process may include first scanning a fleet to determine a diversity of SKUs in the fleet. The process may include deployment and monitoring 140 of a continuous scanner cluster and keeping track of each unique triple or 3-tuple, including generations associated with hardware, manufacturers associated with hardware, and SKUs associated with devices. The deployment database 412 of FIG. 4 may include tables for keeping track of unique triples.

With continued reference to FIG. 7, in one example, the deployment test setup (e.g., fleet portion 710) can be organized in segments, which can conceptually correspond to a radius of detonation. Thus, each stage may include more and more hardware, such that packet deployments to an increasing number of stages may cover more and more hardware in the cluster. Thus, STAGE 1720 may include hardware (e.g., servers) configured to process only synthetic workloads. As an example, servers organized as part of a STAGE 1720 may host virtual machines that do not serve any customer workload or other workloads that may adversely affect the customer or another user. In this example, the shot radius may be zero using the shot radius analogy, since any packet deployed to STAGE 1720 will not affect any actual workload. The STAGE 2730 may include hardware (e.g., servers) configured to handle real-time workloads, which if affected, may adversely affect the workload of at least some users, but not any clients. Thus, in this example, using the explosion radius analogy, the STAGE 2730 can include at least a certain number of servers that, if impacted by package deployment, may impact the workload of at least a small number of users. One way to estimate the workload impact on a customer is to classify servers based on the number of containers or Virtual Machines (VMs). Thus, in one example, as part of the STAGE 2730, only those servers whose container count or VM count is below a threshold (e.g., two containers per platform server or two VMs per server) may be targeted for deployment. The STAGE 3740 can include a large number of servers that, if affected, can adversely affect the workload of at least a few customers. In one example, the STAGE 740 can also include servers with a larger container count per server or a larger VM count per server. STAGE 4750 and STAGE 5760 may include more and more diverse servers that, if affected, may adversely affect more and more customer workloads. Again, in one example, more and more customer workloads may correspond to more and more counts per server container or per server VM counts. Of course, other indicia of the workload of the client may be used to determine which servers or other hardware to include in one phase or another. In one example, the radius of explosion of a deployment may be managed by first deploying a package into as few phases as possible. Although fig. 7 shows a particular number of stages for a secure deployment, more or fewer stages may be used.

FIG. 8 shows a flow diagram 800 of a method for deploying a package, according to an example. Step 802 may include receiving a commit. Upon receipt of the submission, some of the subsequent processing steps may be performed in parallel. Thus, in this example, one path may involve creating a package and verifying the package, while a parallel path may involve pre-scanning and planning deployment. As part of the packaging path, the package may be automatically created and verified. Thus, step 804 may include creating a package that may include a payload, package deployment instructions (e.g., configuration instructions or settings), and a health monitor (e.g., package 600 of fig. 6). In this example, this step may be performed upon instructions corresponding to the packing module 506 of fig. 5 being executed by the processor 402 of fig. 4. The package may also include a set of in-production Test (TIP) targets determined for the verification package. Step 808 may include verifying the package. In this example, this step may be performed when instructions corresponding to the verification module 502 of fig. 5 are executed by the processor 402 of fig. 4. Packet validation may include automatically processing the packet as part of the TIP. Any results obtained by these steps may be stored in the deployment database 412 of fig. 4.

With continued reference to fig. 8, the planned path may involve automatically determining risk factors and any gates that may be needed. Thus, in this example, step 806 may include pre-scanning the payload and any other commit parameters to determine risk factors associated with the deployment of any packages that include committed payloads. Risk factors and gates may be tracked in the deployment database 412 of fig. 4. In this example, this step may be performed when instructions corresponding to the pre-scan module 502 of fig. 5 are executed by the processor 402 of fig. 4. Based on these parameters, the coverage of a smaller number of servers or other types of equipment organized in the STAGE 1720 of fig. 7 can be evaluated, and/or the STAGE 2730 of fig. 7 will provide the current configuration of a given cluster. As part of this step, deployment and monitoring (e.g., deployment and monitoring 140 of FIG. 1) may provide a suggested deployment plan (step 810). The deployment plan may include a degree of parallelism that may be achieved during packaging and deployment. An example level of parallelism during deployment may relate to whether all servers in a rack that are part of a single cluster may receive packets in parallel. The deployment plan may also include the level of testing and validation required before deploying the package to the cluster. The deployment plan may also take into account the importance of the customer and its workload. Thus, a particular customer may have devices critical to their operation, and any deployment of packages to these devices may require additional signoff. Details associated with the deployment plan may be stored in the deployment database 412 of FIG. 4 or other storage. In this example, this step may be performed when instructions corresponding to the planning module 504 of fig. 5 are executed by the processor 402 of fig. 4. The deployment plan may or may not be approved in step 812. In one example, this determination may be made by an administrator associated with the cluster.

Still referring to FIG. 8, if the deployment plan is approved in step 812, then in step 814, the package may be deployed. In this example, this step may be performed when instructions corresponding to the deployment module 510 of fig. 5 are executed by the processor 402 of fig. 4. Additional details of deployment using the minimum scan tree approach are provided with respect to fig. 9 and 10. Alternatively, if the deployment is rejected in step 812, then in step 816, the submission may be rejected. As part of this step, deployment and monitoring may record the denial of the submission. Although fig. 8 shows a certain number of steps performed in a certain order, more or fewer steps may be performed in the same order or in a different order.

Fig. 9 illustrates a diagram of a scan tree 900 for use in the hardware in STAGE 1720 of fig. 7, according to one example. As previously described, the STAGE 1720 can include hardware (e.g., a server) configured to process only synthetic workloads. As an example, servers organized as part of a STAGE 1720 may host virtual machines that do not handle the workload of any customer or other workload that may adversely affect a customer or other user. In this example, fig. 9 shows that there are two generations (G1910 and G2940) of hardware in STAGE 1720. Each device with a unique SKU (or some other item identifier), which may be the target of the deployment of the payload (e.g., firmware), may be represented as an edge of the scan tree. In this example, each device may be identified by a triplet (3-tuple), including: (1) generation of hardware in the data center in which the device is located, (2) the manufacturer of the server or other device having the device, and (3) the SKU associated with the device. Thus, in this example, STAGE 1720 may include the following triplets: g1910, M1912, SKU 1922; g1910, Ml 912, SKU 2924; g1910, M2914, SKU 3926; g1910 MM 916, SKU 7928; and G1910, MM 916, SKU 9930. STAGE 1720 may also include the following triplets: g2940, Ml 942, SKU 1952; g2940, Ml 942, SKU 2954; g2940, M3944, SKU 5956; and G2940, MM 946, SKU 8958. Information about triples in the scan tree 900 corresponding to STAGE 1720 may be stored in a database (e.g., the deployment database 412 of fig. 4). Although FIG. 9 shows a scan tree with only two generations of hardware, additional scan trees may be available for other generations of hardware. Similarly, although FIG. 9 shows a particular number of manufacturers and a particular number of SKUs, other manufacturers and SKUs may also exist.

Fig. 10 illustrates a diagram of a scan tree 1000 of the hardware in STAGE 2730 of fig. 7 according to one example. As previously described, the STAGE 2730 may include at least some number of servers that, if affected by package deployment, may affect the active workload of at least a small number of users. In this example, fig. 10 shows the presence of two generations (G11010 and G21040) of hardware in STAGE 2730. Each device with a unique SKU (or some other identifier), which may be the target of deployment of a payload (e.g., firmware), may be represented as an edge of a scan tree. In this example, similar to the example described with respect to fig. 9, each device may be identified by a triplet (3-tuple) including: (1) generation of hardware in the data center in which the device is located, (2) the manufacturer of the server or other device having the device, and (3) the SKU associated with the device. Thus, in this example, STAGE 2730 may include the following triplets: g11010, M11012, SKU 11022; g11010, Ml 1012, SKU 21024; g11010, Ml 1012, SKU 31026; g11010, M21014, SKU 41028; g11010 MM 1016, SKU 51030; g11010, MM 1016, SKU 61032; g11010, MM 1016, SKU 71034; g11010, MM 1016, SKU 81036; and G11010, MM 1016, SKU 91038. STAGE 2730 may also include the following triplets: g21040, M11042, SKU 11052; g21040, Ml 1042, SKU 21054; g21040, Ml 1042, SKU 31056; g21040, M31044, SKU 41058; g21040, M31044, SKU 51060; g21040, MM 1046, SKU 71062; and G21040, MM 1046, SKU 81064. In this example, the scan tree of STAGE 2730 includes higher diversity components. By way of example, the scan tree of STAGE 2730 includes SKU 31026 as part of the G1 generation hardware provided by manufacturer M1. In addition, the scan tree of STAGE 2730 includes additional types of components of generation G1 provided by the manufacturer MM. Similarly, for the G2 generation hardware, the scan tree includes other types of components. Information about triples in the scan tree 1000 corresponding to STAGE 2730 may be stored in a database (e.g., the deployment database 412 of fig. 4). Although FIG. 10 shows a scan tree with only two generations of hardware, additional scan trees may exist for other generations of hardware. Similarly, although FIG. 10 shows a particular number of manufacturers and a particular number of SKUs, other manufacturers and SKUs may exist.

11A and 11B illustrate a flow diagram of a method for deploying a package, according to one example. Step 1102 may include scanning hardware in the cluster to obtain information about the hardware. In one example, step 1102 can include the pre-scan module 502 of FIG. 5 that scans hardware in a cluster to obtain information about hardware components deployed in the cluster. The information obtained by scanning the hardware may be stored in the deployment database 412 of FIG. 4. The scanned information may include information about the generation, manufacturer, and SKU associated with each hardware component in the fleet. This information may be organized in one or more tables and stored in the deployment database 412 of FIG. 4.

Step 1104 may include classifying hardware in the cluster into a deployment category divided by volume. In one example, classifying hardware in the fleet into deployment categories by volume can include the planning module 504 of FIG. 5 processing scan information about the hardware. The deployment category may include at least one type (or category) of component that may require the deployment package. Thus, in one example, each deployment category may include those components that may receive the same or similar payloads. The classification information may be stored in the deployment database 412 of FIG. 4. The planning module 504 may also determine risk factors associated with the planned deployment. By way of example, the planning module 504 may determine the types of customers that may be affected by the deployment. The planning module 504 may further determine gates and watchdog that may be needed to ensure safe and reliable deployment. Details regarding the doors and watchdog may also be recorded in the deployment database 412 of FIG. 4. Based on all of this information, the planning module 504 can evaluate the coverage of the current configuration of the relevant area that can provide the given cluster at various stages for deployment and validation.

Step 1106 can include mapping the package to a device selected for deployment. As part of this step, the planning module 504 can create information (e.g., a table or a set of tables) that maps the package to the selected device for deploying the package. This information may be stored in the deployment database 412 of FIG. 4.

Step 1108 may include scanning hardware in STAGE 1 to determine whether the selected diversity objective is met. If the selected diversity objective is met, flow may proceed to process STAGE a 1110. Otherwise, flow may proceed to process STAGE B1112. In one example, as part of this step, the planning module 504 can build (or process) an existing minimum scan tree, as with respect to fig. 9 and 10. The goal may be to obtain a reasonable amount of confidence in deploying packages in a manner consistent with operational characteristics, such as with minimal disruption to the customer's workload. By way of example, as described with respect to FIG. 9, if a package is to be deployed to an FPGA in a cluster having five different SKUs, the minimum scan tree may include a selected set of triples. In STAGE 1, coverage of the selected diversity target is allowed. The diversity objective of selection may be the percentage of different types of SKUs that may receive a package as part of the planning phase of deployment. Thus, in this example, as long as 80% of the different types of FPGAs receive the packet, it may be sufficient to meet the selected diversity objective. Assuming, as part of this example, that STAGE 1 includes only 50% of different types of FPGAs, the planning module 504 can conclude that the selected diversity objective is not met, and processing can continue with STAGE B1112. Alternatively, if STAGE 1 includes 80% different types of FPGAs, then planning module 504 can proceed to process STAGE a 1110.

With respect to fig. 11B, if the hardware in scan STAGE 1 does not achieve satisfaction of the selected diversity objective, flow may continue from process STAGE B1112. Accordingly, step 1114 can include scanning hardware in STAGE 2 to determine whether the selected diversity objective is satisfied. In this example, the planning module 504 may construct (or process) an existing minimum scan tree, as described with respect to fig. 11. Similar to the FPGA example with respect to fig. 11A, the planning module 504 can scan the hardware in STAGE 2 to determine whether the selected diversity objective is satisfied. Thus, if the combination of SKUs of the FPGAs in STAGE 1 and STAGE 2 includes 80% of the SKUs of the FPGAs, then the planning module 504 can proceed to processing STAGE A1110. Otherwise, flow may proceed to step 1116.

Step 1116 may include scanning hardware in STAGE 3 to determine whether the selected diversity objective is satisfied. In this example, the planning module 504 can build (or process) an existing minimum scan tree of STAGE 3 in a similar manner as described with respect to fig. 10. Similar to the FPGA example with respect to fig. 11A, the planning module 504 can scan the hardware in STAGE 3 to determine whether the selected diversity objective is satisfied. Thus, if the combination of the SKUs of the FPGAs in STAGE 1, STAGE 2, and STAGE 3 includes 80% of the SKU of the FPGAs, then planning module 504 can proceed to processing STAGE A1110. Otherwise, flow may proceed to step 1118.

Step 1118 may include continuing to scan additional stages until the selected diversity objective is met or all remaining stages have been scanned. In one example, the selected diversity target may be selected based on the packet type. Alternatively or additionally, the selected diversity objective may be selected based on the type of influence. Thus, for a particular package type, the selected diversity objective may be 75% of the SKU, while for another package type, the selected diversity objective may be 90%.

Once the state of the fleet is determined and the minimum scan tree for deployment of a particular package or group of packages is determined, processing can proceed to the next step. These steps may include determining the speed of deployment. In one example, the speed of deployment may be related to the number of doors included in the deployment. Each gate may correspond to a waiting period (e.g., a particular number of hours, days, or months) that specifies a time after each step of the deployment process that deployment may be delayed. As an example, for a particular package to be deployed to a CPU, the deployment may be gated within 24 hours of deployment to the minimum scan tree; after 24 hours, the package can be deployed to the CPUs in the remaining clusters with associated SKUs. In one example, a gate may specify a longer latency period (e.g., data plane 220 of fig. 2) when deploying devices involving processing via a control plane (e.g., control plane 230 of fig. 2) relative to devices via a data plane.

In another example, the speed of deployment may be related to the impact of the deployment. Thus, the number of gates and the latency period specified by the gates may depend on the impact of the deployment on the cluster. By way of example, certain deployments may be characterized as having an impact, while other deployments may be characterized as having no impact. Deployments can also be characterized by a sliding scale between no and impact. The process can include a planning module 504 that considers both the package type and the impact type of the package. The information corresponding to the impact, including the impact type and the package type, may be stored in a table in a database (e.g., deployment database 412 of FIG. 4).

FIG. 12 illustrates an impact table 1200 according to one example. The impact table 1200 may be used to keep track of the impact of package deployment on various devices in a cluster. As an example, the impact table 1200 may be used to classify impacts on a device into a plurality of impact types and to classify packets into a plurality of packet types. In one example, impact table 1200 may be stored in deployment database 412 of FIG. 4. In this example, the impact table 1200 may include information organized into rows and columns, including impact types 1210 in the rows and package types 1240 in the columns. Impact types 1210 may include CPU pause 1212, storage pause 1214, network pause 1216, FPGA pause 1218, restart 1220, power increase 1222, performance degradation 1224, and thermal impact 1226. The packet type 1240 may include microcode 1242, Universal Extensible Firmware Interface (UEFI)/basic input/output system (BIOS)1244, Baseboard Management Controller (BMC)1246, Solid State Drive (SSD)1248, Hard Disk Drive (HDD)1250, and FPGA 1252. FPGA 1252 may also include two steamed bun types: FPGA platform 1254 and FPGA image 1256. Although impact table 1200 shows particular information organized in a particular manner, more or less information may be included and organized in different manners. Further, the information in impact table 1200 may be enclosed in other types of data structures, including linked lists or other structures. As another example, the information in the influence table may be distributed such that the influence information for each package may be included with the package as metadata or other data structures associated with the package.

With continued reference to FIG. 12, in this example, as shown in the table, deploying a packet with microcode may cause the CPU to pause for less than X seconds (Xs); deploying a package with UEFI/BIOS may cause the CPU to pause for less than X seconds (Xs); and deploying the package to the FPGA platform may cause the CPU to pause for less than X seconds (Xs), where X is a number. Further, in this example, deploying the package to the SSD may result in a storage pause of Y seconds (Ys), while deploying the package to the HDD may result in a storage pause of Z seconds (Zs), where each of Y and Z is a number. Further, deploying the package to the FGPA platform may cause network outages of less than F seconds (Fs). Deploying the package to the FPGA platform may cause the FPGA to pause for less than P seconds (Ps). On the other hand, deployment of the package to the FPGA image may be image-specific.

Still referring to fig. 12, in this example, the impact table 1200 may include information regarding the type of packet that may cause a reboot. Thus, in this example, deploying the UEFI/BIOS package and deploying the package to the FPGA platform may always cause a reboot; however, deploying the package to the SSD or HDD may only cause a reboot at some time. The impact table 1200 may also include information regarding power changes (e.g., increased power) based on the deployment of a particular packet type. For microcode, UEFI/BIOS, SSD, and HDD packet types, the power increase may be small; however, for FPGA platform packet types, the increase in power may be moderate, while for FPGA images, the increase in power may be image-specific. The impact table 1200 may also include information regarding performance changes (e.g., performance degradation) based on the deployment of a particular package type. For microcode, UEFI/BIOS, SSD, and HDD packet types, the performance degradation may be small. The impact table 1200 may also include information regarding changes in thermal impact (e.g., higher or lower thermal impact) based on the deployment of a particular package type. For microcode, UEFI/BIOS, and BMC packet types, the thermal impact may be low.

Although the impact table 1200 contains information about specific packet types and impact types, the impact table 1200 may contain specific information about each of additional or fewer packet types and impact types. As an example, the impact table 1200 may include information regarding impact to Network Interface Controller (NIC), top of rack (TOR) switches, in-rack (MOR) switches, routers, Power Distribution Units (PDUs), and rack level Uninterruptible Power Supply (UPS) system deployment packages.

Fig. 13 illustrates a machine learning system 1300 according to one example. The machine learning system 1300 may include a processor 1302, I/O devices 1304, memory 1306, sensors 1310, a display 1320, and a network interface 1322, which may be interconnected via a bus system 1330. Bus system 1330 may be coupled to both a data plane (e.g., data plane 220 of fig. 2) and a control plane (e.g., control plane 230 of fig. 2) via a network, including a wired network and a wireless network. Processor 1302 may execute instructions stored in memory 1306. The memory 1306 may be any combination of non-volatile memory or volatile memory (e.g., flash, DRAM, SRAM, or other types of memory). The sensors 1310 may include telemetry or other types of sensors configured to detect and/or receive information (e.g., conditions associated with a device).

With continued reference to fig. 13, sensors 1310 may include sensors, FPGAs, motherboards, baseboard management controllers, etc. configured to sense conditions associated with a CPU, memory, or other storage component. The sensors 1310 may also include sensors configured to sense conditions associated with the rack, chassis, fans, Power Supply Units (PSUs), and the like. The sensors 1310 may also include sensors configured to sense conditions associated with Network Interface Controllers (NICs), top of rack (TOR) switches, in-rack (MOR) switches, routers, Power Distribution Units (PDUs), rack-level Uninterruptible Power Supply (UPS) systems, and the like. The sensor 1310 may be implemented using a sensor API that may allow the sensor 1310 to receive information via the sensor API. Software configured to detect or listen for particular conditions or events may communicate any conditions associated with deploying and monitoring 400 the device being monitored via the sensor API. Remote sensors or other telemetry devices incorporated within the data center to sense conditions associated with components installed therein may sense the conditions and provide information to the sensors 1310 or processor 1302. Further, deployment and monitoring may also communicate data related to the event or condition to the sensor 1310 or processor 1302. As an example, any event or condition sensed by the sensor 410 of fig. 4 may be provided to the processor 1302 as desired.

Display 1320 may be any type of display, such as an LCD, LED, or other type of display. Network interface 1322 may include a communication interface, such as an ethernet, cellular radio, bluetooth radio, UWB radio, or other type of wireless or wired communication interface. Although fig. 13 illustrates machine learning system 1300 as including a particular number of components arranged and coupled in a particular manner, it may include fewer or additional components arranged and coupled differently. Further, the functionality associated with the machine learning system 1300 may be distributed as desired.

Fig. 14 illustrates a memory 1400 (e.g., memory 1306 of fig. 13) including instructions and data for use by the machine learning system 1300, according to one example. In this example, the instructions may be organized in memory 1400 in blocks or modules comprising code, data, or both. In this example, memory 1400 may include a learning-based analyzer (LBA)1410, training data 1420, Machine Learning (ML) model 1430, impact tables 1440, payload parameters 1450, package parameters 1460, deployment parameters 1470, and fleet parameters 1480. Although FIG. 14 shows instructions and data organized in a particular manner, the instructions and data may be combined or distributed in various ways.

With continued reference to fig. 14, the learning-based analyzer (LBA)1410 may implement a supervised learning algorithm that may be trained based on input data and, once trained, may make predictions or directives based on the training. In this example, LBA 1410 may implement techniques such as linear regression, Support Vector Machines (SVMs) for regression setup, random forests for regression setup, gradient boosting trees and neural networks for regression setup. Linear regression may include modeling past relationships between independent variables and dependent output variables. The neural network may include artificial neurons for creating an input layer, one or more hidden layers, and an output layer. Each layer may be encoded as a weight matrix or vector represented in the form of coefficients or constants that may have been obtained via offline training of the neural network. The neural network may be implemented as a Recurrent Neural Network (RNN), a Long Short Term Memory (LSTM) neural network, or a Gated Recurrent Unit (GRU). All information required for a supervised learning based model can be converted into a vector representation corresponding to any of these techniques. Taking LSTM as an example, an LSTM network may contain a series of repeated RNN layers or other types of layers. Each layer of the LSTM network may consume one input at a given time step, e.g., the state of the layer from a previous time step, and may produce a new set of outputs or states. In the case of using LSTM, a single block of content may be encoded as a single vector or multiple vectors. As an example, a word or combination of words (e.g., a phrase, sentence, or paragraph) may be encoded as a single vector. Each block may be encoded into a single layer (e.g., a particular time step) of the LSTM network. The LSTM layer may be described using a set of equations, shown below:

i_t＝σ(W_xixt+W_hih_t-1+W_cic_t-1+b_i

f_t＝σ(W_xfx_t+W_hfh_t-1+W_cfc_t-1+b_f)

c_t＝f_tc_t-1i_ttanh(W_xcx_t+W_hch_t-1+b_c)

o_t＝σ(W_xox_t+W_hoh_t-1+W_coc_t+b_o)

h_t＝o_ttanh(c_t)

in this example, within each LSTM layer, the input and hidden states may be processed using vector operations (e.g., dot product, inner product, or vector addition) or a combination of non-linear operations, if desired.

Although FIG. 14 depicts LBA 1410 as including instructions, the instructions may be encoded to correspond to the hardware of the A/I processor. In this case, some or all of the functionality associated with the learning-based analyzer may be hard coded or otherwise provided as part of the a/I processor. By way of example, the A/I processor may be implemented using an FPGA with the necessary functionality.

The training data 1420 may be data that may be intended for training neural network models or similar machine learning models. In one example, the training data 1420 may be used to train a machine learning model to minimize an error function associated with deployment of the package. In one example, minimization of the error function may be obtained by obtaining user feedback on various payload and packet parameters, and determining appropriate weights for convolution operations or other types of operations to perform as part of machine-based learning. As an example, users in a test environment may be provided with a set of pre-selected mapping functions with known payload and packet parameters and asked to select their favorite mapping function.

The ML model 1430 may include a machine language model that may be used as part of the machine learning system 1300. The ML model 1430 may include a model created through a training process. In this example, the training data 1420 may include target attributes, such as a selected diversity target for deploying the package. A suitable machine learning algorithm included as part of LBA 1410 may find a pattern in training data 1420 that maps a given set of input parameters (e.g., payload parameters and package parameters) to a selected diversity target for deploying the package. In another example, the machine learning algorithm may find patterns in the training data 1420 that map the input parameters to the deployment classification. An example deployment classification may include at least two categories: with or without influence. Other machine language models may also be used. As an example, the training data 1420 may be used to train a machine language model that maps input package types to any impact associated with deployment of the packages. The effects may be represented in a similar form as described with respect to the effects table 1440. Thus, the impact table 1440 may be similar or identical to the impact table 1200 of FIG. 12.

Payload parameters 1450 may include parameters associated with the payload. In one example, the payload parameters may include the type of payload, the target SKU of the payload, the degree of change caused by the payload deployment, any preconditions, any known impact, and the required deployment time. The payload parameters 1450 may be extracted from metadata associated with the payload or otherwise obtained through a commit process as previously described.

Packet parameters 1460 may include parameters associated with packets that include a payload. In one example, the package parameters 1460 may include information related to the type of health monitoring included with the package. The package parameters 1460 may also include the package type and the gates and watchdog required to deploy the package.

The deployment parameters 1470 may include information about the push plan. By way of example, the deployment parameters 1470 may include an evaluation of the target conditions needed for deployment. These conditions may include information about whether any device resets are required, node reboots, node re-marshalling, power cycling, or disk reformatting. These parameters may be included as part of the instructions and/or metadata associated with the package.

The cluster parameters 1480 may include information about the entire cluster or a subset of the cluster that may be the target of deployment. The fleet parameters can include information related to the item type (e.g., SKU) associated with the data center in the fleet or a subset of the fleet. This information may include the number of each of the SKUs. In addition, the cluster parameters 1480 may include additional details regarding the data centers included in the cluster or a subset of the cluster. As examples, the information about the data center may include location information, AC voltage supply of the data center (e.g., 120 volts or 240 volts), operator information (e.g., whether the data center is operated by a service provider or by a customer of the service provider). The deployment module 510 of fig. 5 can be used to evaluate the fleet parameters 1480. Some of the fleet parameters 1480 may be stored in the deployment database 412 of fig. 4.

The ML model 1430 may include a model trained to prioritize targets with minimal impact. Thus, in one example, the ML model can appreciate that when a node needs to be restarted, empty nodes should be deployed first-as they do not run any workload. The ML model 1430 may also include a model that may be trained to receive as input parameters associated with the payload, the package, the deployment, and the fleet, and to determine whether some of the deployment steps may be performed in parallel. Further, the ML model 1430 can include a model that can be trained to receive as input parameters associated with the payload, the package, the deployment, and the fleet, and to determine the particular gates and watchdog that may be needed during deployment to the fleet. Further, ML model 1430 may include a model that may be trained to receive as input parameters associated with the payload, the package, the deployment, and the fleet, and determine the type of health monitoring that should be included as part of the deployment of the package. Finally, other automatic feedback models may also be used. As an example, such automatic feedback models may not rely on machine learning; instead, they may rely on other feedback mechanisms to allow for automatic creation of packages for deployment or to allow for automatic creation of deployment plans for deploying packages to the fleet. Regardless, in some cases, the automatic feedback model may be learned using a machine language model, such as a reinforcement learning model.

FIG. 15 shows a flowchart 1500 of a method for creating a package including a payload for deployment to a set of devices, according to one example. Step 1502 may include receiving a payload, wherein the payload has an associated set of payload parameters relating to deploying the payload to the set of devices. As previously described, the payload may be received via a submission portal or otherwise. The payload parameter may be the payload parameter 1450 explained with respect to fig. 14.

Step 1504 may include automatically creating, using a processor, a package for deployment to the set of devices, wherein the package includes instructions for deploying a payload to the set of devices, and wherein the instructions specify at least one of a plurality of operations derived from a machine learning model based at least on a subset of the payload parameters of the associated set. In this example, processor 1302 may execute instructions stored in memory 1306 (e.g., instructions corresponding to learning-based analyzer 1410) to perform this step. The instructions to deploy the payload may specify operations such as the number of gates and/or watchdog required for deployment. The operations may involve any deployment parameters (e.g., deployment parameters 1470 of FIG. 14) related to deploying the package to the set of devices. As an example, the operation may specify a deployment schedule and a scope that include a push plan. The operations may also include health monitoring information for package deployment. The health monitoring information may include what is monitored and trigger thresholds associated with the monitoring information.

In one example, automatically creating a package for deployment to the set of devices may include processing metadata or other commit parameters associated with the payload. The machine learning model may be trained based on training data that includes a mapping between at least a subset of the associated set of payload parameters and a set of labels that classify an impact of deploying a payload to the set of devices. In one example, the set of tags may include a first tag that classifies an impact as an impact and a second tag that classifies an impact as no impact. Any of the ML models 1430 described with respect to FIG. 14 can be trained and used as previously explained.

FIG. 16 illustrates a flow chart 1600 of a method for deploying a package to a cluster according to one example. Step 1602 can include evaluating a cluster to determine a set of cluster parameters associated with deploying a package to the cluster. In this example, processor 1302 may execute instructions stored in memory 1306 (e.g., instructions corresponding to learning-based analyzer 1410) to perform this step. Evaluating the cluster can include processing metadata associated with the cluster. The metadata may include information regarding the use and composition of the cluster. The metadata may be stored in the deployment database 412 of FIG. 4. In one example, as previously explained with respect to fig. 11A and 11B, evaluating the cluster can also include scanning hardware associated with the cluster.

Step 1604 may include automatically creating, using the processor, a deployment plan for deploying the package to the fleet, wherein the deployment plan includes instructions for deploying the package to the fleet, and wherein the instructions specify at least one of a plurality of operations originating from the machine learning model based at least on a subset of the set of fleet parameters. In this example, processor 1302 may execute instructions stored in memory 1306 (e.g., instructions corresponding to learning-based analyzer 1410) to perform this step. The machine learning model may be trained based on training data that includes a mapping between at least a subset of the fleet parameters and at least one label associated with the deployment plan. Further, the machine learning model may be trained based on feedback regarding deployment of the package to the fleet. The plurality of operations may include an action corresponding to the deployment of the monitoring package to the cluster. Thus, as previously described, the deployment monitor may monitor the deployment of the fleet. Additional details regarding the deployment monitor are provided with respect to deployment monitor 512 of FIG. 5. The operations may include information about what is monitored and trigger thresholds associated with the monitored information. Further, the operations can include an act corresponding to generating information regarding a minimum scan tree that includes a group of devices in the cluster. Additional details related to generating information about the minimum scan tree are provided with respect to fig. 11A and 11B.

In one example, packets may be pushed out of the entire cluster in segments with minimal impact on customer workload. Thus, the package may first be deployed to an empty node (e.g., a node that does not host any workload). Next, the package may be deployed to those nodes having a minimum number (e.g., two) of workloads (e.g., determined based on the container count or the count of virtual machines supported by the node). Next, the package may be deployed to those nodes that have a slightly higher workload, and so on. This can limit the explosion radius and help contain any damage to the customer workload if the deployment causes hardware functionality to be interrupted.

A logical reporting service may also be implemented to keep track of deployment in real time. The service may access data stored in the deployment database 412 of FIG. 4 as well as data stored in other sources and automatically generate reports. May use a signal such asBusiness analysis tools at the BI to implement the reporting service. Deployment dashboards may also be implemented.

Fig. 17 illustrates a deployment dashboard 1700 according to one example. The deployment dashboard 1700 includes information relating to the current state of the deployed fleet with respect to various package types. The information can be displayed and tracked in real time. Information relating to the current state of the fleet may be stored in the deployment database 412 of FIG. 4 and may be retrieved and displayed via the deployment dashboard 1700 via the deployment monitor 512 of FIG. 5. In this example, the deployment dashboard 1700 can include freshness information that indicates the current state of the deployed fleet with respect to a particular package type. In this example, the deployment dashboard 1700 may include meters that represent the current fleet status by package type. Deployment dashboard 1700 can also indicate whether the meter is associated with a control plane device or a data plane device. Further, the deployment dashboard 1700 may indicate that the tracked deployment impact is minor or impact. Thus, the meters 1710 may indicate the current status of the deployment of packages associated with the chassis manager. The meter 1720 may indicate a current status of a deployment of a package associated with a Power Supply Unit (PSU). The meter 1730 may indicate the current status of the deployment of the packages related to the CPU microcode. The meter 1730 may indicate the current status of the deployment of the packages related to the CPU microcode. The meter 1740 may indicate the current status of the deployment of packets related to a Baseband Management Controller (BMC). The meter 1750 may indicate the current state of deployment of the package in relation to the UEFI/BIOS. The meter 1760 may indicate the current status of the deployment of the package related to HDD/SDD. Although fig. 17 illustrates a deployment dashboard 1700 having a particular number of meters organized in a particular manner, the deployment dashboard 1700 may include additional or fewer meters that may be organized differently. By way of example, the deployment dashboard 1700 may include instrumentation for tracking other packet types including, for example, FPGA platforms, FPGA images, Network Interface Controllers (NICs), top of rack (TOR) switches, in rack (MOR) switches, routers, Power Distribution Units (PDUs), and rack-level Uninterruptible Power Supply (UPS) systems.

In addition, other dashboards may be provided, including dashboards for tracking each activity deployment. Each such dashboard may display deployment progress, including current deployment rates and projected completion times. In addition to the active deployment, the pending deployment may be displayed. For a suspended deployment, the dashboard may include the status of the deployment, such as committed, packaged, tested, waiting, aborted, or completed. Additional details regarding each deployment (active or pending) may be made available by a deployment monitor (e.g., deployment monitor 512 of fig. 5). In addition to the dashboard, the deployment monitor 512 can provide reports on critical results, such as deployment security, deployment time, detection efficiency, deployment impact, and deployment parallelism.

In summary, the present disclosure relates to a method for creating a package comprising a payload for deployment to a set of devices. The method may include receiving a payload, wherein the payload has an associated set of payload parameters related to deploying the payload to the set of devices. The method may also include automatically creating, using the processor, a package for deployment to the set of devices, wherein the package includes instructions for deploying the payload to the set of devices, and wherein the instructions specify at least one operation of a plurality of operations derived from the machine learning model based at least on a subset of the associated set of payload parameters.

Automatically creating a package for deployment to the set of devices may also include processing metadata or other submission parameters associated with the payload. The machine learning model may be trained based on training data that includes a mapping between at least a subset of an associated set of payload parameters and a set of labels that classify an impact of deploying a payload to the set of devices. The set of tags may include a first tag that classifies an impact as an impact and a second tag that classifies an impact as no impact.

The plurality of operations may include actions relating to: a schedule associated with deploying a package to a set of devices. The plurality of operations may include actions relating to: a gate associated with deploying the package to a set of devices. The plurality of operations may include actions relating to: a watchdog associated with deploying a package to a set of devices.

In another example, the present disclosure is directed to a method for deploying a package to a cluster. The method can include evaluating a cluster to determine a set of cluster parameters associated with deploying a package to the cluster. The method may also include automatically creating, using the processor, a deployment plan for deploying the package to the fleet, wherein the deployment plan includes instructions for deploying the package to the fleet, and wherein the instructions specify at least one operation of a plurality of operations derived from the machine learning model based at least on a subset of the set of fleet parameters.

Evaluating the cluster can include processing metadata associated with the cluster. The machine learning model may be trained based on training data that includes a mapping between at least a subset of the fleet parameters and at least one label associated with the deployment plan. The machine learning model may be trained based on feedback regarding deployment of the package to the fleet.

The plurality of operations may include an action corresponding to the deployment of the monitoring package to the cluster. The plurality of operations may include an act corresponding to generating information regarding a minimum spanning tree that includes a group of devices in the cluster.

In yet another example, the present disclosure is directed to a system for deploying a package to a cluster. The system can be configured to evaluate a cluster to determine a set of cluster parameters associated with deploying a package to the cluster. The system may also be configured to automatically create, using the processor, a deployment plan for deploying the package to the fleet, wherein the deployment plan includes instructions for deploying the package to the fleet, and wherein the instructions specify at least one of a plurality of operations originating from the machine learning model based at least on a subset of parameters of the fleet.

As part of evaluating the cluster, the system may also be configured to process metadata associated with the cluster. The machine learning model may be trained based on training data that includes a mapping between at least a subset of the fleet parameters and at least one label associated with the deployment plan. The machine learning model may be trained based on feedback regarding deployment of the package to the fleet.

In yet another example, the present disclosure is directed to a method for creating a package comprising a payload for deployment to a set of devices. The method may include receiving a payload, wherein the payload has an associated set of payload parameters related to deploying the payload to the set of devices. The method may also include automatically creating, using the processor, a package for deployment to the set of devices, wherein the package includes instructions for deploying a payload to the set of devices, and wherein the instructions specify at least one of a plurality of operations originating from the automated feedback model based at least on a subset of the payload parameters of the associated set.

Automatically creating a package for deployment to the set of devices may include processing metadata or other submission parameters associated with the payload. The automated feedback model may include a reinforcement learning model trained based on training data including a mapping between at least a subset of the associated set of payload parameters and a set of labels that classify an impact of deploying a payload to the set of devices. The set of labels may include a first label that classifies the impact as an impact and a second label that classifies the impact as less impacting.

The plurality of operations may include actions related to a schedule associated with deploying the package to the set of devices. The plurality of operations may include actions related to a door associated with deploying the package to a set of devices. The plurality of operations may include actions related to a watchdog associated with deploying the package to a set of devices.

It should be understood that the methods, modules, and components described herein are merely exemplary. Alternatively or additionally, the functions described herein may be performed, at least in part, by one or more hardware logic components. By way of example, and not limitation, illustrative types of hardware logic components that may be used include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like. In an abstract, but still definite sense, any arrangement of components to achieve the same functionality is effectively "associated" such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as "associated with" each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being "operably connected," or "coupled," to each other to achieve the desired functionality.

The functionality associated with some examples described in this disclosure may also include instructions stored in a non-transitory medium. The term "non-transitory medium" as used herein refers to any medium that stores data and/or instructions that cause a machine to operate in a specific manner. Exemplary non-transitory media include non-volatile media and/or volatile media. Non-volatile media includes, for example, hard disk, solid state drive, magnetic disk or tape, optical disk or tape, flash memory, EPROM, NVRAM, PRAM, or other such media, or a networked version of such media. Volatile media include, for example, dynamic memory, such as DRAM, SRAM, cache, or other such media. The non-transitory medium is different from, but may be used in conjunction with, a transmission medium. Transmission media are used to transmit data and/or instructions to and from a machine. Exemplary transmission media include coaxial cables, fiber optic cables, copper wire and wireless media such as the airwaves.

Furthermore, those skilled in the art will recognize that boundaries between the functionality of the above described operations merely illustrative. The functionality of multiple operations may be combined into a single operation, and/or the functionality of a single operation may be distributed in additional operations. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments.

While the present disclosure provides specific examples, various modifications and changes may be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of present disclosure. Any benefits, advantages, or solutions to problems that are described herein with regard to specific examples are not intended to be construed as a critical, required, or essential feature or element of any or all the claims.

Furthermore, the terms "a" or "an," as used herein, are defined as one or more than one. Furthermore, the use of introductory phrases such as "at least one" and "one or more" in the claims should not be construed to imply that the introduction of an element by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim element to inventions containing only one such element, even if the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an". The same holds true for the use of definite articles.

Unless otherwise specified, terms such as "first" and "second" are used to arbitrarily distinguish between the elements such terms describe. Thus, these terms are not necessarily intended to indicate temporal or other prioritization of such elements.

40页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于改进人机接口性能的汽车电子控制单元预启动

Creating and deploying packages to devices in a fleet based on operations derived from a machine learning model

相关技术

网友询问留言