Container bare metal server and method and system for coping with physical environment risks thereof

文档序号：1963626 发布日期：2021-12-14 浏览：27次中文

阅读说明：本技术 容器裸金属服务器及其物理环境风险的应对方法、系统 (Container bare metal server and method and system for coping with physical environment risks thereof ) 是由苏令浩刘世奇李洋颜开郭峰于 2021-09-17 设计创作，主要内容包括：本申请提供了一种容器裸金属服务器及其物理环境风险的应对方法、系统。容器裸金属服务器为云原生平台的第一节点,方法包括：代理应用采集并汇总容器裸金属服务器的物理环境信息,得到监控指标数据,第二节点上的监控告警模块根据代理应用发出的监控指标数据确定警告类型。当监控告警模块发出的物理环境高危警告,第二节点上的控制应用将容器裸金属服务器标记为不可用,云原生平台将容器裸金属服务器中的容器组调度至云原生平台的健康节点。籍此,使得容器裸金属服务器能够及时感知周围物理环境中的危险,一旦发现可能对其造成严重损伤的危险,对运行其上的应用进行调度处理,确保应用能够在云原生平台上安全、正常地对外提供服务。(The application provides a container bare metal server and a method and a system for dealing with physical environment risks of the container bare metal server. The container bare metal server is a first node of the cloud native platform, and the method comprises the following steps: and the agent application collects and summarizes the physical environment information of the bare metal server to obtain monitoring index data, and the monitoring alarm module on the second node determines the warning type according to the monitoring index data sent by the agent application. When the monitoring alarm module sends out a high-risk warning of the physical environment, the control application on the second node marks the container bare metal server as unavailable, and the cloud native platform schedules the container group in the container bare metal server to the healthy node of the cloud native platform. Therefore, the bare metal server of the container can timely sense the danger in the surrounding physical environment, and once the danger of serious damage to the bare metal server is found, the application running on the bare metal server is scheduled, so that the application can be ensured to be capable of safely and normally providing service to the outside on the cloud native platform.)

1. A method for dealing with physical environment risks of a container bare metal server, wherein the container bare metal server is a first node of a cloud native platform, and the method comprises the following steps:

collecting and summarizing physical environment information of the container bare metal server by the agent application to obtain monitoring index data; wherein the agent application is deployed on the container bare metal server;

the monitoring alarm module determines the type of the alarm according to the monitoring index data sent by the agent application;

in response to a physical environment high-risk warning sent by the monitoring alarm module, the control application marks the container bare metal server as unavailable; the monitoring alarm module and the control application are both deployed on a second node of the cloud native platform;

the cloud native platform schedules container groups in the container bare metal server to healthy nodes of the cloud native platform.

2. The method for dealing with the risk of the physical environment of the bare metal container server according to claim 1, wherein the agent application collects and summarizes the physical environment information of the bare metal container server to obtain monitoring index data, and the method includes:

the sensor periodically collects physical environment information of the container bare metal server and sends the physical environment information to the agent application; wherein the sensor is disposed on the container bare metal server;

and the agent application receives the physical environment information acquired by the sensor and summarizes the physical environment information into the monitoring index data.

3. The method for addressing physical environmental risk of container bare metal servers according to claim 2, wherein the sensor comprises at least one of a temperature detection sensor, a smoke detection sensor, and an image sensor;

correspondingly, the monitoring index data comprises at least one of temperature data, smoke data and image data.

4. The method for handling the physical environment risk of the container bare metal server according to claim 1, wherein the monitoring and warning module includes a monitoring unit and a warning unit, and the determining, by the monitoring and warning module, the warning type according to the monitoring index data sent by the agent application includes:

the monitoring unit analyzes the monitoring index data according to a preset risk type judgment rule, determines the type of the physical environment risk and generates a corresponding warning;

the alarm unit sends different types of alerts to different applications.

5. The method for dealing with the physical environment risk of the container bare metal server according to claim 4, wherein the monitoring unit analyzes the monitoring index data by using a Prometheus system to determine the type of the physical environment risk;

in a corresponding manner, the first and second optical fibers are,

the proxy application acquires and summarizes physical environment information of the container bare metal server by adopting a Prometheus Exporter to obtain the monitoring index data;

the alerting unit sends different types of alerts to different applications using an alert manager module.

6. The method for dealing with the risk of the physical environment of the bare metal container server according to claim 1, wherein the step of marking the bare metal container server as unavailable by the control application in response to the physical environment high-risk warning issued by the monitoring and warning module comprises:

responding to the monitoring alarm module to send out a physical environment high-risk alarm, and the control application accesses an API-Server of the cloud native platform to mark the container bare metal Server as unavailable;

the control application accesses the ETCD through an API-Server of the cloud native platform to obtain all container groups on the container bare metal Server; the container group comprises at least one of a daemon process container group, a mirror image container group and an application container group;

the control application deletes the application container group on the container bare metal server.

7. The method for handling the physical environment risk of the container bare metal server according to claim 6, wherein the controlling application deletes the application container group on the container bare metal server, specifically:

and the control application controls the Kubelet component on the container bare metal Server through the API-Server of the cloud native platform and deletes the application container group.

8. The method for handling risk of physical environment of container bare metal server according to any of claims 1-7, wherein after the cloud native platform schedules the container group in the container bare metal server to the healthy node of the cloud native platform, the method further comprises:

the monitoring alarm module determines warning cancellation according to the monitoring index data sent by the agent application;

and responding to the physical environment danger relieving information sent by the monitoring alarm module, and removing the unavailable mark on the container bare metal server by the control application.

9. A system for coping with physical environmental risk of a container bare metal server, the container bare metal server being a first node of a cloud native platform, the system comprising:

the collection unit is configured to collect and gather physical environment information of the container bare metal server by proxy application so as to obtain monitoring index data; wherein the agent application is deployed on the container bare metal server;

the monitoring unit is configured to determine the type of the warning according to the monitoring index data sent by the agent application by the monitoring alarm module;

the marking unit is configured to respond to a physical environment high-risk warning sent by the monitoring warning module, and control an application to mark the container bare metal server as unavailable; wherein the monitoring alarm module and the control application are deployed on a second node of the cloud native platform;

the scheduling unit is configured to schedule the container group in the container bare metal server to a healthy node of the cloud native platform.

10. A container bare metal server, applied to the method for dealing with the physical environment risk of the container bare metal server according to any one of claims 1 to 8, on which a proxy application is deployed, the container bare metal server further comprising:

and the sensor is arranged on the container bare metal server and used for monitoring the physical environment information of the container bare metal server and sending the physical environment information to the proxy application so that the proxy application summarizes the physical environment information into monitoring index data.

Technical Field

The application relates to the technical field of cloud and primary technology, in particular to a container bare metal server and a method and a system for dealing with physical environment risks.

Background

The virtualization server runs a virtualization platform on a physical server, deploys a virtual machine and runs an operating system in the virtual machine; and the container bare metal server corresponding to the virtualization server does not deploy a virtualization platform and a virtual machine on the physical server any more, and the container is directly operated on the physical server, so that the problems of performance loss, mutual interference of virtual machines and the like caused by a virtualization technology are avoided.

The container bare metal server is used as a choice capable of meeting performance requirements and reducing operation and maintenance costs, and is increasingly widely applied to data centers and cloud services. However, the existing node exception scheduling mechanism of the cloud native platform schedules a container group on an unavailable node and re-deploys the container group on a healthy node in a cluster only after the node in the cluster is in the unavailable state, and when scheduling is generated and is not completed, much pressure is applied to the node which still survives, and high availability of service is difficult to guarantee.

Therefore, how to ensure that the application can run on the container bare metal server safely and reliably has become a very important issue for enterprises.

Disclosure of Invention

The present application is directed to a container bare metal server and a method and a system for dealing with physical environment risks thereof, so as to solve or alleviate the problems in the prior art.

In order to achieve the above purpose, the present application provides the following technical solutions:

the application provides a coping method for physical environment risks of a container bare metal server, wherein the container bare metal server is a first node of a cloud native platform, and the coping method comprises the following steps: collecting and summarizing physical environment information of the container bare metal server by the agent application to obtain monitoring index data; wherein the agent application is deployed on the container bare metal server; the monitoring alarm module determines the type of the alarm according to the monitoring index data sent by the agent application; in response to a physical environment high-risk warning sent by the monitoring alarm module, the control application marks the container bare metal server as unavailable; the monitoring alarm module and the control application are both deployed on a second node of the cloud native platform; the cloud native platform schedules container groups in the container bare metal server to healthy nodes of the cloud native platform.

Preferably, the agent application collects and summarizes physical environment information of the bare metal server of the container to obtain monitoring index data, and the monitoring index data includes: the sensor periodically collects physical environment information of the container bare metal server and sends the physical environment information to the agent application; wherein the sensor is disposed on the container bare metal server; and the agent application receives the physical environment information acquired by the sensor and summarizes the physical environment information into the monitoring index data.

Preferably, the sensor includes at least one of a temperature detection sensor, a smoke detection sensor, and an image sensor; correspondingly, the monitoring index data comprises at least one of temperature data, smoke data and image data.

Preferably, the monitoring and warning module includes a monitoring unit and a warning unit, and the determining, by the monitoring and warning module, the warning type according to the monitoring index data sent by the agent application includes: the monitoring unit analyzes the monitoring index data according to a preset risk type judgment rule, determines the type of the physical environment risk and generates a corresponding warning; the alarm unit sends different types of alerts to different applications.

Preferably, the monitoring unit analyzes the monitoring index data by using a Prometheus system to determine the type of the physical environment risk; correspondingly, the proxy application acquires and summarizes physical environment information of the container bare metal server by adopting a Prometheus Exporter to obtain the monitoring index data; the alerting unit sends different types of alerts to different applications using an alert manager module.

Preferably, the step of marking the container bare metal server as unavailable by the control application in response to the physical environment high-risk warning sent by the monitoring warning module comprises: responding to the monitoring alarm module to send out a physical environment high-risk alarm, and the control application accesses an API-Server of the cloud native platform to mark the container bare metal Server as unavailable; the control application accesses the ETCD through an API-Server of the cloud native platform to obtain all container groups on the container bare metal Server; the container group comprises at least one of a daemon process container group, a mirror image container group and an application container group; the control application deletes the application container group on the container bare metal server.

Preferably, the control application deletes the application container group on the container bare metal server, specifically: and the control application controls the Kubelet component on the container bare metal Server through the API-Server of the cloud native platform and deletes the application container group.

Preferably, after the cloud native platform schedules the container group in the container bare metal server to a healthy node of the cloud native platform, the method further comprises: the monitoring alarm module determines warning cancellation according to the monitoring index data sent by the agent application; and responding to the physical environment danger relieving information sent by the monitoring alarm module, and removing the unavailable mark on the container bare metal server by the control application.

The embodiment of the present application further provides a system for dealing with physical environment risk of a bare metal server of a container, where the bare metal server of the container is a first node of a cloud native platform, and the system includes: the collection unit is configured to collect and gather physical environment information of the container bare metal server by proxy application so as to obtain monitoring index data; wherein the agent application is deployed on the container bare metal server; the monitoring unit is configured to determine the type of the warning according to the monitoring index data sent by the agent application by the monitoring alarm module; the marking unit is configured to respond to a physical environment high-risk warning sent by the monitoring warning module, and control an application to mark the container bare metal server as unavailable; wherein the monitoring alarm module and the control application are deployed on a second node of the cloud native platform; the scheduling unit is configured to schedule the container group in the container bare metal server to a healthy node of the cloud native platform.

An embodiment of the present application further provides a container bare metal server, which is applied to any of the above embodiments of the method for dealing with physical environment risks of the container bare metal server, where an agent application is deployed on the container bare metal server, and the container bare metal server further includes: and the sensor is arranged on the container bare metal server and used for monitoring the physical environment information of the container bare metal server and sending the physical environment information to the proxy application so that the proxy application summarizes the physical environment information into monitoring index data.

Compared with the closest prior art, the technical scheme of the embodiment of the application has the following beneficial effects:

according to the technical scheme provided by the embodiment of the application, the container bare metal server is a first node of a cloud native platform, physical environment information of the container bare metal server is collected and summarized in real time through an agent application deployed on the container bare metal server to obtain monitoring index data of the container bare metal server, and then a monitoring alarm module deployed on a second node of the cloud native platform determines an alarm type according to the monitoring index data; if the monitoring alarm module sends out a physical environment high-risk alarm, the control application deployed on the second node marks the container bare metal server as unavailable; and finally, the cloud native platform schedules the container group in the container bare metal server to the healthy node of the cloud native platform. Therefore, the cloud native platform can sense the external physical environment of the bare metal server of the container in real time and make corresponding preparation before danger in the external physical environment occurs. Once the danger that the bare metal server of the container is possibly seriously damaged in the external physical environment is found, the application running on the bare metal server of the container is scheduled and processed in time, and the application can be ensured to be capable of safely and normally providing service to the outside on a cloud native platform.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. Wherein:

fig. 1 is a schematic flow chart of a method for dealing with a physical environmental risk of a container bare metal server according to some embodiments of the present application;

fig. 2 is a schematic diagram of a Prometheus system deployed on a control node for monitoring in a kubernets cluster according to some embodiments of the present application;

fig. 3 is a schematic diagram illustrating a Prometheus system deployed on an alarm node in a kubernets cluster according to some embodiments of the present application for monitoring;

fig. 4 is a schematic diagram of deployment of a zabbix server component on a control node for monitoring in a kubernets cluster according to some embodiments of the present application;

fig. 5 is a schematic diagram illustrating deployment of a zabbix server component on an alarm node for monitoring in a kubernets cluster according to some embodiments of the present application;

fig. 6 is a schematic diagram of deployment of a Judge component on a control node for monitoring in a kubernets cluster according to some embodiments of the present application;

fig. 7 is a schematic diagram illustrating deployment of a Judge component on an alarm node for monitoring in a kubernets cluster according to some embodiments of the present application;

FIG. 8 is a logic diagram for controlling application scheduling container groups according to some embodiments of the present application;

fig. 9 is a schematic structural diagram of a system for dealing with a physical environmental risk of a container bare metal server according to some embodiments of the present application.

Detailed Description

The present application will be described in detail below with reference to the embodiments with reference to the attached drawings. The various examples are provided by way of explanation of the application and are not limiting of the application. In fact, it will be apparent to those skilled in the art that modifications and variations can be made in the present application without departing from the scope or spirit of the application. For instance, features illustrated or described as part of one embodiment, can be used with another embodiment to yield a still further embodiment. It is therefore intended that the present application cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

In the cloud native era, in order to improve application access and user experience of an enterprise data center, an enterprise deploys and manages containerized applications based on a Kubernetes platform, the Kubernetes platform can take a physical server or a virtual machine as a node to be managed, and after the containerized applications are deployed on a certain node in the platform, the Kubernetes platform can automatically schedule the applications deployed on the platform according to the condition of the node.

As a novel server different from a virtual server, the container bare metal server directly runs the container on the physical server by removing the virtual platform and the virtual machine in the virtual server, so that the problems of performance loss, virtual machine mutual interference and the like caused by a virtualization technology are avoided, and the performance requirement and the operation and maintenance cost can be met.

However, the container bare metal server is put into use as a node of the kubernets platform, which also brings new problems, and with the popularization of the container bare metal server, how to ensure that the application running on the container bare metal server cannot be devastated because the container bare metal server is physically damaged, becomes an important problem. The Kubernetes platform can only monitor the running state of the nodes through Kubelet components on the nodes, and when the external physical environment goes wrong, the running state cannot be found out in time to respond, so that the hardware performance of the bare metal server of the container is affected, even the hardware performance of the bare metal server of the container is damaged physically, and the application performance and data are damaged.

For example, when the air conditioner in the machine room where the bare metal container server is located fails due to some unpredictable reason, the temperature of the external physical environment of the bare metal container server will gradually rise, but the kubernets platform cannot monitor the external physical environment of the node, and corresponding countermeasures cannot be taken.

When the temperature rises to a certain value, the heat dissipation of the container bare metal server is influenced, and the heat is reduced by automatically reducing the frequency, so that the performances of the container bare metal server and the application arranged on the container bare metal server are influenced, and meanwhile, the service life of the container bare metal server is also influenced by continuous operation in a high-temperature environment.

In addition, the temperature of the machine room may be increased not only due to the failure of the air conditioner, but also due to the fire of the machine room, and once the fire of the machine room occurs, the container bare metal server is physically damaged, and the application running on the container bare metal server is subjected to destructive attack.

In order to solve the above problems, the applicant proposed a method of coping with the physical environmental risk of the bare metal server. In the embodiment of the application, the container bare metal server serves as a working node of the cloud native platform, namely, a first node, on which the cloud native application is deployed to provide services to the outside.

Fig. 1 is a schematic flow chart of a method for dealing with a physical environmental risk of a container bare metal server according to some embodiments of the present application; as shown in fig. 1, the method for dealing with the risk of the physical environment of the bare metal server container includes:

step S101, collecting and summarizing physical environment information of the bare metal server of the container by the agent application to obtain monitoring index data. Wherein the agent application is deployed on the container bare metal server.

The current cloud native platform (for example, a kubernets cluster) judges the health state of each node joining the cluster through a heartbeat mechanism, specifically, the state information of a working node is synchronized to a control node through a Kubelet period (10 seconds) on the node, the kubernets orchestrate the state information synchronized by a Kubelet of an engine checking node through a period (5 seconds), and if a certain node does not have the synchronized state information in a certain time range (40 seconds), the control node considers that the node is in an unavailable state. The node state monitoring method can only monitor the running state of the container group on the node through the Kubelet component on the working node, and can not find abnormality in time at the beginning of a problem in an external physical environment, so as to make corresponding measures. The hardware performance of the container bare metal server is affected or even physically damaged due to the danger of the external physical environment, and the performance and data of the application deployed on the container bare metal server are damaged.

In some optional embodiments, when the agent application collects and summarizes the physical environment information of the container bare metal server to obtain the monitoring index data, the physical environment information of the container bare metal server is periodically collected through a sensor arranged on the container bare metal server and is sent to the agent application; and after receiving the physical information acquired by the sensor, the agent application summarizes the physical information into monitoring index data.

In the embodiment of the application, the physical environment information around the container bare metal server is collected by arranging various sensors on the container bare metal server, converted into corresponding sensor information and sent to the agent application.

In the embodiment of the application, Agent (Agent) application is deployed in a containerization mode on all container bare metal servers in a cloud native platform, and the Agent application is responsible for receiving sensor information and device information sent by sensors on nodes through hardware information collection tools (such as redifish and imPitool) and collecting the sensor information and the device information into monitoring index data (metrics).

In a specific example, the sensor includes at least one of a temperature detection sensor, a smoke detection sensor, and an image sensor, and the monitoring index data includes at least one of temperature data, smoke data, and image data.

In the embodiment of the application, parameters such as the temperature and the smoke concentration of the physical environment where the bare metal server of the container is located are monitored through the physical sensor arranged on the bare metal server of the container, long-term image monitoring of the fixed position is carried out on the surrounding environment of the bare metal server of the container, difference operation is carried out on continuously collected pictures, and if the difference exceeds a preset range, abnormal bright light exists. Therefore, the physical environment of the bare metal server of the container is monitored and judged from multiple different dimensions, misjudgment of the physical environment of the bare metal server of the container is effectively avoided, and response accuracy of the bare metal server of the container is improved.

And S102, the monitoring alarm module determines the alarm type according to the monitoring index data sent by the agent application.

In the embodiment of the application, the monitoring alarm module judges the physical environment of the container bare metal server according to the monitoring index data and determines the alarm type. In some optional embodiments, the monitoring alarm module includes a monitoring unit and an alarm unit, the monitoring unit and the alarm unit are respectively deployed on the second node of the cloud native platform in a containerization manner, and the monitoring unit analyzes the monitoring index data according to a preset risk type determination rule, determines the type of the physical environment risk, and generates a corresponding alarm; the alarm unit sends different types of alerts to different applications.

It should be understood that, the monitoring unit should analyze the monitoring index data sent by the agent application for multiple times according to a preset risk type determination rule, and then determine the type of the physical environment risk, so as to prevent the monitoring index data from being incorrect.

For example, when the temperature data included in the metrics is increased, it indicates that the air conditioner of the machine room may be in failure, or an abnormal heat source may occur in the surrounding environment; when the smoke concentration in the metrics is increased, it indicates that a large amount of smoke exists in the surrounding environment, and people may smoke in a machine room or fire in the surrounding environment; when abnormal bright light exists in the image data included in the metrics, it indicates that there is an abnormal light source in the surrounding environment, and it may be that the surrounding environment is on fire.

When these conditions occur, it can be determined that an abnormal condition occurs in the physical environment of the bare metal server, and a machine room maintenance person needs to be notified immediately. The monitoring unit generates an abnormal warning, the warning unit sends the abnormal warning to the notification application, and the machine room maintenance personnel are notified in various modes such as short messages, telephone calls, mails, alarms and the like. Secondly, the monitoring unit can also determine the type of risk by comprehensively analyzing metrics.

If the temperature data continuously rises to the preset temperature threshold, the smoke concentration continuously rises to the preset concentration threshold, and abnormal light exists in the surrounding environment, the monitoring unit can determine that the type of the physical environment risk is a fire hazard and belongs to a high-risk, an application running on the container bare metal server needs to be dispatched to a healthy node on the cloud native platform, the monitoring unit generates a physical environment high-risk warning after determining that the type of the risk is the high-risk, the warning unit sends the physical environment high-risk warning to a control application (Controller application), and the Controller application dispatches the application running on the container bare metal server to the healthy node on the cloud native platform.

If only the temperature data continuously rises and the performance of the container bare metal server is affected by the rise of the external temperature, the monitoring unit can judge that the risk is slight, the cloud native platform is rejected to continue to deploy the new application to the container bare metal server, the monitoring unit generates a physical environment slight warning after the risk type is determined to be slight, the warning unit sends the physical environment slight warning to the rejected application, and the rejected application rejects the cloud native platform to continue to deploy the new application to the container bare metal server.

If only the smoke concentration is continuously increased, the monitoring unit can determine the risk type as medium risk and generate a medium physical environment warning, the monitoring unit generates the medium physical environment warning after determining that the risk type is medium risk, the warning unit sends the medium physical environment warning to the warning application, the warning application rejects the cloud native platform to continue to deploy a new application to the container bare metal server, and the responsible personnel of the building where the new application is located are immediately notified.

In a specific example, as shown in fig. 2 and fig. 3, the monitoring unit analyzes the monitoring index data by using a Prometheus system to determine the type of the physical environment risk; correspondingly, the proxy application acquires and summarizes physical environment information of the bare metal server of the container by adopting a Prometheus Exporter to obtain monitoring index data; the alerting unit uses an alert manager module to send different types of alerts to different applications.

In the embodiment of the application, a Prometheus system, an alert manager module and a control (Controller) application are deployed in a containerized form on a control node or an alarm node (any other node different from the control node and the first node) of a cloud native platform, wherein the Controller application is deployed in a Deployment manner. The Prometheus system obtains metrics through a Prometheus exporter of an Agent application in a first node, judges whether an alarm needs to be issued or not based on the metrics, transmits the alarm to an AlertManager module if the alarm needs to be issued, and issues the alarm to different Controller applications according to the type of the alarm.

Specifically, the various sensors arranged on the bare metal container server periodically (for example, every 10 seconds) collect the surrounding physical environment information, the Prometheus system periodically acquires the corresponding metrics, and when it is determined whether the warning needs to be issued, the Prometheus system analyzes the metrics acquired for multiple times, and then determines whether the warning needs to be issued, so as to prevent false alarms.

In another specific example, as shown in fig. 4 and 5, the monitoring unit analyzes the monitoring index data by using a zabbix server component of the zabbix system to determine the type of the physical environment risk; correspondingly, the agent uses a zabbix agent component of the zabbix system to collect and gather physical environment information of the bare metal server of the container so as to obtain monitoring index data.

In the embodiment of the application, a zabbix server component, an alarm script and a Controller application are deployed on a control node or an alarm node of a cloud native platform in a containerization mode. The zabbix server component acquires metrics through a zabbix Agent component applied by an Agent in the first node, judges whether an alarm needs to be sent out or not based on the metrics, transmits the alarm to an alarm script if the alarm needs to be sent out, and sends the alarm to different applications according to the type of the alarm by the alarm script.

In another specific example, as shown in fig. 6 and 7, the monitoring unit analyzes the monitoring index data by using a Judge component of an open-falcon system to determine the type of the physical environment risk; correspondingly, the agent application adopts a falcon-agent component to collect and gather physical environment information of the bare metal server of the container so as to obtain monitoring index data.

In the embodiment of the application, the Judge component, the Alarm component and the Controller application are deployed on a control node or an Alarm node of a cloud native platform in a containerized form. The Judge component acquires metrics through a falcon-Agent component of the Agent application in the first node, judges whether an Alarm needs to be sent out or not based on the metrics, transmits the Alarm to the Alarm component if the Alarm needs to be sent out, and sends the Alarm to different applications according to the type of the Alarm.

And step S103, in response to the physical environment high-risk warning sent by the monitoring warning module, the control application marks the container bare metal server as unavailable.

The monitoring alarm module and the control application (Controller application) are both deployed on a second node of the cloud native platform.

In the embodiment of the application, the physical environment high-risk warning sent by the monitoring alarm module indicates that the surrounding physical environment has seriously threatened the container bare metal server, and at this time, the Controller application marks the container bare metal server as unavailable.

Specifically, as shown in fig. 8, in response to the monitoring alarm module issuing a physical environment high risk warning, the control application (Controller application) accesses the API-Server of the cloud native platform to mark the container bare metal Server as unavailable.

Then, the control application (Controller application) accesses the ETCD through the API-Server of the cloud native platform to acquire all the container groups on the container bare metal Server. Specifically, the Controller application accesses the ETCD on the control node in the cloud native platform through the API-Server, and obtains a container group list which is recorded in the ETCD and is deployed on the container bare metal Server.

The container group list on the bare metal container server includes at least one of a daemon container group (DaemonSet managed container group), a Mirror container group (Mirror Pod) and an application container group. The mirror image container group and the daemon container group are deployed on each node and used for running a core component application and a daemon process of the node where the mirror image container group and the daemon process container group are located, the basic running of the node is maintained, and an Agent application in the application is also deployed in the daemon process container group. The nodes are marked as unavailable, and the deployment of the mirror image container group and the daemon container group in the nodes is not influenced.

Finally, the control application (Controller application) deletes the application container group on the container bare metal server. Specifically, a control application (Controller application) deletes an application container group by controlling a Kubelet component on a container bare metal Server through an API-Server of a cloud native platform.

In the embodiment of the application, the Controller applies filtering to the mirror image container group and the daemon container group in the container group list, controls the Kubelet component in the first node through the API-Server in the control node, and deletes the application container group on the container bare metal Server in sequence according to the container group remaining after the mirror image container group and the daemon container group are filtered in the container group list.

And S104, the cloud native platform schedules the container group in the container bare metal server to a healthy node of the cloud native platform.

Currently, in a cloud native platform, an orchestration engine of the cloud native platform implements scheduling and management of nodes through a node selector (node selector) or a node affinity (node affinity). For example, in a Kubernetes cluster, the Kubernetes platform automatically learns the health status of each node joining the cluster, and automatically marks a corresponding label for each node; after a certain node is not reported for a period of time, the Kubernets platform marks the node with a label and marks the node as a NotReady (unavailable) state; and restarting the container group originally running on the node on the healthy node according to the information synchronized in the ETCD on the Kubernets platform, and immediately notifying the healthy node to delete the container group restarted on other nodes once contacting with the disconnected node. When the Kubernetes scheduling mechanism schedules the container group on the completely damaged node to the healthy node through the existing node scheduling method, the normal functions of the application deployed in the container group are greatly influenced or even cannot be used in the period of restarting the container group on the completely damaged node on the healthy node. Thus, the reliability of cloud-native applications deployed on container bare metal servers cannot be ensured.

In the embodiment of the application, after the Controller application deletes the application container groups on the container bare metal server in sequence, the cloud native platform schedules the application container groups in the container bare metal server to the healthy nodes of the cloud native platform. Therefore, the application can be ensured to run normally all the time, the bare metal server of the container always provides high-performance service, and the influence of the surrounding physical environment is reduced to the minimum.

In some optional embodiments, after the cloud native platform schedules the container group in the container bare metal server to a healthy node of the cloud native platform, the monitoring alarm module determines that the warning is released according to monitoring index data sent by the agent application; and in response to the physical environment danger relieving information sent by the monitoring alarm module, the control application (Controller application) removes the unavailable mark on the container bare metal server.

In the embodiment of the application, a sensor arranged on the container bare metal server collects physical environment information of the container bare metal server, the physical environment information is collected into monitoring index data through proxy application, when a monitoring alarm module determines that warning is removed according to the real-time monitoring index data, physical environment danger removal information is sent out, and an unavailable mark on the container bare metal server is removed through a Controller application on a control node.

For example, after the physical environment around the bare metal container Server returns to normal, the Prometheus system determines that the physical environment around the bare metal container Server returns to normal based on metrics collected and summarized by a sensor arranged on the bare metal container Server, and removes an unavailable mark of the bare metal container Server through an alert manager module, a Controller application and an API-Server, so as to allow the Kubernetes cluster to deploy a new application on the bare metal container Server.

In the embodiment of the application, the application comprises a Controller application and a notification/alarm application; the Controller application is mainly used for marking and/or scheduling the container bare metal server; the notification/alarm application is used to notify different relevant persons according to the degree of danger of the surrounding physical environment of the container bare metal server. For example, when an abnormal condition occurs in the surrounding physical environment, the machine room maintenance personnel is notified, and when the surrounding physical environment has moderate risk or above, the relevant responsible personnel with higher level is notified.

In this embodiment, the container bare metal server may also be associated with an alarm system of a surrounding physical environment, for example, a fire alarm system of a building where the machine room is located, and when a fire alarm occurs in the building where the machine room is located, an application running on the container bare metal server is immediately dispatched to the healthy node. In addition, the system can be associated with a geological disaster early warning center and an earthquake early warning center locally.

In the embodiment of the application, a plurality of container bare metal servers can be deployed in the same machine room, and all the container bare metal servers in the same machine room can be organized into a group. When the surrounding physical environment is detected, the sensor information acquired by the sensors on the container bare metal servers in the same group can be used as a group of metrics, the monitoring alarm module comprehensively analyzes the group of metrics, and even the deployment positions of the container bare metal servers in a machine room can be used as parameters for data analysis of the monitoring alarm module, so that the judgment accuracy of the surrounding physical environment is improved. In making the determination of the surrounding physical environment, container bare metal servers in the entire team are collectively marked as unavailable when it is determined that the surrounding physical environment will pose a serious threat to the container bare metal servers.

Based on the coping method for the physical environment risk of the container bare metal server provided by the embodiment of the application, the cloud native platform can sense the external physical environment of the container bare metal server in real time, and inform related personnel through a preset way when the danger in the external physical environment is about to occur or has occurred so as to deal with the problem in time; when the danger that the bare metal server of the container is possibly seriously damaged in the external physical environment is found, the application deployed on the bare metal server of the container in the dangerous physical environment is dispatched to the healthy node in time, and the application can be ensured to be capable of safely and normally providing service to the outside on the cloud native platform.

Fig. 9 is a schematic structural diagram of a system for dealing with risk of physical environment of a bare metal server container according to some embodiments of the present application; as shown in fig. 9, the system for dealing with the risk of the physical environment of the bare metal server comprises: an acquisition unit 901, a monitoring unit 902, a marking unit 903 and a scheduling unit 904. The collecting unit 901 is configured to collect and gather physical environment information of the bare metal server of the container by proxy application to obtain monitoring index data; wherein the agent application is deployed on the container bare metal server; the monitoring unit 902 is configured to determine the type of the warning according to the monitoring index data sent by the agent application by the monitoring alarm module; the marking unit 903 is configured to mark the container bare metal server as unavailable by a control application (Controller application) in response to a physical environment high-risk warning issued by the monitoring alarm module; wherein the monitoring alarm module and the control application (Controller application) are deployed on a second node of the cloud native platform; the scheduling unit 904 is configured to schedule the container group in the container bare metal server to a healthy node of the cloud native platform.

The system for dealing with the physical environment risk of the bare metal server container provided by the embodiment of the application can realize the steps and the flows of the method for dealing with the physical environment risk of any bare metal server container, and achieve the same technical effects, which are not described in detail herein.

The embodiment of the present application further provides a container bare metal server, which is applied to any container bare metal server physical environment risk coping method, where the container bare metal server is deployed with an agent application, and the container bare metal server further includes: the sensor is arranged on the container bare metal server and used for monitoring physical environment information of the container bare metal server and sending the physical environment information to the agent application, so that the agent application collects the physical environment information into monitoring index data.

When the container bare metal server provided by the embodiment of the application is applied to the method for dealing with the physical environment risk of any container bare metal server, the steps and the flows of the method for dealing with the physical environment risk of any container bare metal server can be realized, the corresponding technical effects are achieved, and the method is not repeated one by one.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

17页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种用于监控系统的数据汇聚装置、方法及服务器

Container bare metal server and method and system for coping with physical environment risks thereof

相关技术

网友询问留言