Industrial AI project development platform

文档序号:189895 发布日期:2021-11-02 浏览:26次 中文

阅读说明:本技术 工业ai项目开发平台 (Industrial AI project development platform ) 是由 刘鹏 李兵洋 邓荣 刘政森 杜雨航 沈文枫 陆唯佳 于 2021-07-19 设计创作,主要内容包括:本申请公开了一种工业AI项目开发平台,涉及互联网技术领域。该工业AI项目开发平台包括物理硬件层、计算资源层、训练框架层、应用服务层、交互界面层;交互界面层位于应用服务层之上,应用服务层位于训练框架层之上,训练框架层位于计算资源层之上,计算资源层位于物理硬件层之上;计算资源层包括分布式计算资源集群和分布式存储集群;训练框架层与计算资源层进行数据交互,训练框架层包括深度学习框架;应用服务层用于提供工业AI项目开发平台的核心功能;交互界面层用于提供集成基础业务功能的用户界面;解决了目前的AI项目开发平台难以满足不同机器学习背景的开发人员的功能需求的问题;达到了令工业AI开发平台适用性更强的效果。(The application discloses industry AI project development platform relates to internet technical field. The industrial AI project development platform comprises a physical hardware layer, a computing resource layer, a training framework layer, an application service layer and an interaction interface layer; the interactive interface layer is positioned on the application service layer, the application service layer is positioned on the training framework layer, the training framework layer is positioned on the computing resource layer, and the computing resource layer is positioned on the physical hardware layer; the computing resource layer comprises a distributed computing resource cluster and a distributed storage cluster; the training framework layer performs data interaction with the computing resource layer, and comprises a deep learning framework; the application service layer is used for providing a core function of an industrial AI project development platform; the interactive interface layer is used for providing a user interface integrating basic service functions; the problem that the conventional AI project development platform is difficult to meet the functional requirements of developers with different machine learning backgrounds is solved; the effect of making the industrial AI development platform have stronger applicability is achieved.)

1. An industrial AI project development platform is characterized by comprising a physical hardware layer, a computing resource layer, a training framework layer, an application service layer and an interaction interface layer;

the interactive interface layer is located above the application service layer, the application service layer is located above the training framework layer, the training framework layer is located above the computing resource layer, and the computing resource layer is located above the physical hardware layer;

the computing resource layer comprises a distributed computing resource cluster and a distributed storage cluster and is used for storing data, exchanging data, executing computing tasks and storing tasks;

the training framework layer performs data interaction with the computing resource layer, and comprises a deep learning framework;

the application service layer is used for providing core functions of an industrial AI project development platform, and the core functions at least comprise model development and model training;

the interaction interface layer is used for providing a user interface integrating basic service functions, and the basic service functions at least comprise project management, development interaction, model management, data set management and user management.

2. The industrial AI project development platform of claim 1, wherein the core functionality further includes auxiliary modeling.

3. The industrial AI project development platform of claim 1 or 2, wherein the core functionality further comprises Git, Wiki, access control gateway.

4. The industrial AI project development platform of any of claims 1 to 3, wherein the application service layer is configured to obtain an API call request from the interaction interface layer, and to call a background service component according to the API call request to implement a core function of the industrial AI project development platform.

5. The industrial AI project development platform of claim 4, wherein when the core functionality includes model development, model training, the background services component includes a model development service, a model training service;

when the core function further comprises auxiliary modeling, the background service component further comprises auxiliary modeling service;

and when the core function further comprises a Git service, a Wiki service and an access control gateway, the background service component further comprises a Git service, a Wiki service and an access control gateway service.

6. The industrial AI project development platform of claim 5, wherein the model development service and the model training service utilize a shared GPU container instance.

7. The industrial AI project development platform of claim 5, wherein the auxiliary modeling service is configured to implement the functionality of hyper-parameter searching and web architecture searching.

8. The industrial AI project development platform of claim 5, wherein the access control gateway service is configured to implement functions of single sign-on, intercepting API requests, and verifying access rights to account resources.

9. The industrial AI project development platform of claim 5, wherein the Git service is configured to implement functions for code sharing and code version management based on Gitlab.

10. The industrial AI project development platform of claim 5, wherein the Wiki service is to implement Wiki-based functionality for project document sharing and project document version management.

11. The industrial AI project development platform of any of claims 1 to 3, wherein the deep learning framework comprises at least Tensorflow, Pythrch, Keras, mxnet.

12. The industrial AI project development platform of any of claims 1 to 3, wherein the training framework layer includes at least one of an SA algorithm package, a Detectron algorithm package, and an Opt algorithm package.

13. The industrial AI project development platform of any of claims 1-3, wherein the computing resources layer is configured to receive computing tasks and storage tasks transmitted by the training framework layer and to execute the computing tasks and the storage tasks using the distributed computing resource cluster.

14. The industrial AI project development platform of any of claims 1, 2, 3, 13, wherein the distributed cluster of computing resources is constructed based on kubernets at the computing resource level.

15. The industrial AI project development platform of any of claims 1, 2, 3, 13, wherein at the computing resource level, the distributed computing resource cluster is constructed based on OpenShift.

16. The industrial AI project development platform of any of claims 1, 2, 3, 13, 14, 15, wherein at the computing resource level, a runtime environment is configured as Docker.

17. The industrial AI project development platform of any of claims 1, 2, 3, 13, 14, 15, wherein at the computing resource level, a runtime environment is configured as a contetainerd.

18. The industrial AI project development platform of any of claims 1, 2, 3, 13, wherein at the computing resource level, the distributed storage cluster employs GlusterFS or NFS.

19. The industrial AI project development platform of any of claims 1, 2, 3, 13, wherein at the computing resource level, a network scheme employs Calico.

Technical Field

The application relates to the technical field of internet, in particular to an industrial AI project development platform.

Background

At present, when an AI (Artificial Intelligence) project development platform is selected by an enterprise, selection is generally carried out between a public cloud platform and a private cloud platform. The public cloud platform needs enterprises to upload a large amount of data developed by AI application, occupies large bandwidth and storage capacity, and has a leakage risk in the aspect of data security. The AI project development platform based on the public cloud mainly comprises: the Aws sagemake, MS azure ML Studio, Baidu AI Studio, Tecent TI-ML, Huawei model Arts, etc. The AI project development platform based on the private cloud is more suitable for the requirements of enterprise data property safety and customization, and the AI project development platform mainly comprises: kubeflow, MS OpenPAI, Inspur AI Station, Sugon SothisAI, Qincloud kubsphere, etc.

Different AI project development platforms adopt different user interfaces for auxiliary development and have different functional characteristics. However, since AI project development involves people in different departments and different knowledge backgrounds, it is difficult for existing AI project development platforms to meet the functional requirements of developers in different machine learning backgrounds.

Kubeflow and Qincloud kubsphere are taken as examples. Kubeflow supports a number of pairs of machine learning, such as: the model training, the hyper-parameter training, the model deployment and the like are used for carrying out module segmentation and are deployed in a containerization mode, so that the high reusability and the high expansibility of each system are improved, a user can use the model training, the hyper-parameter training, the model deployment and the like to carry out different machine learning tasks, and the support of the model training, the hyper-parameter training, the model deployment and the like on the whole flow scene of industrial AI application development is weak. The Qincloud kubsphere is a multi-tenant enterprise-level container platform with multi-stack automatic IT operation and simplified DevOps workflow, provides developer-friendly guide Web UI, can help companies to build platforms with stronger and richer functions, and has the most common functions required by the enterprise kubernets strategy, such as: kubernets resource management, DevOps (CI/CD), application lifecycle management, multi-tenant access control, GPU support, multi-cluster deployment, etc., but lack customization for machine learning application development processes.

Disclosure of Invention

In order to solve the problems in the related art, the application provides an industrial AI project development platform. The technical scheme is as follows:

on one hand, the embodiment of the application provides an industrial AI project development platform, which comprises a physical hardware layer, a computing resource layer, a training framework layer, an application service layer and an interaction interface layer;

the interactive interface layer is positioned on the application service layer, the application service layer is positioned on the training framework layer, the training framework layer is positioned on the computing resource layer, and the computing resource layer is positioned on the physical hardware layer;

the computing resource layer comprises a distributed computing resource cluster and a distributed storage cluster and is used for storing data, exchanging data, executing computing tasks and storing tasks;

the training framework layer performs data interaction with the computing resource layer, and comprises a deep learning framework;

the application service layer is used for providing core functions of an industrial AI project development platform, and the core functions at least comprise model development and model training;

the interactive interface layer is used for providing a user interface integrating basic service functions, and the basic service functions at least comprise project management, development interaction, model management, data set management and user management.

Optionally, the core functionality further comprises auxiliary modeling.

Optionally, the core function further includes Git, Wiki, and access control gateway.

Optionally, the application service layer is configured to obtain an API call request from the interaction interface layer, and call the background service component according to the API call request to implement a core function of the industrial AI project development platform.

Optionally, when the core function includes model development and model training, the background service component includes a model development service and a model training service;

when the core function further comprises auxiliary modeling, the background service component further comprises auxiliary modeling service;

and when the core function further comprises a Git service, a Wiki service and an access control gateway, the background service component further comprises a Git service, a Wiki service and an access control gateway service.

Optionally, the model development service and the model training service use a shared GPU container instance.

Optionally, the auxiliary modeling service is used for implementing functions of hyper-parameter search and network structure search.

Optionally, the access control gateway service is configured to implement functions of single sign-on, intercepting an API request, and verifying an access right of an account resource.

Optionally, the Git service is used to implement the functions of code sharing and code version management based on Gitlab.

Optionally, the Wiki service is used to implement Wiki-based functions for project document sharing and project document version management.

Optionally, the deep learning framework includes at least Tensorflow, Pytorch, Keras, mxnet.

Optionally, the training framework layer includes at least one of an SA algorithm package, a Detectron algorithm package, and an Opt algorithm package.

Optionally, the computing resource layer is configured to receive the computing task and the storage task transmitted by the training framework layer, and execute the computing task and the storage task by using the distributed computing resource cluster.

Optionally, at the computing resource layer, the distributed computing resource cluster is constructed based on kubernets.

Optionally, in the computing resource layer, the distributed computing resource cluster is constructed based on OpenShift.

Optionally, at the computing resource layer, the runtime environment is configured as a Docker.

Optionally, at the computing resource layer, the runtime environment is configured as a continerd.

Optionally, at the computing resource layer, the distributed storage cluster uses GlusterFS or NFS.

Optionally, at the computing resource layer, the network scheme employs Calico.

The technical scheme at least comprises the following advantages:

the industrial AI project development platform provided by the embodiment of the application comprises a physical hardware layer, a computing resource layer, a training framework layer, an application service layer and an interaction interface layer, the architecture of the industrial AI project development platform is designed according to the actual industrial AI project development process, the use requirements of developers with different machine learning backgrounds are met, and the problem that the current AI project development platform is difficult to meet the functional requirements of the developers with different machine learning backgrounds is solved; the effect of making the industrial AI development platform have stronger applicability is achieved.

Drawings

In order to more clearly illustrate the detailed description of the present application or the technical solutions in the prior art, the drawings needed to be used in the detailed description of the present application or the prior art description will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a structural diagram of an industrial AI project development platform provided in an embodiment of the present application;

FIG. 2 is a functional module diagram of an industrial AI project development platform provided by an embodiment of the present application;

fig. 3 is an architecture diagram of an industrial AI project development platform according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the present application will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the description of the present application, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present application. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; the connection can be mechanical connection or electrical connection; the two elements may be directly connected or indirectly connected through an intermediate medium, or may be communicated with each other inside the two elements, or may be wirelessly connected or wired connected. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

In addition, the technical features mentioned in the different embodiments of the present application described below may be combined with each other as long as they do not conflict with each other.

Referring to fig. 1, a block diagram of an industrial AI project development platform provided in an embodiment of the present application is shown, where the industrial AI project development platform includes a physical hardware layer 110, a computing resource layer 120, a training framework layer 130, an application service layer 140, and an interaction interface layer 150.

The interaction interface layer 150 is located above the application services layer 140, the application services layer 140 is located above the training framework layer 130, the training framework layer 130 is located above the computing resource layer 120, and the computing resource layer 120 is located above the physical hardware layer 110.

The physical hardware layer 110 includes several hardware devices. The physical hardware layer 110 is a computer cluster formed by physical machines and/or virtual machines, and is used for providing computing, storage and network capabilities for the industrial AI project development platform.

The physical hardware layer 110 includes hardware devices such as a CPU, GPU, memory, storage, network device, and the like.

The computing resource layer 120 includes distributed computing resource clusters and distributed storage clusters for storing data, exchanging data, performing computing tasks, and storing tasks.

The training framework layer 130 interacts data with the computing resource layer 120.

By the abstract computing resource layer 120, the physical hardware used by the industrial AI project development platform can be made transparent, so that the training framework layer 130 can transmit the computing task to the computing resource layer 120 for computing by the computing resource layer 120, and the training framework layer 130 can transmit the storage task to the computing resource layer 120 for storage by the computing resource layer 120, under the condition that the hardware device specification in the physical hardware layer is unknown.

The training framework layer 130 includes a deep learning framework. The training framework layer 130 includes a plurality of deep learning frameworks, and the training framework layer 130 supports a plurality of computing engines.

Optionally, the training framework layer 130 further includes a personalized and customized algorithm package; the function of the algorithm package customized according to the actual situation confirms.

The application service layer 140 is used to provide core functions of an industrial AI project development platform. The core functions at least comprise model development and model training.

Optionally, the core functionality further comprises auxiliary modeling.

Optionally, the core function further includes git, wiki, access control gateway, and the like.

The interactive interface layer 150 is used to provide a user interface that integrates the underlying business functions. The basic business functions cover the common business logic of industrial AI project development.

The basic business functions include at least project management, development interaction, model management, data set management, user management, and the like.

When developing and managing the AI application, developers at various departments of an enterprise need to complete a series of tasks, and accordingly, collaborative development of the AI project needs to go through the following stages: the method comprises the following steps of project requirement tracking, prototype construction, architecture design, data processing, model development, model training, model verification, pre-release, acceptance test and formal release.

In order to enable the industrial AI project development platform provided by the embodiment of the present application to meet the requirements of developers in different depth learning backgrounds, the functions implemented by the industrial AI project development platform provided by the embodiment of the present application can be divided into the following modules: the data management module 21, the model and algorithm development module 22, the model verification module 23, the model release module 24, the log document module 25, and the authority management module 26, and the functional modules correspond to the flow of AI project development, as shown in fig. 2.

Wherein, the data management module 21 includes the functions of: data import, data annotation and data export; the model and algorithm development module 22 includes the functions of: data preprocessing, a predefined algorithm, a computing framework and a code compiling environment; the model verification module 23 comprises the functions of: managing a test set, mirroring a test environment and visualizing a test result; the model publishing module 24 includes the functionality of: packing a model, signing the model and deploying the model; the journal document module 25 includes functions of: model logs, system monitoring logs and document maintenance; the rights management module 26 includes functions for unified login authentication, access control.

The user management and the authority management are combined, and the visible resources of different types of users can be effectively isolated through the authority management function, so that the use safety of the platform is improved.

After a user logs in the industrial AI project development platform provided by the embodiment of the application, corresponding functions are selected or executed according to the guide tags provided by the interactive interface.

Taking a development flow of a certain AI project as an example, operations performed on the industrial AI project development platform provided in the embodiments of the present application are as follows: document maintenance (creating a project homepage) → data import → data annotation → data preprocessing (writing code or preprocessing data using a deep learning framework provided in a training framework layer) → calling a predefined algorithm → training model → building a test environment → testing model → testing result visualization → model packaging → setting a signature → pre-publishing project → completing project page → publishing project.

In each stage of the AI project development, each developer can log in the industrial AI project development platform provided by the embodiment of the application according to actual task requirements, and complete corresponding tasks by using functions provided by the industrial AI project development platform.

In an alternative embodiment based on the embodiment shown in fig. 1, the Application service layer 140 is configured to obtain an API (Application Programming Interface) call request from the interaction Interface layer 150, and call a background service component according to the API call request to implement a core function of the industrial AI project development platform.

As shown in fig. 3, when the core function includes model development and model training, the background service component includes a model development service and a model training service; when the core function comprises Git, Wiki and access control gateway, the background service component also comprises Git service, Wiki service and access control gateway service.

When the core function includes the auxiliary modeling, the background service component further includes an auxiliary modeling service corresponding to the auxiliary modeling function.

The model development service and the model training service use a shared GPU container instance, and the model development environment requirements are met.

The auxiliary modeling service is used for realizing the functions of hyper-parameter search and network structure search, and at least comprises a Bayesian optimizer, a near-end proximity reinforcement learning optimizer, a differential evolution algorithm optimizer, a particle swarm optimization algorithm optimizer and a tabu search algorithm optimizer.

The access control gateway service is used for realizing the functions of single sign-on, interception of API (application program interface) requests and verification of account resource access authority.

Optionally, the access control Gateway service implements functions of single sign-on, intercepting an API request, and verifying an account resource access right based on the Spring Cloud Gateway and the OpenLdap.

The Git service is used for realizing the functions of code sharing and code version management based on Gitlab.

Wiki services are used to implement Wiki-based functionality for project document sharing and project document version management.

As shown in fig. 3, the deep learning framework in the training framework layer at least comprises tensiflow, Pytorch, Keras, mxnet.

Optionally, the training framework layer further includes a personalized and customized algorithm package. The personalized and customized algorithm comprises at least one of an SA (structured data analysis) algorithm package, a Detectron algorithm package and an Opt (optimized) algorithm package.

Such as: the training framework layer comprises a deep learning framework and an SA algorithm package; or the training frame layer comprises a deep learning frame and a Detectron algorithm package; or, the training frame layer comprises a deep learning frame and an Opt algorithm package; or the training frame layer comprises a deep learning frame, an SA algorithm package and a Detectron algorithm package; or the training framework layer comprises a deep learning framework, an SA algorithm package and an Opt algorithm package; or the training frame layer comprises a deep learning frame, a Detectron algorithm package and an Opt algorithm package; or, the training frame layer comprises a deep learning frame and SA algorithm package, a Detectron algorithm package and an Opt algorithm package.

The SA algorithm package is used for processing the structured data, and comprises the following algorithms: the algorithm is used for automatic feature generation of the time sequence signals, the deep learning network is used for generating time sequence embedding representation, a time sequence signal classification and regression model, a feature selection tool and the like.

The Detectron algorithm package is used for processing video and picture data, and comprises the following algorithms: algorithms for data enhancement, detection networks, semantic segmentation networks, classification networks.

The Opt algorithm is used for processing an optimization problem and comprises the following algorithms: a non-gradient optimization algorithm.

It should be noted that the algorithms included in each algorithm package are existing algorithms and/or algorithms developed autonomously according to actual application requirements, and the embodiments of the present application do not limit this.

In the using process of the industrial AI project development platform provided by the embodiment of the present application, a deep learning framework and/or an algorithm package included in the training framework layer may be added or modified according to actual situations.

Based on a deep learning frame and/or an algorithm package in a training frame layer, the industrial AI project development platform provided by the embodiment of the application can facilitate developers to write model codes.

The computing resource layer 120 is configured to receive the computing tasks and the storage tasks transmitted by the training framework layer 130, and execute the computing tasks and the storage tasks by using the distributed computing resource cluster. The computing resource layer 120 automatically interacts with the physical hardware layer 110 to perform operations related to the hardware.

Optionally, at the computing resource layer 120, a distributed computing resource cluster is built based on kubernets.

Kubernets shown in fig. 3 is merely an exemplary illustration, and the embodiment of the present application is not limited thereto.

Optionally, in the computing resource layer 120, the distributed computing resource cluster is constructed based on OpenShift.

Optionally, at the compute resource layer 120, the runtime environment is configured as a Docker.

The problems of the machine learning task running environment and multiple versions are solved through Docker, and the running task environments in a development environment, a test environment and a production environment cluster are ensured to be consistent. The Docker shown in fig. 3 is merely an exemplary illustration, and the embodiment of the present application is not limited thereto.

Optionally, at the computing resource layer 120, the runtime environment is configured as a continerd.

Both Docker and contianerd support distributed computing resource clusters built on the basis of kubernets or OpenShift.

The industrial AI project development platform provided by the embodiment of the application is designed based on the container mirror image, and the effect that different users customize development tool sets aiming at development scenes of different AI projects or machine learning projects can be realized.

At the compute resource layer 120, the distributed storage cluster employs GlusterFS or NFS.

GlusterFS provides native container storage functionality, providing higher efficiency for the platform container environment at lower cost. GlusterFS is used to store application data, such as log files, unstructured data, and various files generated in big data schemas, such as: sensor data, log files generated by machine learning applications, and various rich text files.

The industrial AI project development platform provided by the embodiment of the application adopts the distributed storage cluster, and can effectively cope with multi-tenant scenes.

At the computing resource layer 120, the network scheme employs Calico.

The industrial AI project development platform provided by the embodiment of the application comprises a physical hardware layer, a computing resource layer, a training framework layer, an application service layer and an interaction interface layer, the architecture of the industrial AI project development platform is designed according to the actual industrial AI project development process, the use requirements of developers with different machine learning backgrounds are met, and the problem that the current AI project development platform is difficult to meet the functional requirements of the developers with different machine learning backgrounds is solved; the effect of making the industrial AI development platform have stronger applicability is achieved.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of this invention are intended to be covered by the scope of the invention as expressed herein.

11页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种接口生成方法、设备和计算机可读存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!