Enhanced storage device storage architecture for machine learning

文档序号：1126536 发布日期：2020-10-02 浏览：8次中文

阅读说明：本技术 用于机器学习的增强型存储设备存储架构 (Enhanced storage device storage architecture for machine learning ) 是由 L·M·弗朗卡-内托 V·杜贝科于 2019-12-03 设计创作，主要内容包括：本发明题为“用于机器学习的增强型存储设备存储架构”。公开了用于使用机器学习来处理数据的存储设备架构的实施方案。在一些实施方案中,存储设备包括独立的I/O内核和神经网络内核。存储设备可创建存储数据的数据流的副本,并且神经网络内核可处理神经网络中的数据流的副本,而I/O内核可对数据流执行读取或写入功能。(The invention provides an enhanced storage device storage architecture for machine learning. Embodiments of a storage device architecture for processing data using machine learning are disclosed. In some embodiments, the storage device includes separate I/O cores and a neural network core. The storage device may create a copy of the data stream that stores the data, and the neural network core may process the copy of the data stream in the neural network, while the I/O core may perform read or write functions on the data stream.)

1. A data storage device configured to perform neural network computations, the device comprising:

a non-volatile memory including a first storage area configured to store data provided by a host system and a second storage area configured to store data related to neural network computations;

a controller configured to:

storing data in and retrieving data from the first storage area in response to at least one data transfer command received from the host system; and

performing neural network computations in the second memory area.

2. The apparatus of claim 1, wherein the second storage area is configured to store a plurality of storage streams, each stream comprising a set of contiguous physical memory storage units of the non-volatile memory, and wherein the controller is further configured to perform neural network computations on the plurality of storage streams.

3. The apparatus of claim 2, wherein the controller is further configured to identify each of the plurality of storage streams by a common identifier.

4. The apparatus of claim 2, wherein the plurality of storage streams comprises a first storage stream and a second storage stream.

5. The apparatus of claim 2, wherein the controller is further configured to store input data for neural network computations in at least one of the plurality of storage streams.

6. The apparatus of claim 2, wherein the plurality of storage streams includes at least one input storage stream and at least one output storage stream, and wherein the controller is further configured to perform neural network computations on data stored in the at least one input storage stream and store results of the neural network computations in the at least one output storage stream.

7. The apparatus of claim 6, wherein the controller is further configured to receive the results of the neural network calculations from the at least one output storage stream and provide the results to the host system.

8. The apparatus of claim 2, wherein the controller comprises a plurality of processor cores configured to process a plurality of memory streams substantially simultaneously.

9. The device of claim 1, wherein the controller comprises an I/O core, and wherein the device comprises another controller of a neural network core.

10. A method for performing neural network computations within a data storage device, the method comprising, by a controller of the data storage device:

receiving, from a host system, a first request to perform analysis on data stored in a storage area of a non-volatile memory of the data storage device;

locking the storage area of the non-volatile memory;

copying the storage area of the non-volatile memory;

unlocking the storage area of the non-volatile memory; and

processing the data is initiated by applying a neural network to the replicated data.

11. The method of claim 10, wherein the neural network comprises a systolic flow engine.

12. The method of claim 10, wherein neural network parameters are stored in the non-volatile memory and the processing of the data via the neural network occurs within the data storage device.

13. The method of claim 10, wherein the method further comprises, by the controller:

receiving, from the host system, a second request to perform an operation on data stored in the storage area of the non-volatile memory; and

storing the second request in a log in response to determining that the storage region is locked until the storage region becomes unlocked.

14. The method of claim 13, wherein the operation comprises a write operation.

15. The method of claim 10, wherein the method further comprises, by the controller:

deleting the copy of the storage region in response to determining that the processing of the data via the neural network has been completed.

16. A data storage device configured to perform neural network computations, the device comprising:

a non-volatile memory including a first storage area configured to store data provided by a host system and a second storage area configured to store data related to neural network computations;

a first controller configured to:

receiving a first request from the host system to perform analysis on data stored in the first storage area;

setting a locking state of the first storage area;

copying the data stored in the first storage area into the second storage area;

setting an unlocking state of the first storage area; and

performing a neural network computation on the copy of the data stored in the second storage area; and

a second controller configured to:

receiving a second request from the host system to perform an operation on data stored in the first storage area;

performing the operation in response to determining that the first storage region is in an unlocked state; and

storing the second request in a log in response to determining that the first storage region is in a locked state; and

performing a neural network computation on the copy of the data stored in the second storage area.

17. The apparatus of claim 16, wherein the log is configured to prevent writing to the first storage portion.

18. The storage device of claim 16, wherein the first controller is further configured to:

receiving a second request to perform analysis on data stored in the first storage area while the first storage area is in a locked state; and

copying the data stored in the first storage area to a third storage area without waiting for the first storage area to be in an unlocked state.

19. The apparatus of claim 16, wherein the first controller is further configured to retrieve the results of the neural network computation from an output storage stream comprising a set of contiguous physical memory storage units of the second storage portion of the non-volatile memory, and provide the results to the host system.

20. The apparatus of claim 16, wherein the first controller comprises a plurality of processing cores configured to process a plurality of memory streams substantially simultaneously.

Technical Field

The present disclosure relates to storage device architectures, and more particularly, to data processing by machine learning inside storage devices.

Background

Modern computing systems often utilize machine learning techniques, such as neural networks. These techniques may run on large data sets and therefore may require a large amount of storage space. However, current storage architectures do not allow scalability of big data analytics. The present disclosure addresses these and other issues.

Drawings

The innovations described in the claims each have several aspects, no single one of which is solely responsible for its desirable attributes. Without limiting the scope of the claims, some salient features of the disclosure will now be briefly described.

FIGS. 1A and 1B are examples of persistent data transferred between a DRAM and a persistent storage according to the prior art.

FIG. 2 is an example of analyzing data by an artificial intelligence model according to the prior art.

Fig. 3 is an example of combining a storage device with a neural network engine, according to some embodiments.

FIG. 4 is an example of communication between a storage device and a host, according to some embodiments.

FIG. 5 illustrates an example of a storage device including persistent space in a stream that stores I/O requests, according to some embodiments.

Fig. 6 illustrates an example of a file divided into several sections and stored in a contiguous sequence, according to some embodiments.

Fig. 7 illustrates an example of an input stream of data processed by a neural network, according to some embodiments.

FIG. 8 illustrates an example of a multi-core storage device according to some embodiments.

Fig. 9 illustrates an example of performing inference operations on a neural network, according to some embodiments.

Fig. 10 illustrates an example of performing inference operations on a neural network by replicating streaming data, according to some embodiments.

FIG. 11 illustrates an example of data replication management, according to some embodiments.

Detailed Description

In the data storage device of the preceding paragraph or any paragraph herein, the second storage area may be configured to store a plurality of storage streams, each stream comprising a set of contiguous physical memory storage units of the non-volatile memory, and the controller may be further configured to perform neural network computations on the plurality of storage streams.

In the data storage device of the preceding paragraph or any paragraph herein, the controller may be further configured to identify each of the plurality of storage streams by a common identifier.

In the data storage device of the preceding paragraph or any paragraph herein, the plurality of storage streams may include a first storage stream and a second storage stream.

In the data storage device of the preceding paragraph or any paragraph herein, the controller may be further configured to store input data for neural network computations in at least one of the plurality of storage streams.

In the data storage device of the preceding paragraph or any paragraph herein, the plurality of storage streams may include at least one input storage stream and at least one output storage stream, and wherein the controller may be further configured to perform neural network computations on data stored in the at least one input storage stream and store results of the neural network computations in the at least one output storage stream.

In the data storage device of the preceding paragraph or any paragraph herein, the controller may be further configured to receive the results of the neural network calculations from the at least one output storage stream and provide the results to the host system.

In the data storage device of the preceding paragraph or any paragraph herein, the controller may comprise a plurality of processor cores configured to process the plurality of memory streams substantially simultaneously.

In the data storage device of the preceding paragraph or any paragraph herein, the controller may comprise an I/O core, and the device may further comprise another controller of the neural network core. The I/O core may be responsible for performing I/O operations on data, while the neural network core may be solely responsible for performing neural network computations.

Various embodiments of the present disclosure provide a method for performing neural network computations within a data storage device, the method comprising, by a controller of the data storage device: receiving, from a host system, a first request to perform analysis on data stored in a storage area of a non-volatile memory of the data storage device; locking the storage area of the non-volatile memory; copying the storage area of the non-volatile memory; unlocking the storage area of the non-volatile memory; and initiating processing of the data by applying a neural network to the replicated data.

In the method of the preceding paragraph or any paragraph herein, the neural network may comprise a systolic flow engine.

In the method of the preceding paragraph or any paragraph herein, neural network parameters may be stored in the non-volatile memory and the processing of the data via the neural network may be performed within the data storage device.

The method of the preceding paragraph or any paragraph herein, further comprising, by the processor: receiving a second request from the host system to perform an operation on data stored in a storage area of the non-volatile memory; and storing the second request in a log in response to determining that the storage region is locked until the storage region becomes unlocked.

In the method of the preceding paragraph or any paragraph herein, the operation may comprise a write operation.

The method of the preceding paragraph or any paragraph herein, further comprising, by the processor: deleting the copy of the storage region in response to determining that the processing of the data via the neural network has been completed.

Various embodiments of the present disclosure provide a data storage device configured to perform neural network computations, the device comprising: a non-volatile memory including a first storage area configured to store data provided by a host system and a second storage area configured to store data related to neural network computations; a first controller configured to: receiving a first request from the host system to perform analysis on data stored in the first storage area; setting a locking state of the first storage area; copying the data stored in the first storage area into the second storage area; setting an unlocking state of the first storage area; and performing a neural network computation on the copy of the data stored in the second storage area; and a second controller configured to: receiving a second request from the host system to perform an operation on data stored in the first storage area; performing the operation in response to determining that the first storage region is in an unlocked state; and storing the second request in a log in response to determining that the first storage region is in a locked state; and performing a neural network computation on the copy of the data stored in the second storage area.

In the data storage device of the preceding paragraph or any paragraph herein, the log may prevent writing to the first storage portion.

In the data storage device of the preceding paragraph or any paragraph herein, the first controller may be further configured to: while the first storage area is in a locked state, receiving a second request to perform analysis on data stored in the first storage area, and copying the data stored in the first storage area to a third storage area without waiting for the first storage area to be in an unlocked state.

In the data storage device of the preceding paragraph or any paragraph herein, the first controller may be further configured to retrieve the results of the neural network computations from an output storage stream comprising a set of contiguous physical memory storage units of the second storage portion of the non-volatile memory, and provide the results to the host system.

In the data storage device of the preceding paragraph or any paragraph herein, the controller may comprise a plurality of processing cores configured to process the plurality of memory streams substantially simultaneously.

SUMMARY

Conventional memory architectures, such as those found in non-volatile memory (NVM), Magnetic Random Access Memory (MRAM), resistive random access memory (ReRAM), Nano Random Access Memory (NRAM), etc., may have low latency characteristics, providing an opportunity to significantly improve computer system performance. However, these conventional memory architectures do not make efficient use of non-volatile memory. Conventional memory architectures suffer from a serious drawback, particularly if some data is not pre-accessed into the page cache, then persistent data will be transferred from persistent storage to Dynamic Random Access Memory (DRAM) while some data is being processed.

FIGS. 1A and 1B are examples 100 and 150 of persistent data transferred between a DRAM and a persistent storage. The host 102 may include a CPU 104 and DRAM 106. For each data that must be processed, the interface circuitry of the DRAM 106 communicates with the interface circuitry of a persistent storage device, such as a Solid State Drive (SSD)108A or a hybrid SSD 108B. SSD 108A may include NAND flash memory 110A. Hybrid SSD 108B may include NAND flash memory 110A and non-volatile memory (NVM) 110B.

FIG. 2 is an example 200 of analyzing data through an artificial intelligence model. In step 202, the host may request an analysis of the data. Data may be input into the artificial intelligence model 204, data 206 may be processed via the artificial intelligence model, and data 208 may be output. The user 210 may then receive the output data. In the prior art, the memory device typically waits to receive output data and may waste time 212 and resources that would otherwise be available to perform other operations.

Furthermore, current memory chip architectures do not allow scalability of big data analytics. With such an architecture, large amounts of data would have to be transferred to and from the DRAM and persistent storage devices. Therefore, merely increasing the number of cores for data processing does not solve the problems described herein. For example, the storage device may have to copy data to the host side, and the host side may have to process the data. Then, one set of data needs to be copied in the DRAM, the CPU will process the set of data, and then the next set of data will be copied again for processing. This can create a significant bottleneck to performance and cannot be scaled for large data processing. Therefore, data processing requires a lot of time and resources. Furthermore, this would result in a large amount of overhead in the software stack. Further, for individual CPU cores, each CPU may be dedicated to a subset of data, such as modifying a subset of data, resulting in inconsistent data states across CPUs. In addition, increasing the size of the DRAM also brings about inefficiencies such as an increase in power consumption. Furthermore, the CPU may not be able to address the DRAM at a particular size, and thus the DRAM is not scalable.

Generally, some embodiments of the systems and methods described herein improve memory chip architecture by processing data internal to a memory device. Fig. 3 is an example 300 of combining a storage device with a neural network engine to create an improved or enhanced (sometimes referred to as "smart") storage device, according to some embodiments. The legacy storage device 302A may store persistent data 304 and the interface of the storage device 302A may send the data to the host for processing. Some embodiments of the described systems and methods relate to smart storage device 302B. Smart storage device 302B may store persistent data 304 and execute a neural network, such as systolic flow engine 306, within smart storage device 302B. The systolic flow engine is described in more detail in a patent application entitled "systolic neural network engine capable of forward propagation" (U.S. patent application 15/981,624 filed on 2018, 5, 16) and a patent application entitled "reconfigurable systolic neural network engine" (U.S. patent application No.16/233,968 filed on 2018, 12, 27), the disclosures of which are hereby incorporated by reference in their entirety.

Thus, in some embodiments, the memory chip architecture may reduce or eliminate bottleneck-based transfer data between the memory device and the DRAM (or another type of memory). Advantageously, data processing on the storage device may be scalable and capable of handling large amounts of data.

In the embodiment of FIG. 3, there may be a conflict between the I/O core (or controller or processor) and the neural network core (or controller or processor), such as where the neural network requires a significant amount of time for data processing, but the host system expects fast I/O operations (such as read or write operations) from the storage device. FIG. 4 is an example 400 of communication between a storage device and a host, according to some embodiments. The smart storage device 402 may include a neural network such as a systolic flow engine 404 and an I/O core 408 in communication with a persistent memory 406. The systolic flow engine 404 may retrieve data from the persistent storage 406 and begin processing the data, such as by a perturbation operation. However, while processing data, the host system may request access to the same data. The I/O core 408 may wait for the shrink flow engine 404 to complete data processing before retrieving and/or writing the same data requested by the host system. Thus, neural network processing time can significantly reduce the performance of I/O operations.

Data streams stored in persistent spaces

FIG. 5 illustrates an example 500 of a storage device 502 that includes persistent space 504 to store I/O requests in a stream, according to some embodiments. The storage device 502 may store I/O requests in streams (such as stream #0506A, stream # 1506B,. stream # N506N) within the persistent space 504. The persistent space may include one or more persistent memories, such as non-volatile solid-state memory, magnetic memory, or the like.

In some embodiments, storage device 502 may receive a request to process a particular stream. In some embodiments, persistent space 504 may persistently store data. Data may be retained by the power-off state of the storage device 502. Persistent space 504 may provide a data stream that may be used as an analog of a file.

In some implementations, the contiguous space corresponding to a flow may store data in the persistent space 504 that may be distinguished based on an identifier corresponding to the flow. Thus, the flow may be advantageous for machine learning processes stored within the storage device, such as neural networks, as the machine learning process may apply any requests stored in the flow for data processing. The machine learning process may identify an identifier of the flow, offset inside the flow, and the machine learning process may process data inside the flow. Streaming may be preferable to storing objects because objects may include metadata that is not needed for neural network processing. Typically, the objects include object-based storage, resulting in significant limitations on the applicability of neural networks. For example, some neural networks may not be configured to exclude metadata from the object data, or receive data related to the metadata as input, thereby rendering the neural network inoperable. In some embodiments, the stream may store data without metadata and/or store only relevant data for the neural network. The neural network may receive relevant data from a byte stream of contiguous space that corresponds to a stream in which data may be stored in persistent space 504. Advantageously, such neural networks can simplify complexity by not having to be trained to distinguish metadata from related data. In addition, the objects can be stored continuously or discontinuously, and the stream method can realize continuous storage.

Neural networks can effectively implement dedicated algorithms for data processing. An artificial neural network (or connected system or machine learning model) may learn to perform certain tasks based on training data. Moreover, such training may be performed without task-specific programming. For example, the neural network may learn to identify images containing cats by analyzing training data for example images that have been manually labeled as "cats" or "no cats". The neural network may adjust its weights in the nodes to identify cats in other images.

The neural network engine used by the disclosed embodiments of the invention may be configured as any type of neural network. The neural network engine may define the neural network based on one or more factors, including (1) the number of nodes in a layer, (2) the number of hidden layers, (3) the type of activation function, and/or (4) the weight matrix for each connection between layer nodes. In some embodiments, the neural network may be defined based on a function, and the neural network engine may retrieve a predefined neural network corresponding to the desired function.

Stream stored as a continuous sequence

The file system may allocate some physical sectors of memory to store files. However, storage devices may be segmented and files may be divided into pieces, stored in different areas that are contiguous spaces accessible via logical block addresses ("LBAs"), and may not be stored contiguously. In this case, to read and write to the storage device, the host will use multiple logical block addresses to store and/or retrieve data.

In some embodiments, the data is stored as a stream in an adjacent area of the smart storage device. FIG. 6 illustrates an example 600 of a file divided into segments and stored by a file system in a contiguous sequence of physical memory units, according to some embodiments. Extent (or stream) # 1606A, extent # 2606B,. section # N606N (collectively referred to herein as extent 606) of file 604 may be identified by the host's file system as a starting logical block address such as LBA #1, LBA #2, LBA #3, and the length of each extent (or common length in the case of extents having the same length). The file system may update and/or access the file content via logical block addresses.

In some embodiments, the host or host side 602 may request an inference operation of a neural network (such as a systolic flow engine) based on the LBA and length of the flow. The host side 602 may send the starting LBA number and the length of each zone that the host side 602 wishes to process in the neural network. In other embodiments, the host side 602 may send the zone ID. Storage device 608 may receive the number and length of LBAs per zone, may determine and/or configure the number of neural networks to process the data, and process the flow through the neural networks on the storage device 608 side. Furthermore, a stream-based approach to a contiguous sequence of physical memory cells may provide an efficient way to process data in a neural network on the storage device 608. In addition, the stream-based approach can support different file sizes and files that change over time in-memory neural network data processing.

In some embodiments, the storage device 608 may implement a locking function that causes data in the stream for neural network operations to be consistent in representation. Advantageously, the neural network core can process data of a fixed size even though files may be of various lengths and the size may change over time. As shown in storage 608, the file may not be stored in a contiguous sequence of physical memory units, but may be stored in a set of one or more storage device storage streams, such as stream #1610A, stream # 2610B, stream # N610N (collectively referred to herein as storage device stream 610), distributed at different locations in the storage device. The streams may be identified by a common identification, which may be unique. Each storage device stream may include a contiguous sequence of physical memory storage units (such as cells, pages, sectors, etc.).

The storage device 608 may process the neural network computations on the file 604 using the segments. Advantageously, the segment-based approach allows the storage device 608 to resolve conflicts between a neural network core and another core (e.g., a core that handles I/O operations), because of the advantage of locking only relevant files of the neural network, rather than the entire memory store. In addition, because each stream is continuous, the neural network core may process multiple streams substantially simultaneously or substantially in parallel. This may improve the efficiency of neural network processing.

Fig. 7 illustrates an example 700 of an input stream of data processed by a neural network, according to some embodiments. In some embodiments, the storage device may configure the neural network, such as defining the type of neural network used to process the data. The storage device may identify the appropriate input stream 702 based on the flow identifier. For example, the input stream 702 may include an image 702 sent into a neural network (such as the systolic stream engine 704) that is trained to recognize people in the image. The contracted flow engine 704 may output an output flow 706 that provides an indication as to whether a person is identified in the images of the input flow 702.

Neural network and I/O kernel architecture

FIG. 8 illustrates an example 800 of a multi-core smart storage device 802 according to some embodiments. In some embodiments, the storage device 802 may include a neural network core 806, such as a systolic flow engine ("SFE") core and an I/O core 808. The neural network core 806 may receive inference (and/or training) commands from a host system through the interface 804 and/or process data through one or more neural networks. The I/O kernel 808 can receive I/O commands from a host system through the interface 804.

In some embodiments, the neural network kernel 806 may copy the data 812 into the knowledge space 810 of the persistent space in order to perform the neural network computations. Advantageously, creating a copy may free the I/O core 808 to execute I/O commands on the actual data 812 while the neural network cores 806 are performing neural network computations in parallel or substantially in parallel. This may be useful if the neural network computations require a longer period of time and/or if I/O commands are received from the host while performing the neural network processing. Furthermore, creating the copies allows all neural network computations to be performed on a single copy of the data, rather than on a copy that may be modified by the I/O core 808. Furthermore, in the event of an error (e.g., real data 812 being corrupted via data processing in a neural network), replication can protect real data 812. In some embodiments, storage device 802 may store the relevant streams for data processing into knowledge 810 data space. In some embodiments, the output of the neural network may be stored in knowledge space 810 and/or data space 812 of the persistent space. Data may be stored at least in knowledge space 810 using streams as described herein.

In some embodiments, the neural network core 806 may configure one neural network to process data at a given time. In some embodiments, the neural network core 806 may configure multiple neural networks to process the same set of data at the same or substantially the same time. For example, the neural network core 806 may configure the neural network to identify a person in the image and another neural network that may identify a background location of the image. The images may be input into two neural networks for parallel or substantially parallel processing.

In some cases, the cores 806 and 808 may be implemented by different controllers or processors. For example, I/O core 808 may be a different core than neural network core 806. The I/O kernel may have dedicated persistent data 812 space for storing persistent data. The neural network core 806 may be a stand-alone core, such as an ASIC, CPU, FPGA, with dedicated persistent space for knowledge 810 data to perform training, inference, and data processing.

In some embodiments, the I/O core 808 may communicate with the host without knowledge of the underlying data processing via the neural network. For example, the host may request that the I/O kernel 808 perform a particular operation on a set of data, such as a read/write request. The particular operation may be an inference operation of a neural network that may require a significant amount of processing resources. The I/O kernel 808 may then store the data to persistent space (as described herein). In some embodiments, the neural network core 806 may receive input data from a host, configure the neural network to perform one or more inference operations, process the data through the neural network, and send output data to the host. Further, the neural network core 806 may perform training and/or inference operations of the neural network in parallel or substantially in parallel with other operations performed by the I/O core 808.

In some embodiments, when input data is pushed into the neural network, the storage device may lock the corresponding input data. The I/O core does not need to wait for the neural network core to complete the inference operation because it needs to lock the original data for the time period in which the data is copied from the data 812 to the knowledge area 810. The host may access data without modification, such as a read operation.

In some embodiments, the neural network kernel may push data into the neural network. Circuitry between layers may include one or more memory cells to store the output of a previous layer as an input to a next layer.

In some embodiments, data may be propagated back through the layers of persistent space for training purposes. For example, the training data may be propagated forward through the neural network. Depending on the output of the neural network, the neural network core may back-propagate through each layer by increasing the weight of the nodes contributing the desired output, and vice versa.

Performing inference operations without copying streaming data

Fig. 9 illustrates an example 900 of performing inference operations on a neural network without replicating streaming data, according to some embodiments. The illustrated process may be performed by any of the smart storage devices described herein, such as, for example, an I/O core and a neural network core. In step 1, the host may send a request to smart storage device 902 to write a data stream to memory. I/O core 912 may receive the request and, in step 2, store the flow in persistent 906 based on flow identifier 910.

In step 3, the storage device 902 may receive parameters for defining a neural network configuration. In some embodiments, the neural network configuration is predefined, such as predefined during the manufacturing process. In other embodiments, the neural network may be configured and/or reconfigured based on certain parameters, such as the number of nodes, the number of layers, the set of weights for the nodes, the type of function, and so forth. In step 4, the storage device 902 may store the configuration of the neural network in the neural network configuration portion 908 of the persistent space 906. As shown and described herein, persistent space 906 may include one or more storage streams.

In step 5, the storage 902 may receive a request for an inference operation. A neural network core, such as the systolic flow engine 904, may process one or more data streams at step 6 and return one or more results of the processing at step 7.

In some cases, during neural network operation, I/O core 912 may receive a request to update an existing flow at step 1A. If I/O core 912 updates the same flow that the neural network is processing, then the following problems may exist: neural networks do not process static data sets, but rather data that is being altered during processing. This can lead to errors in the inputs to the neural network and/or changes in the outputs of the neural network. For example, when the neural network begins processing an image of a car stored in a stream, and the I/O core receives a request to switch the same image of the car with an image of a building, the neural network may process neither the car nor the building. Instead, the neural network may be processing a mix of car and building images, resulting in erroneous results.

Performing inference operations with replication of streaming data

These issues may be addressed as shown in fig. 10, which illustrates an example 1000 of performing inference operations on a neural network by replicating streaming data, according to some embodiments. The illustrated process may be performed by any of the smart storage devices described herein, such as, for example, an I/O core and a neural network core. In step 1, a neural network core 1002, such as a systolic flow engine core, may communicate with a subsystem to implement a lock request. Neural network core 1002 and I/O core 1006 may interact with subsystems to access data in persistent space.

In step 2, the I/O kernel can create a view (or copy) of the related data stream by sending a copy of the data 1014 to the view 1008, all within the persistent space 1004. In step 3, the I/O core 1006 may indicate to the neural network core 1002 that the data is now unlocked. In step 4, the neural network core 1002 may process the data stored in the view 1008 through the neural network, and may store the results in the result space 1010 in step 5.

In some embodiments, the streams copied from the data 1014 to the view 1008 may be individually locked/unlocked during copying. In some embodiments, all flows required by the neural network may be locked, copied, and unlocked simultaneously. In some embodiments, the entire data store 1014 may be locked, the associated stream copied, and the entire data 1014 unlocked.

In step 2A, I/O operations and/or other operations received while the data is locked may be stored in the log 1012. After the data in step 3 is unlocked, the request stored in log 1012 may be replayed and executed on data 1014 in step 4A. Advantageously, the neural network can process the data in view 1008 without affecting the I/O operations that need to be performed on the data 1014 and/or without pausing to wait for the I/O operations to complete.

Fig. 11 illustrates an example 1100 of data replication management according to some embodiments. In step 1, a data storage device 1102 (which may be a smart storage device of any disclosed embodiment) may receive a request from a host device to train or interfere with operations. In step 2, the streaming data 1108 in the persistent space may be copied to the view 1106 portion of the persistent space. In step 3, the data storage device may process the separated view data in the neural network. After the neural network completes processing, the data storage device may store the results (e.g., knowledge) into the knowledge portion 1104 of the persistent space in step 4, and destroy the copy of the data in step 5. In step 6, the data storage device may transmit the results (e.g., knowledge) of the neural network to the host device.

Other variants

Any of the embodiments disclosed herein may be used with any of the concepts disclosed in co-pending U.S. patent application entitled "enhanced storage device architecture for neural network processing" (attorney docket No. WDA-3974-US-silsp.381a), which was filed on the same day as the present patent application and is incorporated by reference herein in its entirety.

One skilled in the art will appreciate that in some embodiments, additional system components may be utilized and that disclosed system components may be combined or omitted. Although some embodiments describe video data transmission, the disclosed systems and methods may be used to transmit any type of data. Further, although some implementations utilize erasure coding, any suitable error correction scheme may be used. The actual steps taken in the disclosed processes may differ from those shown in the figures. According to an embodiment, some of the above steps may be removed and other steps may be added. Accordingly, the scope of the disclosure is intended to be limited only by reference to the appended claims.

While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of protection. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the protection. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions. For example, the systems and methods disclosed herein may be applied to hard drives, hybrid hard drives, and the like. Further, other forms of storage may additionally or alternatively be used (such as DRAM or SRAM, battery backed up volatile DRAM or SRAM devices, EPROM, EEPROM memory, etc.). As another example, the various components shown in the figures may be implemented as software and/or firmware on a processor, ASIC/FPGA, or dedicated hardware. Moreover, the features and attributes of the specific embodiments disclosed above may be combined in different ways to form additional embodiments, all of which fall within the scope of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Furthermore, references herein to "a method" or "an embodiment" are not intended to mean the same method or the same embodiment, unless the context clearly indicates otherwise.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the various embodiments of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the disclosure. The exemplary embodiments were chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments and with various modifications as are suited to the particular use contemplated.

While the present disclosure provides certain preferred embodiments and applications, other embodiments that are apparent to those of ordinary skill in the art, including embodiments that do not provide all of the features and advantages described herein, are also within the scope of the present disclosure. Accordingly, the scope of the disclosure is intended to be limited only by reference to the appended claims.

22页详细技术资料下载

Enhanced storage device storage architecture for machine learning

相关技术

网友询问留言