Artificial intelligence storage device and storage system including the same

文档序号：570118 发布日期：2021-05-18 浏览：17次中文

阅读说明：本技术 人工智能存储设备和包括该存储设备的存储系统 (Artificial intelligence storage device and storage system including the same ) 是由张宰薰孙弘乐薛昶圭邵惠晶吴和锡尹弼相林真洙于 2020-11-05 设计创作，主要内容包括：提供人工智能存储设备和包括该存储设备的存储系统。所述存储系统包括主机设备和存储设备。所述主机设备提供用于数据存储功能的第一输入数据和用于人工智能(AI)功能的第二输入数据。所述存储设备存储来自所述主机设备的所述第一输入数据,并且基于所述第二输入数据执行AI计算以生成计算结果数据。所述存储设备包括第一处理器、第一非易失性存储器、第二处理器和第二非易失性存储器。所述第一处理器控制所述存储设备的操作。所述第一非易失性存储器存储所述第一输入数据。所述第二处理器执行所述AI计算,并且与所述第一处理器区分开。所述第二非易失性存储器存储与所述AI计算相关联的权重数据,并且与所述第一非易失性存储器区分开。(An artificial intelligence storage device and a storage system including the same are provided. The storage system includes a host device and a storage device. The host device provides first input data for a data storage function and second input data for an Artificial Intelligence (AI) function. The storage device stores the first input data from the host device, and performs an AI calculation based on the second input data to generate calculation result data. The storage device includes a first processor, a first non-volatile memory, a second processor, and a second non-volatile memory. The first processor controls operation of the storage device. The first non-volatile memory stores the first input data. The second processor performs the AI calculation and is distinguished from the first processor. The second non-volatile memory stores weight data associated with the AI calculation and is distinct from the first non-volatile memory.)

1. A storage system, comprising:

a host device configured to provide first input data and second input data; and

a storage device configured to store the first input data and to generate calculation result data by performing artificial intelligence calculations based on the second input data, the storage device comprising:

a first processor configured to control operation of the storage device,

a first non-volatile memory configured to store the first input data,

a second processor configured to perform the artificial intelligence computation, the second processor being different from the first processor, an

A second non-volatile memory configured to store weight data associated with the artificial intelligence calculation, the second non-volatile memory being different from the first non-volatile memory.

2. The storage system of claim 1, wherein the second processor is configured to, in response to the second input data being received,

loading the weight data stored in the second non-volatile memory,

performing the artificial intelligence calculation based on the second input data and the weight data to generate the calculation result data, an

Transmitting the calculation result data to the host device.

3. The storage system of claim 2, wherein the weight data is used by the storage device and is not sent to the host device.

4. The storage system of claim 2,

the weight data represents a plurality of weight parameters that are pre-training parameters and that are included in a plurality of layers of a neural network system, and

the calculation result data represents a result of a multiply-accumulate operation performed by the neural network system.

5. The storage system of claim 4, wherein the second processor is a neural processing unit configured to drive the neural network system.

6. The storage system of claim 4, wherein the neural network system comprises at least one of an artificial neural network system, a convolutional neural network system, a cyclic neural network, and a deep neural network system.

7. The storage system of claim 1, wherein the storage device is configured to:

storing said first input data in a first mode of operation, an

Performing the artificial intelligence calculation based on the second input data and the weight data in a second mode of operation.

8. The storage system of claim 7, wherein the second mode of operation is enabled based on a mode setting signal provided to the storage device from the host device.

9. The memory system according to claim 8, wherein the host device includes a plurality of first pins configured to exchange the first input data, the second input data, and the calculation result data with the memory device, and a second pin configured to exchange the mode setting signal with the memory device, and

the memory device further includes a plurality of third pins configured to exchange the first input data, the second input data, and the calculation result data with the host device, and a fourth pin configured to exchange the mode setting signal with the host device.

10. The storage system of claim 7,

a first address of the storage device and a first storage space in the storage device corresponding to the first address are designated as special function register areas, and

the second mode of operation is enabled in response to providing the first address and first setting data from the host device.

11. The storage system of claim 7, wherein the first processor and the first non-volatile memory are enabled in the first mode of operation and switched to be in an idle state in the second mode of operation.

12. The storage system of claim 11, wherein the second processor and the second non-volatile memory are in the idle state in the first mode of operation and are enabled in the second mode of operation.

13. The storage system of claim 12, further comprising:

a trigger unit configured to enable the second processor and the second non-volatile memory in response to the operating mode of the storage device changing from the first operating mode to the second operating mode.

14. The storage system of claim 11,

the first processor and the first non-volatile memory are included in a first clock/power domain, and

the second processor and the second non-volatile memory are included in a second clock/power domain.

15. The storage system of claim 7, wherein the host device is configured to collect a plurality of the second input data in the second mode of operation to transmit the collected second input data to the storage device.

16. The storage system of claim 15, wherein an interface between the host device and the storage device is configured to enter a sleep state when the collected second input data is not transmitted in the second mode of operation.

17. A storage device, comprising:

a first processor configured to control operation of the storage device;

a first non-volatile memory configured to store first input data;

a second non-volatile memory configured to store weight data associated with artificial intelligence calculations; and

a second processor configured to load the weight data stored in the second non-volatile memory, generate calculation result data by performing the artificial intelligence calculation based on second input data and the weight data, and output the calculation result data.

18. The memory device of claim 17, wherein the first processor and the second processor are formed as one chip.

19. The memory device of claim 17, wherein the first processor and the second processor are formed as two separate chips.

20. A storage device, comprising:

a first clock/power domain comprising a first processor and a first non-volatile memory, the first non-volatile memory configured to store first input data in a first mode of operation and the first processor configured to access the first non-volatile memory;

a second clock/power domain comprising a second processor and a second non-volatile memory, the second non-volatile memory configured to store weight data associated with artificial intelligence calculations, the second non-volatile memory being different from the first non-volatile memory, and the second processor configured to access the second non-volatile memory and perform the artificial intelligence calculations based on second input data in a second mode of operation, the second processor being different from the first processor; and

a third clock/power domain comprising a trigger unit, the third clock/power domain different from the first clock/power domain and the second clock/power domain, the trigger unit configured to enable the first mode of operation and the second mode of operation, the trigger unit configured to: enabling the first processor and the first non-volatile memory to store the first input data and to put the second processor and the second non-volatile memory in an idle state in the first operation mode, and enabling the second processor to load the weight data stored in the second non-volatile memory to perform the artificial intelligence calculation based on the second input data and the weight data and to output artificial intelligence calculation result data in the second operation mode.

Technical Field

Example embodiments relate generally to semiconductor integrated circuits, and more particularly, to an Artificial Intelligence (AI) memory device and a memory system including the same.

Background

The storage system includes a host device and a storage device. The memory device may be a memory system including a memory controller and a memory device, or a memory system (memory system) including only a memory device. In a storage system (storage system), a host device and a storage device are connected to each other via various interface standards such as: for example, universal flash memory (UFS), Serial Advanced Technology Attachment (SATA), Small Computer System Interface (SCSI), serial attached SCSI (sas), embedded multimedia card (eMMC), and the like.

In computer science, Artificial Intelligence (AI), sometimes referred to as machine intelligence, is the intelligence exposed by a machine, as compared to natural intelligence, such as that presented by a human being. Colloquially, the term "AI" is commonly used to describe a machine (e.g., a computer) that mimics "cognitive" functions (e.g., "learning" and "problem solving") related to human thinking. For example, AI can be implemented based on machine learning, neural networks, Artificial Neural Networks (ANNs), and the like. The ANN is obtained by engineering a model of the cellular structure of the human brain in which the pattern recognition process is performed. An ANN refers to a software and/or hardware-based computational model designed to mimic biological computational capabilities by applying a number of artificial neurons interconnected by connecting wires. The human brain is composed of neurons, which are the basic units of nerves, and encrypts or decrypts information according to different types of dense connections between these neurons. The artificial neurons in ANN are obtained by simplifying the function of biological neurons. ANN performs cognitive or learning processes by interconnecting artificial neurons with connection strength. Recently, AI and/or ANN based data processing has been investigated.

Disclosure of Invention

At least one example embodiment of the present disclosure provides a memory system including an Artificial Intelligence (AI) memory device capable of improving or enhancing operation efficiency and reducing power consumption.

At least one example embodiment of the present disclosure provides an AI memory device capable of improving or enhancing operation efficiency and reducing power consumption.

According to an example embodiment, a storage system includes a host device and a storage device. The host device provides first input data and second input data. The storage device is configured to store the first input data and perform an AI calculation based on the second input data. The storage device includes a first processor, a first non-volatile memory, a second processor, and a second non-volatile memory. The first processor controls operation of the storage device. The first non-volatile memory stores the first input data. The second processor performs the AI calculation and is different from the first processor. The second non-volatile memory stores weight data associated with the AI calculation and is different from the first non-volatile memory.

According to an example embodiment, a storage device includes a first processor, a first non-volatile memory, a second non-volatile memory, and a second processor. The first processor is configured to control operation of the storage device. The first non-volatile memory is configured to store first input data for a data storage function. The second non-volatile memory is configured to store weight data associated with Artificial Intelligence (AI) calculations and is different from the first non-volatile memory. The second processor is configured to perform an AI function and is different from the first processor. The second processor is configured to load the weight data stored in the second nonvolatile memory, perform the AI calculation based on second input data and the weight data, and output calculation result data.

According to an example embodiment, a storage system includes: a first clock/power domain including a first processor and a first non-volatile memory, a second clock/power domain including a second processor and a second non-volatile memory, and a third clock/power domain including a trigger unit. The first non-volatile memory is configured to store first input data in a first mode of operation. The second non-volatile memory is configured to store weight data associated with Artificial Intelligence (AI) calculations, the second non-volatile memory being different from the first non-volatile memory, and the second processor is configured to perform the AI calculations based on second input data in a second mode of operation, the second processor being different from the first processor. The third clock/power domain includes a trigger unit, the third clock/power domain being different from the first clock/power domain and the second clock/power domain. The trigger unit is configured to enable the first operation mode and the second operation mode. The trigger unit is configured to: enabling the first processor and the first nonvolatile memory to store the first input data and the second processor and the second nonvolatile memory in an idle state in the first operation mode, and enabling the second processor to load the weight data stored in the second nonvolatile memory to perform the AI calculation based on the second input data and the weight data and output calculation result data in the second operation mode.

The AI storage device and the storage system according to example embodiments may further include a second processor that performs an AI function and an AI calculation. The second processor may be independent of or different from the first processor controlling the operation of the storage device. The AI function of the storage device may be performed independently of the control of the host device, and may be performed by the storage device itself. The storage device may receive only the second input data that is a target of the AI calculation, and may output only calculation result data that is a result of the AI calculation. Thus, data traffic between the host device and the storage device may be reduced. In addition, the first processor and the second processor may be independent of each other, and the nonvolatile memories accessed by the first processor and the second processor may also be independent of each other, thereby reducing power consumption.

Drawings

The illustrative, non-limiting example embodiments will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings.

FIG. 1 is a block diagram illustrating a storage device and a storage system including the storage device, according to some example embodiments.

FIG. 2 is a block diagram illustrating an example of a storage controller included in a storage device, according to some example embodiments.

Fig. 3 is a block diagram illustrating an example of a non-volatile memory included in a storage device according to some example embodiments.

Fig. 4, 5A, 5B, and 5C are diagrams for describing an operation of the memory system of fig. 1.

Fig. 6A, 6B, and 6C are diagrams for describing examples of network structures driven by an AI function implemented in a storage device according to some example embodiments.

Fig. 7, 8A, and 8B are diagrams for describing an operation of the memory system of fig. 1.

Fig. 9 and 10 are diagrams for describing an operation of switching an operation mode in a memory system according to some example embodiments.

Fig. 11A, 11B, and 11C are diagrams for describing an operation of transferring data in a storage system according to some example embodiments.

FIG. 12 is a block diagram illustrating a storage device and a storage system including the storage device, according to some example embodiments.

FIG. 13 is a flow chart illustrating a method of operating a storage device according to some example embodiments.

Fig. 14 is a block diagram illustrating an electronic system according to some example embodiments.

Detailed Description

Various example embodiments will be described more fully with reference to the accompanying drawings, in which embodiments are shown. This disclosure may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Like reference numerals refer to like elements throughout the application.

FIG. 1 is a block diagram illustrating a storage device and a storage system including the storage device, according to some example embodiments.

Referring to fig. 1, a storage system 100 includes a host device 200 and a storage device 300.

The host device 200 is configured to control the overall operation of the storage system 100. HOST device 200 may include an external interface (EXT I/F)210, a HOST interface (HOST I/F)220, a HOST memory (HOST MEM)230, a Neural Processing Unit (NPU)240, a Digital Signal Processor (DSP)250, a Central Processing Unit (CPU)260, an Image Signal Processor (ISP)270, and a Graphics Processing Unit (GPU) 280.

The external interface 210 may be configured to exchange data, signals, events, etc. with the outside of the memory system 100. For example, the external interface 210 may include: input devices such as keyboards, keypads, buttons, microphones, mice, touch pads, touch screens, remote controls, etc., and output devices such as printers, speakers, displays, etc.

The host interface 220 may be configured to provide a physical connection between the host device 200 and the storage device 300. For example, the host interface 220 may provide an interface corresponding to a bus format of the host device 200 to communicate between the host device 200 and the storage device 300. In some example embodiments, the bus format of the host device 200 may be universal flash memory (UFS) and/or non-volatile memory express (NVMe). In other example embodiments, the bus format of host device 200 may be Small Computer System Interface (SCSI), serial attached SCSI (sas), Universal Serial Bus (USB), peripheral component interconnect express (PCIe), Advanced Technology Attachment (ATA), parallel ATA (pata), serial ATA (sata), and the like.

NPU 240, DSP 250, CPU 260, ISP 270, and GPU 280 may be configured to control the operation of host device 200 and may process data associated with the operation of host device 200.

For example, the CPU 260 may be configured to control the overall operation of the host device 200, and may run an Operating System (OS). For example, the OS run by CPU 260 may include: a file system for file management, and a device driver configured to control peripheral devices including the storage device 300 at an OS level. The DSP 250 may be configured to process digital signals. The ISP 270 may be configured to process image signals. GPU 280 may be configured to process various data associated with graphics.

The NPU 240 may be configured to operate and drive a neural network system, and may process corresponding data. In addition to the NPU 240, at least one of the DSP 250, the CPU 260, the ISP 270, and the GPU 280 may also be configured to run and drive a neural network system. Thus, NPU 240, DSP 250, CPU 260, ISP 270, and GPU 280 may be referred to as a plurality of Processing Elements (PEs), a plurality of resources, or a plurality of accelerators for driving a neural network system, and may include: processing circuitry, such as hardware including logic circuitry; a hardware/software combination, such as a processor running software; or a combination thereof. For example, the processing circuitry may more particularly include, but is not limited to, a Central Processing Unit (CPU), an Arithmetic Logic Unit (ALU), a digital signal processor, a microcomputer, a Field Programmable Gate Array (FPGA), and programmable logic unit, a microprocessor, an Application Specific Integrated Circuit (ASIC), and the like.

Host memory 230 may be configured to store instructions and/or data that are executed and/or processed by NPU 240, DSP 250, CPU 260, ISP 270, and/or GPU 280. For example, the host memory 230 may include at least one of various volatile memories such as a Dynamic Random Access Memory (DRAM), a Static Random Access Memory (SRAM), and the like.

In some example embodiments, host device 200 may be an Application Processor (AP). For example, the host device 200 may be implemented in the form of a system on chip (SoC).

The host device 200 may be configured to access the storage device 300. The memory device 300 may include a memory controller 310 and a plurality of non-volatile memories (NVMs) 320a, 320b, 320c, and 320 d. Although shown as including four NVMs, example embodiments are not so limited and may include more or fewer NVMs, e.g., five or more.

The memory controller 310 may be configured to control the operation of the memory device 300 and/or the operation of the plurality of non-volatile memories 320a, 320b, 320c and 320d based on commands, addresses and data received from the host device 200. The configuration of the memory controller 310 will be described with reference to fig. 2.

The plurality of nonvolatile memories 320a, 320b, 320c and 320d may be configured to store a plurality of data. For example, the plurality of nonvolatile memories 320a, 320b, 320c, and 320d may store metadata, various user data, and the like.

In some example embodiments, the plurality of non-volatile memories 320a, 320b, 320c and 320d may each include a NAND flash memory. In other example embodiments, each of the plurality of non-volatile memories 320a, 320b, 320c, and 320d may include one of an Electrically Erasable Programmable Read Only Memory (EEPROM), a phase change random access memory (PRAM), a Resistive Random Access Memory (RRAM), a Nano Floating Gate Memory (NFGM), a polymer random access memory (ponam), a Magnetic Random Access Memory (MRAM), a Ferroelectric Random Access Memory (FRAM), and the like.

In some example embodiments, the storage device 300 may include a universal flash memory (UFS), a multimedia card (MMC), or an embedded multimedia card (eMMC). In other example embodiments, the storage device 300 may include one of a Solid State Disk (SSD), a Secure Digital (SD) card, a micro SD card, a memory stick, a chip card, a Universal Serial Bus (USB) card, a smart card, a Compact Flash (CF) card, and the like.

In some example embodiments, the storage device 300 may be connected to the host device 200 via a block access (block access) interface, which may include, for example, UFS, eMMC, NVMe bus, SATA bus, SCSI bus, SAS bus, or the like. The storage device 300 may be configured to provide a block access interface to the host device 200 using a block access address space corresponding to an access size of the plurality of nonvolatile memories 320a, 320b, 320c, and 320d to allow access to data stored in the plurality of nonvolatile memories 320a, 320b, 320c, and 320d in units of storage blocks.

In some example embodiments, the storage system 100 may be included in at least one of various mobile systems, such as: mobile phones, smart phones, tablet computers, laptop computers, Personal Digital Assistants (PDAs), Portable Multimedia Players (PMPs), digital cameras, portable game consoles, music players, camcorders, video players, navigation devices, wearable devices, internet of things (IoT) devices, internet of everything (IoE) devices, electronic book readers, Virtual Reality (VR) devices, Augmented Reality (AR) devices, robotic devices, unmanned planes, and so forth. In other example embodiments, storage system 100 may be included in at least one of a variety of computing systems, such as a Personal Computer (PC), a server computer, a data center, a workstation, a digital television, a set-top box, a navigation system, and so forth.

The storage device 300 according to some example embodiments is implemented with or equipped with Artificial Intelligence (AI) functionality. For example, the storage device 300 may be configured to function as a storage medium that performs a data storage function, and may also be configured to function as a computing device that operates a neural network system to perform an AI function.

For example, the memory device 300 may be configured to operate in one of a first mode of operation and a second mode of operation. In the first operation mode, the storage device 300 may perform a data storage function, for example, a write operation for storing the first input data UDAT received from the host device 200, a read operation for outputting the stored data to the host device 200, and the like. In the second mode of operation, the memory device 300 may perform AI functions such as: an AI calculation and/or operation (e.g., an arithmetic operation for an AI) based on the second input data IDAT received from the host device 200 to generate calculation result data RDAT, an operation of outputting the calculation result data RDAT to the host device 200, and the like. Although fig. 1 shows that only data is transmitted, a command, an address, or the like corresponding to the data may be transmitted.

The memory controller 310 may include a first processor 312 and a second processor 314. The first processor 312 may control the overall operation of the storage device 300 and may control operations associated with data storage functions in a first mode of operation. The second processor 314 may control operations, runs, or calculations associated with the AI function in the second mode of operation. In the example of fig. 1, the first processor 312 and the second processor 314 may be formed or implemented as one chip, or may include: processing circuitry, such as hardware including logic circuitry; a hardware/software combination, such as a processor running software; or a combination thereof. For example, the processing circuitry may more particularly include, but is not limited to, a Central Processing Unit (CPU), an Arithmetic Logic Unit (ALU), a digital signal processor, a microcomputer, a Field Programmable Gate Array (FPGA), and programmable logic unit, a microprocessor, an Application Specific Integrated Circuit (ASIC), and the like.

The plurality of non-volatile memories 320a, 320b, 320c and 320d may include (e.g., may be divided or classified into) at least one first non-volatile memory and at least one second non-volatile memory. The first non-volatile memory may be configured to be accessed by the first processor 312 and may be designated or allocated to perform data storage functions. The second non-volatile memory may be configured to be accessed by the second processor 314 and may be designated or allocated to perform AI functions. For example, as will be described with reference to fig. 4, the first non-volatile memory may store the first input data UDAT and the second non-volatile memory may store weight data associated with the AI calculation.

According to some example embodiments, the first nonvolatile memory and the second nonvolatile memory may be formed or implemented as one chip, or may be formed or implemented as two separate chips. In some example embodiments, when the first non-volatile memory is accessed only by the first processor 312 and the second non-volatile memory is accessed only by the second processor 314, the plurality of non-volatile memories 320a, 320b, 320c, and 320d may also include a third non-volatile memory accessed by both the first processor 312 and the second processor 314.

The AI function of the storage device 300 may be performed independently of the control of the host device 200, and/or may be performed separately/independently from the control of the host device 200, may be performed by inside the storage device 300 and/or by the storage device 300 itself. For example, a neural network system operated internally by storage device 300 may be implemented and/or driven independently of a neural network system operated by host device 200. The storage device 300 may perform the AI function based on a neural network system operated by the inside of the storage device 300 without being controlled by the main machine device 200. According to example embodiments, the neural network system operated by the inside of the storage device 300 and the neural network system operated by the host device 200 may be the same type or different types.

Conventional storage devices are not implemented or equipped with AI functions, but perform AI functions using resources or accelerators included in the host device. When an AI function including a relatively small amount of computation is to be performed, the relatively small amount of computation will be performed using relatively large resources included in the host device. In this case, power consumption may increase, and data traffic between the host device and the storage device may exceed the amount of computation, and thus a bottleneck may occur. As a result, performing AI functions on a host device may be very inefficient when AI functions including relatively small amounts of computation are to be performed.

The storage device 300 according to some example embodiments may be configured to implement an AI function, and the storage device 300 may further include a second processor 314 performing the AI function and AI calculation. The second processor 314 may be separate or distinct from the first processor 312 that controls the operation of the storage device 300. The AI function of the storage device 300 may be performed independently of the control of the host device 200, and may be performed by the storage device 300 itself. The storage device 300 may be configured, for example, to receive only the second input data IDAT that is a target of the AI calculation, and may be configured to output only the calculation result data RDAT that is a result of the AI calculation. Accordingly, data traffic between the host device 200 and the storage device 300 can be reduced. In addition, the first processor 312 and the second processor 314 may be independent of each other, and the nonvolatile memories accessed by the first processor 312 and the second processor 314 may also be independent of each other, thereby reducing power consumption.

FIG. 2 is a block diagram illustrating an example of a storage controller included in a storage device, according to some example embodiments.

Referring to fig. 2, the memory controller 400 may include a first processor 410, a second processor 420, a buffer memory 430, a host interface 440, an Error Correction Code (ECC) block 450, and a memory interface 460.

The first processor 410 may be configured to control the operation of the storage controller 400 in response to commands received from a host device (e.g., the host device 200 in fig. 1) via the host interface 440. In some example embodiments, the first processor 410 may control the various components by employing firmware for operating a storage device (e.g., the storage device 300 in fig. 1).

The first processor 410 may be configured to control operations associated with a data storage function, and the second processor 420 may control operations associated with an AI function and AI computation. The first processor 410 and the second processor 420 in fig. 2 may be the same as or similar to the first processor 312 and the second processor 314, respectively, in fig. 1. For example, the first processor 410 may be a CPU and the second processor 420 may be an NPU, or may include: processing circuitry, such as hardware including logic circuitry; a hardware/software combination, such as a processor running software; or a combination thereof.

In some example embodiments, second processor 420 may be an NPU, and the NPU included in memory controller 400 may be smaller than the NPU included in host device 200 (e.g., NPU 240 in fig. 1). For example, the second processor 420 may have a data throughput, arithmetic capacity, power consumption, etc. less than the NPU 240.

The buffer memory 430 may be configured to store instructions and data that are executed and processed by the first processor 410 and the second processor 420. For example, the buffer memory 430 may be implemented by a volatile memory (e.g., a Static Random Access Memory (SRAM), a cache memory, etc.) having a relatively small capacity and high speed.

The ECC block 450 is configured to correct errors and may be configured to perform coded modulation, for example, by using at least one of a Bose-Chaudhuri-hocquenghem (bch) code, a Low Density Parity Check (LDPC) code, a turbo code, a Reed-Solomon code, a convolutional code, a Recursive Systematic Code (RSC), Trellis Coded Modulation (TCM), Block Coded Modulation (BCM), and the like, or may perform ECC encoding and ECC decoding using the above-described codes and/or other error correction codes.

The host interface 440 may be configured to provide a physical connection between the host device 200 and the storage device 300. For example, the host interface 440 may provide an interface corresponding to a bus format of a host for communication between the host device 200 and the storage device 300. The bus format of host interface 440 may be the same as or similar to the bus format of host interface 220 in FIG. 1.

The memory interface 460 may be configured to exchange data with non-volatile memory (e.g., non-volatile memories 320a, 320b, 320c, and 320d in fig. 1). The memory interface 460 may be configured to transmit data to the non-volatile memories 320a, 320b, 320c and 320d and/or may be configured to receive data read from the non-volatile memories 320a, 320b, 320c and 320 d. In some example embodiments, the memory interface 460 may be connected to the nonvolatile memories 320a, 320b, 320c and 320d via one channel. In other example embodiments, the memory interface 460 may be connected to the non-volatile memories 320a, 320b, 320c and 320d via two or more channels.

Fig. 3 is a block diagram illustrating an example of a non-volatile memory included in a storage device according to some example embodiments.

Referring to fig. 3, the nonvolatile memory 500 includes a memory cell array 510, a row decoder 520, a page buffer circuit 530, a data input/output (I/O) circuit 540, a voltage generator 550, and a control circuit 560.

The memory cell array 510 may be connected to the row decoder 520 via a plurality of string selection lines SSL, a plurality of word lines WL, and a plurality of ground selection lines GSL. The memory cell array 510 may also be connected to a page buffer circuit 530 via a plurality of bit lines BL. The memory cell array 510 may include a plurality of memory cells (e.g., a plurality of nonvolatile memory cells) connected to a plurality of word lines WL and a plurality of bit lines BL. The memory cell array 510 may be divided into a plurality of memory blocks BLK1, BLK2, …, BLKz each including memory cells. In addition, the plurality of memory blocks BLK1, BLK2, …, BLKz may each be divided into a plurality of pages.

In some example embodiments, the plurality of memory cells may be arranged in a two-dimensional (2D) array structure and/or a three-dimensional (3D) vertical array structure. The three-dimensional vertical array structure may include vertical cell strings vertically oriented such that at least one memory cell is located above another memory cell. The at least one memory cell may include a charge trapping layer. The following patent documents are incorporated herein by reference in their entirety: U.S. patent nos.: 7,679,133, respectively; 8,553,466, respectively; 8,654,587, respectively; 8,559,235, respectively; U.S. patent publication nos.: 2011/0233648 which describe a suitable configuration of memory cell arrays including a 3D vertical array structure in which a three-dimensional memory array is configured in multiple levels and word lines and/or bit lines are shared between the levels.

The control circuit 560 may be configured to receive a command CMD and an address ADDR from the outside (e.g., the host device 200 and/or the memory controller 310 in fig. 1), and may be configured to control erase, program, and read operations of the nonvolatile memory 500 based on the command CMD and the address ADDR. The erase operation may include a series of erase cycles and the program operation may include a series of program cycles. Each program loop may include a program operation and a program verify operation. Each erase cycle may include an erase operation and an erase verify operation. The read operation may include a normal read operation and a data recovery read operation.

For example, the control circuit 560 may be configured to generate a control signal CON for controlling the voltage generator 550 based on the command CMD, and may generate a control signal PBC for controlling the page buffer circuit 530, and may generate a row address R _ ADDR and a column address C _ ADDR based on the address ADDR. Control circuitry 560 may provide row addresses R _ ADDR to row decoders 520 and may provide column addresses C _ ADDR to data I/O circuitry 540.

The row decoder 520 may be connected to the memory cell array 510 via a plurality of string selection lines SSL, a plurality of word lines WL, and a plurality of ground selection lines GSL.

For example, in a data erase/write/read operation, the row decoder 520 may determine at least one word line of the plurality of word lines WL as a selected word line based on the row address R _ ADDR, and may determine remaining and/or remaining word lines of the plurality of word lines WL other than the selected word line as unselected word lines.

In addition, in the data erase/write/read operation, the row decoder 520 may determine at least one string selection line of the plurality of string selection lines SSL as a selected string selection line based on the row address R _ ADDR, and may determine remaining or remaining string selection lines of the plurality of string selection lines SSL other than the selected string selection line as unselected string selection lines.

Also, in the data erase/write/read operation, the row decoder 520 may determine at least one ground selection line of the plurality of ground selection lines GSL as a selected ground selection line based on the row address R _ ADDR, and may determine remaining or remaining ground selection lines of the plurality of ground selection lines GSL except the selected ground selection line as unselected ground selection lines.

The voltage generator 550 may be configured to generate a voltage VS for an operation of the nonvolatile memory 500 based on the power PWR and the control signal CON. The voltage VS may be applied to a plurality of string selection lines SSL, a plurality of word lines WL, and a plurality of ground selection lines GSL via the row decoder 520. In addition, the voltage generator 550 may be configured to generate the erase voltage VERS for the data erase operation based on the power PWR and the control signal CON. The erase voltage VERS may be applied to the memory cell array 510 directly or via the bit lines BL.

For example, during an erase operation, the voltage generator 550 may apply the erase voltage VERS to the common source line and/or the bit line BL of the memory block (e.g., the selected memory block), and may apply an erase permission voltage (e.g., a ground voltage) to all word lines or a portion of the word lines of the memory block via the row decoder 520. In addition, during the erase verify operation, the voltage generator 550 may apply the erase verify voltage to all word lines of the memory block simultaneously or sequentially one by one to the word lines.

For example, during a program operation, the voltage generator 550 may apply a program voltage to a selected word line via the row decoder 520 and may apply a program pass voltage to unselected word lines. Further, during a program verify operation, the voltage generator 550 may apply a program verify voltage to a selected word line via the row decoder 520 and may apply a verify pass voltage to unselected word lines.

In addition, during a normal read operation, the voltage generator 550 may apply a read voltage to a selected word line via the row decoder 520 and may apply a read pass voltage to unselected word lines. During a data recovery read operation, the voltage generator 550 may apply a read voltage to a word line adjacent to a selected word line via the row decoder 520, and may apply a recovery read voltage to the selected word line.

The page buffer circuit 530 may be connected to the memory cell array 510 via a plurality of bit lines BL. The page buffer circuit 530 may include a plurality of page buffers. In some example embodiments, each page buffer may be connected to one bit line. In other example embodiments, each page buffer may be connected to two or more bit lines.

The page buffer circuit 530 may store data DAT to be programmed into the memory cell array 510 or may read data DAT sensed from the memory cell array 510. For example, the page buffer circuit 530 may function as a write driver or a sense amplifier according to an operation mode of the nonvolatile memory 500.

The data I/O circuit 540 may be connected to the page buffer circuit 530 via the data line DL. The data I/O circuit 540 may be configured to supply data DAT from the outside of the nonvolatile memory 500 to the memory cell array 510 via the page buffer circuit 530 based on the column address C _ ADDR, or may supply data DAT from the memory cell array 510 to the outside of the nonvolatile memory 500.

Fig. 4, 5A, 5B, and 5C are diagrams for describing an operation of the memory system of fig. 1. FIG. 4 illustrates the operation of the memory system 100 in a first mode of operation. Fig. 5A, 5B, and 5C illustrate operation of the memory system 100 in a second mode of operation. Components that are less relevant to the description of the illustrated example embodiments among all components of the storage device are omitted for ease of illustration.

Referring to fig. 4, the host device 200 in fig. 4 may be the same as or similar to the host device 200 in fig. 1. The storage device 300a in fig. 4 may include a host interface 440, a first processor 410, a first memory interface 462, a first non-volatile memory 322, a second processor 420, a second memory interface 464, and a second non-volatile memory 324.

The host interface 440, the first processor 410, and the second processor 420 in fig. 4 may be the same as or similar to the host interface 440, the first processor 410, and the second processor 420, respectively, in fig. 2. First memory interface 462 and second memory interface 464 may be included in memory interface 460 of fig. 2. The first nonvolatile memory 322 and the second nonvolatile memory 324 may be included in the plurality of nonvolatile memories 320a, 320b, 320c and 320d of fig. 1. The host interface 440 may be included in the first clock/power domain DM 1. The first processor 410, the first memory interface 462, and the first non-volatile memory 322 may be included in a second clock/power domain DM2 that is different and distinct from the first clock/power domain DM 1. The second processor 420, the second memory interface 464 and the second non-volatile memory 324 may be included in a third clock/power domain DM3 that is different and distinct from the first clock/power domain DM1 and the second clock/power domain DM 2.

In the first mode of operation, the first input data UDAT may be provided from the host interface 220 of the host device 200 and the storage device 300a may receive the first input data UDAT. For example, the first input data UDAT may be any user data processed by at least one of the NPU 240, the DSP 250, the CPU 260, the ISP 270, and the GPU 280.

The storage device 300a may perform a data storage function on the first input data UDAT. For example, first input data UDAT may be sent through host interface 440 and first memory interface 462 and stored in first non-volatile memory 322. Although the data storage function is described based on a write operation, example embodiments are not limited thereto, and a read operation may be performed to provide the data UDAT stored in the first non-volatile memory 322 to the host device 200.

In the first mode of operation shown in FIG. 4, the host interface 440, the first and second processors 410, 420, the first and second memory interfaces 462, 464, and the first and second non-volatile memories 322, 324 may all be enabled or activated. However, example embodiments are not limited thereto, and the second processor 420, the second memory interface 464, and the second nonvolatile memory 324 may be in an idle state in the first operation mode, as will be described with reference to fig. 7.

Referring to fig. 5A, in the second operation mode, the second input data IDAT may be provided from the external interface 210 and the host interface 220 of the host device 200, and the storage device 300a may receive the second input data IDAT. For example, the second input data IDAT may be any inference data (inference data) that is a target of AI calculation. For example, when the AI function is voice recognition, the second input data IDAT may be voice data received from a microphone included in the external interface 210. For another example, when the AI function is image recognition, the second input data IDAT may be image data received from a camera included in the external interface 210. The second input data IDAT may be sent to the second processor 420 via the host interface 440.

In the second mode of operation, the host interface 440, the second processor 420, the second memory interface 464, and the second non-volatile memory 324 may be enabled or may be in an active state, and the first processor 410, the first memory interface 462, and the first non-volatile memory 322 may be switched or changed from the active state to an idle state (e.g., a sleep, power saving, or power down state). In fig. 5A and subsequent figures, components in an idle state are shown in a shaded pattern. Only the first processor 410, the first memory interface 462 and the first non-volatile memory 322 may be included in the separate clock/power domain DM2 such that only the first processor 410, the first memory interface 462 and the first non-volatile memory 322 are switched to an idle state, and thus power consumption may be reduced in the second mode of operation.

Referring to fig. 5B, in the second operating mode, the second processor 420 may load the weight data WDAT stored in the second non-volatile memory 324. The weight data WDAT may be sent to the second processor 420 through the second memory interface 464. For example, the weight data WDAT may represent a plurality of weight parameters that are pre-training parameters and that are included in a plurality of layers of the neural network system. The weight data WDAT may be pre-trained to be suitable or adapted for the neural network system and may be pre-stored in the second non-volatile memory 324.

In some example embodiments, the weight data WDAT may be stored in the second non-volatile memory 324 continuously, sequentially, and/or sequentially. In this example, the second processor 420 may directly load the weight data WDAT using only a start location (e.g., a start address) where the weight data WDAT is stored and a size of the weight data WDAT without a Flash Translation Layer (FTL) operation.

In some example embodiments, the neural network system includes at least one of the following various neural network systems and/or machine learning systems, for example: artificial Neural Network (ANN) systems, Convolutional Neural Network (CNN) systems, Deep Neural Network (DNN) systems, deep learning systems, and the like. Such machine learning systems may include a variety of learning models such as: convolutional Neural Networks (CNN), deconvolution neural networks, Recurrent Neural Networks (RNN) optionally including Long Short Term Memory (LSTM) units and/or Gated Recurrent Units (GRU), Stacked Neural Networks (SNN), State Space Dynamic Neural Networks (SSDNN), Deep Belief Networks (DBN), generative countermeasure networks (GAN), and/or Restricted Boltzmann Machines (RBM). Alternatively or additionally, such machine learning systems may include other forms of machine learning models such as: for example, linear and/or logistic regression, statistical clustering, bayesian classification, decision trees, dimension reduction such as principal component analysis, and expert systems; and/or combinations thereof, including, for example, collections of random forests (ensembles). Such machine learning models may also be used to provide, for example, at least one of the following various services and/or applications: image classification services, user authentication services based on biometric information or biometric data, Advanced Driver Assistance System (ADAS) services, voice assistant services, Automatic Speech Recognition (ASR) services, etc., and may be executed, run, or processed by host device 200 and/or storage device 300 a. The configuration of the neural network system will be described with reference to fig. 6A, 6B, and 6C.

Referring to fig. 5C, in the second operation mode, the second processor 420 may perform AI calculation based on the second input data IDAT received in fig. 5C and the weight data WDAT loaded in fig. 5B to generate calculation result data RDAT, and may transmit the calculation result data RDAT to the host device 200. The calculation result data RDAT may be transmitted to the host device 200 through the host interface 440. For example, the calculation result data RDAT may represent the result of a multiply-accumulate (MAC) operation performed by the neural network system.

As described with reference to fig. 5A, 5B, and 5C, the second processor 420 may be configured to perform the AI function separately and/or independently in the storage device 300a, and the weight data WDAT may be used only inside the storage device 300a and may not be transmitted to the host device 200. For example, the storage device 300a may exchange only the second input data IDAT as an AI calculation target and the calculation result data RDAT as an AI calculation result with the host device 200. In general, the size of the weight data WDAT may be much larger than the size of the second input data IDAT and the size of the calculation result data RDAT. Accordingly, since the AI function is implemented in the storage device 300a, the data traffic between the host device 200 and the storage device 300a can be reduced, and the amount of calculation of the host device 200 and the usage rate of the host memory 230 can also be reduced.

Fig. 6A, 6B, and 6C are diagrams for describing examples of network structures driven by AI functions implemented in a storage device according to some example embodiments.

Referring to fig. 6A, a general neural network (e.g., ANN) may include an input layer IL, a plurality of hidden layers HL1, HL2, …, HLn, and an output layer OL.

The input layer IL may comprise i input nodes x₁、x₂、…、x_iWhere i is a natural number. Input data of length i (e.g., vector input data) IDAT may be input to input node x₁、x₂、…、x_iSo that each element of the input data IDAT is input to the input node x₁、x₂、…、x_iOf the corresponding input node.

The plurality of hidden layers HL1, HL2, …, HLn may include n hidden layers, where n is a natural number, and may include a plurality of hidden nodes h¹ ₁、h¹ ₂、h¹ ₃、…、h¹ _m，h² ₁、h² ₂、h² ₃、…、h² _m，hⁿ ₁、hⁿ ₂、hⁿ ₃、…、hⁿ _m. For example, the hidden layer HL1 may include m hidden nodes h¹ ₁、h¹ ₂、h¹ ₃、…、h¹ _mThe hidden layer HL2 may comprise m hidden nodes h² ₁、h² ₂、h² ₃、…、h² _mThe hidden layer HLn may comprise m hidden nodes hⁿ ₁、hⁿ ₂、hⁿ ₃、…、hⁿ _mWherein m is a natural number.

The output layer OL may include j output nodes y₁、y₂、…、y_jWhere j is a natural number. Output node y₁、y₂、…、y_jEach of which may correspond to a respective one of the categories to be classified. The output layer OL may output an output value ODAT (e.g. a category score or a simple score) associated with the input data IDAT for each category. The output layer OL may be referred to as a fully connected layer and may indicate, for example, the probability that the input data IDAT corresponds to a car.

The structure of the neural network shown in fig. 6A can be represented by information on branches (or connections) illustrated as lines between nodes and a weight value (not shown) assigned to each branch. Nodes within a layer may not be directly connected to each other, but nodes of different layers may be fully or partially connected to each other.

Each node (e.g., node h)¹ ₁) May be configured to receive a previous node (e.g., node x)₁) To perform computational operations, operations and/or computations on the received output and to output the results of the computational operations, operations and/or computations to a subsequent node (e.g., node h)² ₁). Each node may compute a value to be output by applying the input to a particular function (e.g., a non-linear function).

In general, the structure of a neural network is set in advance, and weight values for connections between nodes are appropriately set using data having known answers to the categories to which they belong. Data with known answers is referred to as "training data" and the process of determining weight values is referred to as "training". Neural networks "learn" during the training process. A set of independently-trainable structures and weight values is called a "model", and a process of predicting which category input data belongs to by a model having a certain weight value and then outputting a predicted value is called a "test" process.

The general neural network shown in fig. 6A may not be suitable for processing input image data (or input sound data) because each node (e.g., node h)¹ ₁) All nodes connected to the previous layer (e.g., node x included in layer IL)₁、x₂、…、x_i) And as the size of the input image data increases, the number of weight values sharply increases. Accordingly, the CNN implemented by combining a filtering technique with a general neural network has been studied, thereby effectively training a two-dimensional image (e.g., input image data) through the CNN.

Referring to fig. 6B, the CNN may include a plurality of layers CONV1, RELU1, CONV2, RELU2, POOL1, CONV3, RELU3, CONV4, RELU4, POOL2, CONV5, RELU5, CONV6, RELU6, POOL3, and FC.

Unlike a general neural network, each layer of the CNN may have three dimensions of width, height, and depth, and thus data input to each layer may be volume data having three dimensions of width, height, and depth. For example, if the size of the input image in fig. 6B is 32 widths (e.g., 32 pixels) and 32 heights and three color channels R, G and B, the size of the input data IDAT corresponding to the input image may be 32 × 3. The input data IDAT in fig. 6B may be referred to as input volume data or input stimulus volume.

Convolutional layers CONV1, CONV2, CONV3, CONV4, CONV5, and CONV6 may each be configured to perform convolution operations on input volume data. For example, in image processing, a convolution operation represents an operation of: the image data is processed based on the mask having the weight value, and an output value is obtained by multiplying the input value by the weight value and adding all the product values. The mask may be referred to as a filter, window, or kernel.

The parameters of each convolutional layer may consist of a set of learnable filters. Each filter may be small in space (along width and height), but may extend the full depth of the input volume. For example, during a forward pass, each filter may slide (e.g., convolve) over the width and height of the input volume, and the dot product between the entry of the filter and the input may be calculated at any location. As the filter slides across the width and height of the input volume, a two-dimensional activation map is generated that gives the response of the filter at each spatial location. As a result, an output volume may be generated by stacking the activation maps along the depth dimension. For example, if input volume data of size 32 × 3 passes through convolutional layer CONV1 with four filters containing zero padding, the output volume data of convolutional layer CONV1 may be of size 32 × 12 (e.g., the depth of the volume data increases).

RELU layers RELU1, RELU2, RELU3, RELU4, RELU5, and RELU6 may each be configured to perform modified linear unit (RELU) operations corresponding to an activation function defined by, for example, a function f (x) max (0, x) (e.g., the output of all negative inputs x is zero). For example, if input volume data of size 32 × 12 passes through the RELU layer RELU1 to perform modified linear unit operations, the output volume data of the RELU layer RELU1 may be of size 32 × 12 (e.g., size preservation of volume data).

The pooling layers POOL1, POOL2, and POOL3 may each be configured to perform downsampling operations on input volume data along the spatial dimensions of width and height. For example, four input values arranged in a 2 × 2 matrix may be converted into one output value based on a 2 × 2 filter. For example, the maximum of four input values arranged in a 2 × 2 matrix may be selected based on a 2 × 2 max pool, or the average of four input values arranged in a 2 × 2 matrix may be obtained based on a 2 × 2 average pooling. For example, if input volume data of size 32 x 12 passes through the pooling layer POOL1 with 2 x 2 filters, the output volume data of the pooling layer POOL1 may be of size 16 x 12 (e.g., the width and height of the volume data is reduced, the depth of the volume data is preserved).

In general, one convolutional layer (e.g., CONV1) and one RELU layer (e.g., RELU1) may form a pair of CONV/RELU layers in the CNN, the pair of CONV/RELU layers may be repeatedly arranged in the CNN, and the pooled layers may be periodically inserted in the CNN, thereby reducing the spatial size of an image and extracting features of the image.

The output layer or the full connection layer FC may output the result (e.g., category score) of the input volume data IDAT for each category. For example, when the convolution operation and the downsampling operation are repeated, the input volume data IDAT corresponding to a two-dimensional image may be converted into a one-dimensional matrix or vector. For example, the full connectivity layer FC may represent the probability that the input volume data IDAT corresponds to cars, trucks, planes, ships and horses.

The type and number of layers included in the CNN may not be limited to the example described with reference to fig. 6B, but may be changed according to an example embodiment. In addition, although not shown in fig. 6B, the CNN may further include other layers, for example, a soft maximum (softmax) layer for converting a score value corresponding to a prediction result into a probability value, a bias addition layer for adding at least one bias, and the like.

Referring to fig. 6C, the RNN may include a repetitive structure using a specific node or unit N shown at the left side of fig. 6C.

The structure shown on the right side of fig. 6C may indicate that the loop connection of the RNN shown on the left side is unfolded (or rolled out). The term "unrolled" refers to writing out or showing the network for a complete or entire sequence including all nodes NA, NB, and NC. For example, if the sequence of interest is a 3-word sentence, the RNN may be developed into a 3-layer neural network, one layer per word (e.g., no recursive connections or no cycles).

In the RNN of FIG. 6C, X represents the input to the RNN. For example, X_tMay be an input at time step t, and X_t-1And X_t+1May be input at time steps t-1 and t +1, respectively.

In the RNN of fig. 6C, S denotes a hidden state. For example, S_tMay be a hidden state at time step t, and S_t-1And S_t+1May be hidden states at time steps t-1 and t +1, respectively. The hidden state may be calculated based on the previous hidden state and the input at the current step. For example, S_t＝f(UX_t+WS_t-1). For example, the function f may typically be a non-linear function such as tanh or RELU. Calculating S required for the first hidden state_-1Can be initialized to all zeros in general.

In the RNN of fig. 6C, O denotes an output of the RNN. E.g. O_tMay be output at time step t, and O_t-1And O_t+1May be output at time steps t-1 and t +1, respectively. For example, if the next word in a sentence needs to be predicted, it will be a vector of probabilities in the entire vocabulary. E.g. O_t＝softmax(VS_t)。

In the RNN of fig. 6C, the hidden state may be the "memory" of the network. For example, the RNN may have a "memory" that captures information about the results that have been calculated so far. Hidden state S_tInformation about what happened in all previous time steps can be captured. The output O may be calculated based only on the memory at the current time step t_t. In addition, unlike conventional neural networks that use different parameters at each layer, the RNN may share the same parameters (e.g., U, V and W in fig. 6C) at all time steps. This may be expressed as follows: the same task can be performed at each step, but using different inputs. This may greatly reduce the total number of parameters that need to be trained or learned.

In some example embodiments, the neural network system described with reference to fig. 6A, 6B, and 6C may execute, run, or process at least one of various services and/or applications (e.g., image classification services, biometric-information-or-biometric-data-based user authentication services, Advanced Driver Assistance System (ADAS) services, voice assistant services, Automatic Speech Recognition (ASR) services, etc.).

Fig. 7, 8A, and 8B are diagrams for describing an operation of the memory system of fig. 1. A description overlapping with fig. 4, 5A, 5B, and 5C will be omitted.

Referring to fig. 7, the host device 200 in fig. 7 may be substantially the same as or similar to the host device 200 in fig. 1. The memory device 300b may be substantially the same as or similar to the memory device 300a of fig. 4, except that the memory device 300b of fig. 7 further includes a trigger unit (TRG) 470.

The trigger unit 470 may be configured to enable and/or activate the second processor 420, the second memory interface 464, and/or the second non-volatile memory 324 when the operating mode of the storage device 300b changes from the first operating mode to the second operating mode. The trigger unit 470 may be included in a fourth clock/power domain DM4 that is different and distinct from the first, second and third clock/power domains DM1, DM2, DM 3. According to some example embodiments, the triggering unit 470 and the second processor 420 may be formed or implemented as one chip or two separate chips. In one example, the trigger unit 470 may be configured to enable/activate/trigger the first and second operation modes.

In the first mode of operation, the first input data UDAT may be provided from the host interface 220 of the host device 200, and the storage device 300b may be configured to receive the first input data UDAT. The storage device 300b may be configured to perform a data storage function on the first input data UDAT.

In the first operating mode shown in fig. 7, the host interface 440, the first processor 410, the first memory interface 462, the first non-volatile memory 322, and the trigger unit 470 may be in an active state, and the second processor 420, the second memory interface 464, and the second non-volatile memory 324 may be in an idle state. Only the second processor 420, the second memory interface 464 and the second non-volatile memory 324 may be included in the independent clock/power domain DM3 such that only the second processor 420, the second memory interface 464 and the second non-volatile memory 324 are in an idle state, and thus power consumption may be reduced in the first operation mode.

Referring to fig. 8A, in the second operation mode, the second input data IDAT may be provided from the external interface 210 and the host interface 220 of the host device 200, and the storage device 300b may be configured to receive the second input data IDAT. The second input data IDAT may be sent to the trigger unit 470 through the host interface 440.

Referring to fig. 8B, the trigger unit 470 may generate the wake signal WK to enable the second processor 420, the second memory interface 464, and the second non-volatile memory 324, and may provide the second input data IDAT to the second processor 420 after the second processor 420, the second memory interface 464, and the second non-volatile memory 324 are enabled.

In the second operating mode, the host interface 440 and the trigger unit 470 may be in an active state, and the second processor 420, the second memory interface 464, and the second non-volatile memory 324 may be switched from an idle state to an active state. In addition, the first processor 410, the first memory interface 462, and the first non-volatile memory 322 may be switched from an active state to an idle state, similar to that described with reference to fig. 5A.

After the operation of fig. 8B, the operation of loading the weight data WDAT and generating/transmitting the calculation result data RDAT may be similar to the operation described with reference to fig. 5B and 5C.

Fig. 9 and 10 are diagrams for describing an operation of switching an operation mode in a memory system according to an example embodiment.

Referring to FIG. 9, the operating mode of the storage device 302 may be switched or a particular operating mode of the storage device 302 may be enabled or activated based on a mode setting signal MSS provided to the storage device 302 from the host device 202.

For example, the host device 202 may include a plurality of first pins P1 and a second pin P2 different from the plurality of first pins P1, and the storage device 302 may include a plurality of third pins P3 and a fourth pin P4 different from the plurality of third pins P3. A plurality of first signal lines SL1 for connecting the plurality of first pins P1 with the plurality of third pins P3 may be formed between the plurality of first pins P1 and the plurality of third pins P3, and a second signal line SL2 for connecting the second pin P2 with the fourth pin P4 may be formed between the second pin P2 and the fourth pin P4. For example, the pins may be contact pads or contact pins, but example embodiments are not limited thereto.

The plurality of first pins P1, the plurality of third pins P3, and the plurality of first signal lines SL1 may form a common interface between the host device 202 and the memory device 302, and may be configured to exchange first input data UDAT, second input data IDAT, and calculation result data RDAT shown in fig. 1.

The second pin P2, the fourth pin P4, and the second signal line SL2 may be formed separately and additionally from a general interface between the host device 202 and the memory device 302, and may be physically added for the mode setting signal MSS. For example, the mode setting signal MSS may be transmitted through the second pin P2 and the fourth pin P4, and the second pin P2 and the fourth pin P4 may be used only to transmit the mode setting signal MSS. For example, the second pin P2 and the fourth pin P4 may each be general purpose input/output (GPIO) pins.

Referring to fig. 10, a specific memory space S4 of the memory device 304 may be designated or allocated to a Special Function Register (SFR) area for enabling or activating an AI function, and an operation mode of the memory device 304 may be switched or a specific operation mode of the memory device 304 may be enabled or activated based on an address SADDR and setting data SDAT provided from the host device 204 to the memory device 304.

For example, the host device 204 may include a plurality of first pins P1, the storage device 304 may include a plurality of third pins P3, and a plurality of first signal lines SL1 may be formed between the plurality of first pins P1 and the plurality of third pins P3. The plurality of first pins P1, the plurality of third pins P3, and the plurality of first signal lines SL1 in fig. 10 may be the same as or similar to the plurality of first pins P1, the plurality of third pins P3, and the plurality of first signal lines SL1 in fig. 9, respectively.

The storage space of the storage device 304 may include or may be divided into: a first storage space S1 for the OS, a second storage space S2 for user data, a third storage space S3 for weight data, a fourth storage space S4 for AI functions, and the like. The second operation mode may be enabled or disabled when the address SADDR of the fourth memory space S4 and the setting data SDAT for enabling or disabling the AI function are provided to the memory device 304.

For example, when an operation mode of the memory device according to example embodiments is changed or switched, the pins P2 and P4 for the mode setting signal MSS may be physically added to a general interface (as shown in fig. 9), or a specific address and memory space may be designated and used to switch the operation mode when the general interface is used (as shown in fig. 10).

Although not shown in fig. 9 and 10, an unused command field among command fields used in the storage device may be designated and used to switch the operation mode.

Fig. 11A, 11B, and 11C are diagrams for describing an operation of transferring data in a storage system according to some example embodiments.

Referring to fig. 11A, an example of running a speech recognition service based on a neural network service is shown, for example, the second input data IDAT is an example of speech data VDAT received from a microphone included in the external interface 210.

Referring to FIG. 11B, there is shown an interface IF1 between the host device 200 and the storage device 300 when voice data VDAT is sampled and transmitted in real-time. There is initially an idle interval TID1, and then sample data D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, and D12 are transmitted sequentially. In this case, the relatively small-sized sample data D1 through D12 are transferred at a relatively slow rate (e.g., about 24kHz), in which a time interval TA during which data is not transferred is shorter than a reference time, and thus the interface between the host device 200 and the storage device 300 (e.g., the host interfaces 220 and 440) does not enter a sleep state during the time interval TA.

Referring to fig. 11C, an interface IF2 between the host device 200 and the storage device 300 is shown when voice data VDAT is sampled and sampled data D1 to D12 are collected in a predetermined number and transmitted. Initially there may be an idle interval TID2, and then three of the sample data D1 through D12 may be collected and transmitted at a time. In comparison with the operation of fig. 11B, the time interval TH in which data is not transmitted may be longer than the reference time, and thus, the interface between the host device 200 and the storage device 300 (e.g., the host interfaces 220 and 440) may enter a sleep state during the time interval TH.

Fig. 12 is a block diagram illustrating a storage device and a storage system including the same according to example embodiments. A description overlapping with fig. 1 will be omitted.

Referring to fig. 12, the storage system 100c includes a host device 200 and a storage device 300 c.

The storage system 100c of fig. 12 may be the same as or similar to the storage system 100 of fig. 1, except that the configuration of the storage controller 310c included in the storage device 300c is changed.

The memory controller 310c includes a first processor 312. The second processor 314 is located or provided outside of the memory controller 310 c. In the example of fig. 12, the first processor 312 and the second processor 314 may be formed or implemented as two separate chips.

In some example embodiments, when the memory system 100c further includes a trigger unit (e.g., the trigger unit 470 in fig. 7), the trigger unit 470 and the second processor 420 may be formed or implemented as one chip or two separate chips.

FIG. 13 is a flow chart illustrating a method of operating a storage device according to some example embodiments.

Referring to fig. 1 and 13, in a method of operating a storage device according to some example embodiments, the storage device 300 is configured to perform a data storage function in a first operation mode (step S100). For example, the first processor 312 may store user data in the first non-volatile memory 322 or may read user data stored in the first non-volatile memory 322.

The memory device 300 performs the AI function in the second operation mode. For example, the second processor 314 may receive the inference data from the host device 200 (step S210), may load the weight data from the second nonvolatile memory 324 (step S220), may perform AI calculation based on the inference data and the weight data to generate calculation result data (step S230), and may transmit the calculation result data to the host device 200 (step S240).

In some example embodiments, the host device 200 and the storage device 300 may operate or drive the neural network system, respectively, and according to example embodiments, the operation results may be integrated in the storage system 100. For example, the storage system 100 may be used to drive and compute two or more neural network systems simultaneously to perform complex derivation operations. For example, when performing sound or voice recognition, the host device 200 may drive the CNN to recognize lip movement, and the storage device 300 may drive the RNN to recognize sound or voice itself, and then may integrate the recognition results to improve recognition accuracy.

Fig. 14 is a block diagram illustrating an electronic system according to some example embodiments.

Referring to fig. 14, the electronic system 4000 includes at least one processor 4100, a communication module 4200, a display/touch module 4300, a storage 4400, and a storage 4500. For example, electronic system 4000 may be any mobile system or any computing system.

The processor 4100 is configured to control the operation of the electronic system 4000. For example, the processor 4100 may run an OS and at least one application to provide an internet browser, games, videos, and the like. The communication module 4200 is configured to perform wireless or wired communication with an external system. The display/touch module 4300 is configured to display data processed by the processor 4100 and/or receive data through the touch panel. The storage device 4400 is configured to store user data. The storage device 4500 temporarily stores data used to handle the operation of the electronic system 4000. The processor 4100 and the storage device 4400 may correspond to the host device 200 and the storage device 300 in fig. 1, respectively.

The inventive concept can be applied to various electronic devices and/or systems including memory devices and memory systems. For example, the inventive concept may be applied to systems such as: mobile phones, smart phones, tablet computers, laptop computers, Personal Digital Assistants (PDAs), Portable Multimedia Players (PMPs), digital cameras, portable game consoles, music players, camcorders, video players, navigation devices, wearable devices, internet of things (IoT) devices, internet of everything (IoE) devices, electronic book readers, Virtual Reality (VR) devices, Augmented Reality (AR) devices, robotic devices, unmanned planes, and so forth.

The foregoing is illustrative of some example embodiments and is not to be construed as limiting thereof. Although a few example embodiments have been described, those skilled in the art will readily appreciate that many modifications are possible in the example embodiments without materially departing from the novel teachings and advantages of the example embodiments. Accordingly, all such modifications are intended to be included within the scope of example embodiments as defined in the claims. Therefore, it is to be understood that the foregoing is illustrative of various example embodiments and is not to be construed as limited to the specific example embodiments disclosed, and that modifications to the disclosed example embodiments, as well as other example embodiments, are intended to be included within the scope of the appended claims.

40页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种嵌入式存储器的参数测试的分类筛选方法

Artificial intelligence storage device and storage system including the same

相关技术

网友询问留言