Column-based data layout techniques for clustered data systems

文档序号:1860759 发布日期:2021-11-19 浏览:4次 中文

阅读说明:本技术 用于集群数据系统的基于列的数据布局技术 (Column-based data layout techniques for clustered data systems ) 是由 C·肖汉 S·东高恩卡尔 R·孙达拉姆 J·卡恩 S·古丽亚尼 D·森古普塔 M·泰珀 于 2020-12-25 设计创作,主要内容包括:用于为集群数据提供列数据布局的介质管理技术包括一种设备,其具有列可寻址存储器以及连接到存储器的电路。该电路被配置为:以基于列的格式将逻辑矩阵的数据集群存储在列可寻址存储器中;以及以列读取操作从列可寻址存储器中读取数据集群的逻辑列。读取逻辑列可以包括:从列可寻址存储器中对角地读取逻辑列数据,包括从数据集群和从数据集群的复制副本中读取。读取逻辑列可以包括从多个互补逻辑列读取。读取逻辑列可以包括利用模数计数器对角地读取逻辑列数据。可以从基于逻辑列号选择的列地址存储器的分区读取列数据。描述和要求保护其它实施例。(A media management technique for providing a column data layout for cluster data includes a device having a column addressable memory and circuitry coupled to the memory. The circuitry is configured to: storing the data clusters of the logical matrix in a column-addressable memory in a column-based format; and reading the logical columns of the data cluster from the column addressable memory in a column read operation. Reading the logical column may include: the logical column data is read diagonally from the column addressable memory, including from the data cluster and from the replicated copy of the data cluster. Reading the logical column may include reading from a plurality of complementary logical columns. Reading the logical column may include reading the logical column data diagonally using a modulo counter. Column data may be read from a partition of the column address memory selected based on the logical column number. Other embodiments are described and claimed.)

1. An apparatus, comprising:

a column addressable memory; and

circuitry coupled to the memory, wherein the circuitry is to:

storing data clusters of a logical matrix in the column addressable memory in a column based format; and

reading logical columns of the data cluster from the column addressable memory in a column read operation, wherein the column read operation is based on a block address and a column number.

2. The device of claim 1, wherein reading the logical column comprises:

determining a base address from the block address and the column number; and

reading logical column data diagonally from the column addressable memory starting at the base address, wherein reading the logical column data comprises reading from the data cluster and from a duplicate copy of the data cluster.

3. The apparatus of claim 1 or 2, wherein the circuitry is further to discard data from row addresses not included in the data cluster or a duplicate copy of the data cluster.

4. The device of any one of claims 1, 2, or 3, wherein storing the data cluster comprises:

storing the data cluster at a first row address of the column addressable memory; and

storing a replicated copy of the data cluster at a second row address in the column addressable memory, wherein the first row address and the second row address are separated by a predetermined row offset.

5. The device of any of claims 1, 2, 3, 4, wherein the circuitry is further to rotate each row of the data clusters over a partition of the column addressable memory, wherein storing the data clusters comprises storing the data clusters in response to rotation of each row of the data clusters.

6. The apparatus of any of claims 1, 2, 3, 4, 5, wherein the predetermined row offset comprises a column width of a partition of the column addressable memory.

7. The device of any of claims 1, 2, 3, 4, 5, 6, wherein reading the logical column comprises:

reading a plurality of complementary logical columns of the data cluster; and

assembling the logical columns in response to a read of the plurality of complementary logical columns.

8. The device of any of claims 1, 2, 3, 4, 5, 6, 7, wherein reading the plurality of complementary logical columns comprises: performing a column read operation on each of the complementary logical columns, wherein each column read operation has a different starting address.

9. The apparatus of any of claims 1, 2, 3, 4, 5, 6, 7, 8, wherein assembling the logical column comprises assembling a multi-bit logical column based on the plurality of complementary logical columns.

10. The apparatus of any of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, wherein storing the data cluster comprises:

rotating each logical row of the data cluster over a partition of the column addressable memory to generate a rotated row, wherein the partition comprises a plurality of dies of the column addressable memory, wherein each die comprises a predetermined number of columns, and wherein each die is programmed with a predetermined row offset; and

storing each rotated row at a row address in the partition, wherein each die of the partition adds an associated predetermined row offset to the row address.

11. The apparatus of any of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, wherein reading the logical column comprises:

selecting a first partition of the column addressable memory according to the rank number, wherein the column addressable memory comprises a plurality of partitions, wherein each partition comprises a plurality of dies of the column addressable memory, and wherein each die comprises a predetermined number of ranks;

determining a base address based on the column number and a modulus limit of the column addressable memory; and

reading logical column data diagonally from the column addressable memory starting from the base address in the first partition.

12. The apparatus of any of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, wherein the modulus limit comprises the predetermined number of columns.

13. The apparatus of any of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, wherein reading the logical column data diagonally from the column addressable memory starting from the base address comprises: for each die in the first partition,

reading a column at the base address plus a die offset associated with a corresponding die; and

an internal address counter subject to the modulo limit is incremented.

14. The device of any of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, wherein storing the data cluster comprises:

rotating each subset of logical columns within each logical row, wherein each subset of logical columns includes the predetermined number of columns;

determining die numbers of the plurality of dies based on the row address of each logical row; and

storing each subset of logical columns of each logical row in a die having the die number determined for the logical row within a partition of the plurality of partitions selected based on a logical column number of the subset of logical columns.

15. The apparatus of any of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, wherein reading the logical column comprises:

determining a base address from the column number and a modulo limit of the column addressable memory, wherein the column addressable memory comprises a plurality of dies, wherein each die comprises a predetermined number of columns, and wherein the modulo limit comprises the predetermined number of columns; and

using the modulus limit, reading logical column data diagonally from the column addressable memory starting from the base address.

16. The apparatus of any of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, wherein reading the logical column further comprises:

determining whether the column number is less than the modulus limit;

in response to determining that the column number is not less than the modulo limit, determining an additional base address in accordance with the column number and the modulo limit; and

using the modulus limit, reading logical column data diagonally from the column addressable memory starting from the additional base address.

17. The apparatus of any of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, wherein reading the logical column further comprises: assembling the logical column in response to reading the logical column data diagonally starting from the base address and in response to reading the logical column data diagonally starting from the additional base address.

18. The apparatus of any of claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, wherein storing the data cluster comprises:

rotating each subset of logical columns within each logical row, wherein each subset of logical columns includes the predetermined number of columns;

in response to rotation of each subset, rotating the subset of the logical columns of each logical row over a partition of the column addressable memory, wherein the partition includes the plurality of dies; and

storing each logical row in response to a rotation of the subset of the logical columns.

19. A system, comprising:

a processor;

a column addressable memory; and

circuitry coupled to the memory, wherein the circuitry is to:

storing data clusters of a logical matrix in the column addressable memory in a column based format; and

reading logical columns of the data cluster from the column addressable memory in a column read operation, wherein the column read operation is based on a block address and a column number.

20. The system of claim 19, wherein the circuit is in any one of a data storage device and a memory device.

21. The system of any of claims 19 and 20, wherein reading the logical column comprises:

reading a plurality of complementary logical columns of the data cluster; and

assembling the logical columns in response to a read of the plurality of complementary logical columns.

22. The system of any of claims 19, 20, 21, wherein reading the logical column comprises:

selecting a first partition of the column addressable memory according to the rank number, wherein the column addressable memory comprises a plurality of partitions, wherein each partition comprises a plurality of dies of the column addressable memory, and wherein each die comprises a predetermined number of ranks;

determining a base address based on the column number and a modulus limit of the column addressable memory; and

reading logical column data diagonally from the column addressable memory starting from the base address in the first partition.

23. The system of any of claims 19, 20, 21, 22, wherein reading the logical column comprises:

determining a base address from the column number and a modulo limit of the column addressable memory, wherein the column addressable memory comprises a plurality of dies, wherein each die comprises a predetermined number of columns, and wherein the modulo limit comprises the predetermined number of columns; and

using the modulus limit, reading logical column data diagonally from the column addressable memory starting from the base address.

24. The system of any of claims 19, 20, 21, 22, 23, wherein storing the data cluster comprises:

rotating each subset of logical columns within each logical row, wherein each subset of logical columns includes the predetermined number of columns;

in response to rotation of each subset, rotating the subset of the logical columns of each logical row over a partition of the column addressable memory, wherein the partition includes the plurality of dies; and

storing each logical row in response to a rotation of the subset of the logical columns.

25. At least one computer-readable storage medium comprising a plurality of instructions that when executed on a processor cause the processor to implement the apparatus of any of claims 1-18.

Background

Content-based similarity search (or simply similarity search) is a key technology to consolidate Machine Learning (ML) and artificial intelligence Applications (AI). In performing a similarity search, query data, such as data representing objects (e.g., images), is used to search a database to identify data representing similar objects (e.g., similar images). However, the enormous amount of data and richness used in large-scale similarity searches is a very challenging problem, both computationally and storage intensive. In some systems, the random association search is performed faster than would otherwise be possible using a hashing approach. However, hashing methods typically provide imperfect data transformation from one space (e.g., domain) to another space (e.g., domain), and may produce degraded (e.g., in terms of accuracy) search results as compared to searching using the original space of data to be searched.

Given the size of modern databases (on the order of billions of entries), the search speed of random associative memories may not be able to cope with current throughput demands (tens or hundreds of thousands of searches per second). To increase the effective search speed, the database may be partitioned into clusters. Each cluster has an associated delegate. The search query is compared to all cluster representatives and a subset of the database is explored. Exploring the database may include merging database entries in the selected cluster and retrieving the most similar elements in the collection.

Drawings

In the drawings, the concepts described herein are illustrated by way of example and not by way of limitation. For simplicity and clarity of illustration, elements illustrated in the figures have not necessarily been drawn to scale. Where considered appropriate, reference numerals have been repeated among the figures to indicate corresponding or analogous elements.

FIG. 1 is a simplified diagram of at least one embodiment of a computing device for providing a column data layout for cluster data using column read enable memory;

FIG. 2 is a simplified diagram of at least one embodiment of a memory medium included in the computing device of FIG. 1;

FIG. 3 is a simplified diagram of at least one embodiment of a memory medium of the computing device of FIG. 1 in a dual in-line memory module (DIMM);

FIG. 4 is a simplified diagram of a cluster data set that may be processed by the computing devices of FIGS. 1-3;

FIG. 5 is a simplified flow diagram of at least one embodiment of a method for clustered data access that may be performed by the computing device of FIG. 1;

FIG. 6 is a schematic diagram showing a memory layout of cluster data that may be accessed by the method of FIG. 5;

FIG. 7 is a simplified flow diagram of at least one embodiment of a method for clustered data access that may be performed by the computing device of FIG. 1;

FIGS. 8 and 9 are schematic diagrams illustrating a memory layout of cluster data that may be accessed using the method of FIG. 7;

FIG. 10 is a simplified flow diagram of at least one embodiment of a method for clustered data access that may be performed by the computing device of FIG. 1;

FIG. 11 is a schematic diagram showing a memory layout of cluster data that may be accessed by the method of FIG. 10;

FIG. 12 is a simplified flow diagram of at least one embodiment of a method for clustered data access that may be performed by the computing device of FIG. 1; and

FIG. 13 is a schematic diagram illustrating a memory layout of cluster data that may be accessed using the method of FIG. 12.

Detailed Description

While the concepts of the present disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the concepts of the present disclosure to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives consistent with the present disclosure and the appended claims.

References in the specification to "one embodiment," "an illustrative embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described. Additionally, it should be understood that items included in the list in the form of "at least one of A, B and C" may represent (A); (B) (ii) a (C) (ii) a (A and B); (A and C); (B and C); or (A, B and C). Similarly, an item listed in the form of "at least one of A, B or C" may represent (a); (B) (ii) a (C) (ii) a (A and B); (A and C); (B and C); or (A, B and C).

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored in a transitory or non-transitory machine-readable (e.g., computer-readable) storage medium, which may be read and executed by one or more processors. A machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., volatile or non-volatile memory, media disk, or other media device).

In the drawings, some structural or methodical features may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or order may not be required. Rather, in some embodiments, the features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, such feature may not be included or may be combined with other features.

Referring now to FIG. 1, a computing device 100 for providing a column data layout for cluster data using column read enable memory includes: processor 102, memory 104, input/output (I/O) subsystem 112, data storage 114, communication circuitry 122, and one or more accelerator devices 126. Of course, in other embodiments, computing device 100 may include other or additional components, such as those commonly found in computers (e.g., displays, peripherals, etc.). Additionally, in some embodiments, one or more illustrative components may be incorporated into, or otherwise form a part of, another component. Unless otherwise indicated, the term "memory" as used herein in reference to performing media management may refer to memory 104 and/or data storage device 114. As explained in more detail herein, the media access circuits 108, 118 (e.g., any circuit or device configured to access and operate on data in the corresponding memory media 110, 120) in connection with the corresponding memory media 110, 120 (e.g., any device or material to which data is written and read) may provide (e.g., read and/or write) various column data layouts of the cluster data. As described further below, the column data layout may include replicated data clusters as described in connection with fig. 5-6, complementary logical columns as described in connection with fig. 7-9, per-die rotation (per-die rotation) in multiple partitions as described in connection with fig. 10-11, and/or per-die rotation with reserved row read performance as described in connection with fig. 12-13. The column data layout disclosed herein may improve read performance by reducing the total number of column read operations required, for example, by avoiding multiple reads on the cluster edge, avoiding modulo penalties (modulo penalties), or otherwise reducing reads.

In the illustrative embodiment, the memory medium 110 has a three-dimensional cross-point architecture with different data access characteristics than other memory architectures (e.g., Dynamic Random Access Memory (DRAM)), such as the ability to achieve one bit per tile (tile) access and the resulting time delay between reading or writing to the same or other partitions. The media access circuitry 108 is configured to efficiently use (e.g., in terms of power usage and speed) the architecture of the memory medium 110, for example, by accessing multiple slices in parallel within a given partition, utilizing scratch pads (e.g., relatively small low latency memories) to temporarily hold and operate on data read from the memory medium 110, and broadcasting data from one partition to other portions of the memory 104 to enable matrix computations (e.g., tensor operations) to be performed in parallel within the memory 104. Additionally, in the illustrative embodiment, instead of sending a read or write request to the memory 104 to access the matrix data, the processor 102 may send a higher level request (e.g., a request for a macro operation, a top n-similarity search query request, or other random association search request) and provide a location of the input data to be used in the requested operation (e.g., input query). Further, instead of sending result data back to the processor 102, the memory 104 may simply send back an acknowledgement or other status indication (e.g., "done") to indicate that the requested operation has completed. In this way, many computing operations, such as artificial intelligence operations (e.g., random association searches), may be performed in memory (e.g., in memory 104 or in data storage device 114) in a manner that transfers data between components of computing device 100 (e.g., between memory 104 or data storage device 114 and processor 102) with minimal use of a bus (e.g., an I/O subsystem).

In some embodiments, the media access circuitry 108 is included in the same die as the memory media 110. In other embodiments, the media access circuitry 108 is on a separate die, but in the same package as the memory media 110. In other embodiments, the media access circuit 108 is located in a separate die and separate package, but on the same dual in-line memory module (DIMM) or board as the memory medium 110.

The processor 102 may be embodied as any device or circuitry (e.g., a multi-core processor, microcontroller, or other processor or processing/control circuitry) capable of performing the operations described herein (e.g., executing applications), such as artificial intelligence related applications that may utilize neural networks or other machine learning structures to learn and make inferences. In some embodiments, the processor 102 may be embodied as, include or be coupled to an FPGA, an Application Specific Integrated Circuit (ASIC), reconfigurable hardware or hardware circuits, or other dedicated hardware to facilitate the performance of the functions described herein.

Memory 104, which may include non-volatile memory (e.g., far memory in a two-level memory scheme), includes a memory medium 110 and media access circuitry 108 (e.g., a device or circuitry such as a processor, Application Specific Integrated Circuit (ASIC), or other integrated circuit composed of Complementary Metal Oxide Semiconductor (CMOS) or other material) located below (e.g., lower) and coupled to memory medium 110. The media access circuitry 108, also coupled to the memory controller 106, may be embodied as any device or circuitry (e.g., processor, co-processor, dedicated circuitry, etc.) configured to selectively read from and/or write to the memory medium 110 in response to a corresponding request (e.g., from the processor 102 that may be executing an artificial intelligence related application that relies on random associative searching to identify objects, make inferences, and/or perform artificial intelligence related operations). In some embodiments, memory controller 106 may include a Vector Function Unit (VFU)130, which may be embodied as any device or circuit (e.g., a dedicated circuit, a reconfigurable circuit, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), etc.) capable of offloading vector-based tasks from processor 102 (e.g., comparing data read from a particular column of a vector stored in memory medium 110, determining a Hamming distance between the vector stored in memory medium 110 and a search key, ordering vectors according to their Hamming distance, etc.).

Referring briefly to FIG. 2, in an illustrative embodiment, the memory medium 110 includes a tiled architecture, also referred to herein as a cross-point architecture (e.g., an architecture in which memory cells are located at intersections of word lines and bit lines and are individually addressable and in which bit storage is based on changes in bulk resistance), in which each memory cell (e.g., tile) 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240 is addressable by an x-parameter and a y-parameter (e.g., column and row). The memory medium 110 includes a plurality of partitions, each partition including a partitionThe slice architecture. Partitions can be stacked into layers 202, 204, 206 to form a three-dimensional cross-point architecture (e.g., Intel 3D XPoint)TMA memory). Unlike typical memory devices in which only fixed-size multi-bit data structures (e.g., bytes, words, etc.) are addressable, the media access circuitry 108 is configured to read individual bits or other units of data from the memory media 110 at the request of the memory controller 106, which the memory controller 106 may generate in response to receiving a corresponding request from the processor 102.

Referring back to FIG. 1, the memory 104 may include non-volatile memory and volatile memory. The non-volatile memory may be embodied as any type of data storage device capable of storing data in a persistent manner (even if power is removed from the non-volatile memory). For example, the non-volatile memory may be embodied as one or more non-volatile memory devices. Non-volatile memory devices may include one or more memory devices configured in a cross-point architecture that enables bit-level addressing (e.g., the ability to read and/or write individual bits of data from and/or to individual bits of data, rather than bytes or other larger units of data), and is illustratively embodied as a three-dimensional (3D) cross-point memory. In some embodiments, the non-volatile memory may additionally include other types of memory, including any combination of memory devices using chalcogenide phase change materials (e.g., chalcogenide glass), ferroelectric transistor random access memory (FeTRAM), nanowire-based non-volatile memory, Phase Change Memory (PCM), memory incorporating memristor technology, Magnetoresistive Random Access Memory (MRAM), or Spin Transfer Torque (STT) -MRAM. The volatile memory may be implemented as any type of data storage device capable of storing data while power is supplied to the volatile memory. For example, volatile memory may be implemented as one or more volatile memory devices, and is hereinafter occasionally referred to as volatile memory, with the understanding that volatile memory may be implemented in other embodiments as other types of non-persistent data storage devices. Volatile memory may have an architecture that supports bit-level addressability, similar to the architecture described above.

Processor 102 and memory 104 are communicatively coupled to other components of computing device 100 via an I/O subsystem 112, which I/O subsystem 112 may embody circuitry and/or components for facilitating input/output operations with processor 102 and/or main memory 104 and other components of computing device 100. For example, the I/O subsystem 112 may be embodied as or otherwise include: memory controller hubs, input/output control hubs, integrated sensor hubs, firmware devices, communication links (e.g., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc.) and/or other components and subsystems to facilitate input/output operations. In some embodiments, the I/O subsystem 112 may form part of a system on a chip (SoC) and be incorporated in a single chip with one or more of the processor 102, the main memory 104, and other components of the computing device 100.

The data storage device 114 may be embodied as any type of device configured for short or long term storage of data, such as memory devices and circuits, memory cards, hard drives, solid state drives, or other data storage devices. In the illustrative embodiment, data storage device 114 includes a memory controller 116 similar to memory controller 106, a memory medium 120 (also referred to as a "storage medium") similar to memory medium 110, and a media access circuit 118 similar to media access circuit 108. In addition, memory controller 116 may also include a Vector Functional Unit (VFU)132 similar to Vector Functional Unit (VFU) 130. Data storage device 114 may include a system partition that stores data and firmware code for data storage device 114, and one or more operating system partitions that store data files and executable files for an operating system.

The communication circuitry 122 may be embodied as any communication circuitry, device, or collection thereof capable of supporting communication over a network between the computing device 100 and another device. The communication circuit 122 may be configured to use any one or more communication technologies (e.g., wired or wireless communication) and associated protocols (e.g., ethernet, etc.),WiMAX, etc.) to enable such communication.

The illustrative communication circuitry 122 includes a Network Interface Controller (NIC)124, which may also be referred to as a Host Fabric Interface (HFI). NIC 124 may be embodied as one or more add-in boards, daughter cards, network interface cards, controller chips, chipsets, or other devices that may be used by computing device 100 to connect with another computing device. In some embodiments, NIC 124 may be embodied as part of a system on a chip (SoC) that includes one or more processors, or included in a multi-chip package that also includes one or more processors. In some embodiments, the NIC 124 may include a local processor (not shown) and/or a local memory (not shown) that are both local to the NIC 124. In such embodiments, the local processor of NIC 124 may be capable of performing one or more functions of processor 102. Additionally or alternatively, in such embodiments, the local memory of NIC 124 may be integrated into one or more components of computing device 100 at a board level, a socket level, a chip level, and/or other levels.

The one or more accelerator devices 126 may be embodied as any device or circuitry capable of performing a set of operations faster than the general purpose processor 102. For example, the accelerator device 126 may include a graphics processing unit 128, which may be embodied as any device or circuitry (e.g., a coprocessor, an ASIC, reconfigurable circuitry, etc.) capable of performing graphics operations (e.g., matrix operations) faster than the processor 102.

Referring now to fig. 3, in some embodiments, the computing device 100 may utilize a dual in-line memory module (DIMM) architecture 300. In architecture 300, the multiple dies of memory medium 110 are connected to a shared command address bus 310. Thus, in operation, data is read out in parallel on all memory media 110 connected to the shared command address bus 310. The placement of data on the memory medium 110 may be accomplished in a configuration that allows the same column to be read on all connected dies of the memory medium 110.

Referring now to fig. 4, a diagram 400 illustrates a cluster data set that may be accessed (e.g., read and/or written to) by computing device 100 in memory 104 and/or data storage device 114. The cluster data set is shown as a logical matrix 402 that includes data elements (e.g., bits) arranged in rows and columns. The rows of matrix 402 are illustratively grouped into a plurality of clusters 404, 406, 408, 410, 412, 414, 416, 418, 420. Each cluster comprises a contiguous group of rows, and each cluster may have a different length (i.e., number of rows). The rows of each cluster include data for each of the same columns. For example, columns 422, 424, 426 are shown extending across all clusters. An application may request a column read for column data contained in a particular cluster or group of clusters. For example, a column read may be performed to retrieve data for column 424 contained in cluster 414. It should be noted that diagram 400 illustrates a logical view of matrix 400, including a logical view of rows and columns. As described further below, when the rows and/or columns of matrix 402 are stored in memory media 110, 120, the rows and/or columns of matrix 402 may be arranged in different column-based formats.

Referring now to FIG. 5, in operation, computing device 100 may perform a method 500 for clustered data access. The method 500 is described with reference to the memory 104. However, it should be understood that the method 500 may additionally or alternatively be performed using the data storage device 114. The method 500 begins with block 502, where the computing device 100 (e.g., the media access circuitry 108) determines whether to perform a read operation or a write operation. If the computing device 100 determines to perform a read operation, the method 500 branches forward to block 514, as described below. If the computing device 100 determines to perform a write operation, the method 500 proceeds to block 504.

In block 504, the computing device 100 (e.g., the media access circuitry 108) receives a logical row write request. The logical row write request may include a row address and logical row data to store in the memory 104. The logical row of data may be a row of data included in the cluster. For example, the media access circuitry 108 may receive memory access requests from the memory controller 106 that originate from the host processor 102. In some embodiments, the memory access may be generated by the vector functional unit 130 of the memory controller 106, for example, in response to a macro command received from the host processor 102.

In block 506, the computing device 100 rotates the logical row of data over the partition of the memory medium 110. The amount of rotation may be determined by the row address. Illustratively, the computing device 100 rotates the logical row data one column (i.e., one slice) to the right for each incremented row address. It should be understood that rotation illustratively includes shifting a data bit to the right (e.g., toward a higher numbered column) and wrapping the data bit to the left (e.g., column number 0) if the data bit is shifted through the last column. The amount of rotation may be a modulus of the total width of the columns of the partition. For example, in an embodiment, each die of the memory medium may include 128 slices (columns), and each partition may include eight dies. In this embodiment, the partition includes a total of 1024 columns.

In block 508, the computing device 100 stores the rotated row at a row address in the memory medium 110. In block 510, the computing device 100 stores a copy of the rotated row in the memory medium 110 at the row address plus a predetermined row offset. Thus, the computing device 100 may store duplicate copies of data clusters separated by a predetermined number of rows in the memory medium 110. In some embodiments, in block 512, the predetermined row offset may be based on the width of the partition represented in columns. For example, in some embodiments, the predetermined row offset may be equal to 128, 1024, or equal to another number of slices included in the partition of the memory medium 110. After writing the data, the method 500 loops back to block 502 to perform other memory access operations.

Referring back to block 502, if the computing device 100 performs a read operation, the method 500 branches to block 514, where the computing device 100 determines whether to perform a column read operation or a row read operation. If the computing device 100 determines to perform a column read, the method 500 branches to block 522, described below. If the computing device 100 determines to perform a row read, the method 500 proceeds to block 516.

In block 516, the computing device 100 (e.g., the media access circuitry 108) receives a logical row read request. The logical row read request may include a row address identifying the logical row of data stored in the memory 104. The logical row of data may be a row of data included in the cluster. For example, the media access circuitry 108 may receive memory access requests from the memory controller 106 that originate from the host processor 102. In some embodiments, the memory access may be generated by the vector functional unit 130 of the memory controller 106, for example, in response to a macro command received from the host processor 102.

In block 518, the computing device 100 reads the rotated row of data at a row address in the memory medium 110. The rotated line data is stored in the memory medium 110 in a rotated format, as described above in connection with block 506. In block 520, the computing device 100 unwinds (un-rotate) the rotated data on the partition of the memory medium 110. The computing device 100 may perform a rotation that is opposite to the rotation performed when the line data is stored in the memory medium 110. Thus, by de-rotating the data, the computing device 100 recovers the original logical row data. The amount of rotation may be determined by the row address. Illustratively, to recover row data, the computing device 100 rotates logical row data one column (i.e., one slice) to the left for each incremental row address. It should be understood that rotation illustratively includes moving a data bit to the left (e.g., toward the lower numbered column) and if the data bit is moved beyond column 0, the data bit is wrapped around to the right (e.g., the largest column number). The amount of rotation may be a modulus of the total width of the columns in the partition. For example, in one embodiment, each die of the memory medium may include 128 slices (columns), and each partition may include eight dies. In this embodiment, the partition includes a total of 1024 columns. After reading the data, the method 500 loops back to block 502 to perform other memory access operations.

Referring back to block 514, if the computing device 100 determines to perform a column read operation, the method 500 branches to block 522 where the computing device (e.g., the media access circuitry 108) receives a logical column read request. The column read request identifies a block address and a logical column number. The block address may be a row address including a starting row and/or column (e.g., row 0) of the cluster. In block 524, the computing device 100 determines a base address based on the block address and the column number. The base address identifies a row in the memory medium 110 that includes the requested logical column number in physical column 0 in the memory medium 110. The base address may be determined using a combination of arithmetic operations (e.g., addition, subtraction, and/or modulo). For example, the base address may be the block address plus the column width of the memory partition minus the logical column number, modulo the column width.

In block 526, the computing device 100 reads the logical column data diagonally from the memory medium 110 starting at the base address. Reading column data diagonally includes: read a data element (e.g., bit) from column 0 at a base address; incrementing an internal counter for the row and column numbers; and continue reading columns on the memory medium 110. In some embodiments, computing device 100 may read column data in parallel on multiple dies of a partition. In these embodiments, a single command/address bus may be used to communicate the row address to each die, and each die may add a preconfigured offset to the row address. Reading from the base address reads the logical column data from the rotated rows stored as described above in connection with block 508. In some embodiments, in block 528, computing device 100 may read at least a portion of the logical column from the copy of the rotated row stored as described above in connection with block 510. In some embodiments, in block 530, the computing device 100 discards one or more data elements (e.g., bits) read from rows not included in the data cluster. For example, data read from a row located between a cluster and a replicated copy of the cluster may be discarded. After reading the logical column, the method 500 loops back to block 502 to perform other memory access operations.

Referring now to fig. 6, a diagram 600 illustrates one possible embodiment of an accessible column data format as described above in connection with the method of fig. 5. Diagram 600 illustrates a memory medium 602, such as memory medium 110 and/or memory medium 120 of fig. 1. As shown, the illustrative memory medium 602 is arranged in a number of rows 604 and slices 606 (i.e., columns 606). Each tile 606 is contained within a die 608. Illustratively, each row 604 includes two dies 608, each die 608 having four slices 606 for a total of eight slices 606; however, in other embodiments, each row may include 128 tiles, 1024 tiles, or another number of tiles. Each row 604 is addressable by a row address and each tile 606 is addressable by a column number.

The illustrative memory medium 602 has stored therein a data cluster 610 and a replicated data cluster 612. Each of the clusters 610, 612 is a logical matrix having six rows. As shown, for each subsequent row on the memory medium 602, the column 606 of each row 604 is rotated by one column position. For example, logical row 0, logical column 0 are stored in row address 0, slice number 0; the logic row 1 and the logic column 0 are stored in the row address 1 and the fragment number 1; and so on. Replicated data clusters 612 are stored at a predetermined row offset from data clusters 610. Illustratively, the predetermined row offset is 2, such that the starting address of the replicated data cluster 612 is 8, which is equal to the width of the memory medium 602 in the slice 606. Thus, in the illustrative embodiment, each data cluster includes no more rows than columns in the memory medium 602. Of course, in other embodiments, different predetermined row offsets (e.g., 128 rows or 1024 rows) may be used.

Each logical column of data cluster 610 may be read using a single column read operation. Column reads may read physical bits from both data cluster 610 and replicated data cluster 612. Diagram 600 shows the data contained in logical column 7 as highlighted cell 614. An illustrative column read for logical column 7 may begin at row address 1, slice number 0, and proceed diagonally to row address 2, slice number 1, row address 3, slice number 2, and so on, up to row address 8, slice address 7. Data read from row address 6, slice number 5, and row address 7, slice number 6 may be discarded because it is not contained in cluster 610. Similarly, diagram 600 also shows the data contained in logical column 4 as highlighted cell 616. An illustrative column read for logical column 4 may begin at row address 4, slice number 0, and proceed diagonally to row address 5, slice number 1, row address 6, slice number 2, and so on, through row address 11, slice address 7. Data read from row address 6, slice number 2, and row address 7, slice number 3 may be discarded because it is not contained in cluster 610. Other logical columns of data cluster 610 may be similarly read using a single column read operation, at the expense of storing replicated data cluster 612.

Referring now to FIG. 7, in operation, computing device 100 may perform a method 700 for clustered data access. The method 700 is described with reference to the memory 104. However, it should be understood that the method 700 may additionally or alternatively be performed using the data storage device 114. The method 700 begins at block 702, where the computing device 100 (e.g., the media access circuitry 108) determines whether to perform a read operation or a write operation. If the computing device 100 determines to perform a read operation, the method 700 branches to block 712, which is described below. If the computing device 100 determines to perform a write operation, the method 700 proceeds to block 704.

In block 704, the computing device 100 (e.g., the media access circuitry 108) receives a logical row write request. The logical row write request may include a row address and logical row data to store in the memory 104. The logical row of data may be a row of data included in the cluster. For example, the media access circuitry 108 may receive memory access requests from the memory controller 106 that originate from the host processor 102. In some embodiments, the memory access may be generated by the vector functional unit 130 of the memory controller 106, for example, in response to a macro command received from the host processor 102.

In block 706, the computing device 100 rotates the logical row of data over the partition of the memory medium 110. The amount of rotation may be determined by the row address. Illustratively, the computing device 100 rotates the logical row data one column (i.e., one slice) to the right for each incremental row address. The amount of rotation may be a modulus of the total column width of the partition. For example, in an embodiment, each die of the memory medium may include 128 slices (columns), and each partition may include eight dies. In this embodiment, the partition includes a total of 1024 columns.

In block 708, the computing device 100 adds a die offset to the row address for each column subgroup of the rotated row of data. Each column subgroup may include a number of columns equal to the number of slices included in the die of the memory medium 110. For example, in an illustrative embodiment, the rotated row of data may include eight column subgroups, 128 columns each. Each die offset is an integer number of rows added to the row address of the write request. The die offset for each die may be configured such that column data may be stored diagonally on the die of the memory medium 110. For example, in an illustrative embodiment with 128 sliced die in width, die offset for die No. 0 may be 0, die offset for die No. 1 may be 128, and so on. The die offset for each die may be hard coded, configured or otherwise programmed at boot time. In block 710, the computing device 100 stores the rotated row at a row address in the per-die offset memory medium 110. In some embodiments, each die may independently apply an associated die offset. For example, each die may add an associated die offset to a row address received via a command/address bus. After writing the data, the method 700 loops back to block 702 to perform other memory access operations.

Referring back to block 702, if computing device 100 performs a read operation, method 700 branches to block 712, where computing device 100 determines whether to perform a column read operation or a row read operation. If computing device 100 determines to perform a column read, method 700 branches to block 720, described below. If the computing device 100 determines to perform a row read, the method 700 proceeds to block 714.

In block 714, the computing device 100 (e.g., the media access circuitry 108) receives a logical row read request. The logical row read request may include a row address identifying the logical row of data stored in the memory 104. The logical row of data may be a row of data included in the cluster. For example, the media access circuitry 108 may receive memory access requests from the memory controller 106 that originate from the host processor 102. In some embodiments, the memory access may be generated by the vector functional unit 130 of the memory controller 106, for example, in response to a macro command received from the host processor 102.

In block 716, the computing device 100 reads the rotated row of data at the row address in the per-die offset memory medium 110. For example, each die may add an associated die offset to a row address received via a command/address bus. The rotated line data is stored in the memory medium 110 in a rotated format, as described above in connection with block 706. In block 718, the computing device 100 unwinds the rotated data on the partition of the memory medium 110. The computing device 100 may perform a rotation that is opposite to the rotation performed when the line data is stored in the memory medium 110. Thus, by de-rotating the data, the computing device 100 recovers the original logical row data. The amount of rotation may be determined by the row address. Illustratively, to recover row data, the computing device 100 rotates logical row data one column (i.e., one slice) to the left for each incremental row address. It should be understood that rotation illustratively includes shifting a data bit to the left (e.g., toward the lower numbered column) and if the data bit is moved beyond column 0, the data bit is wrapped around to the right (e.g., the largest column number). The amount of rotation may be a modulus of the total width of the columns in the partition. For example, in an embodiment, each die of the memory medium may include 128 slices (columns), and each partition may include eight dies. In this embodiment, the partition includes a total of 1024 columns. After reading the data, the method 700 loops back to block 702 to perform other memory access operations.

Referring back to block 712, if the computing device 100 determines to perform a column read operation, the method 700 branches to block 720 where the computing device (e.g., the media access circuitry 108) receives a logical column read request. The column read request identifies a block address and a logical column number. The block address may be a row address including a starting row and/or column (e.g., row 0) of the cluster. In some embodiments, in block 722, computing device 100 may receive a plurality of complementary column numbers for a plurality of complementary logical columns. As described further below, data from multiple complementary columns is read during a column read operation issued for a start address of one of the complementary columns. All data for complementary columns may be assembled after issuing a read for the start address of each complementary column.

In block 724, the computing device 100 determines a base address based on the block address and the column number. The base address identifies a row in memory medium 110 that includes the requested logical column number in physical column 0 in memory medium 110. The base address may be determined using a combination of arithmetic operations (e.g., addition, subtraction, and/or modulo). For example, the base address may be the block address plus the column width of the memory partition minus the logical column number, modulo the column width.

In block 726, the computing device 100 starts from the base address and reads the logical column data diagonally from the memory medium 110 using the modulo limit counter. In block 728, each die of the memory medium 100 begins reading data at the base address plus the corresponding die offset. A single command/address bus may be used to communicate the row address to each die, and each die may add a preconfigured die offset to the row address. Computing device 100 may read column data in parallel on multiple dies of a partition. In block 730, each die reads column data (e.g., bits) from the current tile at the current row address, modulo a preconfigured modulo limit, increments an internal counter for the row address and column number, and continues to read the tiles on the die. The modulus limit may be, for example, 128, 256 or other numbers.

In block 732, the computing device 100 determines whether all complementary columns have been read. For example, computing device 100 may determine whether a column read operation has been issued for the starting address of each column in a set of complementary columns. If not, the method 700 loops back to block 720 to continue performing column reads of complementary columns. If each complementary column has been read, the method 700 proceeds to block 734.

In block 734, the computing device 100 assembles a logical column from the data read in the column read operation. In some embodiments, in block 736, computing device 100 can assemble a single logical column comprising a multi-bit column entry. For example, each data point of a logical column may be a 4-bit or 8-bit value, such as genomic variant data. As described above, each bit of the multi-bit column entry corresponds to a complementary column. Thus, the computing device 100 may read a set of complementary columns without discarding any data from multiple complementary reads. Thus, even in the presence of modulus constraints, computing device 100 may provide maximum throughput for column reads without re-reading costs. For application scenarios with data of multiple bit lengths, the computing device 100 may particularly improve performance. After reading and assembling the logical columns, the method 700 loops back to block 702 to perform other memory access operations.

Referring now to fig. 8-9, a diagram 800 illustrates one possible embodiment of a column data format accessed in conjunction with the method of fig. 7, as described above. Diagram 800 illustrates a memory medium 802, such as memory medium 110 and/or memory medium 120 of fig. 1. As shown, the illustrative memory medium 802 is arranged in a number of rows 804 and slices 806 (i.e., columns 806). Each tile 806 is contained within a die 808. Illustratively, each row 804 is formed by four dies 808 comprising four tiles 806, for a total of 16 tiles 806. However, in other embodiments, each row may include another number of dies 808 and/or tiles 806, such as eight dies, 128 tiles per die, for a total of 1024 tiles. Each row 804 is addressable by a row address and each tile 806 is addressable by a column number. Each die includes a preconfigured modulus limit 810, which is illustratively four. In some embodiments, the modulus limit may be a different number, such as 128 or 256.

The illustrative memory medium 802 has stored therein a data cluster that is a logical matrix having 16 logical rows and 16 logical columns. As shown, for each subsequent row on the memory medium 802, the columns 806 of each row 804 are rotated by one column position. For example, logical row 0, logical column 0 are stored at row address 0, slice number 0; the logic row 1 and the logic column 0 are stored in the row address 1 and the fragment number 1; and so on. In addition, the rows of logic are offset by the associated die offset within each die 808. Illustratively, die 0 is offset by 0, die 1 is offset by 4, die 2 is offset by 8, and die 3 is offset by 12. Thus, logical row 0, logical column 4 are stored at row address 4, slice 4 of die 1; logic row 0 and logic column 8 are stored in row 8 and slice 8 of die 2; and so on.

Each set of complementary columns of the data cluster may be read with a single set of column read operations. Each column read operation may read bits from multiple complementary columns. Diagram 800 shows a set of complementary columns including logical column 14 as highlight unit 812, logical column 10 as highlight unit 814, logical column 6 as highlight unit 816, and logical column 2 as highlight unit 818. An illustrative column read for column 14 begins with row address 2, slice 0 of die 0, which reads logical row 2, logical column 14. Due to the per-die offset, reads may be performed simultaneously at row address 6, slice 4 of die 1 (which reads logical row 2, logical column 2), row address 10, slice 8 in die 2 (which reads logical row 2, logical column 6), and row address 14, slice 12 in die 3 (which reads logical row 2, logical column 10). After reading the current bit, each die increments one or more internal counters subject to mode restrictions, and column reads continue at row address 3, slice 1 in die 0 (which reads logical row 3, logical column 14), row address 7, slice 5 in die 1 (which reads logical row 3, logical column 2), row address 11, slice 9 in die 2 (which reads logical row 3, logical column 6), and row address 15, slice 13 in die 3 (which reads logical row 3, logical column 10). The die increments the internal counter and the row counter wraps around due to modulo constraints 810. Thus, the column read continues at row address 0, slice 2 in die 0 (which reads logical row 0, logical column 2), row address 4, slice 6 in die 1 (which reads logical row 0, logical column 6), row address 8, slice 10 in die 2 (which reads logical row 0, logical column 10), and row address 12, slice 14 in die 3 (which reads logical row 0, logical column 14). The die increments the internal counter and the column read continues at row address 1, slice 3 in die 0 (which reads logical row 1, logical column 2), row address 5, slice 7 in die 1 (which reads logical row 1, logical column 6), row address 9, slice 11 in die 2 (which reads logical row 1, logical column 10), and row address 13, slice 15 in die 3 (which reads logical row 1, logical column 14).

After reading all of the slices 806 of the memory medium 802, the column read is complete, and the computing device 100 has read data from each of the complementary columns 812, 814, 816, 818 (i.e., in logical columns 2, 6, 10, and 14). Computing device 100 performs other similar column reads starting from row address 6, slice 0 in die 0 (starting address for logical column 10), row address 10, slice 0 in die 0 (starting address for logical column 6), and row address 14, slice 0 in die 0 (starting address for logical column 2). Thus, after performing 4 column read operations, computing device 100 has read all of the data contained in the complementary columns 812, 814, 816, 818, and may assemble logical columns 2, 6, 10, and 14 together. As described above, the computing device 100 may access those complementary columns as four separate logical columns or as a single multi-bit logical column.

Referring now to FIG. 10, in operation, computing device 100 may perform method 1000 for clustered data access. The method 1000 is described with reference to the memory 104. However, it should be understood that the method 1000 may additionally or alternatively be performed using the data storage device 114. The method 1000 begins at block 1002, where the computing device 100 (e.g., the media access circuitry 108) determines whether to perform a read operation or a write operation. If the computing device 100 determines to perform a read operation, the method 1000 branches forward to block 1012, as described below. If the computing device 100 determines to perform a write operation, the method 1000 proceeds to block 1004.

In block 1004, the computing device 100 (e.g., the media access circuitry 108) receives a logical line write request. The logical row write request may include a row address and logical row data to store in the memory 104. The logical row of data may be a row of data included in the cluster. For example, the media access circuitry 108 may receive memory access requests from the memory controller 106 that originate from the host processor 102. In some embodiments, the memory access may be generated by a vector functional unit 130 of the memory controller 106, e.g., for responding to a macro command received from the host processor 102

In block 1006, the computing device 100 rotates each column subgroup within the logical row of data. Each column subgroup may include a number of columns equal to the number of slices included in the die of the memory medium 110. For example, in an embodiment, each die of the memory medium may include 128 slices (columns), and each partition may include eight dies. In this embodiment, the rotated row of data may include eight column subgroups, 128 columns each. The amount of rotation may be determined by the row address. Illustratively, the computing device 100 rotates the logical row data one column to the right (i.e., one slice) for each incremental row address, modulo the width of the column of each die. For example, in the illustrative embodiment, for each subsequent row, the data in column 0 is rotated one column to the right, the data in column 127 is wrapped around to column 0, the data in column 128 is rotated one column to the right, the data in column 255 is wrapped around to column 128, and so on.

In block 1008, the computing device 100 determines a die number based on the row address. The die number increases after the number of rows equals the width of each die in the column. For example, in the illustrative embodiment with 128 columns in each die, rows 0-127 are included in die 0, rows 128-255 are included in die 1, and so on.

In block 1010, computing device 100 stores the rotated row data for each logical column subset at a row address in the die number determined as described above in a separate partition of memory medium 110. For example, an illustrative embodiment may include eight partitions, each partition having eight dies, each die having 128 slices (columns). In this embodiment, data from columns 0-127 may be stored in partition 0, data from columns 128-255 may be stored in partition 1, and so on. Continuing with the example, within partition 0, the rotation data for rows 0-127 of columns 0-127 is stored in die 0, the rotation data for rows 128-255 of columns 0-127 is stored in die 1, and so on. Similarly, in partition 1, the rotation data for columns 128-255 for rows 0-127 is stored in die 0, the rotation data for columns 128-255 for rows 128-255 is stored in die 1, and so on. Thus, storing data for an entire logical row may require activating and writing to all partitions of the memory medium 110. After writing the data, the method 1000 loops back to block 1002 to perform other memory access operations.

Returning to block 1002, if the computing device 100 performs a read operation, the method 1000 branches to block 1012 where the computing device 100 determines whether to perform a column read operation or a row read operation. If the computing device 100 determines to perform a column read, the method 1000 branches to block 1020, described below. If the computing device 100 determines to perform a row read, the method 1000 proceeds to block 1014.

In block 1014, the computing device 100 (e.g., the media access circuitry 108) receives a logical row read request. The logical row read request may include a row address identifying the logical row of data stored in the memory 104. The logical row of data may be a row of data included in the cluster. For example, the media access circuitry 108 may receive memory access requests from the memory controller 106 that originate from the host processor 102. In some embodiments, the memory access may be generated by the vector functional unit 130 of the memory controller 106, for example, in response to a macro command received from the host processor 102.

In block 1016, computing device 100 reads the rotated row data for each logical column subset at the row address in the die number determined in the separate partition of memory medium 110 as described above in connection with block 1008. For example, as described above, an illustrative embodiment may include eight partitions, each partition having eight dies, each die having 128 slices (columns). In this embodiment, data from columns 0-127 may be read from partition 0, data from columns 128-255 may be read from partition 1, and so on. Continuing with the example, within partition 0, the rotation data for rows 0-127 of columns 0-127 is read from die 0, the rotation data for rows 128-255 of columns 0-127 is read from die 1, and so on. Similarly, within partition 1, the rotation data for columns 128-255 for rows 0-127 is read from die 0, the rotation data for columns 128-255 for rows 128-255 is read from die 1, and so on. Thus, reading data for an entire logical row may require activating and reading from all partitions of the memory medium 110.

In block 1018, the computing device 100 unwinds the rotated data for each logical column subset. The computing device 100 may perform a rotation that is opposite to the rotation performed when the line data is stored in the memory medium 110. Thus, by de-rotating the data, the computing device 100 recovers the original logical row data. Illustratively, the computing device 100 rotates the logical row data to the left by one column (i.e., one slice) for each incremental row address, modulo the width of the column of each die. For example, in the illustrative embodiment, for each subsequent row, data in column 0 wraps around to column 127, data in column 127 rotates one column to the left, data in column 128 wraps around to column 255, data in column 255 rotates one column to the left, and so on. After reading the data, the method 1000 loops back to block 1002 to perform other memory access operations.

Referring back to block 1012, if the computing device 100 determines to perform a column read operation, the method 1000 branches to block 1020 where the computing device (e.g., the media access circuitry 108) receives a logical column read request. The column read request identifies a block address and a logical column number. The block address may be a row address including a starting row and/or column (e.g., row 0) of the cluster.

In block 1022, the computing device 100 selects a partition based on the logical column subset associated with the requested column number. As described above, each column subgroup may include a number of columns equal to the number of slices included in the die of the memory medium 110. For example, in the illustrative embodiment, each die of the memory medium includes 128 slices (columns). Partition 0 may be selected for columns 0-127, partition 1 may be selected for columns 128-255, and so on.

In block 1024, the computing device 100 determines a base address based on the column number and the block address within the logical column subgroup. The base address identifies a row in memory medium 110 that includes the requested logical column number in physical column 0 in memory medium 110. For example, in an illustrative embodiment, given a block address of 0, column 0 may have a base address of 0, column 1 may have a base address of 127, column 128 may have a base address of 0, column 129 may have a base address of 127, and so on.

In block 1026, computing device 100 reads the logical column data diagonally from memory medium 110 starting from the base address and using the modulo limit counter within the partition selected as described above. In block 1028, each die of the memory medium 100 begins reading data at the base address plus the corresponding die offset. A single command/address bus may be used to communicate the row address to each die, and each die may add a preconfigured die offset to the row address. Computing device 100 may read column data in parallel on multiple dies of the selected partition. In block 1030, each die reads column data (e.g., bits) from the current slice at the current row address, modulo a preconfigured modulo limit, increments an internal counter for the row address and column number, and continues reading slices on the die. In block 1032, the modulus limit is equal to the width of each die represented in tiles. Thus, in the illustrative embodiment, the modulus limit is 128. Thus, the computing device 100 may read any column of the data cluster in a single pass (in a single pass) without having to re-read due to the modulo counter. Thus, the computing device 100 may provide a maximum column read speed for all columns in a cluster having an encoding length that is less than the partition width (e.g., 1024 bits in the illustrative embodiment). Because each individual row is stored in multiple partitions, the row read speed for a single row may be reduced. Alternatively, row read throughput may be maintained at the expense of latency of multiple row reads performed in common (e.g., by performing multiple partitioned reads to restore multiple rows). After reading the logical column, the method 1000 loops back to block 1002 to perform other memory access operations.

Referring now to fig. 11, a diagram 1100 illustrates one possible embodiment of an accessible column data format as described above in connection with the method of fig. 10. Diagram 1100 illustrates a memory medium 1102, such as memory medium 110 and/or memory medium 120 of fig. 1. As shown, the illustrative memory medium 1102 is arranged in a plurality of rows 1104 and slices 1106 (i.e., columns 1106). Each slice 1106 is contained in a die 1108. Illustratively, each row 1104 is formed of four dies 1108 including four slices 1106, for a total of 16 slices 1106. However, in other embodiments, each row may include another number of dies 1108 and/or slices 1106, e.g., eight dies, 128 slices per die, for a total of 1024 slices. As shown, memory medium 1102 is three-dimensional, and therefore four dies 1108 are contained in partition 1110. Memory medium 1102 includes four partitions 1110, and each partition 1110 includes four dies 1108. In other embodiments, memory medium 1102 may include a different number of partitions 1110, such as eight partitions. Each row 1104 is addressable by a row address and each slice 1106 is addressable by a column number. Each die includes a preconfigured modulus limit, which is illustratively four. As described above, the modulus limit is equal to the width of each die 1108 represented in slices 1106.

The illustrative memory medium 1102 has stored therein a data cluster that is a logical matrix having 16 logical rows and 16 logical columns. As shown, the columns 1106 of each row 1104 are rotated by one column position for each subsequent row in the column subset on each die 1108. For example, logical row 0, logical column 3 are stored at row address 0, slice number 3; and logic row 1, logic column 3 are rotated and stored at row address 1, slice number 0; and so on. Additionally, each set of four rows (equal to the modulus limit) is stored in a different die 1108. As shown, rows 0-3 are stored in die 0, rows 4-7 are stored in die 1, rows 8-11 are stored in die 3, and rows 12-15 are stored in die 4. In addition, each partition 1110 stores four logical columns of data (equal to the width of each die). As shown, logical columns 0-3 are stored in partition 0. In addition, logical columns 4-7 are stored in partition 1, logical columns 8-11 are stored in partition 2, and logical columns 12-15 are stored in partition 3.

Each column of the data cluster may be read by a single column read operation. As an illustrative example, diagram 1100 shows the data contained in logical column 2 as highlighting unit 1112. An illustrative column read for logical column 2 may begin in partition 0 at row address 2, slice 0 in die 0. Using a shared command/address bus and per-die offset, reads may be performed simultaneously at row address 6, slice 4 in die 1, row address 10, slice 8 in die 2, and row address 14, slice 12 in die 3. After reading the current bit, each die increments one or more internal counters subject to modulo restriction, and slices 1 in die 0 at row address 3; row address 7, slice 5 in die 1; row address 11, slice 9 in die 2; and row address 15, column read continues at slice 13 in die 3. The die increments the internal counter and the row counter wraps around since the modulo limit is 4. Thus, slice 2 in row address 0, die 0; row address 4, slice 6 in die 1; row address 8, slice 10 in die 2; and column reads continue at slice 14 in die 3, row address 12. The die increments the internal counter and slices 3 in die 0, row address 1; row address 5, slice 7 in die 1; row address 9, slice 11 in die 2; and row address 13, column read continues at slice 15 in die 3.

After reading the data on all of the slices in partition 1110, the column read is complete and the computing device has read all of the data from logical column 1112. By starting a column read at a different row address, computing device 100 may read any of logical columns 0-3 from partition 0 in a single column read operation. Similarly, computing device 100 may read logical columns 4-7 from partition 1, logical columns 8-11 from partition 2, and any of logical columns 12-15 from partition 3 in a single column read operation. Thus, the computing device 100 can read any logical column from the data cluster in a single column read operation without having to re-read due to cluster edge or modulus limitations. Thus, the computing device 100 may maximize throughput of column reads.

Referring now to FIG. 12, in operation, computing device 100 may perform a method 1200 for clustered data access. The method 1200 is described with reference to the memory 104. However, it should be understood that method 1200 may additionally or alternatively be performed using data storage device 114. The method 1200 begins at block 1202, where the computing device 100 (e.g., the media access circuitry 108) determines whether to perform a read operation or a write operation. If the computing device 100 determines to perform a read operation, the method 1200 branches forward to block 1212, as described below. If computing device 100 determines to perform a write operation, method 1200 proceeds to block 1204.

In block 1204, the computing device 100 (e.g., the media access circuitry 108) receives a logical row write request. The logical row write request may include a row address and logical row data to store in the memory 104. The logical row of data may be a row of data included in the cluster. For example, the media access circuitry 108 may receive memory access requests from the memory controller 106 that originate from the host processor 102. In some embodiments, the memory access may be generated by the vector functional unit 130 of the memory controller 106, for example, in response to a macro command received from the host processor 102.

In block 1206, the computing device 100 rotates each column subgroup within the logical row of data. Each column subgroup may include a number of columns equal to the number of slices included in the die of the memory medium 110. For example, in an embodiment, each die of the memory medium may include 128 slices (columns), and each partition may include eight dies. In this embodiment, the rotated row of data may include eight column subgroups, each column subgroup having 128 columns. The amount of rotation may be determined by the row address. Illustratively, the computing device 100 rotates the logical row data one column to the right (i.e., one slice) for each incremental row address, modulo the width of the column of each die. For example, in the illustrative embodiment, for each subsequent row, the data in column 0 is rotated one column to the right, the data in column 127 is wrapped around to column 0, the data in column 128 is rotated one column to the right, the data in column 255 is wrapped around to column 128, and so on.

In block 1208, computing device 100 rotates the column subset over the partition based on the row address. The computing device 100 may rotate the sub-blocks of data based on the width of each die in the column. For example, in the illustrative embodiment, a column subset of rows 0-127 of the block does not rotate. Continuing with the example, the column subset of rows 128-255 is rotated one die to the right (e.g., 128 columns). Thus, logical column subsets 0-127 are stored in partitions 128-255, logical column subsets 128-255 are stored in partitions 256-383, and so on.

In block 1210, the computing device 100 stores the rotated row of data at a row address. Each row is stored in a single partition of the memory medium 110 so that the entire rotated row can be stored in a single row write operation. After writing the data, the method 1200 loops back to block 1202 to perform additional memory access operations.

Referring back to block 1202, if computing device 100 performs a read operation, method 1200 branches to block 1212, where computing device 100 determines whether to perform a column read operation or a row read operation. If computing device 100 determines to perform a column read, method 1200 branches to block 1220, described below. If the computing device 100 determines to perform a line read, the method 1200 proceeds to block 1214.

In block 1214, the computing device 100 (e.g., the media access circuitry 108) receives a logical row read request. The logical row read request may include a row address identifying the logical row of data stored in the memory 104. The logical row of data may be a row of data included in the cluster. For example, the media access circuitry 108 may receive memory access requests from the memory controller 106 that originate from the host processor 102. In some embodiments, the memory access may be generated by the vector functional unit 130 of the memory controller 106, for example, in response to a macro command received from the host processor 102.

In block 1216, the computing device 100 reads the rotated row of data at the row address in the memory medium 110. The rotated line data is stored in the memory medium 110 in a rotated format, as described above in connection with blocks 1206, 1208. In block 1218, the computing device 100 unwinds the subset rotation data on the partition of the memory medium 110 and then unwinds each logical column subset. The computing device 100 may perform a rotation that is opposite to the rotation performed when the line data is stored in the memory medium 110. Thus, by de-rotating the data, the computing device 100 recovers the original logical row data. As described above, the rotation of the data sub-block may be based on the width of each die in the column. For example, in the illustrative embodiment, the column subset of rows 0-127 of the block is not rotated, the column subset of rows 128-255 is rotated one die (e.g., 128 columns) to the left, and so on. Continuing with the example, the amount of rotation within each logical column subgroup may be determined by the row address. Illustratively, the computing device 100 rotates the logical row data to the left by one column (i.e., one slice) for each incremental row address, modulo the width of the column of each die. For example, in the illustrative embodiment, for each subsequent row, data in column 0 wraps around to column 127, data in column 127 rotates one column to the left, data in column 128 wraps around to column 255, data in column 255 rotates one column to the left, and so on. After reading the data, the method 1200 loops back to block 1202 to perform additional memory access operations.

Referring back to block 1212, if the computing device 100 determines to perform a column read operation, the method 1200 branches to block 1220 where the computing device (e.g., the media access circuitry 108) receives a logical column read request. The column read request identifies a block address and a logical column number. The block address may be a row address (e.g., row 0) that includes a starting row and/or column of the cluster.

In block 1222, the computing device 100 determines a base address based on the block address, the column number, and the modulus limit. The base address identifies a row in the memory medium 110 that includes the requested logical column number in physical column 0 in the memory medium 110. The base address may be determined using arithmetic operations (e.g., addition, subtraction, and/or modulo).

In block 1224, computing device 100 starts at a base address and reads the logical column data diagonally from memory medium 110 using the modulo limit counter. In block 1226, each die of the memory medium 100 begins reading data at the base address plus the corresponding die offset. A single command/address bus may be used to communicate the row address to each die, and each die may add a preconfigured die offset to the row address. Computing device 100 may read column data in parallel on multiple dies of the selected partition. In block 1228, each die reads column data (e.g., bits) from the current slice at the current row address, modulo a preconfigured modulo limit, increments internal counters for the row and column addresses, and continues to read slices on the die. In block 1230, the modulus limit is equal to the width of each die in tiles. Thus, in the illustrative embodiment, the modulus is limited to 128.

In block 1232, the computing device 100 determines whether the logical column number is less than the modulo limit (e.g., 128). If so, the column read is complete and the method 1200 loops back to block 1202 to perform additional memory access operations. Thus, the computing device 100 may read certain columns of the data cluster (e.g., columns 0-127 in the illustrative embodiment) in a single pass without having to re-read due to cluster edges or modulo counters. Thus, the computing device 100 may provide a maximum column read speed for certain columns in the cluster while maintaining the row read/write speed as described above.

Referring back to block 1232, if the logical column number is not less than the modulo limit, the method 1200 proceeds to block 1234, where the computing device 100 determines an additional base address based on the block address and the logical column number. The additional base address identifies another row in the memory medium 110 that includes the requested logical column number in physical column 0 in the memory medium 110. For example, the additional base address may be included in a column read block immediately preceding the requested block address. In an illustrative embodiment, the additional base address may be 1024 less than the base address determined above in connection with block 1222. In block 1236, the computing device 100 starts from the base address and reads the additional logical column data diagonally from the memory medium 110 using the modulo limit counter. The logical column data read in block 1224 and the additional logical column data read in block 1236 may be assembled to form the requested logical column. Thus, the computing device 100 may read certain columns of the data cluster (e.g., column 128 1023 in the illustrative embodiment) in two read operations without additional re-reads due to the modulo counter. Thus, the computing device 100 may provide improved column read speeds for certain columns in the cluster while preserving row read/write speeds for the entire cluster. After reading the logical column, the method 1200 loops back to block 1202 to perform additional memory access operations.

Referring now to fig. 13, a diagram 1300 illustrates one possible embodiment of a column data format accessed as described above in connection with the method of fig. 12. Diagram 1300 illustrates a memory medium 1302, such as memory medium 110 and/or memory medium 120 of fig. 1. As shown, illustrative memory medium 1302 is arranged into a plurality of rows 1304 and slices 1306 (i.e., columns 1306). Each shard 1306 is contained within a die 1308. Illustratively, each row 1304 is formed of four dies 1308 including four tiles 1306, for a total of 16 tiles 1306. However, in other embodiments, each row may include another number of dies 1308 and/or slices 1306, for example eight dies, 128 slices per die, for a total of 1024 slices. The illustrative memory medium 1302 includes a single partition 1310; in other embodiments, memory medium 1302 may include a different number of partitions 1310, such as eight partitions. Each row 1304 is addressable by a row address and each tile 1306 is addressable by a column number. Each die includes a preconfigured modulus limit, illustratively a modulus limit of four. As described above, the modulus limit is equal to the width of each die 1308 represented in shards 1306.

The illustrative memory medium 1302 has stored therein a data cluster that is a logical matrix having 16 logical rows and 16 logical columns. As shown, the columns 1306 of each row 1304 are rotated by one column position for each subsequent row in a column subgroup on each die 1308. For example, logical row 0, logical column 3 are stored in row address 0, slice number 3; and logical row 1, logical column 3 are rotated and stored in row address 1, slice number 0; and so on. In addition, the column subset is rotated on partition 1310. For example, the sub-block comprising logical rows 4-7 and logical columns 0-3 is rotated to slice 4-7, the sub-block comprising logical rows 8-11 and logical columns 0-3 is rotated to slice 8-11, and so on.

Each column of the data cluster may be read by no more than two column read operations. As an illustrative example, diagram 1300 shows the data contained in logical column 2 as highlighting unit 1312. An illustrative column read for logical column 2 may begin with row address 2, slice 0 in die 0. Using a shared command/address bus and per-die offset, reads may be performed simultaneously at row address 6, slice 4 in die 1, row address 10, slice 8 in die 2, and row address 14, slice 12 in die 3. After reading the current bit, each die increments one or more internal counters subject to modulo restriction, and slices 1 in die 0 at row address 3; row address 7, slice 5 in die 7; row address 11, slice 9 in die 2; and row address 15, column read continues at slice 13 in die 3. The die increments the internal counter and the row counter wraps around since the modulo limit is 4. Thus, slice 2 in row address 0, die 0; row address 4, slice 6 in die 1; row address 8, slice 10 in die 2; and column reads continue at slice 14 in die 3, row address 12. The die increments the internal counter and slices 3 in die 0 at row address 1; row address 5, slice 7 in die 1; row address 9, slice 11 in die 2; and row address 13, column read continues at slice 15 in die 3. After reading the data on all slices in partition 1310, the column read is complete, and the computing device has read all the data from logical column 1312 in one pass. Computing device 100 may similarly read data from any of logical columns 0-3 (less than the modulus limit) in a single pass.

As another example, diagram 1300 shows the data included in logical column 8 as a highlight unit 1314. An illustrative column read for logical column 8 may begin at die 0, slice 0, at row address 8. Using a shared command/address bus and per-die offset, reads may be performed simultaneously on row address 12, slice 4 in die 1, row address 16, slice 8 in die 2, and row address 20, slice 12 in die 3. However, row addresses 16 and 20 extend beyond the end of the data cluster and may therefore be discarded by computing device 100. After reading the current bit, each die increments one or more internal counters subject to modulo restriction and slices 1 in die 0 at row address 9; and row address 13, continuing column read at slice 5 in die 1. The die increments the internal counter and slices 2 in die 0 at row address 10; and row address 14, column read continues at slice 6 in die 1. The die increments the internal counter an additional time and slices 3 in die 0 at row address 11; and row address 15, column read continues at slice 7 in die 1. Computing device 100 begins performing additional column reads at row address 8 locations (e.g., address-8) before the block shown in fig. 13 in die 0. Using the shared command/address bus and per-die offset, reads may be performed simultaneously at row address-4, slice 4 in die 1, row address 0, slice 8 in die 2, and row address 4, slice 12 in die 3. Similar to the column reads discussed above, row addresses-8 and-4 extend beyond the beginning of the data cluster and therefore may be discarded by computing device 100. After reading the current bit, each die increments one or more internal counters subject to modulo restriction and slices 9 in row address 1, die 2; and row address 5, column read continues at slice 13 in die 3. The die increments the internal counter and slices 10 in die 2 at row address 2; and row address 6, column read continues at slice 14 in die 3. The die increments the internal counter an additional time and slices 11 in die 2 at row address 3; and row address 7, column read continues in die 3 at slice 15. After performing the second column read operation, the computing device has read all data from logical column 1314 in a second read operation. Computing device 100 may similarly read data from any of logical columns 4-15 (greater than or equal to the modulus limit) in two passes. Thus, the computing device 100 may provide improved column read performance by avoiding re-reads based on modulo counters while maintaining row read/write performance.

Examples of the invention

Illustrative examples of the techniques disclosed herein are provided below. Embodiments of the technology may include any one or more of the examples described below, and any combination thereof.

Example 1 includes an apparatus comprising: a column addressable memory; and circuitry coupled to the memory, wherein the circuitry is to: storing the data clusters of the logical matrix in a column-addressable memory in a column-based format; and reading a logical column of the data cluster from the column addressable memory in a column read operation, wherein the column read operation is based on the block address and the column number.

Example 2 includes the subject matter of example 1, and wherein reading the logical column comprises: determining a base address from the block address and the column number; and reading the logical column data diagonally from the column addressable memory starting from the base address, wherein reading the logical column data comprises reading from the data cluster and from a duplicate copy of the data cluster.

Example 3 includes the subject matter of any one of examples 1 and 2, and wherein the circuitry is further to discard data from row addresses not included in the data cluster or the replicated copy of the data cluster.

Example 4 includes the subject matter of any one of examples 1-3, and wherein storing the data cluster comprises: storing the data cluster at a first row address of a column addressable memory; and storing the replicated copy of the data cluster at a second row address in the column addressable memory, wherein the first row address and the second row address are separated by a predetermined row offset.

Example 5 includes the subject matter of any one of examples 1-4, and wherein the circuitry is further to rotate each row of the data clusters over the partition of the column addressable memory, wherein storing the data clusters comprises storing the data clusters in response to the rotation of each row of the data clusters.

Example 6 includes the subject matter of any one of examples 1-5, and wherein the predetermined row offset comprises a column width of a partition of a column addressable memory.

Example 7 includes the subject matter of any one of examples 1-6, and wherein reading the logical column comprises: reading a plurality of complementary logical columns of a data cluster; and assembling the logical column in response to reading the plurality of complementary logical columns.

Example 8 includes the subject matter of any one of examples 1-7, and wherein reading the plurality of complementary logical columns comprises: a column read operation is performed on each complementary logical column, where each column read operation has a different starting address.

Example 9 includes the subject matter of any one of examples 1-8, and wherein assembling the logic column comprises assembling a multi-bit logic column based on a plurality of complementary logic columns.

Example 10 includes the subject matter of any one of examples 1-9, and wherein storing the data cluster comprises: rotating each logical row of the data cluster over a partition of the column addressable memory to generate a rotated row, wherein the partition comprises a plurality of dies of the column addressable memory, wherein each die comprises a predetermined number of columns, and wherein each die is programmed with a predetermined row offset; and storing each rotated row at a row address in the partition, wherein each die of the partition adds an associated predetermined row offset to the row address.

Example 11 includes the subject matter of any one of examples 1-10, and wherein reading the logical column comprises: selecting a first partition of the column addressable memory according to the column number, wherein the column addressable memory comprises a plurality of partitions, wherein each partition comprises a plurality of dies of the column addressable memory, and wherein each die comprises a predetermined number of columns; determining a base address based on the column number and a modulo limit of the column addressable memory; and reading the logical column data diagonally from the column addressable memory starting from the base address of the first partition.

Example 12 includes the subject matter of any of examples 1-11, and wherein the modulus limit includes a predetermined number of columns.

Example 13 includes the subject matter of any one of examples 1-12, and wherein reading the logical column data diagonally from the column addressable memory starting at the base address comprises: for each die of the first partition, reading the column at the base address plus the die offset associated with the corresponding die; and incrementing an internal address counter subject to a modulo limit.

Example 14 includes the subject matter of any one of examples 1-13, and wherein storing the data cluster comprises: rotating each subset of logical columns within each logical row, wherein each subset of logical columns includes a predetermined number of columns; determining die numbers of a plurality of dies based on the row address of each logical row; and storing each subset of the logical columns of each logical row in a die having a die number determined for the logical row within a partition of the plurality of partitions selected based on the logical column number of the subset of logical columns.

Example 15 includes the subject matter of any one of examples 1-14, and wherein reading the logical column comprises: determining a base address based on the column number and a modulo limit of a column addressable memory, wherein the column addressable memory comprises a plurality of dies, wherein each die comprises a predetermined number of columns, and wherein the modulo limit comprises the predetermined number of columns; and using modulo restriction, reading the logical column data diagonally from the column addressable memory starting from the base address.

Example 16 includes the subject matter of any one of examples 1-15, and wherein reading the logical column further comprises: determining whether the column number is less than a modulus limit; responsive to determining that the column number is not less than the modulo limit, determining an additional base address based on the column number and the modulo limit; and reading the logical column data diagonally from the column addressable memory starting from the additional base address using modulo restriction.

Example 17 includes the subject matter of any one of examples 1-16, and wherein reading the logical column further comprises: the logical columns are assembled in response to reading the logical column data diagonally from the base address and in response to reading the logical column data diagonally from the additional base address.

Example 18 includes the subject matter of any one of examples 1-17, and wherein storing the data cluster comprises: rotating each subset of logical columns within each logical row, wherein each subset of logical columns includes a predetermined number of columns; in response to the rotation of each subset, rotating the subset of logical columns of each logical row over a partition of the column addressable memory, wherein the partition includes a plurality of dies; and storing each logical row in response to rotation of the subset of logical columns.

Example 19 includes a method comprising: storing, by a computing device, a data cluster of a logical matrix in a column-addressable memory in a column-based format; and reading, by the computing device, the logical columns of the data cluster from the column addressable memory in a column read operation, wherein the column read operation is based on the block address and the column number.

Example 20 includes the subject matter of example 19, and wherein reading the logical column comprises: determining a base address from the block address and the column number; and reading the logical column data diagonally from the column addressable memory starting from the base address, wherein reading the logical column data comprises reading from the data cluster and from a duplicate copy of the data cluster.

Example 21 includes the subject matter of any one of examples 19 and 20, further comprising: data from row addresses not included in the data cluster or the replicated copy of the data cluster is discarded.

Example 22 includes the subject matter of any one of examples 19-21, and wherein storing the data cluster comprises: storing the data cluster at a first row address of a column addressable memory; and storing the replicated copy of the data cluster at a second row address in the column addressable memory, wherein the first row address and the second row address are separated by a predetermined row offset.

Example 23 includes the subject matter of any one of examples 19-22, further comprising: rotating each row of the data cluster over a partition of the column addressable memory, wherein storing the data cluster comprises storing the data cluster in response to rotating each row of the data cluster.

Example 24 includes the subject matter of any one of examples 19-23, and wherein the predetermined row offset comprises a column width of a partition of the column addressable memory.

Example 25 includes the subject matter of any one of examples 19-24, and wherein reading the logical column comprises: reading a plurality of complementary logical columns of a data cluster; and assembling the logical column in response to reading the plurality of complementary logical columns.

Example 26 includes the subject matter of any one of examples 19-25, and wherein reading the plurality of complementary logical columns comprises: a column read operation is performed on each complementary logical column, where each column read operation has a different starting address.

Example 27 includes the subject matter of any one of examples 19-26, and wherein assembling the logic column includes assembling a multi-bit logic column based on a plurality of complementary logic columns.

Example 28 includes the subject matter of any one of examples 19-27, and wherein storing the data cluster comprises: rotating each logical row of the data cluster over a partition of the column addressable memory to generate a rotated row, wherein the partition comprises a plurality of dies of the column addressable memory, wherein each die comprises a predetermined number of columns, and wherein each die is programmed with a predetermined row offset; and storing each rotated row at a row address in the partition, wherein each die of the partition adds an associated predetermined row offset to the row address.

Example 29 includes the subject matter of any one of examples 19-28, and wherein reading the logical column comprises: selecting a first partition of the column addressable memory according to the column number, wherein the column addressable memory comprises a plurality of partitions, wherein each partition comprises a plurality of dies of the column addressable memory, and wherein each die comprises a predetermined number of columns; determining a base address based on the column number and a modulo limit of the column addressable memory; and reading the logical column data diagonally from the column addressable memory starting from the base address of the first partition.

Example 30 includes the subject matter of any one of examples 19-29, and wherein the modulus limit comprises a predetermined number of columns.

Example 31 includes the subject matter of any one of examples 19-30, and wherein reading the logical column data diagonally from the column addressable memory starting at the base address comprises: for each die of the first partition, reading the column at the base address plus the die offset associated with the corresponding die; and incrementing an internal address counter subject to a modulo limit.

Example 32 includes the subject matter of any one of examples 19-31, and wherein storing the data cluster comprises: rotating each subset of logical columns within each logical row, wherein each subset of logical columns includes a predetermined number of columns; determining die numbers of a plurality of dies based on the row address of each logical row; and storing each subset of the logical columns of each logical row in a die having a die number determined for the logical row within a partition of the plurality of partitions selected based on the logical column number of the subset of logical columns.

Example 33 includes the subject matter of any one of examples 19-32, and wherein reading the logical column comprises: determining a base address based on the column number and a modulo limit of a column addressable memory, wherein the column addressable memory comprises a plurality of dies, wherein each die comprises a predetermined number of columns, and wherein the modulo limit comprises the predetermined number of columns; and using modulo restriction, reading the logical column data diagonally from the column addressable memory starting from the base address.

Example 34 includes the subject matter of any one of examples 19-33, and wherein reading the logical column further comprises: determining whether the column number is less than a modulus limit; responsive to determining that the column number is not less than the modulo limit, determining an additional base address based on the column number and the modulo limit; and reading the logical column data diagonally from the column addressable memory starting from the additional base address using modulo restriction.

Example 35 includes the subject matter of any one of examples 19-34, and wherein reading the logical column further comprises: the logical columns are assembled in response to reading the logical column data diagonally from the base address and in response to reading the logical column data diagonally from the additional base address.

Example 36 includes the subject matter of any one of examples 19-35, and wherein storing the data cluster comprises: rotating each subset of logical columns within each logical row, wherein each subset of logical columns includes a predetermined number of columns; in response to rotating each subset, rotating a subset of the logical columns of each logical row over a partition of the column addressable memory, wherein the partition includes a plurality of dies; and storing each logical row in response to rotating the subset of logical columns.

Example 37 includes a system, comprising: a processor; a column addressable memory; and circuitry coupled to the memory, wherein the circuitry is to: storing the data clusters of the logical matrix in a column-addressable memory in a column-based format; and reading a logical column of the data cluster from the column addressable memory in a column read operation, wherein the column read operation is based on the block address and the column number.

Example 38 includes the subject matter of example 37, and wherein the circuitry is in a data storage device.

Example 39 includes the subject matter of any one of examples 37 and 38, and wherein the circuitry is in a memory device.

Example 40 includes the subject matter of any one of examples 37-39, and wherein reading the logical column comprises: reading a plurality of complementary logical columns of a data cluster; and assembling the logical column in response to reading the plurality of complementary logical columns.

Example 41 includes the subject matter of any one of examples 37-40, and wherein reading the logical column comprises: selecting a first partition of the column addressable memory according to the column number, wherein the column addressable memory comprises a plurality of partitions, wherein each partition comprises a plurality of dies of the column addressable memory, and wherein each die comprises a predetermined number of columns; determining a base address based on the column number and a modulo limit of the column addressable memory; and reading the logical column data diagonally from the column addressable memory starting from the base address of the first partition.

Example 42 includes the subject matter of any one of examples 37-41, and wherein reading the logical column comprises: determining a base address based on the column number and a modulo limit of a column addressable memory, wherein the column addressable memory comprises a plurality of dies, wherein each die comprises a predetermined number of columns, and wherein the modulo limit comprises the predetermined number of columns; and using modulo restriction, reading the logical column data diagonally from the column addressable memory starting from the base address.

Example 43 includes the subject matter of any one of examples 37-42, and wherein storing the data cluster comprises: rotating each subset of logical columns within each logical row, wherein each subset of logical columns includes a predetermined number of columns; in response to rotating each subset, rotating a subset of the logical columns of each logical row over a partition of the column addressable memory, wherein the partition includes a plurality of dies; and storing each logical row in response to rotation of the subset of logical columns.

36页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:存储器系统及其存储器访问接口装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类