Method and apparatus for self-trimming memory devices

文档序号:474723 发布日期:2021-12-31 浏览:2次 中文

阅读说明:本技术 用于自微调存储器装置的方法及装置 (Method and apparatus for self-trimming memory devices ) 是由 A·特罗亚 A·蒙代洛 于 2019-05-31 设计创作,主要内容包括:本公开涉及集成存储器装置,其包含:-具有解码和感测电路系统的存储器单元阵列;-存储器控制器;-与所述感测电路系统相关联的读取和写入电路系统;-所述读取和写入电路系统中的逻辑电路部分,其包含在数据输入上接收数据流并在时钟输入上接收时钟信号的至少一个逻辑元件;-在所述数据输入或所述时钟输入上游的至少一个可编程或可微调的延迟元件或电路,用于通过在时间上对准所述时钟信号和/或所述数据流而自微调所述至少一个逻辑元件的内部定时。本公开进一步涉及用于设置集成电路的操作参数,特别是用于自微调所述集成电路的内部定时的方法。(The present disclosure relates to an integrated memory device, comprising: -an array of memory cells with decoding and sensing circuitry; -a memory controller; -read and write circuitry associated with the sensing circuitry; -a logic circuit portion in the read and write circuitry comprising at least one logic element receiving a data stream on a data input and a clock signal on a clock input; -at least one programmable or trimmable delay element or circuit upstream of said data input or said clock input for self-trimming the internal timing of said at least one logic element by temporally aligning said clock signal and/or said data stream. The disclosure further relates to a method for setting operating parameters of an integrated circuit, in particular for self-trimming the internal timing of said integrated circuit.)

1. A method for setting an operating parameter of an integrated circuit, in particular for self-trimming an internal timing of the integrated circuit, the integrated circuit comprising a circuit portion receiving a data stream on a data input and a clock signal on a clock input, the method comprising:

-aligning in time said clock signal and/or said data stream by inserting upstream programmable or trimmable delay elements or circuits before one or both of said inputs.

2. The method of claim 1, wherein the temporally aligning comprises establishing operating conditions when the integrated circuit is new or completely factory.

3. The method of claim 1, wherein the temporally aligning comprises inserting a programmable or trimmable delay element or circuit upstream of the clock input to modify a relative distance between the data stream and a valid or leading edge of the clock signal.

4. The method of claim 1, wherein the temporally aligning comprises inserting a programmable or trimmable delay element or circuit upstream of the data input to modify a relative distance between the data stream and a valid or leading edge of the clock signal.

5. The method of claim 1, wherein the self-trimming of internal timing of the integrated circuit is performed automatically.

6. The method of claim 1, wherein the temporally aligning comprises adjusting the programmable trimmable delay element or circuit to reset a timing difference between a sampling clock signal and a sampled data signal.

7. The method of claim 1, wherein the temporally aligning comprises adjusting a set time interval of the circuit portion.

8. The method of claim 1, wherein temporally phase aligning comprises adjusting a hold time interval of the circuit portion.

9. The method of claim 1, wherein the temporally aligning comprises adjusting the programmable or trimmable delay element or circuit by a configuration signal.

10. The method of claim 1, wherein the temporally aligning comprises adjusting the programmable or trimmable delay elements or circuits through a delay chain.

11. A method for setting operating parameters of an integrated circuit, in particular for self-trimming internal timing of the integrated circuit, the integrated circuit comprising at least one circuit portion receiving at least a data stream on a data input and a clock signal on a clock input, the method comprising:

-performing a tuning phase of setting time and/or holding time by changing the relative temporal distance between the data stream received by the data input and the active edge of the clock signal by inserting a programmable or trimmable delay element or circuit.

12. The method of claim 11, wherein the performing a tuning phase comprises adjusting the programmable or trimmable delay element or circuit upstream of the clock input.

13. The method of claim 11, wherein the performing a tuning phase comprises adjusting the programmable or trimmable delay element or circuit upstream of the data input.

14. The method of claim 11, wherein the tuning phase is performed automatically.

15. The method of claim 11, wherein the tuning phase comprises re-establishing operating conditions when the integrated circuit is new or completely factory.

16. The method of claim 11, comprising adjusting the programmable or trimmable delay element or circuit by a configuration signal.

17. The method of claim 11, wherein the programmable or trimmable delay element or circuit is implemented by a delay chain.

18. An integrated memory device, comprising:

-an array of memory cells with decoding and sensing circuitry;

-a memory controller;

-read and write circuitry associated with the sensing circuitry;

-a logic circuit portion in the read and write circuitry comprising at least one logic element receiving a data stream on a data input and a clock signal on a clock input;

-at least one programmable or trimmable delay element or circuit upstream of said data input or said clock input for self-trimming the internal timing of said at least one logic element by temporally aligning said clock signal and/or said data stream.

19. The memory device of claim 18, wherein the programmable or trimmable delay element or circuit is inserted upstream of a clock signal path relative to the clock input.

20. The memory device of claim 18, wherein the programmable or trimmable delay element or circuit is inserted upstream of a data flow path relative to the data input.

21. The memory device of claim 18, wherein the programmable or trimmable delay element or circuit is configured to automatically implement self-trimming of the internal timing.

22. The memory device of claim 18, wherein the programmable or trimmable delay element or circuit is configured to reset a timing difference between a sampling clock signal and a sampled data signal.

23. The memory device of claim 18, wherein the programmable or trimmable delay element or circuit is configured to adjust a set time interval of the at least one logic element.

24. The memory device of claim 18, wherein the programmable or trimmable delay element or circuit is configured to adjust a hold time interval of the at least one logic element.

25. The memory device of claim 18, wherein the programmable or trimmable delay element or circuit has an input configured to receive a configuration signal.

26. The memory device of claim 18, wherein the programmable or trimmable delay element or circuit comprises a delay chain.

27. An integrated memory device structured to communicate with a host device or a system-on-chip over a communication channel having respective pads; which comprises the following steps:

-an array of memory cells with decoding and sensing circuitry;

-a memory controller;

-an output buffer coupled to the array of memory cells and containing an optional final output stage coupled to the pad;

-at least one programmable or trimmable delay element or circuit in the output buffer upstream of the selectable final output stage for selecting the output impedance of the buffer.

28. The integrated memory device of claim 27 wherein the programmable or trimmable delay element or circuit is configured to adjust a path of data from a memory array to the pad.

Technical Field

The present disclosure relates to a method for setting operating parameters of an integrated circuit. More particularly, the present disclosure relates to a method for self-tuning operating parameters and internal timing of an integrated memory device.

The present disclosure further relates to a non-volatile memory device having self-trimming capability in wide temperature range applications and wide voltage range applications.

Background

One of the major problems in the operation of integrated circuits is to guarantee functionality in all process spread, supply and temperature variations.

For example, any synchronous input addressed to an integrated circuit has appropriate set and hold time specifications with respect to the clock input.

The set time S is the amount of time (or time interval) that data received at the synchronization input of the simple flip-flop circuit D must remain stable before reaching the active edge of the clock signal to allow the circuit to capture such data well. Similarly, the hold time H is the amount of time (or time interval) that the data received at the synchronization input of the simple flip-flop circuit D must remain stable after reaching the active edge of the clock signal.

The set and hold parameters S and H must be set appropriately for the integrated circuit to function properly.

However, setting and holding are opposite parameters, in the sense that they are better explained below; high temperatures generally slow the two intervals, while low temperatures make them fast; thus, both parameters will move accordingly over time.

Another parameter is given by the sum of two intervals S + H MPW that should remain stable before and after the clock input changes. This interval MPW has a minimum value that allows normal operation, but this value may be even longer, since the signal may remain stable waiting for sampling and subsequent changes.

In any case, violations of the set and hold timings may not only generate a single fault condition, but even some serious faults, such as meta-stability of the flip-flop outputs, that are undetermined and cannot be recovered unless a reset or power down and power up is forced.

Integrated circuits are designed and simulated in view of the above problems, but for some applications, for example in the presence of large variations in power supply, temperature and processing range, it is not possible to guarantee performance and functionality within the specified range.

In an attempt to overcome possible process spread, some testing and fine tuning was done at the factory; however, this common practice is time consuming, as it must be performed on each individual IC; in other words, per die.

Drawings

FIG. 1 is a schematic diagram of a simple bi-stable circuit receiving a data input and a clock input according to a known scheme;

FIG. 1A is a comparison diagram showing data set and hold time periods compared to a known clock signal;

FIG. 2 is a schematic perspective view of a system-on-chip device having an associated memory device according to the present disclosure;

figure 3 is a block diagram of an example of a logic circuit portion incorporated into an integrated circuit (i.e., a memory device) according to one embodiment of the present disclosure.

FIG. 4 is a schematic diagram of a memory block formed from a plurality of rows of a memory array according to one embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a portion of an integrated memory device implemented according to one embodiment of the present disclosure;

FIG. 6 is a diagram illustrating selected points of a fine tuning process performed by an optimization algorithm used in accordance with the present disclosure;

FIG. 7 schematically illustrates an example of an output buffer incorporated into a memory device of the present disclosure and including a number of tri-state drivers;

FIG. 8 is a schematic diagram of a transmission model over a wired bus involving an output buffer of a memory device and another device in communication with the memory;

FIGS. 9 and 10 are schematic diagrams reporting the voltage versus time of two opposing nodes A and B at opposite sides of the wired bus of FIG. 8; these graphs are the waveforms on the output buffer side and receiver side with no and noise and the corresponding and related eye diagrams;

fig. 11 and 12 show solid and schematic representations, respectively, of an eye diagram associated with an output buffer of a memory device of the present disclosure.

Detailed Description

In the following detailed description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific examples. In the drawings, like numerals describe substantially similar components throughout the several views. Other embodiments may be disclosed, and structural, logical, and electrical changes may be made without departing from the scope of the present disclosure. The following detailed description is, therefore, not to be taken in a limiting sense.

The present disclosure relates to a method for setting operating parameters of an integrated circuit, and more particularly, to a method for self-trimming operating parameters and internal timing of a memory device and for assisting a memory host in fine-setting a bus driver by optimizing matching of impedance and timing.

The memory device of the present disclosure is a non-volatile memory device or component, indicated by the numeral 200 in fig. 2 and the numeral 500 in fig. 5, respectively. This memory device 200 has been implemented with a particular photolithographic process as a stand-alone die and may be coupled to a host device or system-on-chip through a communication channel. The host device may be a system on chip with embedded memory components or a more complex electronic device including a system coupled to a memory device, as will appear from the description of other embodiments of the disclosure made with reference to other figures. In any case, the system-on-chip and the memory device are implemented on respective dies obtained by different lithographic processes.

Alternatively, the system may be an external controller that communicates with the system on chip, but for purposes of this disclosure reference will be made to a host device or SoC as the entity that communicates with the memory components. For example, the system 10 may be one of a plurality of electronic devices that are capable of using memory for temporarily or persistently storing information. For example, the host device may be a computing device, a mobile phone, a tablet computer, or a central processing unit of an autonomous vehicle.

Non-volatile memory may provide persistent data by retaining stored data when not powered, and may include NAND flash memory, NOR flash memory, Read Only Memory (ROM), electrically erasable programmable or trimmable ROM (eeprom), erasable programmable or trimmable ROM (eprom), and resistance variable memory, such as Phase Change Random Access Memory (PCRAM), self-selecting chalcogenide-based memory, Resistive Random Access Memory (RRAM), 3D XPoint memory (3DXP), and Magnetoresistive Random Access Memory (MRAM), among others.

Flash memory is a type of non-volatile memory that retains stored data and is characterized by very fast access times. Furthermore, it can be erased in blocks, rather than one byte at a time. Each erasable block of memory includes a plurality of non-volatile memory cells arranged in a matrix of rows and columns. Each cell is coupled to an access line and/or a data line. The cells are programmed and erased by manipulating the voltages on the access and data lines.

As shown in fig. 2, in accordance with the present disclosure, memory device 200 is removed from the prior art SoC structure, thus allowing corresponding semiconductor regions to be used for other logic circuits and providing support for structurally independent memory components 200 that partially overlap SoC structure 210. The memory component 200 has a size that is variable depending on the size of the memory array contained therein, which is manufactured according to the needs of the user, for example, in a range of values from 128Mbit to 512Mbit or more.

The removal of the embedded memory portion of the prior art also has the great advantage of free space, thereby obtaining the semiconductor region 220 of the SoC structure 210, allowing for integration of additional user functions and/or reducing the total chip area.

The result of this solution is the new SoC structure of fig. 2, which is strictly associated with a new architecturally independent memory component 200, which is coupled to SoC structure 210, for example, by a plurality of coupling elements 230 (e.g., pillars), as well as by ball-bumping, flip-chip technology, wireless interconnects (coils), and the like. In a preferred embodiment, the coupling elements are pillars arranged in a semiconductor region 220 previously dedicated to the embedded memory portion.

In one embodiment of the present disclosure, a memory component 200 for SoC fabric 210 includes at least a memory portion and a logic circuit portion for interacting with the memory portion and SoC fabric 210, where memory component 200 is a structurally independent semiconductor device coupled to and partially overlapping system-on-chip fabric 210.

Logic circuit 240 is integrated in SoC structure 210 to cooperate with the logic circuit portion of memory assembly 200.

The coupling between SoC structure 210 and memory component 200 is accomplished by interconnecting a plurality of respective pad or pin terminals that face each other in a circuit layout that maintains pad alignment even if the size of memory component 200 is modified.

In one embodiment of the present disclosure, the arrangement of the pads of the memory component 200 has been implemented on the surface of the memory component 200. More specifically, the pads are arranged on the array such that when memory component 200 is flipped, its pads face corresponding pads of SoC structure 210. The semiconductor area 220 occupied by the embedded non-volatile memory portion in the known system-on-chip device 210 is dedicated to accommodating interconnect pads corresponding to the pads of the memory component 200.

Even larger memory components may also be supported and interconnected with the pads of SoC structure 210, thereby preserving the location and misalignment of their interconnect pads.

In the context of the present disclosure, the top side of SoC structure 210 is connected to the opposite side of memory component 200, and the pads of SoC structure 210 are aligned with the mating pads of the flipped memory component. Alternatively, architecturally independent memory components 200 may be wirelessly coupled to SoC architecture 210. If wireless coupling is employed, stacks of identically sized memory components may overlap, thereby implementing a stacked architecture in which each individual component is addressed by the logic circuitry of SoC architecture 210 by a corresponding identification address.

The semiconductor area 220 previously occupied by the embedded memory portion is now used to implement additional functions and prepare the semiconductor device for logic-on-pad technology. The expression "logic on pad" means that logic circuitry is provided that overlaps with some of the connection pads located inside the first or base layer represented by the complete semiconductor product (i.e., SoC structure 210).

Memory component 200 thus represents an upper layer coupled and interconnected to an underlying SoC fabric 210. In known solutions occupied by embedded memory portions, the memory component 200 overlaps with a SoC structure surface portion covering at least the semiconductor area 220. However, a memory component 200 with a larger capacity may cover a larger semiconductor area than the semiconductor area 220.

In this regard, the size of the overlay memory element 200 is larger than the size of the overlay semiconductor region 220 dedicated to interconnect with this overlay memory element 200. In other words, the area of overlapping memory component 200 is larger than semiconductor region 220 of SoC structure 210 dedicated to interconnect pads of memory component 200.

Furthermore, to better operate SoC architecture 210, even the portion of the logic circuits typically incorporated into SoCs and containing finite state machines or RISC architectures may be removed and reorganized in association with memory component 200.

Thus, in accordance with the present disclosure, a modified finite state machine or RISC 240 has migrated into memory component 200 to support the write and erase phases executing on the larger memory component 200.

The separation and optimization of the logic circuit portions further allows for enhancing the functionality of the entire SoC structure 210, resulting in a stand-alone semiconductor memory component 200 coupled to the SoC structure 210. Thus, the stand-alone semiconductor memory component 200 includes at least a memory portion (e.g., a non-volatile memory portion) and an associated modified finite state machine 240, both of which are incorporated into a semiconductor product coupled to SoC fabric 210. In this case, the memory logic in SoC fabric 210 is the logic that handles memory interface communications.

In other words, both the non-volatile memory portion and the associated logic circuit portion are integrated in a separate semiconductor memory component 200 that is coupled and connected to SoC structure 210.

According to an embodiment of the present disclosure, the memory device 200 is of the non-volatile flash memory type, comprising at least the following components: I/O circuitry, a micro-sequencer including control and JTAG logic, and sense amplifiers.

The flash memory device 200 further includes a command user interface CUI, voltage and current reference generators, charge pumps, and decoding circuitry. The flash memory also contains an internal microcontroller for the execution of the erase and program algorithms.

It should be kept in mind that when two semiconductor integrated devices are coupled together, some problems may arise in handling I/O signals between the two devices. Furthermore, if the controller is capable of operating with a clock frequency signal up to a value of at least 1GHz (sometimes referred to as high frequency), other circuit portions, such as associated memory components, are designed to operate at a lower frequency, such as several hundred MHz (sometimes referred to as low frequency).

This type of typical situation occurs, for example, when a temperature increase occurs during operation of a system associated with the memory. Memory components are subject to thermal and/or other drift in power or voltage levels that limit their ability to operate according to original set values defined in the factory.

Another typical case is to adapt the memory device to a host device (e.g., a board in which the memory device is mounted or a SoC in which the memory device is hosted). This adaptation consists in impedance matching between the memory output buffer and the external bus and external signal skew. Such conditions may change during the lifetime of the device, as the external bus load may change (new device connection/disconnection, temperature changes, etc.).

To solve these problems, that is: some examples of methods according to the present disclosure will be considered to reset the set and hold durations of the internal FSM and adjust the impedance matching plus signal offset. The method must be considered as being implemented in an integrated circuit such as memory device 200, such as memory device or component 200 coupled to SoC 210.

This integrated memory device 200 includes a plurality of basic circuit portions, such as simple flip-flops, latches, or logic gates.

It is well known that simple flip-flops D (e.g. like the known circuit shown in fig. 1) may suffer from meta-stability. The data signal must be stable for a predetermined period of time before the rising edge of the clock signal to allow proper data capture. This predetermined period of time is a so-called set time S.

Furthermore, as previously mentioned, the hold time H is the amount of time (or time interval) that the data received at the synchronization input of the simple flip-flop circuit D must remain stable after the arrival of the valid edge of the clock signal.

Fig. 1 shows a schematic diagram of a flip-flop D10, where the main block represents a simple logic gate receiving a DATA input DATA and a clock input CLK. A Reset input Reset is further provided to Reset flip-flop D10.

In this figure, a logic synchronization circuit having at least an input and an output is shown. Data input/output is managed by a clock signal. This circuit can be a simple latch or even a complex finite state machine, but the figure is an example of the set/hold trimming that is required to introduce each synchronization circuit.

Of course, the DATA input DATA is represented in the symbol of FIG. 1 by a plurality of digital values presented in parallel. In other words, the input DATA is the front end of the bus from the digital source.

Similarly, the data output OUT is shown as having a corresponding plurality of digital output values.

It should be noted that the set time S is defined for a simple flip-flop D10 as well as more complex finite state machines containing hundreds of flip-flops or logic gates.

Possible readjusting of the operating parameters is highly desirable in order to avoid serious malfunctions due to possible metastability of the flip-flops or due to critical contention in the internal synchronization circuit. Most likely, the effective window of captured data will drift due to inherent variations in the process, i.e., the line resistance may change, the capacitance may change, and the time constant will change accordingly.

Furthermore, other operating parameters may need to be trimmed after a significant amount of active time of the integrated circuit, or due to possible reconfiguration/updating of the board with new components that change the impedance of the bus, where the memory is hosted. For example, the output impedance towards the system on chip may need to be similarly re-trimmed by selecting one of the available drivers within the output buffer, which ensures impedance matching and optimal performance in terms of speed and shape of the generated voltage/current signals. While this fine-tuning phase is appropriate for the output of the integrated circuit under examination (i.e., the memory device), fine-tuning of the set-up and hold intervals is focused on the internal activity of the integrated circuit.

According to some embodiments of the present disclosure, at least a programmable or trimmable delay element or circuit is employed that is inserted upstream of a clock input and/or a data input for tuning the relative distance between the data input and the active or leading edge of a clock signal.

In other words, at least a programmable or trimmable delay element or circuit inserted upstream of the clock input and/or data input is employed to tune the relative distance between the data input and the active or leading edge of the clock signal. This delay is added to the critical line, i.e., clock, data, command, FSM, etc., with the purpose of re-centering all signals affected by aging drift.

This trimming is done by feeding the FSM (or circuit to be trimmed) with a sequence read from the non-volatile area of the memory array, and then checking whether the output of the FSM (or circuit) matches an expected value.

The fine tuning explores all possible delay values in order to obtain the widest functional range around a specific condition (i.e. temperature).

Furthermore, the output buffers of the memory device 200 are also involved in the trim operations, and more specifically, those buffers are processed as follows:

-performing a proper selection of the output buffer driver in order to match the impedance between the memory buffer and the external bus as best as possible; and is

-performing a proper fine-tuning of the data path delay so as to have the correct offset between the external clock and the signal of the host device.

Both of the above processes are performed using special structures within the output buffer.

In this more specific case, the fine-tuning is performed by using the read phase of the standard datapath stored in a particular virtual row of the memory block. The external controller measures the quality of the reading operation by using the eye pattern and then provides feedback to the device about the quality of the reading. Once the optimal settings are found, the process is completed.

In other words, the clock and data lines are realigned and the operating conditions are re-established when the device is new or factory new. According to some embodiments, the setting of the programmable delay is changed to reset a timing difference between the sampling clock signal and the sampled data signal.

According to an embodiment of the present disclosure, a method for setting operating parameters of an integrated circuit, in particular for self-trimming an internal timing of the integrated circuit, the integrated circuit comprising at least one circuit portion receiving at least a known data stream on a data input and a clock signal on a clock input, is disclosed, the method comprising:

the clock signal and/or the data stream are aligned in time by inserting an upstream programmable or trimmable delay element or circuit before one or both of the inputs.

Furthermore, the phase of temporal alignment comprises establishing operating conditions when the integrated circuit is new or completely factory-new.

Temporal alignment is obtained by inserting a programmable or trimmable delay element or circuit upstream of the clock input to modify the relative distance between the data stream and the valid or leading edge of the clock signal.

The programmable or trimmable delay element or circuit is inserted upstream of a clock signal path relative to the clock input.

Alternatively, temporal alignment is obtained by inserting a programmable or trimmable delay element or circuit upstream of the data input to modify the relative distance between the data stream and the active or leading edge of the clock signal.

In the last case, a programmable or trimmable delay element or circuit is inserted upstream in the data flow path with respect to the data input. The above process may be performed at least for the first time in the factory when the device is first used, and this may be done by using a test machine. The found value is stored in a non-volatile register for future use.

In yet another embodiment, the phase of the clock signal and/or the data stream aligned in time is obtained by a programmable or trimmable delay upstream of the data input and the clock input.

Another alternative in the plant is to set all delays with instructions from the design team and leave the host with fine-tuning.

Instead, the setup phase is performed automatically, for example, upon reset of the integrated circuit.

Instead, the above-described process is performed according to a request of a host device, in other words, according to a request specified by a system-on-chip associated with the integrated circuit (i.e., the memory device).

As another alternative, the above-reported process is activated upon receipt of an alert signal generated by the host device, such as after a multifunction or read message failure of the memory device has been detected or if the host device detects a large change in the operating conditions of the components (e.g., temperature increment/decrement).

Other alternative warning messages may be defined as the starting point of the re-tuning phase for activating the operating parameters of the integrated circuit.

For example, in the case where fine tuning of the output impedance and/or the buffering delay is required, a specific event is provided that generates a request to re-fine tune the operating parameters. For example, due to load variations on the I/O pins, temperature variations, and other possible operating condition variations caused by board reconfiguration of the host, it is proposed to trim the buffer driver (e.g., a selected one of several available drivers) and the data path delay in order to have the correct offset between the external signal and the clock.

In one embodiment of the present disclosure, the setup or hold time may be tuned by inserting some trimmable delay elements or circuits to change the relative distance between the data received by the data input and the active edge of the clock, as shown in fig. 3. The figure shows an illustrative general purpose logic circuit portion 150 comprising at least one flip-flop 10, or a latch or even a composite FSM comprising at least one flip-flop, receiving on its inputs a data stream and a clock signal.

Usually a delay is inserted into the CLK path, but in some critical cases the delay is also used on the data path, especially if the input bus consists of signals from very different sources.

For example, in one example, a first delay is inserted into a first data stream and a second delay is inserted into a second data stream; the respective outputs of each delay block are summed for application to the data input of the component to be trimmed.

Each delay block is programmable or trimmable by a configuration command.

According to an embodiment of the present disclosure, such as disclosed with reference to fig. 3, the delay chain 190 is inserted upstream of the clock signal path with respect to the clock input CLK.

The Delay chain is programmable or trimmable by a configuration signal CLK Delay config.

Furthermore, at least one further delay chain 170 is inserted upstream in the DATA path with respect to the DATA input DATA.

More specifically, a first delay chain 170 is inserted between the first digital DATA source and the DATA input DATA, e.g., for a first set of 8 bits [0:7], while a second delay chain 180 is inserted between the second digital DATA source and the DATA input DATA, e.g., for a second set of 8 bits [7:15 ].

The respective outputs of the first and second DATA streams are summed in a single DATA input at the digital DATA terminal of flip-flop or latch 10.

Each Delay chain is programmable or trimmable by a respective configuration signal (Data Delay config #1 and Data Delay config # 2).

In one embodiment of the present disclosure, the memory array is implemented as a set of subarrays 120 (shown in FIGS. 4 and 5). In this way, if there are smaller sectors than known solutions, the access time is significantly reduced and the overall throughput of the memory component is improved.

Each sub-array 120 is independently addressable within the memory device 200. Each sub-array 120 contains a plurality of memory blocks 160, as shown in FIG. 4.

In this way, if there are smaller blocks or sectors 160 than in known solutions, the access time is significantly reduced and the overall throughput of the memory component is improved. The reduction in initial latency is at the block level because the row and column lines, latency associated with the read path, and external communications have been optimized.

In the embodiments disclosed herein, the memory array is made up of a plurality of sub-arrays 120 corresponding to the number of cores of the associated SoC and thus the number of corresponding communication channels. For example, at least four memory sub-arrays 120 are provided, one for each communication channel with a corresponding SoC core.

A host device or system-on-chip typically includes more than one core, and each core is coupled to a corresponding bus or channel for receiving and transferring data to the memory device 200.

Thus, in this embodiment, each sub-array 120 may access a corresponding channel to communicate with a corresponding core of the system-on-chip. The results of the memory block are driven directly to the SoC without the use of high power output buffers and optimization paths.

The advantage of this architecture is that it is highly scalable, where expanding and/or reducing the density of the final device only translates when mirroring the subarrays and generating connections or increasing the number of blocks per subarray (i.e., the available density per core).

Each sub-array 120 is independently addressable within the memory device 200. Further, each memory array is composed of at least four memory sub-arrays 120. This is a form factor for the device, but it may be different in other techniques and/or applications. As mentioned, it contributes to a low initial latency, matching the final word processed in the SoC in that particular application.

In one embodiment of the present disclosure, the output of subarray 120 is formed by combining the following sequences: data unit plus address unit plus ECC unit. In the embodiments disclosed herein, the total number of bits may contain 168 pads per channel.

Further, as schematically shown in fig. 4, each memory sub-array 120 is constituted in a memory block 160. The architecture of the memory block 160 including each location of the memory array may be defined as a superpage. In an embodiment of the present disclosure, each independently addressable location of a block 160 of each memory sub-array 120 addresses an extended page 130, which will be defined hereinafter by the term superpage.

In other words, the 128-bit atomic page used to fill the communication channel with SoC devices in each subarray 120 has been enlarged to contain the stored address and ECC.

As a non-limiting example, this extended page 130 includes a string that contains a first set of at least one hundred twenty-eight (128) bits plus a second set of at least twenty-four (24) address bits and a last or third set of at least sixteen (16) ECC bits for I/O data exchanges with the SoC device. Twenty-four (24) address bits are sufficient to address up to 2 gigabits of available memory space.

According to the present disclosure, the output of the sense amplifier SA prepares two extended pages at a time, i.e., a superpage 130 including a plurality of bits given by the double combination of the above three sets of data bits, address bits, and ECC bits, depending on the size of the memory array.

In the specific but non-limiting example disclosed herein, each extended page 130 contains at least 168 bits, which are obtained by the above three sets of 128+24+16 combinations of data, address and ECC bits, and each superpage is formed by a pair of extended pages, i.e., a set of 168 x 2 bits.

For purposes of giving a non-limiting numerical example only, each row 125 of memory block 160 includes sixteen extended pages. The resulting row thus contains 2688 bits derived from a combination of sixteen extended pages that are independently addressable, and each contains 168 bits, or in other words a combination of eight superpages.

The combined string of data unit + address unit + ECC unit allows implementing the whole secure coverage of the bus according to the standard requirements, since ECC covers the whole bus communication (data unit + address unit), while the presence of the address unit provides confidence that the data just came from the addressed location of the controller.

Thus, each row 125 includes at least 16 pages that include a memory word plus a corresponding address bit and a corresponding ECC bit. Clearly, another size may be selected and the values reported are for illustrative non-limiting example purposes only.

For the sake of completeness, it should be noted that in accordance with one embodiment of the present disclosure illustrated in FIG. 4, a dummy row or line 300 is associated with each block 160 of the memory sub-array 120.

This virtual row 300 is located outside the address space of the memory array and is used for optimization of trim parameters. The dummy line is inserted to monitor cell drift and simultaneously store the optimal settings for the different operative corners.

The primary purpose of this dummy row 300 is to track temperature, voltage, and process variations. In this way, parameters can be set to perform read and write operations within the array according to the optimal settings stored in the dummy row.

Dummy row 300 contains a known pattern for the memory controller of memory device 200.

Indeed, a comparison between the expected data and the contents of virtual row 300 may provide information about the changes to bring about fine-tuning parameters.

In this way it is possible to optimize the read operations that occur at different temperatures, voltages or process values.

To provide this possibility, different pattern values of the trim parameters are recorded in programmable or trimmable registers for each different temperature or voltage level. In other words, different read voltage values are recorded in this programmable or trimmable register for performing the read phase under different temperature or voltage conditions and by varying trim parameters.

In doing so, known reference patterns of different temperature or voltage values may be detected and compared. The known reference pattern is stored in an internal memory microcontroller. The controller knows the address of this known reference pattern and can perform fine trimming of the read internal trimming until the optimal corner is found. At this point, the controller will use the information contained in the row or another location where the memory contains the "best" possible fine-tuning for the read and write phases.

For a better understanding of the present disclosure, it is assumed, merely as a specific example, that a known value such as 0x55 is recorded in hexadecimal form in virtual row 300. This value is particularly suitable because it contains the same number of "0" and "1" logical values.

Since the value is known a priori, the system will perform a few read cycles, changing the trim parameters until the moment the value is read correctly. The changed trim parameters that are correctly read will correspond to the set temperature values or set voltage values recorded in the programmable or trimmable registers.

The read phase of other portions or sectors of sub-array 120 may be performed only if the read trim parameters completely correspond to a correct read of known values.

In other words, using a known string of data stored in a particular virtual row 300 of array block 120 is useful for quickly comparing the current read to the reference read phase.

It is not actually necessary to store the above known values under predetermined operating conditions, since the pattern is known to the internal controller and its address. Thus, the controller can scan the fine tuning parameters until the pattern is read with the best possible margin. At this point, the best possible corner is found and the internal controller uses these settings for other operations.

The controller only needs additional checks when some data reads begin showing high ECC, with the best possible trim reads. This event may indicate that the operating conditions have changed and that the phase will need to be re-tuned.

With knowledge of this known string it is possible to set the calibration mode for the set and hold time interval and for other parameters to be periodically fine-tuned.

The specific information stored in the virtual rows 300 are read during the calibration phase and at very loose timing so that they are not affected by possible problems of detecting the set and hold time intervals.

At certain events (i.e., temperature, reference voltage (Vdd) changes) or at the request of the user, the known patterns stored in the dummy row 300 are read and the internal calibration algorithm starts to fine-tune the configuration for optimal performance and to avoid any metastability.

Alternatively, the integrated circuit may have the capability to store the parameters obtained at each trim in the programmable or trimmable registers described above (e.g., a look-up table) so that the parameters can be reused at a future occasion once the same or similar conditions may occur.

Thus, the controller of the integrated memory device may check whether a similar environmental condition has occurred before calculating the new trim. If a previous condition is detected by a positive result of a comparison between the data retrieved from the programmable or trimmable registers and the data stored in the reference virtual row 300, this indication can be used as a starting point for finding a new fine trim.

Alternatively, the retrieved information itself may be used for the subsequent fine-tuning phase, since the emergency operation is ongoing without available computing time.

In any case, detecting the necessity of a readjustment operation allows to speed up and optimize the tuning process.

Referring now more particularly to the example of FIG. 5, a trimmer block 155 has been provided within the memory device 200 as the core of the trimming method.

This block 155 receives at its inputs at least a value representative of the operating temperature and information about the reference voltage Vdd level. Those signals may come from the outside world, meaning from outside the system on chip or an external device.

Further, one input may be an external calibration request.

The trimmer block 155 is also configured to internally calculate the internal temperature and Vdd information using internal detectors.

The digital output of the trimmer block 155 is represented by a trim bus leading to a plurality of internal circuits indicated by the general block 150. This block is indicated as being inserted between the output of the sense amplifier and the output buffer of the memory array. This block 150 provides the clock and data to perform the trim operations. When this trim operation is performed, the I/O is turned off to isolate the memory during this calibration sequence. At the same time, a block 175, called a "probe", performs an appropriate comparison between the data from the array and the data from the trim block.

The schematic of fig. 5 shows read and write circuitry at the output of the sense amplifier SA representing a logic circuit 150, where the setup and hold time intervals must be periodically fine-tuned. The fine delay tuning is typically in another block that contains the global configuration of the device.

The data contained in this block 150 is trimmed for the sense amplifiers and analog circuits that need to work properly to use the memory device.

These parameters may be block-dependent, in that the stored data (containing the previously disclosed virtual pattern 0x55) may be written at any corner.

It is also contemplated that the erase phase of array block 120 will delete "virtual row" 300 and its contents. The storage operation should always be performed before the erase phase to save the optimal settings elsewhere in the memory and to recover later.

Since the internal memory controller is aware of location and content, no golden pattern (i.e., stored 0x55 value) is needed. A second important consideration is that the erasure of this block can occur at any corner, and this means that it is necessary to have the particular optimum parameters recovered and then read with the optimum value found using the algorithm described above.

The example of fig. 3 refers to a delay configuration that is typically stored in the factory in a special configuration block that is inherent in the device.

It is important that the known content of the virtual row 300 is read correctly, so that all changes to this content, under actual environmental conditions, represent the fine-tuning adjustments necessary to put the parameters in condition for resetting the setting and holding intervals. As previously described, the fine delay adjustment is written into a configuration block of the entire memory and stored once in the factory. Obviously, this method can be applied, but when the eye diagram is analyzed and the best point is found, erasing is never performed, and the best setting can be shared with the regular array block.

As previously described, the fine delay adjustment is stored in a configuration block, such as block 160. Each of memory subarrays 120 (or block 160, depending on the implementation) may include a virtual row 300 to store golden patterns and analog/digital calibration parameters. Thus, the trim bus at the output of the trim block 155 is active on the data and clock inputs of the logic circuit to adjust and configure the delay block disclosed with reference to FIG. 3 and to allow self-trimming of memory components intended as integrated circuits requiring periodic trimming.

Typically, the memory array and trim blocks must be isolated from the outside world to ensure that the trim is not affected by noise from the I/O lines. In this regard, upon receiving a user request, the trimmer block begins to operate, providing a specific output command to the isolator block 165 HiZ.

First, trimmer 155 isolates the exterior of the PCB by activating isolator 165, for example by forcing HiZ block 165 to operate as an output buffer.

Second, a check of the 0x55 value allows the best fine tuning to be found. Once the trim parameters are found, the look-up table is addressed to load all the appropriate settings. The appropriate settings may also contain data to perform the write and erase phases that have been selected because they will occur with the best parameters found during the 0x55 probe and compare.

A fine tuning using 0x55 will always point to a valid setting in the look-up table. The reason is because these settings are factory selected and they must cover the entire operating range of the memory.

If nothing is available (or such a feature is not present), fine tuning is started from the beginning and the optimization algorithm is activated.

In this regard, the optimization algorithm operates according to the following process:

reading a special information line (even several times) to create an eye diagram and then selecting the best setting for the working point the memory is using;

the trimmer block 155 contains gold patterns. The reason is that the golden pattern must always be rewritten when the sub-array and/or sub-array block and/or sub-array group is erased.

The writing phase of the golden pattern is performed by the flash memory controller at the end of the erase operation. Basically, the end of the erase will be the golden pattern write and restore the best parameters from the local storage area and/or copy it from another similar block with a virtual row.

The trim block requires a read operation on a particular dummy row, but with standard timing (or with the best conditions to know a particular temperature/Vdd condition). The standard timing is typically the default low speed/safety parameter. In any case, the pattern in the controller is known to the trim block, so operations can be performed even for high memory array speeds due to direct memory access by the SoC 210. The memory logic will perform several reads of the stored golden pattern, changing the read parameters to find the best read parameters.

If no timing is available, the trimmer uses the best parameters available from the design process. Implementing the algorithm for finding the best opening for the eye diagram shown in fig. 6; that is to say: the read phase is performed step by step using all possible configurations, and the reads are compared to see how well from the golden configuration is. At the end of the operation, the eye diagram is checked and the best setting is selected.

FIG. 7 represents an example of an output buffer 700 incorporated in a memory device of the present disclosure and including a number of tri-state drivers 710 as final stages. The output of stage 710 is addressed to output pad 750, where the output signal BUFFER # N _ OUT is available.

Each driver 710 may be selectively selected by using a driver selector 720. Once selector 720 is enabled by setting select signal Drive selector config # N, selection is provided by input signals Data _ sel [7:0 ].

The BUFFER input signal BUFFER # N _ IN is received by an input stage 740 that is coupled to the series of drivers 710 through a pre-driver stage 730.

Delay block 760 is coupled between the output of input stage 740 and the input of predriver stage 730. Similar to the example disclosed with reference to fig. 3, the Delay block 760 receives as inputs the first Data signal Data Delay config # N and the second Data signal Data [7:0 ].

The fine tuning of the output buffer 700 allows the correct output impedance to be selected. The delay block 760 fine tuning allows the DATA _ PATH, which is the DATA PATH from the memory array to the output PAD, to be adjusted in a fine manner to optimize the setup and hold timing relative to the SoC clock.

In this regard, fig. 8 shows a schematic diagram of a simple model of transmission over a wired bus. The pad 750 shown in fig. 7 should be considered as coupled to a transmitter TX, which may be considered as one end a of a communication channel like the bus line 800. Thus, the flash array output buffer communicates with the host device or SoC over line 800, and at the opposite end of this line there is a node B representing the receiver RX of another device connected on the bus.

Fig. 9 and 10 report schematic diagrams of voltage versus time on two opposite nodes a and B, respectively, at opposite sides of the wired bus of fig. 8.

It will be appreciated that the voltage value Va at node a is in the form of a square wave, while the resulting transmitted value Vb is a sinusoidal wave.

On one side of each of fig. 9 and 10, an eye diagram is reported, which is a method of representing and analyzing a high-speed digital signal. The eye diagram allows to quickly visualize and determine the critical parameters of the electrical quality of the signal. A data eye is constructed from the digital waveform by folding the waveform portion corresponding to each individual bit into a single graph having signal amplitude on the vertical axis and time on the horizontal axis.

Thus, fig. 9 and 10 depict waveforms and corresponding and related eye diagrams on the output buffer side and on the receiver side with no and noise.

As can be appreciated from these representations, fig. 9 represents an ideal eye diagram, while fig. 10 represents a true (or near true) eye diagram containing distortion (e.g., load-related distortion) and noise.

As is evident from these figures, there is an optimum point for sampling the output signal of the output buffer and this point is essentially the central part or frame of the eye, where it defines the area where sampling is acceptable.

Of course, the goal of the fine tuning of the delay or impedance and the tuning phase is to have the eye well open, which means that the sampling area is relatively wide.

The previously disclosed methods are also referred to as calibration and/or training in the DRAM and PCI domains, as follows:

1. comparing the read value with an expected value

a. If they match, the first stage (initial point) ends → see point 610 in the diagram of FIG. 6

b. If some of the outputs do not match, the configuration is changed until the moment they match (so step (a) is iterated several times)

2. Once the initial point is complete, each configuration is changed to find each widest function interval (point 620 in the diagram of FIG. 6).

An optimization method that reduces the time to find the best point to use may consider choosing an appropriate value in the look-up table according to one of:

random search (bitwise)

Binary search (bit by bit)

Gradient descending; this means that the data-by-data [ N-1:0] is brought to a solution space with 2N words.

The same method can be used for any other parameter within the device depending on Vdd/temperature/process or external conditions of the integrated circuit. This may also be useful in view of process aging and when re-tuning of internal parameters is required.

This method may be substantially the same as the gold pattern stored in the flash internal controller disclosed above and is used as a method to find the deviation of the parameter from a known value.

For example, the method may also be used to match PCB impedances, in which case the trim selects a different buffer Zout, and the feedback for the trim is a reflected signal.

Referring now to recalibration due to, for example, thermal drift, it is not important to detect the actual temperature value at which the reading phase is performed. This temperature may be higher (even much higher) or lower if compared to the temperature level at which the programming phase has been performed. The problem here is that a change in temperature will generate a virtual drift and enlarge the internal horizontal distribution. Such drift and amplification can generate errors due to overlapping distributions. This effect is most severe when the comparative device is new, as the device ages, because the distribution has already expanded at the beginning.

With the method of the present disclosure, the system will be automatically protected by any thermal drift, since the trim parameters are selected after a correct reading of the known sequence stored in the dummy row 300 has been performed and the trim parameters contained in the programmable or trimmable registers and used to read the same known value at different temperatures or with different reference voltage values have been identified accordingly.

The process of allowing identification of more appropriate read trim parameters for the correct re-trim phase at a certain temperature value does not necessarily repeat with a large periodicity. The process is performed when needed by the external SoC and/or when a high ECC level is detected during a read operation.

Conversely, this process may be performed periodically or in a more appropriate manner when a possible problem is detected by the ECC bits.

For example, a situation occurs where an increased number of ECC bits (the array ECC may differ from the security bits according to previous comments) report an excessive number of erroneous reads from the memory device. In this case, the system may automatically start the process for detecting possible thermal drifts and the subsequent need to change the trim parameters.

In some cases, the process may be performed in response to a change in external conditions (e.g., by detecting a temperature or supply voltage change with a corresponding sensor in the vehicle).

The architecture and method of the present disclosure have several distinct advantages. For example, the possibility of following any thermal drift of the environment in which the memory device or the system of memory devices is embedded is provided. This is associated with the optimal setting. In addition, programmability of the system is provided because the virtual rows 300 in which known values are recorded can be deleted and reprogrammed as needed, even in accordance with environmental changes of the memory device. Since 0x55 may be retrieved from the internal flash controller as a location and pattern, the lookup table may be from another sub-array in this embodiment.

Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that an arrangement calculated to achieve the same results can be substituted for the specific embodiments shown. This disclosure is intended to cover adaptations or variations of various embodiments of the disclosure. It is to be understood that the above description has been made in an illustrative fashion, and not a restrictive one. Combinations of the above embodiments, and other embodiments not specifically described herein, will be apparent to those of skill in the art upon reviewing the above description. The scope of the various embodiments of the present disclosure includes other applications in which the above structures and methods are used. The scope of various embodiments of the disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

26页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:设置有包括指令寄存器矩阵的JTAG测试接口的存储器组件

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!