ADWPCIXRSKU20 Intel, ADWPCIXRSKU20 Datasheet - Page 38

no-image

ADWPCIXRSKU20

Manufacturer Part Number
ADWPCIXRSKU20
Description
Manufacturer
Intel
Datasheet

Specifications of ADWPCIXRSKU20

Lead Free Status / Rohs Status
Supplier Unconfirmed
Functional Architecture
Intel® Server Board SE7320VP2
initialization step, but also frees the processor to pursue other machine initialization and
configuration tasks.
Additional features have been added to the initialization engine to support high-speed
population and verification of a programmable memory range with one of four known data
patterns (0/F, A/5, 3/C, and 6/9). This function facilitates a limited, very high speed memory test,
as well as provides a BIOS accessible memory zeroing capability for use by the operating
system.
3.3.6.5
DIMM Sparing Function
To provide a more fault tolerant system, the Intel E7320 MCH includes specialized hardware to
support fail-over to a spare DIMM device in case a primary DIMM exceeds a specified threshold
of runtime errors. One of the DIMMs installed per channel, greater than or equal in size than all
installed, will not be used but is kept in reserve. If a significant failure occurs in a particular
DIMM, that DIMM and its corresponding partner in the other channel (if applicable), will, over
time, have its data copied to the spare DIMM(s). When all data has been copied, the reserve
DIMM(s) will be put into service and the failing DIMM will be removed from service. Only one
sparing cycle is supported. If this feature is not enabled, then all DIMMs will be visible in normal
address space.
Note: The DIMM Sparing feature requires that the spare DIMM be at least the size of the largest
primary DIMM in use.
Hardware additions for this feature include the implementation of tracking register per DIMM to
maintain a history of error occurrence, and a programmable register to hold the fail-over error
threshold level. The operational model is straightforward: if the fail-over threshold register is set
to a non-zero value, the feature is enabled, and if the count of errors on any DIMM exceeds that
value, fail-over will commence. The tracking registers themselves are implemented as “leaky
buckets,” such that they do not contain an absolute cumulative count of all errors since power-
on; rather, they contain an aggregate count of the number of errors received over a running time
period. The “drip rate” of the bucket is selectable by software, so it is possible to set the
threshold to a value that will never be reached by a “healthy” memory subsystem experiencing
the rate of errors expected for the size and type of memory devices in use.
The fail-over mechanism is slightly more complex. Once fail-over has been initiated the MCH
must execute every write twice; once to the primary DIMM, and once to the spare. The MCH will
also begin tracking the progress of its built-in memory scrub engine. Once the scrub engine has
covered every location in the primary DIMM, the duplicate write function will have copied every
data location to the spare. At that point, the MCH can switch the spare into primary use, and
take the failing DIMM off-line.
Until the threshold detection has been triggered to request a data copy this mechanism requires
no software support once it has been programmed and enabled. Hardware will detect the
threshold initiating fail-over and escalate the occurrence of that event as directed (signal an
SMI, generate an interrupt, or wait to be discovered via polling). A software routine responding
to the threshold detection must select a victim DIMM (if multiple DIMMs have crossed the
threshold prior to sparing invocation) and initiate the memory copy. Hardware will automatically
isolate the “failed” DIMM after the copy has completed. The data copy is accomplished by
address aliasing within the DDR control interface, thus it does not require reprogramming of the
38
Revision 2.1
Intel order number C91056-002