AN2203 Freescale Semiconductor / Motorola, AN2203 Datasheet - Page 37

no-image

AN2203

Manufacturer Part Number
AN2203
Description
MPC7450 RISC Microprocessor Family Software Optimization Guide
Manufacturer
Freescale Semiconductor / Motorola
Datasheet

Available stocks

Company
Part Number
Manufacturer
Quantity
Price
Part Number:
AN22030A
Manufacturer:
PANASONIC/松下
Quantity:
20 000
3.7.3
The MPC7450 implements two techniques to improve store performance by coalescing adjacent entries in
the CSQ. Store gathering refers to coalescing adjacent cache-inhibited or write-through stores; store
merging refers to coalescing adjacent cacheable writeback stores. Note that these two techniques are used
only when the bottom CSQ entry is processing a cache miss or sending a store request to the memory
subsystem. In such a situation, the bottom entry itself is not eligible for any coalescing operations, but all
other CSQ entries are examined.
The throughput of cache-inhibited or write-through stores is usually limited by the system address bus
bandwidth. With store gathering enabled (HID0[SGE] = 1), cache-inhibited or write-through stores may be
combined into larger transactions. If the bottom entry of the CSQ is processing a cacheable store miss or
sending a store request on to the memory subsystem, the processor examines the remaining CSQ entries for
store gathering. Any set of adjacent entries in the CSQ are gathered into one transaction if they are aligned,
the same size, to the same or adjacent addresses, either cache-inhibited or write-through, and the result is
aligned. When the MPC7450 is on a system bus supporting the MPX protocol, this gathering may continue
up to a 32-byte store request. On a 60x bus, the MPC7450 does not gather beyond a 64-bit transaction. Under
ideal conditions, a stream of write-through or cache-inhibited stores to sequential addresses reduces store
transactions on the system bus by a factor of four. Note that cache-inhibited guarded stores are never
gathered.
The throughput of cacheable stores that miss in the L1 is limited by the latency to the L2 or L3 caches and
the memory latency. When store gathering is enabled (HID0[SGE] = 1), cacheable writeback stores may
also be combined. If the bottom entry of the CSQ is processing a cacheable store miss or sending a store
request to the memory subsystem, any other adjacent entries in the CSQ are merged into one transaction if
they are both to the same 32-byte granule, are both cacheable and writeback, and are waiting to access the
L1 or have already missed in the L1 cache. For store merging, the size and alignment restrictions are relaxed,
because cacheable stores are always performed by writing bytes to the L1 (if the data L1 hits) or merging
bytes with reload data (if the data L1 misses).
3.7.4
When loads and stores are intermixed, the stores normally lose arbitration to the cache. A store that
repeatedly loses arbitration can stay in the CSQ much longer than four cycles, which is not normally a
performance problem because a store in the CSQ is effectively part of the architecture-defined state.
However, sometimes—including if the CSQ fills up or if a store causes a pipeline stall (as in a partial address
alias case of store to load)—the arbiter gives higher priority to the store, guaranteeing forward progress.
Also, accesses to the data cache are pipelined (two stages) such that back-to-back loads and back-to-back
stores are fully pipelined (single-cycle throughput). However, a store followed by a load cannot be
performed in subsequent clock cycles. Loads have higher priority than stores and the LSU store queues stage
store operations until a cache cycle is available. When the LSU store queues become full, stores take priority
over subsequent loads.
From an architectural perspective, when a load address aliases to a store address the load needs to read the
store data rather than the data in the cache. A store can forward only after acquiring its data, which means
forwarding happens only from the CSQ. Additionally, the load address and size must be contained within
the store address and size for store forwarding to occur. If the alias is only a partial alias (for example a stb
and a lwz) the load stalls. Table 3-23 shows a forwardable load/store alias, where the load stalls in E1 for
three cycles until the store arrives in CSQ0 and can forward its data.
MOTOROLA
Store Gathering and Merging
Load/Store Interaction
MPC7450 RISC Microprocessor Family Software Optimization Guide
Freescale Semiconductor, Inc.
For More Information On This Product,
Go to: www.freescale.com
Load/Store Unit (LSU)
37

Related parts for AN2203