AN2203 Freescale Semiconductor / Motorola, AN2203 Datasheet - Page 37

AN2203

Manufacturer Part Number

AN2203

Description

MPC7450 RISC Microprocessor Family Software Optimization Guide

Manufacturer

Freescale Semiconductor / Motorola

Datasheet

1.AN2203.pdf (76 pages)

Available stocks

Company

Part Number

Manufacturer

Quantity

Price

Company:

Meier Automation Equipment Co., Limited

Part Number:

AN22030A

Manufacturer:

PANASONIC/松下

Quantity:

20 000

Current page: 37 of 76
Download datasheet (650Kb)

3.7.3

The MPC7450 implements two techniques to improve store performance by coalescing adjacent entries in

the CSQ. Store gathering refers to coalescing adjacent cache-inhibited or write-through stores; store

merging refers to coalescing adjacent cacheable writeback stores. Note that these two techniques are used

only when the bottom CSQ entry is processing a cache miss or sending a store request to the memory

subsystem. In such a situation, the bottom entry itself is not eligible for any coalescing operations, but all

other CSQ entries are examined.

The throughput of cache-inhibited or write-through stores is usually limited by the system address bus

bandwidth. With store gathering enabled (HID0[SGE] = 1), cache-inhibited or write-through stores may be

combined into larger transactions. If the bottom entry of the CSQ is processing a cacheable store miss or

sending a store request on to the memory subsystem, the processor examines the remaining CSQ entries for

store gathering. Any set of adjacent entries in the CSQ are gathered into one transaction if they are aligned,

the same size, to the same or adjacent addresses, either cache-inhibited or write-through, and the result is

aligned. When the MPC7450 is on a system bus supporting the MPX protocol, this gathering may continue

up to a 32-byte store request. On a 60x bus, the MPC7450 does not gather beyond a 64-bit transaction. Under

ideal conditions, a stream of write-through or cache-inhibited stores to sequential addresses reduces store

transactions on the system bus by a factor of four. Note that cache-inhibited guarded stores are never

gathered.

The throughput of cacheable stores that miss in the L1 is limited by the latency to the L2 or L3 caches and

the memory latency. When store gathering is enabled (HID0[SGE] = 1), cacheable writeback stores may

also be combined. If the bottom entry of the CSQ is processing a cacheable store miss or sending a store

request to the memory subsystem, any other adjacent entries in the CSQ are merged into one transaction if

they are both to the same 32-byte granule, are both cacheable and writeback, and are waiting to access the

L1 or have already missed in the L1 cache. For store merging, the size and alignment restrictions are relaxed,

because cacheable stores are always performed by writing bytes to the L1 (if the data L1 hits) or merging

bytes with reload data (if the data L1 misses).

3.7.4

When loads and stores are intermixed, the stores normally lose arbitration to the cache. A store that

repeatedly loses arbitration can stay in the CSQ much longer than four cycles, which is not normally a

performance problem because a store in the CSQ is effectively part of the architecture-deﬁned state.

However, sometimes—including if the CSQ ﬁlls up or if a store causes a pipeline stall (as in a partial address

alias case of store to load)—the arbiter gives higher priority to the store, guaranteeing forward progress.

Also, accesses to the data cache are pipelined (two stages) such that back-to-back loads and back-to-back

stores are fully pipelined (single-cycle throughput). However, a store followed by a load cannot be

performed in subsequent clock cycles. Loads have higher priority than stores and the LSU store queues stage

store operations until a cache cycle is available. When the LSU store queues become full, stores take priority

over subsequent loads.

From an architectural perspective, when a load address aliases to a store address the load needs to read the

store data rather than the data in the cache. A store can forward only after acquiring its data, which means

forwarding happens only from the CSQ. Additionally, the load address and size must be contained within

the store address and size for store forwarding to occur. If the alias is only a partial alias (for example a stb

and a lwz) the load stalls. Table 3-23 shows a forwardable load/store alias, where the load stalls in E1 for

three cycles until the store arrives in CSQ0 and can forward its data.

MOTOROLA

Store Gathering and Merging

Load/Store Interaction

MPC7450 RISC Microprocessor Family Software Optimization Guide

Freescale Semiconductor, Inc.

For More Information On This Product,

Go to: www.freescale.com

Load/Store Unit (LSU)

AN2203 Freescale Semiconductor / Motorola, AN2203 Datasheet - Page 37

AN2203

Available stocks

Related parts for AN2203