Processor and MemoryThe purpose of this chapter is to specify the processor and memory
requirements of this architecture. The processor architecture section addresses
differences between the processors in the PA family as well as their interface
variations and features of note. The memory architecture section addresses
coherency, minimum system memory requirements, memory controller requirements,
and cache requirements.Processor ArchitectureThe Processor Architecture (PA) governs software compatibility at an
instruction set and environment level. However, each processor implementation
has unique characteristics which are described in its user’s manual. To
facilitate shrink-wrapped software, this architecture places some limitations
on the variability in processor implementations. Nonetheless, evolution of the
PA and implementations creates a need for both software and hardware developers
to stay current with its progress. The following material highlights areas
deserving special attention and provides pointers to the latest
information.Processor Architecture ComplianceThe PA is defined in .R1--1.Platforms must incorporate only processors which comply fully
with .R1--2.For the Symmetric Multiprocessor option:
Multiprocessing platforms must use only processors which
implement the processor identification register. R1--3.Platforms must incorporate only processors which implement
tlbie and tlbsync, and
slbie and slbia for
64-bit implementations.R1--4.Except where specifically noted otherwise
in , platforms must support all
functions specified by the PA.Hardware and Software Implementation Note: The PA and this
architecture view tlbia
as an optional performance enhancement. Processors need not
implement tlbia. Software that needs to purge the TLB should provide a sequence
of instructions that is functionally equivalent to tlbia and use the content of
the OF device tree to choose the software implementation or the hardware
instruction. See for details.PA Processor DifferencesA complete understanding of processor differences may be obtained
by studying and the user’s
manuals for the various processors. The creators of this architecture cooperate with processor
designers to maintain a list of supported differences, to be used by the OS
instead of the processor
version number (PVN),
enabling execution on future processors. OF communicates these differences via properties of the
cpu node of the OF device tree. Examples of OF device
tree properties which support these differences include “64-bit”
and “performance-monitor”. See
for a complete listing and more details. R1--1.The OS must use the properties of the cpu
node of the OF device tree to determine the programming model of the processor
implementation.R1--2.The OS must provide an execution path
which uses the properties of the cpu node of the OF
device. The PVN
is available to the platform aware OS for exceptional cases such as performance
optimization and errata handling.R1--3.The OS must
support the 64-bit page table formats defined by
. R1--4.Processors which exhibit the
“64-bit” property of the
cpu node of the OF device tree must also implement the
“bridge architecture,” an option in .
R1--5.Platforms must restrict their choice of processors to those whose
programming models may be described by the properties defined for the
cpu node of the OF device tree in
.R1--6.Platform firmware must initialize the
second and third pages above Base correctly for the
processor in the platform prior to giving control to the OS.R1--7.OS and application software must not
alter the state of the second and third pages above Base.R1--8.Platforms must implement the
“ibm,platform-hardware-notification” property (see
) and include all PVRs that the platform may
contain.64-bit ImplementationsSome 64-bit processor implementations will not support the full
virtual address allowed by . As a
result, this architecture adds a 64-bit virtual address subset to the PA and
the corresponding cpu node property
“64-bit-virtual-address” to OF. In order for an OS to make use of the increased addressability of
64-bit processor implementations:The memory subsystem must support the addressing of memory
located at or beyond 4 GB, andAny system memory located at or beyond 4 GB must be reported via
the OF device tree.At an abstract level, the effort to support 64-bit architecture in
platforms is modest. The requirements follow.R1--1.The OS must support the 64-bit virtual
address subset, but may defer support of the full 80-bit virtual address until
such time as it is required.R1--2.Firmware must report the
“64-bit-virtual-address”
property for processors which implement the 64-bit virtual address subset.R1--3.RTAS must be capable of being
instantiated in either a 32-bit or 64-bit mode on a platform with addressable
memory above 4 GB.Software Implementation Note: A 64-bit OS need not require 64-bit
client interface services in order to boot. Because of the problems that might
be introduced by dynamically switching between 32-bit and 64-bit modes in OF,
the configuration variable 64-bit-mode? is provided so
that OF can statically configure itself to the needs of the OS.Processor Interface VariationsIndividual processor interface implementations are described in
their respective user’s manuals.PA Features Deserving CommentSome PA features are optional, and need not be implemented in a
platform. Usage of others may be discouraged due to their potential for poor
performance. The following sections elaborate on the disposition of these
features in regard to compliance with the PA.Multiple Scalar OperationsThe PA supports multiple scalar operations. The multiple scalar
operations are Load and Store String and Load and Store Multiple.
Due to the long-term performance disadvantage associated with multiple scalar
operations, their use by software is not recommended. External Control Instructions (Optional)The external control instructions
(eciwx and ecowx) are not supported
by this architecture.cpu Node “Status” PropertySee for the values of the
“status” property of the cpu
node.Multi-Threading Processor OptionPower processors may optionally support multi-threading.R1--1.For the Multi-threading
Processor option: The platform must supply one entry in the
ibm,ppc-interrupt-server#s property associated with the
processor for each thread that the processor supports.Refer to for the definition of
the ibm,ppc-interrupt-server#s property.Memory ArchitectureThe Memory Architecture of an LoPAR implementation is defined by
and
, which defines what platform elements
are accessed by each real (physical) system address, as well as the sections
which follow.The PA allows implementations to incorporate such performance
enhancing features as write-back caching, non-coherent instruction caches,
pipelining, and out-of-order and speculative execution. These features
introduce the concepts of coherency (the apparent order
of storage operations to a single memory location as observed by other
processors and DMA) and consistency (the order of storage
accesses among multiple locations). In most cases, these features are
transparent to software. However, in certain circumstances, OS software
explicitly manages the order and buffering of storage operations. By
selectively eliminating ordering options, either via storage access mode bits
or the introduction of storage barrier instructions, software can force
increasingly restrictive ordering semantics upon its storage operations. Refer
to for further details.PA processor designs usually allow, under certain conditions, for
caching, buffering, combining, and reordering in the platform’s memory
and I/O subsystems. The platform’s memory subsystem, system
interconnect, and processors, which cooperate through a platform implementation
specific protocol to meet the PA specified memory coherence, consistency, and
caching rules, are said to be within the platform’s coherency
domain. shows an example system.
The shaded portion is the PA coherency domain. Buses 1 through 3 lie outside
this domain. The figure shows two
I/O subsystems, each interfacing with the host system via a Host Bridge. Notice that
the domain includes portions of the Host Bridges. This symbolizes the role of
the bridge to apply PA semantics to reference streams as they enter or leave
the coherency domain, while implementing the ordering rules of the I/O bus
architecture.Memory, other than System Memory, is not required to be coherent.
Such memory may include memory in IOAs.Hardware Implementation Note: Components of the platform within the
coherency domain (memory controllers and in-line caches, for example)
collectively implement the PA memory model, including the ordering of
operations. Special care should be given to configurations for which multiple
paths exist between a component that accesses memory and the memory itself, if
accesses for which ordering is required are permitted to use different paths.System MemorySystem Memory normally consists of dynamic read/write random access
memory which is used for the temporary storage of programs and data being
operated on by the processor(s). A platform usually provides for the expansion
of System Memory via plug-in memory modules and/or memory boards.R1--1.Platforms must provide at least 128 MB of
System Memory. (Also see for
other requirements which apply to memory within the first 32MB of System
Memory.)R1--2.Platforms must support the expansion of
System Memory to 2 GB or more.Hardware Implementation Note: These requirements are minimum
requirements. Each OS has its own recommended configuration which may be
greater.Software Implementation Note: System Memory will be described by
the properties of the memory node(s) of the OF
device tree.Memory Mapped I/O (MMIO) and DMA OperationsStorage operations which cross the coherency domain boundary are
referred to as Memory Mapped I/O (MMIO) operations if they are initiated within
the coherency domain, and DMA operations
if they are initiated outside the coherency domain
and target storage within it. Accesses with targets outside the coherency
domain are assumed to be made to IOAs. These accesses are considered performed
(or complete) when they complete at the IOA’s I/O bus interface.Bus bridges translate between bus operations on the initiator and
target buses. In some cases, there may not be a one-to-one correspondence
between initiator and target bus transactions. In these cases, the bridge
selects one or a sequence of transactions which most closely matches the
meaning of the transaction on the source bus. See also
for more details and the appropriate PCI
specifications.For MMIO Load and Store
instructions, the software needs to set up the WIMG bits
appropriately to control Load and Store caching,
Store combining, and
speculative Load execution to I/O addresses. This
architecture does not require platform support of caching of MMIO
Load and Store instructions.
See the PA for more information.R1--1.For MMIO Load and Store instructions,
the hardware outside of the processor must not
introduce any reordering of the MMIO instructions for a processor or processor
thread which would not be allowed by the PA for the instruction stream executed
by the processor or processor thread. Hardware Implementation Note: Requirement
may imply that hardware outside of
the processor cannot reorder MMIO instructions from the same processor or
processor thread, but this depends on the processor implementation. For
example, some processor implementations will not allow multiple
Loads to be issued when those Loads are to
Cache Inhibited and Guarded space (as are MMIO Loads ) or
allow multiple Stores to be issued when those
Stores are to Cache Inhibited and Guarded space (as are MMIO
Stores). In this example, hardware external to the
processors could re-order Load instructions with respect
to other Load instructions or re-order
Store instructions with respect to other Store
instructions since they would not be from the same processor or thread.
However, hardware outside of the processor must still take care not to re-order
Loads with respect to Stores or
vice versa, unless the hardware has access to the entire instruction stream to
see explicit ordering instructions, like eieio. Hardware outside of the
processor includes, but is not limited to, buses, interconnects, bridges, and
switches, and includes hardware inside and outside of the coherency
domain.R1--2.(Requirement Number Reserved
For Compatibility)Apart from the ordering disciplines stated in Requirements
and, for PCI the ordering of MMIO
Load data return versus buffered DMA data, as defined by
Requirement , no other ordering
discipline is guaranteed by the system hardware for Load
and Store instructions performed by a processor to
locations outside the PA coherency domain. Any other ordering discipline, if
necessary, must be enforced by software via programming means.The elements of a system outside its coherency domain are not
expected to issue explicit PA ordering operations. System hardware must
therefore take appropriate action to impose ordering disciplines on storage
accesses entering the coherency domain. In general, a strong-ordering rule is
enforced on an IOA’s accesses to the same location, and write operations
from the same source are completed in a sequentially consistent manner. The
exception to this rule is for the special protocol ordering modifiers that may
exist in certain I/O bus protocols. An example of such a protocol ordering
modifier is the PCI Relaxed Ordering bitThe PCI
Relaxed Ordering bit is an optional
implementation, from both the IOA and platform perspective. ,
as indicated in the requirements, below.R1--3.Platforms must guarantee that accesses
entering the PA coherency domain that are from the same IOA and to the same
location are completed in a sequentially consistent manner, except transactions
from PCI-X and PCI Express masters may be reordered when the Relaxed Ordering
bit in the transaction is set, as specified in the
and
. R1--4.Platforms must guarantee that multiple write operations entering
the PA coherency domain that are issued by the same IOA are completed in a
sequentially consistent manner, except transactions from PCI-X and PCI Express
masters may be reordered when the Relaxed Ordering bit in the transaction is
set, as specified in the
and
.R1--5.Platforms must be designed to present I/O DMA writes to the coherency domain in the order required by
, except transactions from PCI-X and PCI
Express masters may be reordered when the Relaxed Ordering bit in the
transaction is set, as specified in the
and
.Storage Ordering and I/O InterruptsThe conclusion of I/O operations is often communicated to
processors via interrupts. For example, at the end of a DMA operation that
deposits data in the System Memory, the IOA performing the operation might send
an interrupt to the processor. Arrival of the interrupt, however, may be no
guarantee that all the data has actually been deposited; some might be on its
way. The receiving program must not attempt to read the data from the memory
before ensuring that all the data has indeed been deposited. There may be
system and I/O subsystem specific method for guaranteeing this. See .Atomic Update ModelAn update of a memory location by a processor, involving a
Load followed by a Store, can be
considered “atomic” if there are no intervening
Stores to that location from another processor or mechanism. The PA
provides primitives in the form of Load
And Reserve and Store
Conditional instructions which can be used to determine if the update was
indeed atomic. These primitives can be used to emulate operations such as
“atomic read-modify-write” and “atomic
fetch-and-add.” Operation of the atomic update primitives is based on
the concept of “Reservation,”See
Book I and II of .
which is supported in an LoPAR system via the coherence mechanism.R1--1.Load And Reserve and
Store Conditional instructions
must not be assumed to be supported for Write-Through storage.Software Implementation Note: To emulate an
atomic read-modify-write operation, the instruction pair must access the same
storage location, and the location must have the Memory Coherence Required
attribute.Hardware Implementation Note: The reservation
protocol is defined in Book II of the
for atomic updates to locations in the same coherency domain.R1--2.The Load And
Reserve and Store Conditional instructions
must not be assumed to be supported for Caching-Inhibited storage.Memory ControllersA Memory Controller responds to the real (physical) addresses
produced by a processor or a host bridge for accesses to System Memory. It is
responsible for handling the translation from these addresses to the physical
memory modules within its configured domain of control.R1--1.Memory controller(s) must support the
accessing of System Memory as defined in .R1--2.Memory controller(s) must be fully initialized and
set to full power mode prior to the transfer of control to the OS. R1--3.All allocations of System Memory space
among memory controllers must have been done prior to the transfer of control
to the OS.Software Implementation Note: Memory controller(s) are described by
properties of the memory-controller node(s) of the OF device
tree.Cache MemoryAll of the PA processors include some amount of on-chip or
internal cache memory.
This architecture allows for cache memory which is external to the processor
chip, and this external
cache memory forms an extension to internal cache memory. R1--1.If a platform implementation elects not
to cache portions of the address map in all external levels of the cache
hierarchy, the result of not doing so must be transparent to the operation of
the software, other than as a difference in performance.R1--2.All caches must be fully
initialized and enabled, and they must have
accurate state bits prior to the transfer of control to the OS.R1--3.If an in-line external
cache is used, it must support one reservation as
defined for the Load And Reserve and
Store Conditional instructions.R1--4.For the Symmetric
Multiprocessor option: Platforms must implement their cache
hierarchy such that all caches at a given level in the cache hierarchy can be
flushed and disabled before any caches at the next level which may cache the
same data are flushed and disabled (that is, L1 first, then L2, and so
on).R1--5.For the Symmetric
Multiprocessor option: If a cache implements snarfing,
then the cache must be capable of disabling the snarfing during flushing in order to implement
the RTAS stop-self function in an atomic way.R1--6.Software must not depend on being able to
change a cache from copy-back to write-through.Software Implementation Notes:Each first level cache will be defined via properties of the
cpu node(s) of the OF device tree. Each higher level cache will be
defined via properties of the l2-cache node(s)
of the OF device tree. See for more details.To ensure proper operation, cache(s) at the same level in the
cache hierarchy should be flushed and disabled before cache(s) at the next
level (that is, L1 first, then L2, and so on).Memory Status informationNew OF properties are defined to support the identification and
contain the status information on good and bad system memory.R1--1.Firmware must implement all of the
properties for memory modules, as specified by ,
and any other properties defined by this document which apply to memory modules.Reserved MemorySections of System Memory may be reserved for usage by OS
extensions, with the restrictions detailed below. Memory nodes marked with the
special value of the “status” property of
“reserved” is not to be used or altered by the base OS. Several
different ranges of memory may be marked as “reserved”. If DLPAR
of memory is to be supported and growth is expected, then, an address range
must be unused between these areas in order to allow growth of these areas.
Each area has its own DRC Type (starting at 0, MEM, MEM-1, MEM-2, and so on).
Each area has a current and a maximum size, with the current size being the sum
of the sizes of the populated DRCs for the area and the max being the sum total
of the sizes of all the DRCs for that area. The logical address space allocated
is the size of the sum of the all the areas' maximum sizes. Starting with
logical real address 0, the address areas are allocated in the following order:
OS, DLPAR growth space for OS (if DLPAR is supported), reserved area (if any)
followed by the DLPAR growth space for that reserved area (if DLPAR is
supported), followed by the next reserved space (if any), and so on. The
current memory allocation for each area is allocated contiguously from the
beginning of the area. On a boot or reboot, including hypervisor reboot, if
there is any data to be preserved (that is, the
“ibm,preserved-storage”
property exists in the RTAS
node), then the starting logical real address of each LMB is maintained through
the reboot. The memory in each region can be independently increased or
decreased using DLPAR memory functions, when DLPAR is supported. Changes to the
current memory allocation for an area results in the addition or removal of
memory to the end of the existing memory allocation.Implementation Note: if the shared memory
regions are not accessed by the programs, and are just used for DMA most of the
time, then the same HPFT hit rate could be achieved with a far lower ration of
HPFT entries to logical storage space.R1--1.For the Reserved Memory option:
Memory nodes marked with the special value of the “status”
property of “reserved” must not be used or altered by the base OSImplementation Note: How areas get chosen to
be marked as reserved is beyond the scope of this architecture.R1--2.For the Reserved Memory option
with the LRDR option: Each unique memory area that is to be changed
independently via DLPAR must have different DRC Types (for example, MEM, MEM-1,
and so on).Persistent MemorySelected regions of storage (LMBs) may be optionally preserved
across client program boot cycles. See
and .