Address Map The address map of an LoPAR platform is made up of several distinct areas. These areas are one of five basic types. Each of these types has its own general characteristics such as coherency, alignment, size restrictions, variability of starting address and size, the system action on access of the area, and so on. This chapter gives details on some of those characteristics, and other chapters define the other characteristics. The variable characteristics of these areas are reported to the OS via properties in the OF device tree.
Address Areas The following is a definition of the five areas and some of their characteristics: System Memory refers to memory which forms a coherency domain with respect to the PA processor(s) that execute application software on a system. See for details on aspects of coherence. System Memory Spaces refer to one or more pieces that together form the System Memory. System Memory areas may be marked with a special value of the “status” property of “reserved” which means that this memory is not for general use by the base OS, but may be reserved for use by OS extensions (see ). Some System Memory areas may be preservable across boots (see ). Peripheral Memory Space refers to a range of real addresses which are assigned to the Memory Space of a Host Bridge (HB) or System Bus attached IOA, and which are sufficient to contain all of the Load and Store address space requirements of all IOAs in the Memory Space of the I/O bus that is generated by the HB or which are encompassed by the System Bus attached IOA. The frame buffer of a graphics IOA is an example of a device which may reside in the Peripheral Memory Space. Due to space limitations in the address space below 4 GB, the HBs of platforms may split this space into two pieces; one to support the IOAs that need to have their addresses below 4 GB (because they only support 32-bit addresses) and another to support the IOAs that can have their addresses above 4 GB (because they support 64-bit addresses). In addition to a Memory Space, many types of I/O buses have a separate address space called the I/O Space. An HB which generates such I/O buses must decode another address range, the Peripheral I/O Space.A peripheral space may also include a “configuration” address space. The configuration space is abstracted by a Run-Time Abstraction Service (for example, see ). Peripheral I/O Space refers to a range of real addresses which are assigned to the I/O Space of an HB or System Bus attached IOA and which are sufficient to contain all of the Load and Store address space requirements of all the IOAs in the I/O Space of the I/O bus that is generated by the HB or which are encompassed by the System Bus IOA. A keyboard controller is an example of an IOA which may require Peripheral I/O Space addresses. System Control Area (SCA) refers to a range of addresses which contains all reserved addresses (architected or unarchitected) which are not part of one of the other defined address spaces. For example, the system ROM(s), unarchitected platform-dependent addresses used by firmware and Run-Time Abstraction Services for control of the platform, and architected entities like interrupt controller addresses when those addresses are not in another defined address space. Undefined refers to areas that are not one of the above four areas. The result of accessing one of these areas is defined in as an invalid address error. In addition to the above definitions, it is convenient, relative to I/O op erations, to define a Partitionable Endpoint. A Partitionable Endpoint (PE) is an I/O subtree that can be treated as a unit for the purposes of partitioning and error recovery. A PE may be a single or multi-function IOA, a function of a multi-function IOA, or multiple IOAs (possibly including switch and bridge structures above the multiple IOAs). See for more information about PEs. In describing the characteristics of these various areas, it is convenient to have a nomenclature for the various boundary addresses. defines the labels which are used in this document when describing the various address ranges. Note that “bottom” refers to the smallest address of the range and “top” refers to the largest address. Map Legend Label Description BIOn Bottom of Peripheral I/O Space for HBn (n=0, 1, 2,...). The OF property “ranges” in the OF device tree for HBn contains the value of BIOn. TIOn Top of Peripheral I/O Space for HBn (n=0, 1, 2,...). The value of TIOn can be determined by adding the size of the area as found in the OF property “ranges” in the OF device tree for HBn to the value of BIOn found in that same property and then subtracting 1. This architecture allows at most one Peripheral I/O area per HB which may be above or below 4 GB. For any given n, BIOn to TIOn cannot span from the first 4 GB of address space to the second. BPMn,m Bottom of Peripheral Memory Space m (m=0,1) for HBn (n=0, 1, 2,...), as viewed from the system side of HBn. The OF property “ranges” in the OF device tree for HBn contains the value of BPMn,m. BPM’n,m Bottom of Peripheral Memory Space m (m=0,1) for HBn (n=0, 1, 2,...), as viewed from the I/O side of the HBn. That is, this is the value to which BPMn,m gets translated to as it passes through the HB. The OF property “ranges” in the OF device tree for HBn contains the value of BPM’n,m. BPM’n,m may be equal to BPMn,m or may not be. TPMn,m Top of Peripheral Memory Space m (m=0,1) for HBn (n=0, 1, 2,...) as viewed from the system side of HBn. The Peripheral Memory Space address range is in the OF device tree, as indicated by the “ranges” property in the node in the OF device tree for HBn; BPMn,m to TPMn,m. The value of TPMn,m can be determined by adding the size of the area as found in the OF property “ranges” in the OF device tree for HBn to the value of BPMn,m found in that same property and then subtracting 1. This architecture allows for one or two Peripheral Memory areas per HB (hence, m=0,1). A Peripheral Memory area may be above 4 GB or below. For any given n, BPMn,m to TPMn,m cannot span from the first 4 GB of address space to the second. TPM’n,m Top of Peripheral Memory Space m (m=0,1) for HBn (n=0, 1, 2,...) as viewed from the I/O side of HBn. The value of TPM’n,m can be calculated from the values in the“ranges” property as was TPMn,m. In some cases TPM’n,m is required to be equal to TPMn,m and in some cases it is not required to be equal. For any given n, BPM’n,m to TPM’n,m cannot span from the first 4 GB of address space to the second. BSCAn Bottom of System Control Area. Corresponding top of the System Control Area is TSCAn. This architecture allows for one or two SCAs per platform. The SCA below 4 GB is at the top (largest addresses) of the lower 4 GB range. TSCAn Top of System Control Area. For any given n, BSCAn to TSCAn cannot span from the first 4 GB of address space to the second. BSMn Bottom of System Memory Space n (n=0, 1, 2,...); BSM0 = 0. The OF property “reg” in the OF device tree for the Memory Controller’s node contains the value of BSMn. TSMn Top of System Memory Space n (n=0, 1, 2,...). The value of TSMn can be determined by adding the value of BSMn as found in the Memory Controller’s node of the OF device tree to the value of the size of that area as found in the same property, and then subtracting 1. BTTAn,m Bottom of TCE Translatable Address space m (m=0, 1, 2,...) for HBn (n=0, 1, 2,...) as viewed from the I/O side of HBn. This is the bottom of an address range that is translatable by a Translation Control Entry (TCE) table. The value of BTTAn,m is obtained from the “ibm,dma-window” or “ibm,my-dma-window” property in the OF device tree. TTTAn,m Top of TCE Translatable Address space m (m=0, 1, 2,...) for HBn (n=0, 1, 2,...) as viewed from the I/O side of HBn. This is the top of an address range that is translatable by a TCE table. The range BTTAn,m to TTTAn,m is not accessible by more than one PE for any given “n”. The value of TTTAn,m can be determined by adding the size of the area as found in the OF property “ibm,dma-window” or “ibm,my-dma-window” in the OF device tree for HBn to the value of BTTAn,m found in that same property and then subtracting 1.
The figures found in , show examples of the areas referenced by the labels in . The OS and other software should not use fixed addresses for these various areas. A given platform may, however, make some of these addresses unchangeable. Each of these areas is defined in the OF device tree in the node of the appropriate controller. This gives platforms the most flexibility in implementing the System Address Map to meet their market requirements. R2--1. All unavailable addresses in the Peripheral Memory and Peripheral I/O Spaces must be conveyed in the OF device tree. A “device_type” of “reserved” must be used to specify areas which are not to be used by software and not otherwise reported by OF. Shadow aliases must be communicated as specified by the appropriate OF bus binding. R2--2. There must not be any address generated by the system which causes the system to hang. Hardware Implementation Note: The reason for Requirement is to reserve address space for registers used only by the firmware or addresses which are used only by the hardware.
Address Decoding (or Validating) and Translation In general, different components in the hardware are going to decode the address ranges for the various areas. In some cases the component may be required to translate the address to a new address as it passes through the component. The requirements, below, describe the various system address decodes (or validating) and, where appropriate, what address transforms take place outside of the processor. The HB requirements in this section refer to HBs which are defined by this architecture. Currently, there is only one HB defined by this architecture, and that is the PHB. HBs which implement I/O buses other than those defined by this architecture may or may not require changes to this addressing model. The reader may want to reference the example address maps found in , while reading through the requirements of this section.
<emphasis>Load</emphasis> and <emphasis>Store</emphasis> Address Decoding and Translation Load and Store operations may be targeted at System Memory or I/O. The latter is called Memory Mapped I/O (MMIO). R2--1. Processor Load and Store operations must be routed and translated as shown in . Processor Bus Address Space Decoding and Translation Address Range at Processor Bus Route and Translation Requirements Other Requirements and Comments BSCAn to TSCAn (n=0, 1) To ROM controller or to a platform dependent area. Translation dependent on implementation. Areas other than ROM are reserved for firmware use, or have their address passed by the OF device tree. BIOn to TIOn (n=0,1, 2,...) Send through the HB to the I/O space of the I/O bus, translating by subtracting the value of BIO from each address in this range (that is, translate BIO to TIO to be at 0 to (TIO - BIO) on the I/O side).   BPMn,m to TPMn,m (n=0, 1, 2,...) (m=0, 1) Send through HBn to the Memory Space of the I/O bus. If BPMn,m < 4 GB, do not translate an address in the BPMn,m to TPMn,m range as the transaction passes through the bridge (that is, BPM’n,m = BPMn,m and TPM’n,m = TPMn,m). If BPMn,m is at or above 4 GB then if BPM’n,m is to be below 4 GB (for 32-bit IOAs) then translate addresses in the BPMn,m to TPMn,m range so that this address range becomes BPM’n,m to TPM’n,m (where BPM’n,m and TPM’n,m are less than 4 GB) as the transaction passes through the bridge, otherwise do not translate an address in the BPMn,m to TPMn,m range as the transaction passes through the bridge (for 64-bit IOAs which are configured at or above 4 GB). Platforms that need to support both 32-bit capable and 64-bit capable IOAs and do not want to configure the 64-bit capable IOAs below 4 GB need to support two Peripheral Memory spaces per HB. BSMm to TSMm (m>0) To System Memory Space m, no translation. Can be at or above 4 GB, or below BSCA0. 0 to TSM0 To System Memory Space 0, no translation   All other addresses See . Access is to undefined space.
R2--2. There must be no architected address spaces (Peripheral Memory, Peripheral I/O, SCA, or System Memory) which span the (4GB - 1) to 4 GB boundary. R2--3. The following are the System Control Area requirements: The platform must have at most one System Control Area below 4 GB and at most one per platform or per NUMA node at or above 4 GB. The System Control Area must not overlap with the System Memory Space(s), Peripheral Memory Space(s), or the Peripheral I/O Space(s) in the platform. R2--4. The following are the System Memory Space requirements: Each platform must have at least one System Memory Space. The System Memory Space(s) must not overlap with the Peripheral I/O Space(s), Peripheral Memory Space(s), the System Control Area, or other System Memory Space(s) in the platform. The first System Memory Space must start at address 0 (BSM0 = 0), must be at least 128 MB before a second System Memory Space is added and must be contiguous. Each of the additional (optional) System Memory Space(s) must start on a 4 KB boundary. Each of the additional (optional) System Memory Space(s) must be contiguous within itself. There must be at most eight System Memory Spaces below BSCA0 and at most eight at or above 4 GB. If multiple System Memory Spaces exist below 4 GB, then they must not have any Peripheral Memory or Peripheral I/O Spaces interspersed between them and if multiple System Memory Spaces exist above 4 GB, then they must not have any Peripheral Memory or Peripheral I/O Spaces interspersed between them. R2--5. The following are the Peripheral Memory Space requirements: The Peripheral Memory Space(s) must not overlap with the System Memory Space(s), Peripheral I/O Space(s), the System Control Area, or other Peripheral Memory Space(s) in the platform. The size of each Peripheral Memory Space (TPMn,m - BPMn,m + 1) must be a power of two for sizes up to and including 256 MB, with the minimum size being 1 MB, and an integer multiple of 256 MB plus a power of two which is greater than or equal to 1 MB for sizes greater than 256 MB (for example, 1 MB, 2 MB, 4 MB, 8 MB, 16 MB, 32 MB, 64 MB, 128 MB, 256 MB, (256 + 1) MB, (256 + 2) MB,..., (512 + 1) MB,...). The boundary alignment for each Peripheral Memory Space must be an integer multiple of the size of the space up to and including 256 MB and must be an integer multiple of 256 MB for sizes greater than 256 MB. There must be at most two Peripheral Memory Spaces per HB. If the Peripheral Memory Space for a HB is below 4 GB, then the address must not be translated as it passes through the HB from the system side to the I/O side of the HB (see ). If the Peripheral Memory Space for a HB is above 4 GB, then the address may or may not be translated as it passes through the HB from the system side to the I/O side of the HB, but if it is translated, then the translated address range must be aligned on a boundary which is an integer multiple of the size of the Peripheral Memory Space. Implementation Note: Relative to Requirement , not all OSs can support BPM’ to TPM’ being above 4 GB. R2--6. The following are the Peripheral I/O Space requirements: The Peripheral I/O Space(s) must not overlap with the System Memory Space(s), Peripheral Memory Space(s), the System Control Area, or other Peripheral I/O Space(s) in the platform. The size of each Peripheral I/O Space (TIOn - BIOn + 1) must be a power of two with the minimum size being 64 KB (that is, sizes of 64 KB, 128 KB, 256 KB, 512 KB, 1 MB, 2 MB, 4 MB, 8 MB, 16 MB, 32 MB, 64 MB, and so on, are acceptable). The boundary alignment for each Peripheral I/O Space must be an integer multiple of the size of the space. There must be at most one Peripheral I/O Space per HB. R2--7. All System Memory must be accessible via DMA operation from all IOAs in the system, except where LPAR requirements limit accessibility of an IOA belonging to one partition to the System Memory of another partition. Hardware Implementation Notes: Memory controller and memory card designers who are designing for 64-bit platforms should be careful to consider that the amount of I/O space below 4 GB is reduced by the amount of System Memory space below 4 GB. Therefore it may be prudent to design the hardware to allow minimization of the amount of System Memory below 4 GB, in order to allow maximization of the space for 32-bit Peripheral Memory and Peripheral I/O spaces below 4 GB. The beginning addresses and sizes of the Peripheral I/O Space(s) and Peripheral Memory Space(s), are controlled by firmware. Information about the address map is reported by the OF Device Tree or, for items that can change, through RTAS calls (for example, for Dynamic Reconfiguration, through the ibm,configure-connector RTAS call). Certain System Memory addresses must be reserved in all systems for specific uses (see and for more information).
DMA Address Validation and Translation is a representation of how the validation and translation mechanism works, along with a description of the steps which are involved. At the core of the translation mechanism is the Translation and Control Entry (TCE) table.
PE DMA Address Validation and Translation in the Platform
DMA Addressing Requirements R2--1. Upon receiving a DMA transaction to the Memory Space of an I/O bus, the HB must perform the validation and translation steps, as indicated in and in . DMA Address Decoding and Translation (I/O Bus Memory Space) Address Range at I/O Side of HBn Route and Translation Requirements Other Requirements and Comments BPM’n,m to TPM’n,m (n=0, 1, 2,...) (m=0, 1) (note 1) HB does not respond or responds and signals an invalid address error (See ).   BTTAn,m to TTTAn,m (n=0, 1, 2,...) (m=0, 1, 2,...) (note 1) If the PE that is trying to access this space is allowed to access this space, then translate via the TCE table (as specified in ) and pass the translated address through the HB, otherwise generate an invalid address or TCE extent error, as appropriate (See ). See Notes 2, 3 All other addresses Generate an invalid address error (See ). See Note 3 Notes: n = # of HB Viewing or Receiving the Operation, m = # of instance within the HB. After translation of the address, if the translated address would re-access the same HB or another HB (for example, is in the Peripheral Memory Space or Peripheral I/O Space of that HB or another HB), then the HB generates an invalid address error (See ). If the Enhanced I/O Error Handling (EEH) option is implemented and enabled, then on an error, the PE will enter the DMA Stopped State (See ).
R2--2. An HB must not act as a target for operations in the I/O Space of an I/O bus.
DMA Address Translation and Control via the TCE Mechanism This architecture defines a Translation and Control Entry (TCE) mechanism for translating and controlling DMA addresses. There are several reasons for doing such translations, including: To provide a mechanism for increasing the number of addressing bits for some IOAs. For example, IOAs which are only capable of accessing up to 4 GB via DMA need a way to access above that limit when used in 64-bit addressing systems and the addressing requirements go beyond 4 GB. To provide a redirection mechanism. A redirection mechanism is needed, even for 64-bit addressing capable IOAs, in order to provide the protection and indirection benefits provided by such a translation. The description of how the access to the TCE table occurs, for the translation of a 32-bit address and using a 4 KB I/O page size, follows. The most significant 20 bits of the address (for example, AD[31:12], for PCI) is used as an offset into the TCE table for the PE to select the TCE. Thus, the first TCE maps the addresses BTTAn to BTTAn + 0x00000FFF of the Memory Space of the I/O bus; the second entry controls translation of addresses BTTAn + 0x00001000 to BTTAn + 0x00001FFF, and so on. The translated real system address is generated as follows. The Real Page Number (RPN) from the TCE replaces the 20 most significant bits of the address from the I/O bus. The least significant 12 bits from the I/O bus address are used as-is for the least significant 12 bits of the new address. Thus, the TCE table entries have a one-to-one correspondence with the first n pages of the Memory Space of the I/O bus starting at BTTAn that corresponds to the TCE table. The size of the Memory address space of the I/O bus that can be mapped to the system address space for a particular HB depends on how much System Memory is allocated to the TCE table(s) and on how much mappable I/O bus Memory Space is unavailable due to IOAs which are mapped there. Each TCE also contains two control bits. These are used to identify whether that page is mapped to the system address space, and if the page is mapped, whether it is mapped read/write, read only, or write only. See the for a definition of these control bits. The TCE table is the analogue of the system translation tables. However, unlike the system translation tables, the dynamic page faulting of memory during an I/O operation is not required (the page fault value, 0b00, in the TCE Page Mapping and Control field is used for error detection; that is, access to an invalid TCE by the I/O creates an error indication to the software). The size and location of the HB’s TCE table is set up and changed only by the firmware. R2--1. The platform must provide the “64-bit-addressing” and “ibm,extended-address” OF properties in all HB nodes of the device tree and the “ibm,extended-address” OF property in the root node of the OF device tree. R2--2. The bits of the TCE must be implemented as defined in . TCE Definition Bits Description 0 to 51 RPN: If the page mapping and control field of the TCE indicate anything other than page fault, then these bits contain the Real Page Number (RPN) to which the bus address is mapped in the system address space. In certain HB implementations, all of these bits may not be required, however enough bits must be implemented to match the largest real address in the platform. 52 to 61 Reserved for future use. 62 to 63 Page Mapping and Control: These bits define page mapping and read-write authority. They are coded as follows: 00 Page fault (no access) 01 System address space (read only) 10 System address space (write only) 11 System address space (read/write) Code point 0b00 signifies that the page is not mapped. It must be used to indicate a page fault error. Hardware must not change its state based on the value in the remaining bits of a TCE when code point 0b00 is set in this field of the TCE. For accesses to system address space with an invalid operation (write to a read-only page or read to a write-only page), the HB generates an error. See for more information about error handling.
R2--3. If the address that the HB would use to access the TCE table (in order to get the TCE) would access outside of the TCE table, then the HB must create a TCE extent error (See ). R2--4. Enough bits must be implemented in the TCE so that DMA IOAs are able to access all System Memory addresses. R2--5. Each PE must have its own independent TCE table. R2--6. Any non-recoverable error while an HB is accessing its TCE table must result in a TCE access error; the action to be taken by the HB being defined under the TCE access error in . R2--7. In implementations which cache TCEs, if software changes a TCE, then the platform must perform the following steps: First, if any data associated with the page represented by that TCE is in an I/O bridge cache or buffer, the hardware must write the data, if modified, to System Memory. Secondly, it must invalidate the data in the cache. Finally, it must invalidate the TCE in the cache. R2--8. Neither an IOA nor an HB must ever modify a TCE. R2--9. If the page mapping and control bits in the TCE are set to 0b00, the hardware must not change its state based on the values of the remaining bits of the TCE. R2--10. The OS must initialize all its TCEs upon receiving control from the platform.
Example Address Maps shows how to construct a simple address map with one PHB and with Peripheral Memory, Peripheral I/O, and SCA spaces below 4 GB. shows how to construct the address map with Peripheral Memory, Peripheral I/O, and SCA spaces above 4 GB. This configuration allows some overlap of the System Memory space and 32-bit I/O bus memory space (with the resulting loss of the TCE table in the overlap), while moving some of the SCA spaces above 4 GB. Several things can be noted from this configuration: I/O bus memory areas can overlap System Memory addresses (see memory space of PHB0). However, significant overlap of these I/O bus memory areas and the TCE table may significantly reduce the amount of TCE table space that is available for mapping I/O memory space to system address space (a potential performance impact). The System Memory which is above 4GB is shown starting at 4GB. This architecture also allows this to be pushed further up, with Peripheral Memory, Peripheral I/O, and SCAs existing above 4 GB and below the System Memory areas. BPM’n,m to TPM’n,m spaces for different PHBs (different “n”) are allowed to occur at the same memory addresses in the various memory spaces of different I/O buses, but are not required to do so (and are not shown as the same in the figure). Implementations are likely have BPM’n,m to TPM’n,m at the same address range for all “n” when the BPM’n,m to TPM’n,m ranges are below 4 GB.
Example Address Map: One PHB, Peripheral Memory and Peripheral I/O Spaces below 4 GB
Example Address Map: Four PHBs, all Peripheral Memory and Peripheral I/O Spaces above 4GB