A Protocol for VMC Communications
Overview The Virtual Management Channel (VMC) is a logical device which provides an interface between the hypervisor and a management partition. This management partition is intended to provide an alternative to HMC-based system management. In the management partition, a Logical Partition Manager (LPM) application exists which enables a system administrator to configure the system’s partitioning characteristics via a command line interface or web browser. Support for conventional HMC management of the system may still be provided on a system; however, when an HMC is attached to the system, the VMC interface is disabled by the hypervisor.
Logical Partition Manager The LPM is a browser based LPAR configuration tool provided by the management partition. System configuration, maintenance, and control functions which traditionally require an HMC can be implemented in the LPM using a combination of HMC to hypervisor interfaces and existing operating system methods. This tool provides a subset of the functions implemented by the HMC and enables basic partition configuration. The set of HMC to hypervisor messages supported by the LPM component are passed to the hypervisor over a VMC interface, which is defined below. The actual content of these messages is defined in other documentation. In order to remain consistent with this existing HMC documentation, this chapter generally uses the HMC terminology to refer to these messages and the LPM to hypervisor connections.
Virtual Management Channel (VMC) A logical device, called the virtual management channel (VMC), is defined for communicating between the LPM application and the hypervisor. This device, similar to a VSCSI server device, is presented to a designated management partition as a virtual device and is only presented when the system is not HMC managed. This communication device borrows aspects from both VSCSI and ILLAN devices and is implemented using the CRQ and the RDMA interfaces. The initialization process for CRQs is defined in , and is not duplicated here. A three way handshake is defined that must take place to establish that both the hypervisor and management partition sides of the channel are running prior to sending/receiving any of the protocol messages defined in this chapter. Transport Event CRQs are also defined in , and are not duplicated here. They define the CRQ messages that are sent when the hypervisor detects one of the peer partitions has abnormally terminated, or one side has called H_FREE_CRQ to close their CRQ. Two new classes of CRQ messages are introduced for the VMC device. VMC Administrative messages are used for each partition using the VMC to communicate capabilities to their partner. HMC Interface messages are used for the actual flow of HMC messages between the management partition and the hypervisor. As most HMC messages far exceed the size of a CRQ bugger, a virtual DMA (RMDA) of the HMC message data is done prior to each HMC Interface CRQ message. Only the management partition drives RDMA operations; hypervisor never directly causes the movement of message data.
VMC CRQ Message Definition For the VMC interface, all CRQ messages are defined to use the following base format: CRQ Message Base Format 1 B 1 B 14 B Header Type Data
Two general message formats are defined: administrative and HMC Interface. These are defined in the following sections.
Administrative Messages The administrative message format is used to configure a VMC between the hypervisor and the management partition. The two messages defined for this format are described in the following subsections.
VMC Capabilities VMC Capabilities Message 1 B 1 B 1 B 2 B 1 B 2 B 4 B 2 B 2 B 0x80 0x01 Reserved Reserved # HMC’s Pool Size MTU CRQ Size Version (Major/Minor)
The capabilities message is an administrative message sent after the CRQ initialization sequence of messages and is used to exchange VMC capabilities between the management partition and the hypervisor. The management partition must send this message and the hypervisor must respond with a VMC Capabilities Response message before HMC interface messages can begin. Any HMC interface messages received before the exchange of capabilities has completed are dropped. This message enables the management partition and the hypervisor to trade the following interface parameters: # HMC’s. Maximum number of independent HMC connections supported. Multiple connections would be required to support HMC pass through mode. Pool Size. Maximum number of buffers supported per HMC connection. MTU. Maximum message size supported (bytes). CRQ Size. Number of entries available in the CRQ for the source partition. The target partition must limit the number of outstanding messages to one half or less. Version. Indicates the code level of the management partition or the hypervisor with the high-order byte indicating a major version and the low-order byte indicating a minor version.
VMC Capabilities Response VMC Capabilities Response Message 1 B 1 B 1 B 2 B 1 B 2 B 4 B 2 B 2 B 0x80 0x81 Status Reserved # HMC’s Pool Size MTU CRQ Size Version (Major/Minor)
This command is sent by the hypervisor in response to the VMC Capabilities message. This command enables the hypervisor to inform the management partition of the values it supports. Parameters are identical to the VMC Capabilities message, with the addition of the following field: Status. Zero is success. On failure, one of the following is returned: 1 - General failure 2 - Invalid version The hypervisor and the management partition use the minimum value supported by each side for the parameters negotiated with the capabilities message exchange.
HMC Interface Buffers Buffers are used to transfer data between the management partition and the hypervisor. Many of the HMC Interface messages defined in following sections indicate buffers that contain data that must be transferred. Note the following: All buffers exist in the hypervisor memory, and data is moved between the hypervisor and the management partition by the management partition issuing H_COPY_RDMA. To enable the management partition to access each buffer, the hypervisor must allocate virtual TCEs as well as the actual buffer storage. Each buffer is at least the minimum negotiated MTU bytes long. Buffers are always owned by either the management partition or the hypervisor. Management partition-owned buffers are used for messages (both commands and responses) sent to the hypervisor from the management partition. The hypervisor-owned buffers are used for messages (both responses and asynchronous events) sent from the hypervisor to the management partition. Each LPM interface message carrying HMC protocol (either direction) also carries a buffer, and the ownership of this buffer transfers from sender to receiver. There are no CRQ responses to the CRQ messages carrying HMC protocol. The HMC protocol responses are carried in a message sent from the other direction. The maximum depth of the buffer pool is the minimum value negotiated via the capabilities exchange. For each of the HMC interface commands, Buffer ID is used to identify the transfer buffer and ranges from 0 to the minimum negotiated pool size - 1. There is a separate buffer pool for each LPM connection, each with the negotiated number of buffers.
HMC Interface Messages There are several different HMC Interface messages, as defined in the following sections. Each CRQ message has a unique HMC Interface message type, and the HMC Interface message type defines the format for the remaining 14 bytes of data.
Interface Open Interface Open Command Message 1 B 1 B 1 B 1 B 1B 1 B 2 B 8 B 0x80 0x02 Reserved Reserved HMC Sn HMC Idx Buffer ID Reserved
This command is sent by the management partition as the result of a management partition device request. It causes the hypervisor to prepare a set of data buffers for the LPM connection indicated by HMC Idx (HMC index). A unique HMC Idx would be used if multiple management applications running concurrently were desired. Before responding to this command, the hypervisor must provide the management partition with at least one of these new buffers (see the Add Buffer message defined below). The HMC Sn (HMC Session) field is used as a session identifier for the current VMC connection. If the management partition disconnects (for example as the result of a crash in the LPM application), the next open of the VMC device will result in the next HMC Sn value in the range from 1 to 255 being used. This message is issued after the capabilities exchange has successfully completed and the hypervisor has issued an Add Buffer command to create a buffer for the management partition for use in establishing an LPM connection. The management partition sends the unique 32-byte HMC ID to the hypervisor via an RDMA using the buffer established by the hypervisor.
Interface Open Response Interface Open Response Message 1 B 1 B 1 B 1 B 1B 1 B 2 B 8 B 0x80 0x82 Status Reserved HMC Sn HMC Idx Buffer ID Reserved
This command is sent by the hypervisor in response to the Interface Open message. The status of the open command is returned in the Status field. Zero is success. On failure, the following is returned: 1 - General failure When this message is received, the indicated buffer is again available for management partition use.
Interface Close Interface Close Message 1 B 1 B 1 B 1 B 1B 1 B 2 B 8 B 0x80 0x03 Reserved Reserved HMC Sn HMC Idx Reserved Reserved
This command is sent by the management partition to terminate an LPM to hypervisor connection. When this command is sent, the management partition has quiesced all I/O operations to all buffers associated with this LPM connection, and has freed any storage for those buffers.
Interface Close Response Interface Close Response Message 1 B 1 B 1 B 1 B 1B 1 B 2 B 8 B 0x80 0x83 Status Reserved HMC Sn HMC Idx Reserved Reserved
This command is sent by the hypervisor in response to the LPM Interface Close message. The status of the close command is returned in the Status field. Zero is success. On failure, the following is returned: 1 - General failure
Add Buffer Add Buffer Message 1 B 1 B 1 B 1 B 1B 1 B 2 B 4 B 4 B 0x80 0x04 Reserved hypervisor HMC Sn HMC Idx Buffer ID Reserved Buffer LIOBA
This message transfers a buffer from hypervisor ownership to management partition ownership. The LIOBA is obtained from the virtual TCE table associated with the hypervisor side of the VMC device, and points to a buffer of size MTU (as established in the capabilities exchange). The hypervisor field is set to 0 if the buffer being added is to be used by the management partition for messages inbound to the hypervisor, and to 1 if the buffer being added is to be used for messages outbound from the hypervisor. The typical flow for adding buffers: A new LPM connection is opened by the management partition. The hypervisor assigns new buffers for the traffic associated with that connection. The hypervisor sends VMC Add Buffer messages to the management partition, informing it of the new buffers. The hypervisor sends an HMC protocol message (to the LPM application) notifying it of the new buffers. This informs the application that it has buffers available for sending HMC commands.
Add Buffer Response Add Buffer Response Message 1 B 1 B 1 B 1 B 1B 1 B 2 B 8 B 0x80 0x84 Status Reserved HMC Sn HMC Idx Buffer ID Reserved
This command is sent by the management partition to the hypervisor in response to the Add Buffer message. The Status field indicates the result of the command. Zero is success. On failure, one of the following is returned: 1 - General failure 2 - Invalid HMC Index 3 - Invalid Buffer ID 4 - HMC connection has closed
Remove Buffer Remove Buffer Message 1 B 1 B 1 B 1 B 1B 1 B 2 B 8 B 0x80 0x05 Reserved Reserved HMC Sn HMC Idx Reserved Reserved
This message requests an HMC buffer to be transferred from management partition ownership to hypervisor ownership. The management partition may not be able to satisfy the request at a particular point in time if all its buffers are in use. The management partition requires a depth of at least one inbound buffer to allow LPM commands to flow to the hypervisor. It is, therefore, an interface error for the hypervisor to attempt to remove the management partition's last buffer. The hypervisor is expected to manage buffer usage with the LPM application directly and inform the management partition when buffers may be removed. The typical flow for removing buffers: The LPM application no longer needs a communication path to a particular hypervisor function. That function is closed. The hypervisor and the LPM application quiesce all traffic to that function. The hypervisor requests a reduction in buffer pool size. The LPM application acknowledges the reduction in buffer pool size. The hypervisor sends a Remove Buffer message to the management partition, informing it of the reduction in buffers. The management partition verifies it can remove the buffer. This is possible if buffers have been quiesced.
Remove Buffer Response Remove Buffer Response Message 1 B 1 B 1 B 1 B 1B 1 B 2 B 8 B 0x80 0x85 Status Reserved HMC Sn HMC Idx Buffer ID Reserved
This command is sent by the management partition to the hypervisor in response to the Remove Buffer message. The Buffer ID field indicates which buffer the management partition selected to remove. The Status field indicates the result of the command. Zero is success. On failure, the following is returned: 1 - General failure 2 - Invalid HMC Index 3 - No buffer found
Signal Message Signal Message 1 B 1 B 1 B 1 B 1B 1 B 2 B 4 B 4 B 0x80 0x06 Reserved Reserved HMC Sn HMC Idx Buffer ID Reserved Msg Len
This command is sent between the management partition and the hypervisor in order to signal the arrival of an HMC protocol message. The command can be sent by both the management partition and the hypervisor. It is used for all traffic between the LPM application and the hypervisor, regardless of who initiated the communication. There is no response to this message.
Example Management Partition VMC Driver Interface This section provides an example for the LPM implementation where a device driver is used to interface to the VMC device. This driver consists of a new device, for example /dev/lparvmc, which provides interfaces to open, close, read, write, and perform ioctl’s against the VMC device.
VMC Interface Initialization The device driver is responsible for initializing the VMC when the driver is loaded. It first creates and initializes the CRQ. Next, an exchange of VMC capabilities is performed to indicate the code version and number of resources available in both the management partition and the hypervisor. Finally, the hypervisor requests that the management partition create an initial pool of VMC buffers, one buffer for each possible HMC connection, which will be used for LPM session initialization. Prior to completion of this initialization sequence, the device returns EBUSY to open() calls. EIO is returned for all open() failures.
VMC Interface Initialization
VMC Interface Open After the basic VMC channel has been initialized, an HMC session level connection can be established. The application layer performs an open() to the VMC device and executes an ioctl() against it, indicating the HMC ID (32 bytes of data) for this session. If the VMC device is in an invalid state, EIO will be returned for the ioctl(). The device driver creates a new HMC session value (ranging from 1 to 255) and HMC index value (starting at index 0 and potentially ranging to 254 in future releases) for this HMC ID. The driver then does an RDMA of the HMC ID to the hypervisor, and then sends an Interface Open message to the hypervisor to establish the session over the VMC. After the hypervisor receives this information, it sends Add Buffer messages to the management partition to seed an initial pool of buffers for the new HMC connection. Finally, the hypervisor sends an Interface Open Response message, to indicate that it is ready for normal runtime messaging. The following illustrates this VMC flow:
VMC Interface Open
VMC Interface Runtime During normal runtime, the LPM application and the hypervisor exchange HMC messages via the Signal VMC message and RDMA operations. When sending data to the hypervisor, the LPM application performs a write() to the VMC device, and the driver RDMA’s the data to the hypervisor and then sends a Signal Message. If a write() is attempted before VMC device buffers have been made available by the hypervisor, or no buffers are currently available, EBUSY is returned in response to the write(). A write() will return EIO for all other errors, such as an invalid device state. When the hypervisor sends a message to the LPM, the data is put into a VMC buffer and an Signal Message is sent to the VMC driver in the management partition. The driver RDMA’s the buffer into the partition and passes the data up to the appropriate LPM application via a read() to the VMC device. The read() request blocks if there is no buffer available to read. The LPM application may use select() to wait for the VMC device to become ready with data to read.
VMC Interface Runtime
VMC Interface Close HMC session level connections are closed by the management partition when the application layer performs a close() against the device. This action results in an Interface Close message flowing to the hypervisor, which causes the session to be terminated. The device driver must free any storage allocated for buffers for this HMC connection.
VMC Interface Close