sg247039

sg247039

2

Chapter 2.

Partitioning implementation

򐂰

򐂰

򐂰

򐂰

򐂰

This chapter explains the partitioning implementation on

Sserver

p5 and

OpenPower servers and includes the following sections:

2.1, “Partitioning enablers” on page 26

2.2, “Partition resources” on page 32

2.3, “Resource planning using LPAR Validation Tool” on page 42

2.4, “I/O device assignment considerations” on page 46

2.5, “LPAR limitations and considerations” on page 53

© Copyright IBM Corp. 2003, 2004, 2005. All rights reserved.

25

2.1 Partitioning enablers

With the introduction of the

Sserver

p5 and OpenPower servers, the flexibility and adaptability of the partitioned environment has increased significantly. Each logical partition can now support both dedicated and virtualized server resources, while maintaining the strict isolation of the assigned resources from other partitions in the server. To achieve this flexibility, the hardware and firmware system components work together to support the available partitioning implementations on

Sserver

p5 and OpenPower servers.

Note: In the remainder of this book, when we refer to available partitioning

implementations, we include both dedicated and virtualized partitions in the server environment.

2.1.1 Hardware

This section discusses the hardware components that support the available partitioning environments on partitioning-capable pSeries servers.

POWER5 processor

As the next generation of dual-core 64-bit processors, the POWER5 continues to incorporate IBM proven fabrication technologies such as silicon-on insulator and copper interconnection. The newer design also includes increased processor performance and an enhanced computing approach through improved granularity by way of the Advanced POWER Virtualization technology features. In addition to these characteristics, POWER5 provides the following functions to support all of the available partitioned environments:

򐂰

POWER Hypervisor

The POWER5 processor supports a special form of instructions. These

instructions are exclusively used by new controlling firmware called the

POWER Hypervisor. The hypervisor performs the initialization and configuration of the processors on partition activation. It provides privileged and protected access to assigned partition hardware resources and enables the use of Advanced POWER Virtualization features. The hypervisor receives and responds to these requests using specialized hypervisor calls. The

hypervisor functions are discussed in “Firmware” on page 29.

Note: This book uses the term hypervisor to refer to the POWER Hypervisor.

26

Partitioning Implementations for IBM

E server

p5 Servers

򐂰

򐂰

Simultaneous multi-threading (SMT)

SMT allows for better utilization and greater performance of a POWER5 processor by executing two separate threads simultaneously on an

SMT-enabled processor. These threads are switched if one of the threads experiences a stalled or long latency event, such as memory miss or disk data fetch. The SMT feature works with both dedicated and shared processor resources and provides an estimated performance increase of 30%.

Dynamic Power Management

To address one of the most important issues facing modern Complementary

Metal Oxide Semiconductor chip designs, the POWER5 processor has newly embedded logic which reduces the amount of power that the processor consumes. The techniques developed to achieve this reduction in power usage were reducing switch power during idle clock cycles and a low-power mode of operation, which causes both processor threads to be set to the lowest possible priority.

Note: SMT is active on the POWER5 processor when combined with AIX 5L

Version 5.3 or SUSE Linux.

Capacity on Demand

To enable customers to meet temporary, permanent, or unexpected computing workloads, certain

Sserver

p5 servers provide a number of enhanced Capacity on Demand methods, namely for processors. These include the following:

򐂰

򐂰

򐂰

Capacity Upgrade on Demand

With this feature, additional processors and memory are shipped with the server and can be later activated in minimum increments of one.

On/Off Capacity on Demand

If ordered, this feature allows you to temporarily activate a minimum of one processor to meet peak system loads. With this feature, you must report your on and off activity to IBM at least once a month.

Reserve Capacity on Demand

This feature further enhances the micro-partition flexibility of the

Sserver

p5 servers, by allowing you to order a number of inactive processors, which are placed in the shared processor pool as reserved processors. When the assigned or available processor resources that are used in uncapped partitions reaches 100%, additional processing days (normally a 24 hours period) are subtracted from a pre-paid number of processor days.

Chapter 2. Partitioning implementation

27

򐂰

Trial Capacity on Demand

This option provides you with a one-time, no-cost processor activation for a maximum period of 30 consecutive days. It is available as a complementary service when access to Capacity on Demand resources is required immediately.

There are basic rules which apply for each specific

Sserver

p5 server that supports Capacity on Demand. For more information, refer to the technical

overview documentation for the individual server as shown in “Related publications” on page 301.

Note: At the time of the writing of this book, only p5-550 and p5-570 systems

with a 2-way 1.65 GHz POWER5 and above processor card support the

Capacity Upgrade on Demand options. All p5-590 and p5-595 also support these features.

Interrupt controller

The interrupt controller that manages the peripheral interrupt requests to the processors works in a fashion similar to other pSeries SMP servers. In a partitioned environment, the interrupt controller supports multiple global interrupt queues, which can be individually programmed to send external interrupts only to the set of processors that are allocated to a specific partition. Therefore, the processors in a partition can only receive interrupt requests from devices inside their partition.

PCI host bridges

The PCI host bridges control the PCI slots in the I/O drawers, as in conventional

Sserver

p5 servers. The PCI host bridges use translation control entry (TCE) tables for the I/O address to memory address translation in order to perform direct memory access (DMA) transfers between memory and PCI adapters. The

TCE tables are allocated in the physical memory.

In a partitioned environment, the hypervisor controls the DMA addressing to the partition memory for all I/O devices in all partitions. The hypervisor uses central

TCE tables for all I/O devices, which are located outside of the memory of the partitions. The hypervisor can manage as many TCE tables as it needs. For example, each PCI host bridge could have its own TCE table. The number of

TCEs needed, and thus the number of TCE tables, is a function of the number of

PCI host bridges and slots. The address mapping is protected on a per-adapter basis. The PCI host bridges that are used in

Sserver

p5 and OpenPower servers support the control of the PCI adapter DMA by the hypervisor.

28

Partitioning Implementations for IBM

E server

p5 Servers

The key point is that a logical partition is only given a window of TCEs per I/O slot that are necessary to establish DMA mappings for the device in that slot. The hypervisor controls TCE accesses by the operating system to ensure that they are to a window owned by the partition.

Error handling

The basis of the Enhanced Error Handling function is to isolate and limit the impact of hardware errors or failures to a single partition. To further enhance the

Reliability, Availability, and Serviceability capabilities, the POWER5 processor has improved the following features:

򐂰

򐂰

򐂰

򐂰

Most firmware updates enable the system to remain operational.

Error checking and correcting has been extended to inter-chip connections for the Fabric and Processor bus.

Partial L2 cache de-allocation is possible.

The number of L3 cache line deletes improved from two to ten for better self-healing capability.

Service processor

All the

Sserver

p5 and OpenPower server models have an enhanced service processor compared to existing pSeries models. The two major enhancements of the

Sserver

p5 and OpenPower SP are:

򐂰

The Hardware Management Console (HMC) communicates with the physical server using an Ethernet network

򐂰 The power control of the attached I/O subsystems occurs from the HMC using system power control network connection(s).

2.1.2 Firmware

Support of partitioning on

Sserver

p5 and OpenPower servers requires the new firmware-based hypervisor, partition Open Firmware, and Run-Time Abstraction

Service. See Figure 2-2 on page 32.

POWER Hypervisor

The hypervisor is a component of the global firmware. It owns the partitioning model and the resources that are required to support this model. The hypervisor enables the use of Virtual I/O Server and other Advanced POWER5

Virtualization features.

When the Advanced POWER Virtualization hardware feature is specified with the initial system order, the firmware is shipped pre-activated to support

Micro-Partitioning technology and the Virtual I/O Server. For upgrade orders, a

Chapter 2. Partitioning implementation

29

key is shipped to enable the firmware, similar to the Capacity Upgrade on

Demand key.

Figure 2-1 shows the HMC panel where the Virtualization Engine™ Technologies

are enabled.

Figure 2-1 HMC panel to enable the Virtualization Engine Technologies

򐂰

򐂰

򐂰

򐂰

򐂰

򐂰

򐂰

򐂰

򐂰

򐂰

򐂰

In addition to the Page Frame Table and Translation Control Entry, the hypervisor also handles the following system service calls:

Virtual memory management

Debugger support

Virtual terminal support

Processor register hypervisor resource access

Dump support

Debugger support

Memory migration support

Performance monitor support

Virtualization I/O interrupts

Micro-Partitioning scheduling

Dynamic LPAR operations

30

Partitioning Implementations for IBM

E server

p5 Servers

The hypervisor is a callable, active, interrupt-driven service in

Sserver

p5 and

OpenPower systems. This differs from POWER4-based systems where the hypervisor was a passive callable library.

The hypervisor resides outside of the partition system memory in the first physical memory block at physical address zero. This first physical memory block is not usable by any of the partition operating systems in a partitioned environment.

Note: For information about the hypervisor, see Chapter 6, “The POWER

Hypervisor” on page 217.

Open Firmware

A

Sserver

p5 and OpenPower server has one instance of Open Firmware both when in the partitioned environment and when running as a full system partition.

Open Firmware has access to all devices and data in the system. Open

Firmware is started when the system goes through a power-on reset. Open

Firmware, which runs in addition to the hypervisor in a partitioned environment, runs in two modes: global and partition. Each mode of Open Firmware shares the same firmware binary that is stored in the flash memory.

In a partitioned environment, Open Firmware runs on top of the global Open

Firmware instance. The partition Open Firmware is started when a partition is activated. Each partition has its own instance of Open Firmware and has access to all the devices assigned to that partition. However, each instance of Open

Firmware has no access to devices outside of the partition in which it runs.

Partition firmware resides within the partition memory and is replaced when AIX takes control. Partition firmware is needed only for the time that is necessary to load AIX into the partition system memory. The global firmware resides with the hypervisor firmware in the first 256 MB of the physical memory.

The global Open Firmware environment includes the partition manager component. That component is an application in the global Open Firmware that establishes partitions and their corresponding resources (such as CPU, memory, and I/O slots), which are defined in partition profiles. The partition manager manages the operational partitioning transactions. It responds to commands from the service processor external command interface that originate in the application that is running on the HMC.

Chapter 2. Partitioning implementation

31

To confirm the current firmware level, you can use the

lscfg

command as follows: lscfg -vp | grep -p 'Platform Firmware:'

Platform Firmware:

ROM Level.(alterable).......RH020930

Version.....................RS6K

System Info Specific.(YL)...U1.18-P1-H2/Y1

Physical Location: U1.18-P1-H2/Y1

This example shows firmware level RH020930.

Note: The firmware level shown in this example might be a different level from

that shown on your system.

Figure 2-2 shows the POWER hypervisor environment using Open Firmware.

Figure 2-2 POWER hypervisor

2.2 Partition resources

Logical partitioning allows you to assign dedicated processors or, when you use the Micro-Partitioning feature of

Sserver

p5 and OpenPower systems, to assign processing units to partitions. You can define a partition with a processor capacity as small as 0.10 processing units, which represents 10 percent of a

32

Partitioning Implementations for IBM

E server

p5 Servers

physical processor. You can also assign physical memory and physical I/O devices or virtual I/O-devices (SCSI or Ethernet) to partitions.

The following sections give an overview of resource assignments.

2.2.1 Partition and system profiles

The information about resources that are assigned to a partition is stored in a partition

profile

. Each partition can have multiple partition profiles. By switching from one partition profile to another, you can change how resources are assigned. For example, you can assign relatively small resources to small online transactions on weekdays and assign large resources to high-volume batch transactions on weekends.

To change partition profiles, you must shut down the operating system instance that is running in the partition and stop (deactivate) the partition. You can also define a system profile (for administrative purposes) as an optional task. By using a system profile, you can turn on multiple partitions in a specific order in one operation.

There are two types of profiles: partition and system.

򐂰 Partition profile

A partition profile stores the information about the assigned resources for a specific partition, such as processor, memory, physical I/O devices, and virtual I/O devices (Ethernet, serial, and SCSI). Each partition must have a unique name and at least one partition profile. A partition can have several partition profiles, but it reads only one partition profile when it is started

(activated). You select a partition profile when you activate the partition.

Otherwise, the default partition profile is used. You can designate any partition profile as the default partition profile. If there is only one partition profile for a partition, it is always the default.

򐂰

System profile

A system profile provides a collection of partition profiles that should be started at the same time. The partition profiles are activated in the order of the list that is defined in the system profile.

Both types of profiles are stored in the non-volatile random access memory

(NVRAM) of the server. Although you can create many partition profiles and system partition profiles, the actual number possible depends on your profile configuration, because both types of profiles share the same memory area in the

NVRAM.

Figure 2-3 on page 34 summarizes the relationship among partitions, partition

profiles, and system profiles. In this figure, partition A has three partition profiles,

Chapter 2. Partitioning implementation

33

B has one, and C has two. The default partition profile for each partition is represented with a check mark. The system profile X is associated with partition profiles A1, B1, and C2, and the system profile Y is associated with partition profiles A1 and C1. Keep in mind the following points:

򐂰

You do not have to associate all the partition profiles with system profiles. In this example, the partition profiles A2 and A3 are not associated with any system profile.

򐂰 It is possible to associate a partition profile to multiple system profiles. In this example, the partition profile A1 is associated with system profile X and Y.

P artition A

Partition profile A 1

Partition profile A 2

Partition profile A 3

P artition B

Partition profile B 1

S ystem profile X

Partition profile A 1

Partition profile B 1

P artition profile C 2

S ystem profile Y

Partition profile A 1

P artition profile C 1

Partition C

P artition profile C 1

P artition profile C 2

D efault partition profile

FullS ystem P artition

Figure 2-3 Partitions, partition profiles, and system profiles

To create partition profiles and system profiles, use the IBM Hardware

Management Console for

Sserver

p5 and OpenPower systems.

2.2.2 Processors

Continuing the evolution of the partitioning technology on pSeries servers, the

POWER5 processor improves its flexibility in using partitions. There are two types of partitions in

Sserver

p5 and OpenPower servers. Partitions can have processors dedicated to them, or they can have their processors virtualized from

34

Partitioning Implementations for IBM

E server

p5 Servers

a pool of shared physical processors This is known as Micro-Partitioning technology.With this technology, both types of partitions can coexist in the same system at same time.

The concept of Micro-Partitioning technology allows the resource definition of a partition to allocate fractions of processors to the partition. Physical processors have been virtualized to enable these resources to be shared by multiple partitions. There are several advantages associated with this technology, including higher resource utilization and more partitions that can exist concurrently. Micro-Partitioning technology is implemented on

Sserver

p5 servers with AIX 5L Version 5.3 or Linux operating system environments.

A

dedicated processor partition

, such as the partitions that are used on

POWER4 processor based servers, have an entire processor that is assigned to a partition. These processors are owned by the partition where they are running and are not shared with other partitions. Also, the amount of processing capacity on the partition is limited by the total processing capacity of the number of processors configured in that partition, and it cannot go over this capacity (unless you add or move more processors from another partition to the partition that is using a dynamic LPAR operation).

By default, a powered-off logical partition using dedicated processors will have its processors available to the shared processing pool. When the processors are in the shared processing pool, an uncapped partition that needs more processing power can use the idle processing resources. However, when you turn on the dedicated partition while the uncapped partition is using the processors, the activated partition regains all of its processing resources. If you want to prevent dedicated processors from being used in the shared processing pool, you can disable this function using the logical partition profile properties panels on the

Hardware Management Console (see Figure 2-4 on page 36).

Chapter 2. Partitioning implementation

35

Figure 2-4 Allow and disallow shared processor partition utilization authority

Micro-Partitioning technology differs from dedicated processor partitions in that physical processors are abstracted into

virtual processors

which are then assigned to partitions. These virtual processors have capacities ranging from 10 percent of a physical processor up to the entire processor. Therefore, a system can have multiple partitions that share the same physical processor and that divide the processing capacity among themselves.

2.2.3 Memory

When discussing memory, it is important to highlight that the new

Sserver

p5 and OpenPower servers and their associated virtualization features have adopted an even more dynamic memory allocation policy than the previous partition capable pSeries servers. Also, despite its increased flexibility, the underlying fundamentals and mechanisms within a virtual or dedicated logical environment has remained relatively static.

36

Partitioning Implementations for IBM

E server

p5 Servers

Because the word memory is overused in various contexts, we have provided definitions of the following four terms regarding memory: virtual, physical, real, and logical memory.

The term

virtual memory

is used in many operating system environments to express the function that enables the operating system to act as though it were equipped with a larger memory size than it physically has.

Because each process should be isolated from the other processes, each has its own virtual memory address range, called the process address space. Each process address space is classified into several memory regions named segments. Each segment is again divided into small size memory regions, named pages.

Because not all of the virtual memory can sit in the

physical memory

in the system, only some portions of virtual memory are mapped to physical memory.

The rest of the virtual memory is divided by page size. Each page can be mapped to a disk block in paging spaces or can reside in a block of files in the file systems. This address translation is managed by the virtual memory manager

(VMM) of the operating system using hardware components, such as the hardware page frame table and translation look-aside buffer.

The term

real memory

is often used to represent the physical memory, especially when discussing the VMM functionality in the kernel. The modifier

real

comes from the real addressing mode defined in some processor architectures

(including PowerPC®), where address translation is turned off. In a non-partitioned environment, because there is a one-to-one relationship between the real and physical memory, the difference between these two terms can be ignored in most cases.

The physical address space must encompass all addressable hardware components, such as memory cards, I/O ports, bus memory, and so on.

Depending on the hardware implementation and restrictions, address ranges might need to be dispersed throughout the physical address space, which could result in a discontinuous physical memory address space. For example, if a PCI adapter device requires DMA, the device’s DMA address is mapped on the specific physical memory address range by a PCI host bridge. Most VMMs of modern operating systems are designed to handle non-contiguous physical memory addresses. However, operating systems require a certain amount of contiguous physical memory that can be addressed as translate-off, typically for bootstrapping, in a non-partitioned environment.

In a partitioned environment, real and physical memories must be distinguished,

The physical memory address, which previously meant the

real memory

address, is no longer used in that way because there is an extra level of addressing in a partitioned environment.

Chapter 2. Partitioning implementation

37

To support any operating system, including AIX and Linux, which requires real mode code execution and the ability to present a real address space starting at zero to each partition in the system, the

logical memory

concept is adopted.

Logical memory is an abstract representation that provides a contiguous memory address to a partition. Multiple non-contiguous physical memory blocks are mapped to provide a contiguous logical memory address space. The logical address space provides the isolation and security of the partition operating system from direct access to physical memory, allowing the hypervisor to police valid logical address ranges assigned to the partition. The contiguous nature of the logical address space is use more for simplifying the hypervisor’s per-partition policing than it is used because it is an operating system requirement. The operating system’s VMM handles the logical memory as though it were physical memory in a non-partitioned environment.

Important: For

Sserver

p5 and OpenPower systems, the size of the minimum logical memory block has been reduced from 256 MB to 16 MB to facilitate smaller partitions.

In a partitioned environment, some of the physical memory areas are reserved by several system functions to enable partitioning in the partioning-capable pSeries server. You can assign unused physical memory to a partition. You do not have to specify the precise address of the assigned physical memory in the partition profile, because the system selects the resources automatically.

From the hardware perspective the minimum amount of physical memory for each partition is 128 MB, but in most cases AIX needs 256 MB of memory. After that, you can assign further physical memory to partitions in increments of

16 MB.

The AIX VMM manages the logical memory within a partition as it does the real memory in a stand-alone pSeries server. The hypervisor and the POWER5 processor manage access to the physical memory.

Memory requirements for partitions depend on partition configuration, I/O resources assigned, and applications used. Memory can be assigned in increments of 16 MB, 32 MB, 64 MB, 128 MB, and 256 MB. The default memory block size varies according to the amount of configurable memory in the system ,

as shown in Table 2-1 on page 39.

38

Partitioning Implementations for IBM

E server

p5 Servers

Table 2-1 Default memory block sizes

Amount of configurable memory

Less than 4 GB

Greater than 4 GB, up to 8 GB

Greater than 8 GB, up to 16 GB

Greater than 16 GB, up to 32 GB

Greater than 32 GB

Default memory block size

16 MB

32 MB

64 MB

128 MB

256 MB

The default memory block size can be changed by using the Logical Memory

Block Size option in the Advanced System Management Interface. To change the default memory block size, you must be a user with administrator authority, and you must shut down and restart the managed system for the change to take effect. If the minimum memory amount in any partition profile on the managed system is less than the new default memory block size, you must also change the minimum memory amount in the partition profile.

Depending on the overall memory in your system and the maximum memory values that you choose for each partition, the server firmware must have enough memory to perform logical partition tasks. Each partition has a Hardware Page

Table (HPT). The size of the HPT is based on an HPT ratio of 1/64 and is determined by the maximum memory values that you establish for each partition.

򐂰

򐂰

򐂰

򐂰

Server firmware requires memory to support the logical partitions on the server.

The amount of memory that is required by the server firmware varies according to several factors. Factors influencing server firmware memory requirements include:

Number of logical partitions

Partition environments of the logical partitions

Number of physical and virtual I/O devices used by the logical partitions

Maximum memory values given to the logical partitions

Generally, you can estimate the amount of memory that is required by server firmware to be approximately eight percent of the system installed memory. The actual amount that is required will generally be less than eight percent. However, there are some server models that require an absolute minimum amount of memory for server firmware, regardless of the previously mentioned considerations.

Chapter 2. Partitioning implementation

39

When selecting the maximum memory values for each partition, consider the following:

򐂰

That maximum values affect the HPT size for each partition

򐂰

򐂰

The logical memory map size for each partition

Using the LPAR Validation Tool to provide the actual memory that is used by firmware

Note: The Hardware Page Table is created based on the maximum values

that are defined on partition profile.

2.2.4 Physical I/O slots

Physical I/O devices are assignable to partitions on a PCI slot (physical PCI connector) basis. It is not the PCI adapters in the PCI slots that are assigned as partition resources, but the PCI slots into which the PCI adapters are plugged.

When using physical I/O devices to install an operating system, you have to assign at least one, typically an SCSI adapter that is able to boot the operating system, and an adapter to access the install media. Instead of physical I/O devices, you can assign a virtual I/O device that behaves like a physical I/O device.

Once installed, you need at least one physical device adapter that is connected to the boot disk or disks. For application use and system management purposes, you also have to assign at least one physical network adapter. You can allocate physical slots in any I/O drawer on the system.

If you must add physical PCI slots into a running partition, you have two possibilities:

1. You can run an DLPAR operation from HMC to add or to move an empty PCI slot to the partition. After successful addition, you will use AIX PCI Hot Plug

Manager to add a PCI Hot Plug Adapter.

2. You can assign more PCI slots than required for the number of adapters in the partition, even if these PCI slots are not populated with PCI adapters. This provides you with the flexibility to add PCI adapters into the empty slots of an active partition, using the PCI Hot Plug insertion and removal capability.

2.2.5 Virtual I/O

Virtual I/O allows the

Sserver

p5 and OpenPower servers to support more partitions than it has slots for I/O devices by enabling the sharing of I/O adapters between partitions.

40

Partitioning Implementations for IBM

E server

p5 Servers

Virtual Ethernet enables a partition to communicate with other partitions without the need for an Ethernet adapter. A shared Ethernet adapter, supported by the

Virtual I/O Server, allows a shared path to an external network.

Virtual SCSI enables a partition to access block-level storage that is not a physical resource of that partition. With the Virtual SCSI design, the virtual storage is backed by a logical volume on a portion of a disk or an entire physical disk. These logical volumes appear to be the SCSI disks on the client partition, which gives the system administrator maximum flexibility in configuring partitions.

Virtual SCSI support is provided by a service running in an I/O server that uses two primitive functions:

򐂰

Reliable Command / Response Transport

򐂰 Logical Remote DMA to service I/O requests for an I/O client, such that, the

I/O client appears to enjoy the services of its own SCSI adapter

The term

I/O server

refers to platform partitions that are servers of requests, and

I/O client

refers to platform partitions that are clients of requests, usually I/O operations, that use the I/O server's I/O adapters. This allows a platform to have more I/O clients than it may have I/O adapters, because the I/O clients share I/O adapters using the I/O server.

2.2.6 Minimum, desired, and maximum values

In a partition profile, you need to specify three kinds of values for each resource.

For memory, you must specify

minimum

,

desired

, and

maximum

values.

For processor, you define whether you use dedicated or shared processors. If you chose to use dedicated processor, you can specify

minimum

,

desired

, and

maximum

values. For shared processors, you need to specify

minimum

,

desired

, and

maximum

values for both processing units and virtual processors.

Note: If you define the maximum value of virtual processors as one, you

cannot obtain processing units over 1.00 to that partition without changing the maximum value of virtual processors in that parition’s profile on HMC.

For physical and virtual I/O slots, you must specify the

required

and

desired

values.

Chapter 2. Partitioning implementation

41

Note: You cannot move or remove an I/O slot if it is defined as required on the

partition profile. First, you must change its state from required to desired. After that, you must run the AIX

rmdev

command to remove the PCI slot from the running operating system.

If any of the three types of resources cannot satisfy the specified minimum and required values, the activation of a partition fails. If the available resources satisfy all the minimum and required values but do not satisfy the desired values, the activated partition will get as many of the resources that are available.

The maximum value is used to limit the maximum processor and memory resources when dynamic logical partitioning operations are performed on the partition.

Note: The maximum memory value also affects the size of the partition page

table.

2.3 Resource planning using LPAR Validation Tool

The LPAR Validation Tool (LVT) is a tool that helps you to validate the resources that are assigned to LPARs. The LVT was designed specifically for the latest

Sserver

p5 servers. As partitioned environments grow increasingly more complex, the LVT tool should be your first resource to determine how a system can be effectively partitioned.

Note: At the time of the writing of this book, the LVT did not include the

OpenPower models.

LVT is a Java™-based tool that is loaded on a Microsoft® Windows® 95 or above workstation with at least 128 MB of free memory. Its footprint on disk, at the time of the writing of this book, is about 47 MB. It includes an IBM Java Runtime

Environment 1.4. The installation adds an icon to the desktop.

For information, including user’s guide and download information, see: http://www.ibm.com/servers/eserver/iseries/lpar/systemdesign.htm

During the development of this book, the LVT was receiving regular updates.

Installation only required a couple of minutes, with the most time devoted to downloading the code. An update, available as a separate file, brings the base code to the latest level and is significantly smaller in size.

42

Partitioning Implementations for IBM

E server

p5 Servers

2.3.1 System Selection dialog

After the tools are launched, you can create a new configuration or load an

existing one, as shown in Figure 2-5.

Figure 2-5 LPAR Validation Tool, creating a new partition

When you open a new configuration, you select the basic attributes of the

machine that you plan to validate, as shown in Figure 2-6 on page 44.

Chapter 2. Partitioning implementation

43

Figure 2-6 LPAR Validation Tool, System Selection dialog

Hold your cursor over a field, and additional information is provided, as shown in

Figure 2-7.

Figure 2-7 LPAR Validation Tool, System Selection processor feature selection

44

Partitioning Implementations for IBM

E server

p5 Servers

2.3.2 Memory Specification dialog

After you complete the System Selection fields, you enter the memory specifications for each of the logical partitions that you previously specified

(Figure 2-8).

Figure 2-8 LPAR Validation Tool, Memory Specifications dialog

As you enter the memory specifications, the unallocated memory and the amount that is required by the hypervisor are shown. These values increase as the number of virtual devices defined increase. Internal validation prevents configurations that do not have enough resource.

This tool answers a common question in planning, “What is the memory usage of the POWER Hypervisor.”

Chapter 2. Partitioning implementation

45

2.3.3 LPAR Validation dialog

The final dialog enables you to assign features to the various slots defined, as

shown in Figure 2-9.

Figure 2-9 LPAR Validation Tool, slot assignments

From this screen, you can view a detailed report and select a validation engine to point out any errors in the configuration. If changes to the memory configuration are required, you can edit the configuration and change the values in error. When the configuration has been validated without error, you can feel confident that the resources that you selected will provide the configuration desired. At this point, if you choose, you can configure the system in an ordering tool or through a miscellaneous equipment specification upgrade of an existing machine, with the additional resources that are required to handle a new LPAR configuration.

2.4 I/O device assignment considerations

With the introduction of the Advanced POWER Virtualization feature, the number of available I/O options and scenarios has increased considerably. To ensure that the maximum benefits from all available partition environments are gained from the implementation or upgrade, you need a greater understanding of the hardware I/O architectural platform.

46

Partitioning Implementations for IBM

E server

p5 Servers

The hardware limitations of each

Sserver

p5 and OpenPower server model also influence aspects of the implementation or upgrade planning.

2.4.1 Media devices

If your installation media is removable media (for example CD-ROM, DVD-RAM, or 4 mm tape), the corresponding devices should be configured. However, the configuration of removable media devices depends on the hardware architecture of

Sserver

p5 and OpenPower servers as described in this section.

p5-520 and p5-550, OpenPower 720

These servers, including rack-mounted and deskside models, support three non-hot-swappable media bays which are used to accommodate additional devices.

򐂰

򐂰

򐂰

Two media bays only accept slim line media devices, such as IDE DVD-ROM

(FC 2640) or DVD-RAM (FC 5751) drives, and one half-height bay can be used for a tape drive. However, there are several device-reassignment operations that are required on the HMC in order to use these devices as the installation media device on these models, if you use the servers in a dedicated partition environment. You can also configure the following SCSI-attached tape devices on these models:

(FC 6120): 8 mm 80/160 GB tape drive

(FC 6134): 8 mm 60/150 GB tape drive

(FC 6258): 4 mm 36/72 GB tape drive

p5-570 (Rack-mounted model)

As the hardware design of this server is based on modular building blocks, each

Central Electronics Complex component supports two media bays which accept the optional slim-line media devices, such as IDE DVD-ROM (feature code (FC

2640) or DVD-RAM (FC 5751) drives.

Note: The maximum number of media devices in a fully configured p5-570 is

eight.

p5-590 and p5-595

These servers can be configured with an optional storage device enclosure

(FC 7212-102) which is can only be mounted in an 19-inch rack. This enclosure

Chapter 2. Partitioning implementation

47

򐂰

򐂰

򐂰

򐂰 contains two media bays, which can support any combination of the following

IDE or SCSI devices:

(FC 1103): DVD-RAM drive

(FC 1104): 8 mm VXA-2 tape drive

(FC 1105): 4 mm DAT72 DDS-4 tape drive kit

(FC 1106): DVD-ROM drive

Note: One of the media bays must contain an optical drive.

Each of these servers can be configured with a USB diskette drive (FC 2591), which can be supported on either an integrated or external USB adapter. A USB adapter (FC 2738) is required for an external USB port connection.

Note: At the time of the writing of this book, media devices cannot be

virtualized.

򐂰

򐂰

If your installation media is in the network, one of the following network adapters must be assigned to the partition:

Ethernet

Token ring

2.4.2 Boot device considerations

The following sections describe the boot device considerations for partitions.

Full system partition and dedicated partitions

When implementing either of these partitioning models, each partition requires its own separate boot device. Therefore, you must assign at least one boot device and a corresponding adapter per partition. The

Sserver

p5 and

OpenPower servers support boot devices connected with SCSI, SSA, and Fibre

Channel adapters. Boot over network is also available as an operating system installation option.

p5-520

Both rack-mounted and deskside models support up to eight internal SCSI disk drives which are housed in two 4-pack disk bays. In the base configuration, each

4-pack is connected to one of the two ports on the integrated SCSI controller. To an LPAR, the entire SCSI controller (including all disks attached to both ports) will be seen as P1-T10, and therefore can only be assigned to one active LPAR at a time. To provide additional drives for a second LPAR, either virtual I/O or an optional PCI SCSI adapter feature should be used. The internal disk drive bays

48

Partitioning Implementations for IBM

E server

p5 Servers

can be used in two different modes, depending on whether the SCSI RAID

Enablement Card (FC 5709) is installed.

The other partitions must be assigned to the boot adapter and disk drive from the following options:

򐂰

A boot adapter inserted in one of six PCI-X slots in the system. A bootable external disk subsystem is connected to this adapter.

򐂰

򐂰

A bootable SCSI adapter is inserted in the PCI-X slot 7, or two SCSI adapters, one in PCI-X slot 5 and PCI-X slot 7 in a 7311-D20 I/O drawer connected to the system. The adapter(s) is connected to one of a 6-pack of disk bays of the drawer that houses the boot disk drive.

A boot adapter inserted in one of seven PCI-X slots in a 7311-D20 I/O drawer connected to the system. A bootable external disk subsystem is connected to this adapter.

Note: The p5-520 models support up to four 7311-D20 I/O drawers.

p5-550 and OpenPower 720

Both rack-mounted and desk-side models support up to eight internal SCSI disk drives, which are housed in two 4-pack disk bays. To an LPAR, the entire SCSI controller (including all disks attached to both ports) will be seen as P1-T10, and therefore can only be assigned to one active LPAR at a time. To provide additional drives for a second LPAR, either virtual I/O or an optional PCI SCSI adapter feature should be used. Assigned to this boot adapter, a boot disk drive must use one of the following options:

򐂰

򐂰

򐂰

򐂰

A boot adapter inserted in one of five PCI-X slots in the system. The adapter could then be connected to the second internal SCSI 4-pack in the system.

A boot adapter inserted in one of five PCI-X slots in the system. A bootable external disk subsystem is connected to this adapter.

A bootable SCSI adapter (which can have various features) is inserted in the

PCI-X slot 7, or two SCSI adapters, one in PCI-X slot 5 and PCI-X slot 7 in a

7311-D20 I/O drawer connected to the system. The adapter(s) is connected to one of a 6-pack of disk bays of drawer that houses the boot disk drive.

A boot adapter inserted in one of seven PCI-X slots in a 7311-D20 I/O drawer connected to the system. A bootable external disk subsystem is connected to this adapter.

Note: The p5-550 and OpenPower 720 models support up to eight 7311-D20

I/O drawers.

Chapter 2. Partitioning implementation

49

p5-570

Each system drawer can contain up to six internal disks which are housed in one split 6-pack disk drive bay. These disks are connected to two Ultra320 internal

SCSI controllers with dual ports, allowing each of the 3-packs to be assigned to a unique partition. Additional partitions must be assigned to the boot adapter and disk drive from the following options:

򐂰

򐂰

򐂰

򐂰

򐂰

A boot adapter inserted in one of five PCI-X slots in the system. A bootable external disk subsystem is connected to this adapter.

7311-D10, this drawer supports five hot-plug 64-bit 133 MHz 3.3 V PCI-X slots and one hot-plug 64-bit 33 MHz 5V PCI slot. All slots have the full length, blind-swap cassette.

7311-D11, a boot adapter inserted in one of six PCI-X slots in this drawer connected to the system. A bootable external disk subsystem can be connected to this adapter. This drawer supports six hot-plug 64-bit 133 MHz

3.3V PCI-X slots, full length, enhanced blind cassette.

7311-D20, a bootable SCSI adapter (which can have various features) is inserted in the PCI-X slot 7, or two SCSI adapters, one in PCI-X slot 5 and

PCI-X slot 7 in a 7311-D20 I/O drawer connected to the system. The adapter(s) is connected to one of 6-pack disk bays of drawer that houses the boot disk drive.

A boot adapter inserted in one of seven PCI-X slots in a 7311-D20 I/O drawer connected to the system. A bootable external disk subsystem is connected to this adapter.

Note: The p5-570 model can support up to a total combination of 20

7311-D10, 7311-D11, and 7311-D20 I/O drawers.

p5-590 and p5-595

Partitions must be assigned to the boot adapter and disk drive from the following options:

򐂰

An internal disk drive inserted in one of the 4-pack disk bays on I/O drawer and the SCSI controller on the drawer. The 7040-61D I/O drawer (FC 5791) can have up to 16 internal SCSI disk drives in the four 4-pack disk bays. Each of the disk bays is connected to a separate internal SCSI controller on the drawer.

򐂰 A boot adapter inserted in one of 20 PCI-X slots in a 7040-61D I/O drawer8 connected to the system. A bootable external disk subsystem is connected to this adapter.

You should select the adapter of the boot device from the PCI-X slot of the system or the first I/O drawer if the system is running as a full system partition.

50

Partitioning Implementations for IBM

E server

p5 Servers

The system locates the boot device faster. In a partitioned environment, the placement of the boot adapter does not affect the boot speed of partition.

The following points apply to the p5-590 and p5-595 models:

򐂰

The p5-590 supports up to eight 7040-61D I/O drawers, and the p5-595 supports up to 12. The minimum hardware configurations of these models require at least one I/O drawer.

򐂰

򐂰

Existing 7040-61D I/O drawers may be attached to a p5-595 server as additional I/O drawers. Each 16-way processor book includes six Remote

I/O-2 attachment cards for connection of the system I/O drawers.

The parallel ports on these models are not supported in a partitioned environment.

Virtual I/O server

The Virtual I/O server complements the

Sserver

p5 server’s Micro-Partitioning technology. The need to meet adequately the flexible I/O requirements of up to

254 logical partitions has driven the development of the Virtual SCSI and Virtual

Ethernet. The following sections outlines what effect the Virtual I/O server has on the boot device capabilities of partitions in the shared resource pool.

Virtual SCSI disks

Virtual SCSI facilitates the sharing of physical disk resources (I/O adapters and devices) between the VIOS and the client partitions. Partitions must be assigned a SCSI adapter and disk drive(s) as follows:

򐂰

򐂰

One or more client SCSI adapter(s) from the available candidates on the

HMC.

One or more logical volumes which appears as a real disk devices (hdisks).

For redundancy and high availability, consider mirroring AIX and Linux partition operating system disks across multiple virtual disks.

Note: Once a virtual disk is assigned to a client partition, the Virtual I/O server

must be available before the client partitions are able to boot.

Virtual Ethernet

When installing or maintaining partitions with a Virtual Ethernet adapter, the AIX

Network Installation Manager, Network Installation Manager on Linux, and

Cluster Server Manager application operate in the same manner as they would with a dedicated Ethernet adapter assigned to the partition(s). Virtual Ethernet does not require the Virtual I/O server.

Chapter 2. Partitioning implementation

51

2.4.3 Network devices

򐂰

򐂰

򐂰

򐂰

򐂰

It is mandatory to assign a network adapter to each partition. In addition to providing network access to client systems of a partition, the connection is also needed to provide the capability to manage the operating system and the applications in the partition remotely, either with a telnet session or a graphical user interface, such as the Web-based System Manager. An Ethernet network connection between partitions and the HMC must be available if you want to use the following services:

Service Agent

Service Focal Point

Inventory Scout

Dynamic logical partitioning

Partition Load Manager

These services communicate over the TCP/IP network between the partitions and the HMC.

2.4.4 Graphics console

If you need direct console access to a partition without using the network, the partition must be assigned a graphics console. A graphics console is available on a partition by configuring the following features on the partition:

򐂰

򐂰

A graphics adapter (FC 2849) with a graphics display

A USB keyboard and mouse adapter (FC 2738) with a USB keyboard and a

USB mouse attached

Only one graphics console is supported per partition. The graphics console is functional only when AIX is running. For any installation or service processor support functions, you have to use the virtual terminal function on the HMC.

2.4.5 High availability

You should place redundant devices of a partition in separate I/O drawers, where possible, for highest availability. For example, if two Fibre Channel adapters support multipath I/O to one logical unit number, and if one path fails, the device driver chooses another path using another adapter in another I/O drawer automatically.

Some PCI adapters do not have enhanced error handling capabilities built in to their device drivers. If these devices fail, the PCI host bridge in which they are placed and the other adapters in this PCI host bridge are affected. Therefore, it is strongly recommended that you place all adapters without enhanced error

52

Partitioning Implementations for IBM

E server

p5 Servers

handling capabilities on their own PCI host bridge and that you do not assign these adapters on the same PCI host bridge to different partitions.

2.5 LPAR limitations and considerations

򐂰

򐂰

Consider the following limitations when implementing shared processor partitions:

򐂰

The limitation for a shared processor partition is 0.1 processing units of a physical processor. So, the number of shared processor partitions you can create for a system depends mostly on the number of processors of a system.

򐂰 The system architecture is designed to support a maximum number of 254 partitions.

In a partition, there is a maximum number of 64 virtual processors

򐂰

A mix of dedicated and shared processors within the same partition is not supported.

If you dynamically remove a virtual processor you cannot specify a particular virtual CPU to be removed. The operating system will choose the virtual CPU to be removed.

򐂰 Shared processors can make AIX affinity management less effective. AIX continues to utilize affinity domain information as provided by firmware to build associations of virtual processors to memory and continues to show preference to re-dispatching a thread to the virtual CPU that it last ran on.

You should carefully consider the capacity requirements of online virtual processors before choosing values for their attributes. Virtual processors have dispatch latency, because they are scheduled. When a virtual processor is made runnable, it is placed on a run queue by the hypervisor, where it waits until it is dispatched. The time between these two events is referred to as

dispatch latency

.

The dispatch latency of a virtual processor depends on the partition entitlement and the number of virtual processors that are online in the partition. The capacity entitlement is equally divided amongst these online virtual processors, so the number of online virtual processors impacts the length of each virtual processor's dispatch. The smaller the dispatch cycle, the greater the dispatch latency.

At the time of the writing of this book, the worst case virtual processor dispatch latency is 18 milliseconds, because the minimum dispatch cycle that is supported at the virtual processor level is one millisecond. This latency is based on the minimum partition entitlement of 1/10 of a physical processor and the 10 millisecond rotation period of the hypervisor's dispatch wheel. It can be easily visualized by imagining that a virtual processor is scheduled in the first and last

Chapter 2. Partitioning implementation

53

portions of two 10 millisecond intervals. In general, if these latencies are too great, then clients may increase entitlement, minimize the number of online virtual processors without reducing entitlement, or use dedicated processor partitions.

In general, the value of the minimum, desired, and maximum virtual processor attributes should parallel those of the minimum, desired, and maximum capacity attributes in some fashion. A special allowance should be made for uncapped partitions, because they are allowed to consume more than their entitlement.

If the partition is uncapped, then the administrator may want to define the desired and maximum virtual processor attributes x percent above the corresponding entitlement attributes. The exact percentage is installation specific, but 25 to 50 percent is a reasonable number.

Table 2-2 lists several reasonable settings of number of virtual processor,

processing units, and the capped and uncapped mode.

2

2

Table 2-2 Reasonable settings for shared processor partitions

Min VPs

a

Desired VPs Max VPs Min PU

b

Desired PU

1

1

2

3 or 4

4

6 or 8

0.1

0.1

2.0

2.0

2

3 or 4

6

8 or 10

2.0

2.0

2.0

2.0

Max. PU

4.0

4.0

6.0

6.0

Y

N

Capped

Y

N a - Virtual processors b - Processing units

Operating systems and applications that are running in shared partitions need not be aware that they are sharing processors. However, overall system performance can be significantly improved by minor operating system changes.

AIX 5L Version 5.3 provides support for optimizing overall system performance of shared processor partitions.

In a shared partition, there is not a fixed relationship between the virtual processor and the physical processor. The hypervisor tries to use a physical processor with the same memory affinity as the virtual processor, but it is not guaranteed. Virtual processors have the concept of a home physical processor. If it cannot find a physical processor with the same memory affinity, then it gradually broadens its search to include processors with weaker memory affinity, until it finds one that it can use. As a consequence, memory affinity is expected to be weaker in shared processor partitions.

54

Partitioning Implementations for IBM

E server

p5 Servers

Workload variability is also expected to be increased in shared partitions, because there are latencies associated with the scheduling of virtual processors and interrupts. SMT may also increase variability, because it adds another level of resource sharing, which could lead to a situation where one thread interferes with the forward progress of its sibling.

Therefore, if an application is cache sensitive or cannot tolerate variability, then it should be deployed in a dedicated partition with SMT disabled. In dedicated partitions, the entire processor is assigned to a partition. Processors are not shared with other partitions, and they are not scheduled by the hypervisor.

Dedicated partitions must be explicitly created by the system administrator using the HMC.

Processor and memory affinity data is only provided in dedicated partitions. In a shared processor partition, all processors are considered to have the same affinity. Affinity information is provided through RSET APIs, which contain discovery and bind services.

Chapter 2. Partitioning implementation

55

56

Partitioning Implementations for IBM

E server

p5 Servers

Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement

Table of contents