Dell AX-7525 Owner's Manual

Add to my manuals
76 Pages

advertisement

Dell AX-7525 Owner's Manual | Manualzz

Dell Integrated System for Microsoft Azure

Stack HCI: End-to-End Deployment with

Stretched Cluster Infrastructure

Deployment Guide

Abstract

This end-to-end deployment guide provides an overview of the Microsoft Azure Stack

HCI operating system and guidance on how to deploy stretched clusters in your environment. This guide includes procedures for Day One operations.

Dell Technologies Solutions

Part Number: H19287

December 2022

Notes, cautions, and warnings

NOTE: A NOTE indicates important information that helps you make better use of your product.

CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid the problem.

WARNING: A WARNING indicates a potential for property damage, personal injury, or death.

© 2022 - 2022 Dell Inc. or its subsidiaries. All rights reserved. Dell Technologies, Dell, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners.

Contents

Chapter 1: Introduction................................................................................................................. 6

Document overview............................................................................................................................................................ 6

Audience and scope............................................................................................................................................................ 6

Chapter 2: Solution overview.........................................................................................................7

Introduction........................................................................................................................................................................... 7

Stretched clusters and Storage Replica...................................................................................................................7

Solution integration and network architecture.............................................................................................................8

Chapter 3: Solution deployment...................................................................................................10

Introduction......................................................................................................................................................................... 10

PAL and DPOR registration for Azure Stack.............................................................................................................. 10

Install roles and features.................................................................................................................................................. 10

Configuring a cluster witness.......................................................................................................................................... 11

Deployment prerequisites for stretched clusters........................................................................................................11

Customer network team requirements.........................................................................................................................12

Design principles and best practices............................................................................................................................. 12

Validated network topology.............................................................................................................................................14

Chapter 4: Creating a stretched cluster....................................................................................... 16

Introduction......................................................................................................................................................................... 16

Test-Cluster........................................................................................................................................................................ 16

Cluster creation.................................................................................................................................................................. 16

Volumes................................................................................................................................................................................ 18

Storage efficiency..............................................................................................................................................................19

Test-SRTopology............................................................................................................................................................... 19

Chapter 5: Virtual Machines........................................................................................................ 20

Introduction........................................................................................................................................................................ 20

VM and storage affinity rules......................................................................................................................................... 20

Preferred sites................................................................................................................................................................... 20

Chapter 6: Failure/Recovery from failure of Site/Node................................................................ 21

Planned failover.................................................................................................................................................................. 21

Operation steps.................................................................................................................................................................. 21

Chapter 7: Appendices................................................................................................................ 23

Appendix A: Sample PowerShell cmdlets for end-to-end deployment.................................................................23

Appendix B: Supported hardware................................................................................................................................. 26

Chapter 8: Day 1 operations......................................................................................................... 27

Day 1 operations overview...............................................................................................................................................27

Known issues...................................................................................................................................................................... 27

Microsoft HCI Solutions from Dell Technologies overview..................................................................................... 27

Contents 3

Deployment guidance....................................................................................................................................................... 28

Azure onboarding for Azure Stack HCI operating system...................................................................................... 29

Creating virtual disks........................................................................................................................................................ 29

Chapter 9: Managing and monitoring clusters with Windows Admin Center................................. 30

Overview............................................................................................................................................................................. 30

Install Windows Admin Center....................................................................................................................................... 30

Add the HCI cluster connection..................................................................................................................................... 31

Access the HCI cluster.................................................................................................................................................... 32

View server details............................................................................................................................................................33

View drive details.............................................................................................................................................................. 34

Managing and monitoring volumes............................................................................................................................... 35

Creating volumes in Storage Spaces Direct......................................................................................................... 36

Managing volumes.......................................................................................................................................................37

Enabling data deduplication on Storage Spaces Direct............................................................................................37

Monitoring and managing VMs...................................................................................................................................... 37

Managing virtual switches...............................................................................................................................................39

Chapter 10: Dell OpenManage Integration with Windows Admin Center........................................41

Overview.............................................................................................................................................................................. 41

Prerequisites for managing AX nodes........................................................................................................................... 41

Installing the OMIMSWAC license.................................................................................................................................42

Managing Microsoft HCI-based clusters..................................................................................................................... 42

Overview........................................................................................................................................................................42

Prerequisite checks.....................................................................................................................................................43

Health status................................................................................................................................................................ 44

Inventory........................................................................................................................................................................44

Locating physical disks and viewing their status.................................................................................................45

Viewing update compliance and updating the cluster........................................................................................45

Full Stack Cluster-Aware Offline Updating.................................................................................................................46

Full Stack Cluster-Aware Updating for Azure Stack HCI clusters using the OpenManage Integration snap-in............................................................................................................................................................................. 47

Updating a standalone node before adding it to the cluster.................................................................................. 48

Secure cluster with Secured-core................................................................................................................................ 48

Enabling operating system features.............................................................................................................................. 51

Protect your infrastructure with infrastructure lock................................................................................................52

Enable or disable infrastructure lock...................................................................................................................... 53

Manage CPU cores in Azure Stack HCI clusters ..................................................................................................... 53

Cluster expansion .............................................................................................................................................................54

Validate and remediate Azure Stack HCI clusters.................................................................................................... 56

View HCP compliance summary.............................................................................................................................. 57

Onboard Dell policies to Azure Arc from Windows Admin Center to manage Azure Stack HCI clusters....58

View recommendations for storage expansion.......................................................................................................... 59

View node level storage configuration details......................................................................................................60

Known issues.......................................................................................................................................................................61

Chapter 11: Updates and maintenance......................................................................................... 63

Annual feature update for an Azure Stack HCI Solution......................................................................................... 63

Recommended methods for feature update.........................................................................................................64

4 Contents

Firmware and driver updates using the manual method..........................................................................................66

Preparing for maintenance operations...................................................................................................................66

Placing an AX node in maintenance mode............................................................................................................ 66

Obtaining the firmware catalog for AX nodes or Ready Nodes using Dell Repository Manager............. 67

Updating the AX node by using iDRAC out of band............................................................................................67

Updating the out-of-box drivers..............................................................................................................................68

Exiting the AX node from maintenance mode......................................................................................................69

Restarting a cluster node or taking a cluster node offline...................................................................................... 69

Expanding the Azure Stack HCI cluster.......................................................................................................................70

Azure Stack HCI node expansion............................................................................................................................ 70

Storage Spaces Direct storage expansion............................................................................................................. 71

Extending volumes............................................................................................................................................................ 72

Performing AX node recovery........................................................................................................................................ 72

Configuring RAID for operating system drives..................................................................................................... 72

Operating system recovery.............................................................................................................................................74

Manual operating system recovery......................................................................................................................... 74

Factory operating system recovery........................................................................................................................ 75 iDRAC Service Module (iSM) for AX nodes and Storage Spaces Direct Ready Nodes............................. 75

FullPowerCycle.............................................................................................................................................................75

Contents 5

1

Introduction

This chapter presents the following topics:

Topics:

Document overview

Audience and scope

Document overview

This end-to-end deployment guide provides an overview of the Microsoft Azure Stack HCI operating system and guidance on how to deploy stretched clusters in your environment. The guide provides network topology references and best practices to consider during a stretched cluster deployment.

The Microsoft Azure Stack HCI operating system can be deployed in both standalone and stretched cluster environments.

For the deployment steps for a standalone cluster and end-to-end deployment steps with network and host configuration options, see the https://infohub.delltechnologies.com/t/reference-guide-network-integration-and-host-network-configurationoptions-1/ .

Dell Technologies offers integrated systems with the new Azure Stack HCI operating system. This guide applies to select configurations of the integrated systems built using AX nodes.

Audience and scope

This guide is for systems engineers, field consultants, partner engineering team members, and customers with knowledge of deploying hyperconverged infrastructures (HCIs) with Windows Server operating systems and the newly released Azure Stack

HCI operating system.

Customer site-to-site networking configuration and guidance is outside the scope of this document.

Assumptions

This guide assumes that deployment personnel have:

● Knowledge of AX nodes from Dell Technologies.

● Experience of configuring BIOS and integrated Dell Remote Access Controller (iDRAC) settings.

● Advanced knowledge of deploying and configuring Windows Servers and Hyper-V infrastructure.

● Experience with deploying and configuring Storage Spaces Direct Solutions with Windows Server or Azure Stack HCI.

● Familiarity with customer site-to-site networking, including enabling and configuring the necessary static routes or inter-site bandwidth throttling (if needed) according to the stretched cluster requirement.

6 Introduction

2

Solution overview

This chapter presents the following topics:

Topics:

Introduction

Solution integration and network architecture

Introduction

Dell Solutions for Azure Stack HCI offers stretched cluster solutions with AX nodes from Dell Technologies. Built using industryleading PowerEdge servers, AX nodes offer fully validated HCI nodes for a variety of use cases. A robust set of configurations and different models allows you to customize your infrastructure for application performance, capacity, or deployment location requirements.

Stretched clusters and Storage Replica

An Azure Stack HCI stretched cluster solution is a disaster recovery solution that provides an automatic failover capability to restore production quickly, with little or no manual intervention. Storage Replica, a Windows Server technology, enables replication of volumes between servers across sites for disaster recovery. For more information, see Storage Replica overview .

A stretched cluster with Azure Stack HCI consists of servers residing at two different locations or sites, with each site having two or more servers, replicating volumes either in synchronous or asynchronous mode. For more information, see Stretched clusters overview .

A stretched cluster can be set up as either Active-Active or Active-Passive. In an Active-Active setup, both sites will actively run VMs or applications; therefore the replication is bidirectional. In an Active-Passive setup, one site is always dormant unless there is a failure or planned downtime.

Sites can be on the same campus or in different places. Stretched clusters using two sites provides disaster recovery if a site experiences an outage or failure.

Solution overview 7

The following figure shows an Active-Active setup:

Figure 1. An Active-Active setup

Sites can be logical or physical. For logical sites, a stretched cluster can exist on single or multiple racks or in different rooms in the same data center. For physical sites, the stretched cluster can be in different data centers on the same campus or in different cities or regions. Stretched clusters using two physical sites provide disaster recovery and business continuity should a site suffer an outage.

Solution integration and network architecture

Dell Solutions for Azure Stack HCI stretched clusters offer distinct network topologies that are validated with the following stretched cluster configurations:

● Basic configuration

● High throughput configuration

Basic configuration sees a network topology that requires minimal changes to a traditional single-site Azure Stack HCI configuration. This configuration uses a single network/fabric for management, VM, and replication traffic, keeping host networking simple. The customer network team must configure quality of service (QoS) on an external firewall or routers to throttle inter-site bandwidth and thereby ensure that Replica/VM traffic does not saturate the Management network.

High throughput configuration suits customer environments that are dense and involves higher write IOPs compared to a basic configuration. This configuration requires a dedicated channel (network interface cards (NICs) or fabric) for Replica traffic (using SMB-Multichannel). This network topology should be used only if inter-site bandwidth is higher than 10 Gbps. The network team must configure multiple static routes on the host to ensure that Replica traffic uses the dedicated channel that has been created for it. If the customer environment does not use Border Gateway Protocol (BGP) at the ToR layer, static

8 Solution overview

routes are needed on the L2/L3 to ensure that the Replica networks reach the intended destination. Subsequent sections of this guide provide more information about the expectations of customer networking teams.

A stretched cluster environment has two storage pools, one per site. In both topologies described in the preceding section, storage traffic requires Remote Direct Memory Access (RDMA) to transfer data between nodes within the same site. Because

Storage and Replica traffic produces heavy throughput on an all-flash or NVMe configuration, we recommend that you keep the

Storage traffic on separate redundant physical NICs.

This table shows the types of traffic, the protocol used, and the recommended bandwidth:

Table 1. Types of traffic

Types of traffic

Management

Replica

Intra-site storage

Compute Network

Protocol used

TCP

TCP

RDMA

TCP

Recommended bandwidth

1/10/25 Gb

1/10/25 Gb

10/25 Gb

10/25 Gb

Here are some hovers over consider about network configuration:

● Management traffic uses Transmission Control Protocol (TCP). Because management traffic uses minimal bandwidth, it can be combined with Storage Replica traffic or even use the LOM, OCP, or rNDC ports.

● VM Compute traffic can be combined with management traffic.

● Inter-site Live Migration traffic uses the same network as Storage Replica.

● Storage Replica uses TCP as RDMA is not supported for replica traffic over L3 or WAN links. Depending on the bandwidth and latency between sites and the throughput requirements of the cluster, consider using separate redundant physical NICs for Storage Replica traffic.

Solution overview 9

3

Solution deployment

This chapter presents the following topics:

Topics:

Introduction

PAL and DPOR registration for Azure Stack

Install roles and features

Configuring a cluster witness

Deployment prerequisites for stretched clusters

Customer network team requirements

Design principles and best practices

Validated network topology

Introduction

Stretched clusters with Dell Solutions for Azure Stack HCI can be configured using PowerShell. This guide describes the prerequisites for this deployment.

NOTE: The instructions in this guide are applicable only to the Microsoft Windows Azure Stack HCI operating system.

Each task in this deployment guide requires running one or more PowerShell commands. On some occasions you might have to use Failover Cluster Manager or Windows Admin Center from a machine that supports Desktop Experience.

PAL and DPOR registration for Azure Stack

Partner Admin Link (PAL) and Digital Partner of Record (DPOR) are customer association mechanisms used by Microsoft to measure the value a partner delivers to Microsoft by driving customer adoption of Microsoft Cloud services.

Currently, Dell Azure projects that are not associated through either of these mechanisms are not visible to Microsoft and, therefore, Dell does not get credit. Dell technical representatives should attempt to set both PAL and DPOR, with PAL being the priority.

To register the PAL or DPOR for the Azure Stack system, refer to PAL and DPOR Registration for Azure Stack under

Deployment Procedures in the Azure Stack HCI generator in SolVe.

Install roles and features

Deployment and configuration of a Windows Server 2016, Windows Server 2019, Windows Server 2022, or Azure Stack HCI operating system version cluster requires enabling specific operating system roles and features.

Enable the following roles and features:

● Hyper-V service (not required if the operating system is factory-installed)

● Failover clustering

● Data center bridging (DCB) (required only when implementing fully converged network topology with RoCE and when implementing DCB for the fully converged topology with iWARP)

● BitLocker (optional)

● File Server (optional)

● FS-Data-Deduplication module (optional)

● RSAT-AD-PowerShell module (optional)

Enable these features by running the Install-WindowsFeature PowerShell cmdlet:

10 Solution deployment

Install-WindowsFeature -Name Hyper-V, Failover-Clustering, Data-Center-Bridging, BitLocker,

FS-FileServer, RSAT-Clustering-PowerShell, FS-Data-Deduplication -IncludeAllSubFeature

-IncludeManagementTools -verbose

NOTE: Install the storage-replica feature if Azure Stack HCI operating system is being deployed for a stretched cluster.

NOTE: Hyper-V and the optional roles installation require a system restart. Because subsequent procedures also require a restart, the required restarts are combined into one.

Configuring a cluster witness

A cluster witness must be configured for a two-node cluster. Microsoft recommends configuring a cluster witness for a four-node Azure Stack HCI cluster. Cluster witness configuration helps maintain a cluster or storage quorum when a node or network communication fails and nodes continue to operate but can no longer communicate with one another.

A cluster witness can be either a file share or a cloud-based witness.

NOTE: If you choose to configure a file share witness, ensure that it is outside the two-node cluster.

For information about configuring a cloud-based witness, see Cloud-based witness .

Deployment prerequisites for stretched clusters

Dell Technologies assumes that the management services required for the operating system deployment and cluster configuration are present in the existing infrastructure. An internet connection is required to license and register the cluster with Azure. Because Microsoft Azure Stack HCI operating system is a Server Core operating system, you require a system that supports Desktop Experience to access Failover Cluster Manager and Windows Admin Center. For more information, see the

Windows Admin Center FAQ .

Table 2. Deployment prerequisites for stretched clusters

Component Requirements

Active Directory Sites & Subnets Configure two sites and their corresponding subnets in Active

Directory so that the correct sites appear on Failover Cluster

Manager on configuration of stretched clusters.

Configure Fault domains for each cluster if the IP subnets are the same across both sites.

Network

Windows Admin Center Node

The following requirements apply:

● If two sites have host networks in different subnets, no additional configuration is needed for creating clusters.

Otherwise, manual configuration of the cluster fault domain is required.

● RDMA Adapters for Storage/SMB traffic.

● RDMA is not supported for Replica traffic across WAN.

● At least a 1 Gb network between sites for Replication and inter-site Live Migration is required.

● The bandwidth between sites should be sufficient to meet the write I/Os on the primary site.

● An average latency of 5 ms or lesser for Synchronous

Replication.

● There are no latency requirements or recommendations for

Asynchronous replication.

● There is no recommendation from Microsoft regarding the maximum distance between sites that a stretched cluster can support. Longer distances normally translate into higher network latency.

Windows features required:

Solution deployment 11

Table 2. Deployment prerequisites for stretched clusters (continued)

Component Requirements

RSAT-Clustering

RSAT-Storage-Replica

Number of nodes supported

Minimum: 4 (2 Nodes per site)

Maximum: 16 (8 Nodes Per Site)

Number of drives supported

Tuning of cluster heartbeats

SDN/VM Network

Minimum of 4 drives per node. Both sites should have the same capacity and number of drives. Dell Technologies currently supports only an All-Flash configuration for stretched clusters.

(get-cluster).SameSubnetThreshold = 10

(get-cluster).CrossSubnetThreshold = 20

SDN on multi-site clusters is not supported at this time.

For the maximum supported hardware configuration, see Review maximum supported hardware specifications .

Customer network team requirements

Depending on the network configuration chosen, customers should ensure that the requisite end-to-end routing is enabled for inter-site communication. A minimum of one IP route or three IP routes based on Basic or High Throughput configuration is required for the environment.

Depending on the network configuration, the customer network team may also need to add static routes on the switches or on

Layer-3 to ensure site-to-site connectivity.

Design principles and best practices

Stretched clusters and Storage Replica

A stretched cluster setup has two sites and two storage pools. Replicating data across WAN and writes on both sites results in lower performance compared to a standalone Storage Spaces Direct Cluster. Low latency inter-site links are necessary for optimum performance of workloads. Low bandwidth and high latency between sites can result in very poor performance on the primary site in the case of both synchronous and asynchronous replication.

Synchronous replication involves data blocks being written to log files on both sites before being committed. In asynchronous replication, the remote node accepts the block of replicated data and acknowledges back to the source copy. Application performance is not affected unless the rate of change of data is faster than the bandwidth of the replica link between the sites for large periods of time. This point is critical and must be taken into consideration when you are designing the solution.

The size of the log volume has no bearing on the performance of the solution. A larger log collects and retains more write I/Os before they are wrapped out. This allows for an interruption in service between the two sites (such as a network outage or the destination site being offline) to go on for a longer period.

Table 3. Disk writes

Scenario

Standalone storage spaces

Replication to secondary site

Writes in two-way mirrored volumes Writes in three-way mirrored volumes

2x

4x

3x

6x

NOTE: WAN latency and additional writes to log volumes on both sites causes higher write latency. Along with writes to the log and data disks, the inter-site bandwidth and latency also play a role in limiting the IOPs in the environment. For this reason, we highly recommend using all-flash configurations for stretched clusters.

12 Solution deployment

NOTE: In a Storage Spaces Direct environment both data and log volumes eventually reside on the same SSD pool because multiple storage pools per site are not supported.

The following figure illustrates the difference between synchronous and asynchronous replication:

Figure 2. Synchronous and asynchronous replication

Synchronous replication : A block of data written by an application to a volume on Site A (1) is written first to the corresponding log volume on the same site (2), and is then replicated to Site B (2). At site B, the block of data is written to the Replica log volume (3) before a commit is sent back to the application using the same route (4 and 5). The block is subsequently pushed to the data volumes on both sites. For each block of data that the application writes, the commit is issued only after data is written to the secondary site. Thus there is no data loss at file system level in the event of a site failure. This results in a lower application write performance compared to a standalone deployment.

Asynchronous replication : A block of data written by an application to a volume on Site A (1) is written first to the corresponding log volume on the same site (2). A commit is immediately sent back to the application. At the same time, the block of data is replicated to Site B and written to the Replica log volume. In the case of a site failure, the cluster ensures that no data is lost beyond the configured Recovery Point Objective (RPO). Application performance is not affected unless the rate of change of data is faster than the bandwidth of the replica link between the sites for large periods of time. This is critical and must be taken into consideration when designing the solution.

NOTE: Both replication scenarios affect application performance because each data block has to be written multiple times, assuming that all volumes are configured for replication.

NOTE: Stretched cluster with Storage Replica is not a substitute for a backup solution. Stretched cluster is a disaster recovery solution that keeps a business running in the event of a site failure. Customers should still rely on application and infrastructure backup solutions to recover lost data due to user error or application/data corruption.

CAUTION: To use the Data Deduplication feature for your Azure Stack HCI data volumes, you must install the server role on both the source and destination servers. Do not enable Data Deduplication on the destination volumes within an Azure Stack HCI stretched cluster. Data Deduplication manages the writes, so it must run only on the source cluster nodes. Destination nodes always receive Deduplicated copies of each volume.

Solution deployment 13

Validated network topology

Basic configuration

This section describes the host network configuration and network cards that are required to configure a basic stretched cluster. The purpose of this topology is to keep the host and inter-site configuration simple with little or no change to a standard standalone cluster networking architecture.

Here we use two 25 GbE NICs for each host on both sites. One NIC is dedicated to intra-site storage traffic, similar to a standalone Storage Spaces Direct environment. The second NIC is used for management, compute, and Storage Replica traffic.

To ensure management traffic is not bottlenecked due to high traffic on the Replica network, we request the customer network team to throttle traffic between the two sites using firewall or router QoS rules. It is recommended that the network is throttled to 50 percent of the capacity of the total number of network cards supporting the management NIC team.

The management network is the only interface between the two sites. Because only one network pipe is available between the hosts on Site A and Site B, you will see the following warning in the cluster validation. This is an expected behavior.

Node SiteANode1.Test.lab is reachable from Node SiteBNode1.Test.lab by only one pair of network interfaces. It is possible that this network path is a single point of failure for communication within the cluster. Please verify that this single path is highly available, or consider adding additional networks to the cluster.

Table 4. Sample IP address schema

Site A

Management/Replica Traffic 192.168.100.0/24

Intra-site Storage (RDMA) - 1 192.168.101.0/24

Site B

192.168.200.0/24

192.168.201.0/24

Type of traffic

L2/L3

L2

Intra-site Storage (RDMA) - 2 192.168.102.0/24

VMNetwork/Compute

Network

192.168.202.0/24 L2

As per customer environment As per customer environment L2/L3

The following figure shows the network topology of a basic stretched cluster:

Figure 3. Network topology for a stretched cluster (basic)

14 Solution deployment

High throughput configuration

In this topology we use two 25 GbE NICs and two additional 1/10/25 GbE ports from each host to configure a high throughput stretched cluster. One NIC is dedicated for intra-site RDMA traffic, similar to a standalone Storage Spaces Direct environment.

The second NIC is used for replica traffic. SMB Multichannel is used to distribute traffic evenly across both replica adapters and it increases network performance and availability. SMB Multichannel enables the use of multiple network connections simultaneously, and facilitates the aggregation of network bandwidth and network fault tolerance when multiple paths are available. For more information, see Manage SMB Multichannel .

The Set-SRNetworkConstraint cmdlet is used to ensure replica traffic flows only through the dedicated interfaces and not through the management interface. Run this cmdlet once for each volume.

IP Address schema

The following table shows the IP Address schema:

Table 5. IP Address schema

Site A

Management 192.168.100.0/24

Intra-site Storage (RDMA) - 1 192.168.101.0/24

Intra-site Storage (RDMA) - 2 192.168.102.0/24

Replica - 1*

Replica - 2*

VMNetwork

Cluster IP

192.168.111.0/24

192.168.112.0/24

Site B

192.168.200.0/24

192.168.201.0/24

192.168.202.0/24

192.168.211.0/24

192.168.212.0/24

Type of traffic

L2/L3

L2

L2

L2/L3

L2/L3

As per customer environment As per customer environment L2/L3

192.168.100.100

192.168.200.100

L2

*Static routes are needed on all hosts on both sites to ensure the 192.168.111.0/24 network can reach 192.168.211.0/24 and the

192.168.112.0/24 network can reach 192.168.212.0/24. Static routes are needed in this network topology because we have three network pipes between Site A and Site B. Network traffic on Management uses the default gateway to traverse the network, while Replica network uses static routes on the hosts to reach the secondary site. If your ToR switches do not have BGP configured, static routes are needed on them also.

The following figure shows the network topology of an advanced stretched cluster:

Figure 4. Network topology for a stretched cluster (advanced)

Solution deployment 15

4

Creating a stretched cluster

This chapter presents the following topics:

Topics:

Introduction

Test-Cluster

Cluster creation

Volumes

Storage efficiency

Test-SRTopology

Introduction

This section outlines the steps that are needed for configuring a stretched cluster. Complete the network configuration on all nodes for the network topology applicable to you. A sample IP address schema is provided for both supported network topologies in the previous section of this guide. Consider these points before you begin:

● Ensure management IPs of all nodes are reachable from any host

● Ensure static routes are configured on all hosts for inter-site communication using the Replica network

● Ensure all nodes from Site A can reach corresponding Replica IPs on Site B using the Replica path

Test-Cluster

Test-Cluster is a fundamental function that is needed to ensure that the cluster to be created meets Microsoft recommendations regarding Failover Clustering. It also ensures that your hardware and settings are compatible. Run Test-

Cluster with all nodes and include All Tests (namely, 'Storage Spaces Direct', 'Inventory', 'Network' and 'System Configuration').

Ensure there are test-cluster passes without warnings for the 'High Throughput Configuration', while 'Basic Configuration' will receive a warning as mentioned in the previous section of this guide.

Cluster creation

This section looks at creating a cluster using PowerShell cmdlets.

Manual cluster creation

Once Test-Cluster completes successfully, use the New-Cluster cmdlet to create a new stretched cluster. Because the nodes specified are part of different IP schemas, Enable-ClusterS2D understands that the cluster is part of a multi-site topology. It automatically creates two storage pools and corresponding ClusterPerformanceHistory volumes and their replica volumes.

After a cluster is created, you will see a warning similar to the following example. This is an expected behavior.

No matching network interface found for resource 'Cluster IP Address 172.18.160.160' IP address '192.168.200.100' (return code was '5035'). If your cluster nodes span different subnets, this may be normal.

Configure cluster witness and Enable Storage Spaces Direct on the cluster.

NOTE: Cluster witness can be either on a tertiary site or on Azure Cloud. Ensure that the "Storage Replica" feature is installed on all nodes in the cluster.

16 Creating a stretched cluster

If Sites and Services with IP Subnets are configured on Active Directory, Failover Cluster Manager correctly shows a node to

Site mapping, under Cluster Name >> Nodes .

The following is a sample image of IP subnets defined in an Active Directory:

Figure 5. IP subnets in an Active Directory

If both sites are in the same IP network, use the New-ClusterFaultDomain cmdlet to define the two site names. Site names defined using New-ClusterFaultDomain override the names given in Active Directory.

The following is a sample image of how sites appears in Failover Cluster Manager:

Figure 6. Sites in Failover Cluster Manager

Once a cluster is created, use Failover Cluster Manager to rename the cluster networks.

Creating a stretched cluster 17

Figure 7. Cluster networks

Volumes

Replication-enabled volumes can be created using a combination of PowerShell and Failover Cluster Manager or by using

Windows Admin Center.

NOTE: Install Storage Replica Module for Windows PowerShell (RSAT-Storage-Replica) on the management node with

Desktop Experience that is used for installing Windows Admin Center and Failover Cluster Manager to access the cluster.

NOTE: Site-to-site volume encryption uses additional CPU resources to encrypt data, and could potentially cause cluster performance-related issues. Microsoft and Dell recommend disabling site-to-site volume encryption unless required.

For each replica-enabled volume, you need a corresponding log volume on both sites (with a minimum of 8 GB in size) and an equivalent replica volume on the secondary site. The log volume is used to serialize writes for replication.

The following table shows the volumes that are must create a 1 TB replica volume:

Table 6. Volumes in a 1 TB replica volume

Site A

VolumeA

VolumeA-Log

Size

1 TB

40 GB

VolumeB

VolumeB-Log

500 GB

40 GB

Site B

VolumeA-Replica

VolumeA-Replica-Log

VolumeB-Replica

VolumeB-Replica-Log

Size

1 TB

40 GB

500 GB

40 GB

Once a replicated volume is created, go to Storage Replica > Partnership > Settings > Modify Partnership settings to verify the status of the replica traffic encryption.

18 Creating a stretched cluster

See

Appendix A

for the correct PowerShell Cmdlets and Failover Cluster Manager steps to create the volumes shown in this table. It is recommended that you create two-way mirrors for all volumes to improve write performance and capacity efficiency.

NOTE: For Asynchronous Replication, the RPO can be set as low as 30 s.

For a planned site failure, when the volume replication direction is reversed, the disk reservations on the secondary site for

Replica Volume and Replica-log volumes are removed and moved to the primary site. Source Data and Source Log volumes are given the disk reservations and become active on the secondary site. After 10 minutes, the virtual machines residing on the primary site associated with the migrated volume automatically Live Migrate to the secondary site.

Storage efficiency

Due to high I/Os on the underlying disks, stretched clusters require an underlying infrastructure capable of delivering high I/Os with low latency. Dell Technologies recommends all-flash configurations for stretched cluster deployments.

All-flash configurations do not have a cache tier. The following table shows the difference in storage efficiency for a two-way and three-way mirror created on a single site and stretched cluster environment:

Table 7. Storage efficiency differences

Single site

Two-Way Mirror

50%

Stretched cluster 25%

Three-Way Mirror

33%

16.5%

Test-SRTopology

This cmdlet validates a potential replication partnership. The cmdlet:

● Measures bandwidth and round trip latency

● Estimates initial sync time

● Verifies that source and destination volumes exist

● Verifies that there is sufficient physical memory to run replication

NOTE: The file server feature has to be enabled on the nodes to run this cmdlet.

Keep the generated report for future reference.

For more information about this cmdlet, see Test-SRTopology .

Creating a stretched cluster 19

5

Virtual Machines

This chapter presents the following topics:

Topics:

Introduction

VM and storage affinity rules

Preferred sites

Introduction

Virtual Machines in a stretched cluster environment can be managed using:

● PowerShell

● Failover Cluster Manager

● Windows Admin Center

For more information, see Manage VMs on Azure Stack HCI using Windows Admin Center.

In a stretched cluster environment, volumes hosting the virtual machines may or may not be replicated, depending on business requirements. A VM that is hosted on a replicated volume runs on the site where the volume is mounted. If the volume moves from the primary site to the secondary site due to a planned downtime or site failure, the VMs follow the volumes to the secondary site after 10 minutes while the cluster service balances the VMs across the nodes on the secondary site after 30 minutes. To enable faster movement of VMs after the Virtual Disk ownership is transferred to the secondary site, you can initiate Live Migration manually.

VM and storage affinity rules

You can use PowerShell to create affinity and anti-affinity rules for your VMs in a cluster. An affinity rule is one that establishes a relationship between two or more resource groups or roles, such as VMs, to keep them together in an Azure Stack HCI cluster. An anti-affinity rule does the opposite, keeping specified resource groups apart from each other.

You can use storage affinity rules to keep a VM and its associated Virtual Hard Disk v2 (VHDX) on a Cluster Shared Volume

(CSV) on the same cluster node. This ensures CSV redirection does not occur and keeps application performance at optimal levels. For more information, see Storage affinity rules.

Preferred sites

Configure preferred sites in a stretched cluster to define a location in which you want to run all your resources. This ensures that VMs and volumes come up on the preferred site after a cold start or after network connectivity issues. A dynamic quorum ensures that preferred sites survive after events such as asymmetric network connectivity failures.

In the event of a quorum split, if witnesses cannot be contacted, the preferred site is selected and the passive site drops out of the cluster membership.

VMs in a stretched cluster are placed based on the following site priority:

● Storage Affinity Site

● Group Preferred Site

● Cluster Preferred Site

Host VMs and associated Virtual Disks

VM placement can be difficult in a Hyper-V clustered environment. Always ensure that VMs that have I/O-intensive workloads are hosted on the node that owns the VM's Virtual Disk.

20 Virtual Machines

6

Failure/Recovery from failure of Site/Node

This chapter presents the following topics:

Topics:

Planned failover

Operation steps

Planned failover

Windows Admin Center has a Switch Direction feature that allows you to migrate workloads from one site to the other. This must be initiated on each volume. VMs hosted on the volumes follow the volumes to the migrated site after 10 minutes. This feature is helpful in scenarios such as:

● There is a planned downtime

● A potential weather event could take the site down

To use the Switch Direction feature, go to Windows Admin Center and select Storage Replica on the left pane. Then select the

SR Partnership for which you would like to change the Replication Direction. Select More and click on Switch Direction .

In the event of a site failure, if a volume is replicating synchronously then the data and the log volume automatically come online on the surviving site, along with VMs associated with this volume because the RPO is 0. For asynchronous replication the data and the log volume do not come online automatically because the RPO is not equal to 0.

When the failed site comes back online, the Replica and Replica-Log volume are moved to the primary site with persistent disk reservations, and replication begins again. For a synchronous replicated volume, the replication direction cannot be changed until replication is 100 percent complete.

Operation steps

The following sections describe the steps to take in the event of different failure types.

Node failure

Handling a node failure on either site in a stretched cluster environment is no different than managing one in a traditional or standalone Azure Stack HCI cluster. A complete node failure would result in operating system or HBA corruption or complete hardware failure on the node. In either case, restoring system functionality is the priority.

The high level steps to do this are:

1. Replace the hardware as needed.

2. Re-install the operating system on the operating system drives (if needed).

3. Join the system to the domain.

4. Ensure you assign the new node IPs specific to the site where the node is hosted.

5. Add the node to the existing stretched cluster.

6. Based on the IP subnets used or the Cluster Fault Domains added, the cluster adds the drives to the correct pool.

7. Wait for the storage jobs to complete.

8. During this process the workloads on the affected site would still be running and there should be no interruption of replication.

Failure/Recovery from failure of Site/Node 21

Site failure

A site failure in a stretched cluster topology requires rebuilding all of the nodes of the affected site. If the failure happens at the primary site, the following scenarios occur:

● All volumes hosted on the affected site and associated VMs become inaccessible.

● After a brief period, the volumes move to the secondary site.

● The VMs restart on the secondary site.

● Depending on whether synchronous or asynchronous replication is being used, you either have zero data loss or data loss within the limits of the defined RPO:

○ For the replica volumes configured with synchronous replication, the VMs are crash consistent. Application recovery depends on the available backup/recovery of the application.

○ For the replica volumes configured with asynchronous replication, the VMs are not crash consistent. The default RPO is

30 seconds. It can be configured using PowerShell or Windows Admin Center. Application recovery still depends on the available backup/recovery of the application.

Site recovery

Follow these steps to recover the nodes on the failed site:

1. Remove the failed nodes from the cluster and remove the computer names from the Active Directory.

2. Remove SRPartnership and SRGroups using PowerShell cmdlets. Replication can also be disabled from the Failover Cluster

Manager.

3. Bring up all the nodes on the affected site. The node names and IPs used should be the same as those used before the crash.

4. Join the nodes to the domain.

5. Add all the nodes to the existing stretched cluster at the same time.

6. All drives in the new site will be added to a new pool.

7. Re-create and enable replication for replica volumes and associated log volumes using Failover Cluster Manager or

PowerShell cmdlets.

22 Failure/Recovery from failure of Site/Node

7

Appendices

These appendices present the following topics:

Topics:

Appendix A: Sample PowerShell cmdlets for end-to-end deployment

Appendix B: Supported hardware

Appendix A: Sample PowerShell cmdlets for end-toend deployment

Install required Windows features

Install-WindowsFeature -Name Fs-Fileserver,Storage-Replica ,Hyper-V, Failover-

Clustering, Data-Center-Bridging -IncludeAllSubFeature -IncludeManagementTools -

Verbose

Create VM switches and configure host networking

For detailed QoS configuration for SMB traffic (Mellanox RDMA adapters only), see QoS policy configuration .

Test-Cluster

Test-Cluster -Node SiteANode1,SiteANode2,SiteBNode1,SiteBNode2 -Include 'Storage

Spaces Direct', Inventory', 'Network', 'System Configuration'

New-Cluster

New-Cluster -Name R740StretchCluster -Node

SiteANode1,SiteANode2,SiteBNode1,SiteBNode2 -NoStorage -StaticAddress

192.168.100.100,192.168.200.100 -IgnoreNetwork

192.168.101.0/24,192.168.201.0/24, 192.168.102.0/24,192.168.202.0/24

Witness

Configure a highly available file share witness at a tertiary site or on Azure cloud .

Set-ClusterQuorum -FileShareWitness \\tertiaryShare\witness

Active Directory sites

Ensure that you have created two sites on your Active Directory based on IP's subnets. This helps with assigning the correct names of sites on the cluster.

Appendices 23

If you do not have sites that are configured on Active Directory, create cluster fault domains as required. The following cmdlet overrides the site names that are specified on the Active Directory:

#This is needed if you do not have sites configured on AD

#Create Sites

New-ClusterFaultDomain -Name 'Bangalore' -Type Site

New-ClusterFaultDomain -Name 'Chennai' -Type Site

#Site membership for nodes

Set-ClusterFaultDomain -Name SiteANode1 -Parent 'Bangalore'

Set-ClusterFaultDomain -Name SiteANode2 -Parent 'Bangalore'

Set-ClusterFaultDomain -Name SiteBNode1 -Parent 'Chennai'

Set-ClusterFaultDomain -Name SiteBNode2 -Parent 'Chennai'

Preferred Sites

The Preferred Site is your primary data center site.

NOTE: Do not configure a Cluster fault domain as a Preferred Site if you plan to run active VMs on both sites, otherwise choosing to Live Migrate VMs using the "Best Possible Node" option results in all VMs moving to the Preferred Site.

#Preferred Site

(get-cluster).PreferredSite = 'Bangalore'

Preferred Sites can also be configured at cluster role and group level.

(Get-ClusterGroup -Name SQLServer1).PreferredSite = 'Bangalore'

If there is an Active-Active stretched cluster where Preferred Sites are not configured, it is highly recommended that you configure Preferred Sites for each volume. This ensures that the volumes stay at the same site if there is a single node failure on either site.

(Get-ClusterSharedVolume "Cluster Virtual Disk (ax740xds2N2)" | Get-

ClusterGroup).PreferredSite = "Chennai"

Get-ClusterSharedVolume "*ax740xds2N2)" | Get-ClusterGroup |fl *

Enable Storage Spaces

Enable-ClusterS2D -confirm:$false

This step ensures that Storage Spaces Direct is enabled on the stretched cluster. Two storage pools are created, one for each site.

PS C:\Users\Administrator.TEST> Get-StoragePool

FriendlyName Operational HealthStatus IsPrimordial IsReadOnly Size AllocatedSize

Status

------------ -------------- ------------ ------------ ---------- --------- -------------

Primordial OK Healthy True False 15.94 TB 15.71 TB

Primordial OK Healthy True False 15.94 TB 15.71 TB

Pool for Site OK Healthy False False 15.7 TB 4.28 TB

Chennai

Pool for Site OK Healthy False False 15.7 TB 4.28 TB

Bangalore

New Volume

You can create a new-replicated volume by using Windows Admin Center or a mixture of PowerShell and Failover Cluster

Manager.

Using PowerShell and Failover Cluster Manager

24 Appendices

For each volume that you want to replicate across two sites, you have to create its associated Replica volume and Log volume on both sites.

NOTE: Customers can choose to enable replication on the volumes based on their business needs.

#Primary Volume

New-Volume -StoragePoolFriendlyName "Pool for Site Bangalore" -FriendlyName

VolumeA -FileSystem CSVFS_ReFS -Size 1TB

#Log Volume

New-Volume -StoragePoolFriendlyName "Pool for Site Bangalore" -FriendlyName

'VolumeA-Log' -FileSystem ReFS -Size 50GB

#Replica Volume

New-Volume -StoragePoolFriendlyName "Pool for Site Chennai" -FriendlyName

'VolumeA-Replica' -FileSystem ReFS -Size 1TB

#Replica Log Volume

New-Volume -StoragePoolFriendlyName "Pool for Site Chennai" -FriendlyName

'VolumeA-Replica-Log' -FileSystem ReFS -Size 50GB

NOTE: Ensure that Replica volumes and all log volumes are ReFS (not CSVFS) before enabling replication using PowerShell or Failover Cluster Manager.

Using Failover Cluster Manager to enable Volume Replica

To enable replication on volumes, go to Storage >> Disks and right-click the primary volume on which you want to enable replication. Then follow these steps:

● Select Replication and click Enable

● Select the log volume for the primary site

● Select the Replica volume and associated log volume for the secondary site

● Overwrite the destination volume unless you have a seeded disk.

● Select the mode of replication.

● Complete the wizard.

This enables replication on the volume after the initial block copy. The initial block copy process can take a few minutes to a few hours, depending on the size of the volume.

Test-SRTopology

This cmdlet validates a potential replication partnership between source and destination systems. Follow these steps:

● Create a local CSVFS volume, for example 1 TB.

● Create a local log volume (ReFS), for example 50 GB.

● Create a Replica volume (ReFS), for example 1 TB (Ensure that the local and replica volumes are the same size).

● Create a Replica-log volume (ReFS), for example 50 GB.

###Step 1###

New-Volume -StoragePoolFriendlyName "Pool for Site Bangalore" -FriendlyName

'VolumeA' -FileSystem CSVFS_ReFS -Size 1TB

#Log Volume

New-Volume -StoragePoolFriendlyName "Pool for Site Bangalore" -FriendlyName

'VolumeA-Log' -FileSystem ReFS -Size 50GB

## Move 'Available Storage to Site B' ##

Get-ClusterGroup -Name 'Available Storage' | Get-ClusterResource | Stop-

ClusterResource

Move-ClusterGroup -Name 'Available Storage' -Node ax740xds2n1

###Step 2###

#Replica Volume

New-Volume -StoragePoolFriendlyName "Pool for Site Chennai" -FriendlyName

'VolumeA-Replica' -FileSystem ReFS -Size 1TB

#Replica Log Volume

New-Volume -StoragePoolFriendlyName "Pool for Site Chennai" -FriendlyName

'VolumeA-Replica-Log' -FileSystem ReFS -Size 50GB

###Step 3

Appendices 25

# Create Replication Group for secondary volumes

$PathVolARep =Get-Volume -FriendlyName VolumeA-Replica | Select -

ExpandProperty Path

$PathVolARepLog=Get-Volume -FriendlyName VolumeA-Replica-Log | Select -

ExpandProperty Path

New-SRGroup -ComputerName ax740xds2n1 -Name Group108 -VolumeName $PathVolARep -

LogVolumeName $PathVolARepLog -LogSizeInBytes 2GB

Get-ClusterGroup -Name 'Available Storage' | Get-ClusterResource | Stop-

ClusterResource

Move-ClusterGroup -Name 'Available Storage' -Node ax740xds1n2

# Assign drive letters

Get-Volume -FriendlyName VolumeB-Log | Get-Partition | Set-Partition -

NewDriveLetter H

Get-Volume -FriendlyName VolumeB-Replica | Get-Partition | Set-Partition -

NewDriveLetter I

Get-Volume -FriendlyName VolumeB-Replica-Log | Get-Partition | Set-Partition -

NewDriveLetter J

Test-SRTopology -SourceComputerName ax740xds1n2 -SourceVolumeName

C:\ClusterStorage\VolumeB -SourceLogVolumeName H -DestinationComputerName ax740xds2n1 -DestinationVolumeName I -DestinationLogVolumeName J -

DurationInMinutes 30 -ResultPath .\TopologyResults

The preceding cmdlet completes in 30 minutes and displays the results in HTML format.

NOTE: Run Test-SRTopology only for a single volume.

NOTE: If you choose asynchronous replication, ensure that you choose a log volume size of at least 30 GB.

NOTE: Use either PowerShell or Windows Admin Center to create volumes, do not mix the tools. There is a minor difference in volume sizes created using PowerShell and Windows Admin Center that results in a failure if you try to enable replication.

Set-SRNetworkConstraint

In a network topology which has multiple routes to the secondary site, it is imperative to provide a correct path for the Replica network. Set-SRNetworkConstraint is a cmdlet that is useful for specifying an array of network interfaces to be used for replica traffic. This cmdlet has to be run once for each volume.

Set-SRNetworkConstraint -SourceRGName "Replication 2" -SourceNWInterface "SR -

Site B" -DestinationRGName "Replication 1" -DestinationNWInterface "SR - Site A"

-SourceComputerName SiteANode1 -DestinationComputerName SiteBNode1 -Verbose

Set-SRNetworkConstraint ensures that the Management network does not become bottlenecked because of Replica traffic.

Appendix B: Supported hardware

See the Dell support matrix for hardware and configurations validated with stretched clusters:

● Support Matrix for Dell EMC Solutions for Microsoft Azure Stack HCI

26 Appendices

8

Day 1 operations

After deploying the Azure Stack HCI cluster, complete Day 1 operations.

Topics:

Day 1 operations overview

Known issues

Microsoft HCI Solutions from Dell Technologies overview

Deployment guidance

Azure onboarding for Azure Stack HCI operating system

Creating virtual disks

Day 1 operations overview

This section includes an overview of Microsoft HCI Solutions from Dell Technologies, guidance to monitor and manage bare metal, and instructions for performing operations on clusters and updating the cluster-aware system. This guide is applicable only to infrastructure that is built by using the validated and certified Microsoft HCI Solutions from Dell Technologies.

Microsoft HCI Solutions from Dell Technologies refers to:

● Dell Integrated System for Azure Stack HCI (based on Azure Stack HCI operating system).

● Dell HCI Solutions for Microsoft Windows Server (based on Windows Server 2019 or 2022).

Instructions in this section are applicable only to the generally available operating system build of Windows Server 2019,

Windows Server 2022, and Azure Stack HCI operating system with the latest applicable updates. These instructions are not validated with Windows Server version 1709. Microsoft HCI Solutions from Dell Technologies does not support the Windows

Server Semi-Annual Channel release. Dell Technologies recommends updating the host operating system with the latest cumulative updates from Microsoft before starting the cluster creation and configuration tasks.

Known issues

Before starting the cluster deployment, see Dell EMC Solutions for Microsoft Azure Stack HCI - Known Issues for known issues and workarounds.

Microsoft HCI Solutions from Dell Technologies overview

Microsoft HCI Solutions from Dell Technologies encompass various configurations of AX nodes from Dell Technologies to power the primary compute cluster that is deployed as a HCI. The HCI that is built by using these AX nodes uses a flexible solution architecture rather than a fixed component design. The following figure illustrates one of the flexible solution architectures. It consists of a compute cluster alongside the redundant top-of-rack (ToR) switches, a separate out-of-band network, and an existing management infrastructure in the data center.

Day 1 operations 27

Figure 8. Hyperconverged virtualized solution using AX nodes

Deployment guidance

For deployment guidance and instructions for configuring a cluster using Microsoft HCI Solutions from Dell Technologies, see

Microsoft HCI Solutions from Dell Technologies . This operations guidance is applicable only to cluster infrastructure that is built using the instructions provided in the deployment documentation for AX nodes.

28 Day 1 operations

Azure onboarding for Azure Stack HCI operating system

Clusters deployed using Azure Stack HCI operating system must be onboarded to Microsoft Azure for full functionality and support. For more information about firewall requirements and to connect Azure HCI clusters, see Firewall requirements for

Azure Stack HCI and Connect Azure Stack HCI to Azure respectively.

Creating virtual disks

Cluster creation and enabling Storage Spaces Direct configuration on the cluster creates only a storage pool and does not provision any virtual disks in the storage pool. Use the New-Volume cmdlet to provision new virtual disks as the shared volumes for the cluster.

When creating volumes in the Azure Stack HCI cluster infrastructure:

● Ensure that you create multiple volumes—a multiple of the number of servers in the cluster. For optimal performance, each cluster node should own at least one virtual disk volume. Virtual machines (VMs) on each volume perform optimally when running on the volume owner node.

● Limit the number of volumes in the cluster to 32 on Windows Server 2016, and to 64 on Windows Server 2019/2022 and the

Azure Stack HCI operating system.

● Ensure that the storage pool has enough reserve capacity for any in-place volume repairs due to failed disk replacement. The reserved capacity must be at least equivalent to the size of one capacity drive per server and up to four drives.

● A single node cluster supports two three-way mirror volumes and nested two-way mirror volumes.

For general guidance about planning volume creation, see Planning volumes in Storage Spaces Direct .

NOTE: Dell Technologies recommends that you use the following resiliency settings when you create virtual disks:

● On Windows Server 2016, Windows Server 2019, Windows Server 2022, and the Azure Stack HCI operating system clusters with three or more nodes—Three-way mirror.

● On Windows Server 2019, Windows Server 2022, and Azure Stack HCI operating system clusters with four or more nodes—Three-way mirror or mirror-accelerated parity.

NOTE: A physical disk is the fault domain in a single node cluster, as compared to StorageScaleUnit in a multi-node cluster.

Day 1 operations 29

9

Managing and monitoring clusters with

Windows Admin Center

Topics:

Overview

Install Windows Admin Center

Add the HCI cluster connection

Access the HCI cluster

View server details

View drive details

Managing and monitoring volumes

Enabling data deduplication on Storage Spaces Direct

Monitoring and managing VMs

Managing virtual switches

Overview

Windows Admin Center is a browser-based management tool developed by Microsoft to monitor and manage Windows Servers, failover clusters, and hyperconverged clusters.

The AX nodes for Storage Spaces Direct offer software-defined storage building blocks for creating highly available and highly scalable HCI. The AX nodes are preconfigured with certified components and validated as a Storage Spaces Direct solution that includes Dell PowerSwitch S-Series switches, with simplified ordering and reduced deployment risks. Dell Technologies offers configuration options within these building blocks to meet different capacity and performance points. With Windows Admin

Center, you can seamlessly monitor and manage the HCI clusters that are created on these building blocks.

Install Windows Admin Center

Download Windows Admin Center version 2211 or later from Microsoft download center and install it on Windows 10, Windows

Server 2016, Windows Server 2019, Windows Server 2022, or Windows Server version 1709. Install Windows Admin Center directly on a managed node to manage itself. You can also install Windows Admin Center on other nodes in the infrastructure or on a separate management station to manage the AX nodes remotely. It is possible to implement high availability for Windows

Admin Center by using failover clustering. When Windows Admin Center is deployed on nodes in a failover cluster, it acts as an active/passive cluster providing a highly available Windows Admin Center instance.

The Windows Admin Center installer wizard performs the configuration tasks that are required for Windows Admin Center functionality. These tasks include creating a self-signed certificate and configuring trusted hosts for remote node access.

Optionally, you can supply the certificate thumbprint that is already present in the target node local certificate store. By default,

Windows Admin Center listens on port 443 (you can change the port during the installation process).

NOTE: The automatically generated self-signed certificate expires in 60 days. Ensure that you use a certificate authority

(CA)-provided SSL certificate if you intend to use Windows Admin Center in a production environment.

For complete guidance about installing Windows Admin Center on Windows Server 2016 or higher with desktop experience or

Server Core, see Install Windows Admin Center .

NOTE: This section assumes that you have deployed the Azure Stack HCI cluster from Dell Technologies following the deployment guidance that is available at: https://dell.com/azurestackhcimanuals .

After the installation is complete, you can access Windows Admin Center at https:// managementstationname: <PortNumber> .

30 Managing and monitoring clusters with Windows Admin Center

Figure 9. Windows Admin Center start screen

Add the HCI cluster connection

About this task

This task applies to Azure Stack HCI and Windows Server. For monitoring and management purposes, add the hyperconverged cluster that is based on Microsoft HCI Solutions from Dell Technologies as a connection in Windows Admin Center.

Steps

1. Go to Windows Admin Center > Cluster Manager .

Figure 10. HCI cluster navigation

2. Click Add .

The Add Cluster window is displayed.

3. Enter the cluster FQDN and select Also add servers in the cluster .

Managing and monitoring clusters with Windows Admin Center 31

Figure 11. Adding the HCI cluster

Windows Admin Center discovers the cluster and nodes that are part of the cluster.

4. Click Add .

The cluster is added to the connection list and Windows Admin Center is configured to monitor and manage the HCI cluster.

Access the HCI cluster

To view the dashboard for the HCI cluster that you have added to Windows Admin Center, click the cluster name in the Cluster

Manager window.

This dashboard provides the real-time performance view from the HCI cluster. This view includes total IOPS, average latency values, throughput achieved, average CPU usage, memory usage, and storage usage from all cluster nodes. It also provides a summarized view of the Azure Stack HCI cluster with drives, volumes, and VM health.

You can examine an alert by clicking the alerts tile in the dashboard.

32 Managing and monitoring clusters with Windows Admin Center

Figure 12. HCI dashboard in Windows Admin Center

View server details

To view the server details, click the tools pane and go to Servers > Inventory .

Figure 13. Servers: Inventory tab

NOTE: The metrics in the figure are for a four-node Azure Stack HCI cluster with all-flash drive configuration.

Managing and monitoring clusters with Windows Admin Center 33

View drive details

About this task

View the total number of drives in the cluster, the health status of the drives, and the used, available, and reserve storage of the cluster as follows.

Steps

1. In the left pane, select Drives .

2. Click the Summary tab.

Figure 14. Drives: Summary tab

To view the drive inventory from the cluster nodes, from the left pane, select Drives , and then click the Inventory tab.

34 Managing and monitoring clusters with Windows Admin Center

Figure 15. Drives: Inventory tab

The HCI cluster is built using four AX-740xd nodes, each with two 1.92 TB NVMe drives.

By clicking the serial number of the drive, you can view the drive information, which includes health status, slot location, size, type, firmware version, IOPS, used or available capacity, and storage pool of the drive.

Also, from the dashboard, you can set the drive options as Light On or Light Off , or Retire or Unretire from the storage pool.

Managing and monitoring volumes

You can manage and monitor the Storage Spaces Direct volumes using Windows Admin Center.

The following features are supported in Windows Admin Center:

● Create volume

● Browse volume

● Expand volume

● Delete volume

● Make volume offline or online

To access the volumes on the HCI cluster, select the cluster and, in the left pane, click Volumes . In the right pane, the

Summary and Inventory tabs are displayed.

The Summary tab shows the number of volumes in the cluster and the health status of the volumes, alerts, total IOPS, latency, and throughput information of the available volumes.

Managing and monitoring clusters with Windows Admin Center 35

Figure 16. Volumes: Summary tab

The Inventory tab provides the volume inventory from the HCI cluster nodes. You can manage and monitor the volumes.

Figure 17. Volumes: Inventory tab

Creating volumes in Storage Spaces Direct

About this task

Create volumes in Storage Spaces Direct in Windows Admin Center as follows.

Steps

1. Go to Volumes > Inventory .

2. Click Create .

The Create volume window is displayed.

3. Enter the volume name, resiliency, and size of the volume, and then click Create .

The volume is created.

36 Managing and monitoring clusters with Windows Admin Center

Managing volumes

About this task

Open, expand, delete, or make a volume offline as follows.

Steps

1. Go to Volumes > Inventory .

2. Click the volume name.

3. Click Open to open the volume folder.

4. Click Offline or Delete to make the volume offline or to delete the volume.

5. Click Expand to expand the volume.

The Expand volume window is displayed.

6. Enter the additional size of the volume.

7. Select the volume size from the drop-down list and click Expand .

Enabling data deduplication on Storage Spaces Direct

About this task

Data deduplication helps to maximize free space on the volume by optimizing duplicated portions on the volume without compromising data fidelity or integrity.

NOTE: To enable data deduplication on an HCI cluster, ensure that the data deduplication feature is enabled on all the cluster nodes. To enable the data deduplication feature, run the following PowerShell command:

Install-WindowsFeature FS-Data-Deduplication

NOTE: To ensure the deduplication activities run in an optimized way, see Microsoft recommended workload Determine which workloads are candidates for Data Deduplication and for Advanced Data Deduplication settings, see Advanced Data

Deduplication settings .

Enable data deduplication using WAC and compression on a Storage Spaces Direct volume as follows.

Steps

1. Go to Volumes > Inventory .

2. Click the volume on which to enable data deduplication.

3. In the optional features, switch the ON button to enable deduplication and compression on that volume.

The Enable Deduplication window is displayed.

4. Click Start and select Hyper-V from the drop-down list.

5. Click Enable Deduplication .

Deduplication is enabled, and the Storage Spaces Direct volume is compressed.

Monitoring and managing VMs

You can use Windows Admin Center to monitor and manage the VMs that are hosted on the HCI cluster.

To access the VMs that are hosted on the HCI cluster, click the cluster name and, in the left pane, select Virtual machines . In the right pane, the Inventory and Summary tabs are displayed.

The Inventory tab provides a list of the VMs that are hosted on the HCI cluster and provides access to manage the VMs.

Managing and monitoring clusters with Windows Admin Center 37

Figure 18. VMs: Inventory tab

The Summary tab provides the following information about the VM environment of the HCI cluster:

● Total number of VMs, their state, and alerts

● Host and guest CPU utilization

● Host and guest memory utilization

● VM total IOPS, latency, and throughput information

38 Managing and monitoring clusters with Windows Admin Center

Figure 19. VMs: Summary tab

You can perform the following tasks from the Windows Admin Center console:

● View a list of VMs that are hosted on HCI cluster.

● View individual VM state, host server information, virtual machine uptime, CPU, memory utilization, and so on.

● Create a new VM.

● Modify VM settings.

● Set up VM protection.

● Delete, start, turn off, shut down, save, delete saved state, pause, resume, reset, add new checkpoint, move, rename, and connect VMs.

Managing virtual switches

The virtual switches tool in Windows Admin Center enables you to manage Hyper-V virtual switches of the cluster nodes.

The virtual switches tool supports the following features:

● View existing virtual switches on the server.

● Create a new virtual switch.

● Modify virtual switch properties.

● Delete a virtual switch.

Managing and monitoring clusters with Windows Admin Center 39

Figure 20. Virtual switches

40 Managing and monitoring clusters with Windows Admin Center

10

Dell OpenManage Integration with Windows

Admin Center

Topics:

Overview

Prerequisites for managing AX nodes

Installing the OMIMSWAC license

Managing Microsoft HCI-based clusters

Full Stack Cluster-Aware Offline Updating

Full Stack Cluster-Aware Updating for Azure Stack HCI clusters using the OpenManage Integration snap-in

Updating a standalone node before adding it to the cluster

Secure cluster with Secured-core

Enabling operating system features

Protect your infrastructure with infrastructure lock

Manage CPU cores in Azure Stack HCI clusters

Cluster expansion

Validate and remediate Azure Stack HCI clusters

Onboard Dell policies to Azure Arc from Windows Admin Center to manage Azure Stack HCI clusters

View recommendations for storage expansion

Known issues

Overview

Dell OpenManage Integration with Windows Admin Center enables IT administrators to manage the HCI that is created by using

Microsoft HCI Solutions from Dell Technologies. OpenManage Integration with Windows Admin Center simplifies the tasks of IT administrators by remotely managing the AX nodes and clusters throughout their life cycle.

For more information about the features, benefits, and installation of OpenManage Integration with Windows Admin Center, see the documentation at https://www.dell.com/support/home/product-support/product/openmanage-integration-microsoftwindows-admin-center/docs .

NOTE: For Storage Spaces Direct Ready Node, if you want to use the Cluster Aware Update premium feature to update the cluster using the Dell extension, contact a Dell sales representative to get an Azure Stack HCI license. See the

Firmware and driver updates using the manual method

section.

Prerequisites for managing AX nodes

The prerequisites for managing AX nodes are:

● You have installed the following:

○ Windows Admin Center version 2211 or later and you are logged in as a gateway administrator.

○ Dell OpenManage Integration with Microsoft Windows Admin Center extension version 3.0. For more information about the installation procedure, see the Dell OpenManage Integration Version 2.3 with Microsoft Windows Admin Center

Installation Guide .

○ Microsoft failover cluster extension version 2.20.0 release or above.

○ An OpenManage Integration with Microsoft Windows Admin Center (OMIMSWAC) Premium License on each AX node.

● You have added the

HCI cluster connection

in Microsoft Windows Admin Center.

● You can access the Windows Admin Center remotely using domain administrator credentials. Otherwise, use local administrator credentials to access the Windows Admin Center locally. For more information, see What type of installation is right for you?

Dell OpenManage Integration with Windows Admin Center 41

Installing the OMIMSWAC license

If you are using AX nodes, skip this step, because AX nodes have a preinstalled OMIMSWAC license. Storage Spaces Direct

Ready Nodes require the installation of an After Point of Sale (APOS) license.

Steps

1. Log in to iDRAC.

2. Select Configuration > Licenses .

3. Select Import , browse to and select the license, and then click Upload .

Managing Microsoft HCI-based clusters

Steps

1. In the upper left of Windows Admin Center, select Cluster Manager from the menu.

2. In the Cluster Connections window, click the cluster name.

3. In the left pane of Windows Admin Center, under EXTENSIONS , click OpenManage Integration .

4. Review the Dell Software License Agreement and Customer Notice , and select the check box to accept the terms of the license agreement.

Overview

Select View > Overview . The Overview page displays the following:

Cluster level information that includes:

● Number of cluster nodes

● Health of cluster that includes total number of components with health as critical across nodes or Health Ok if all components are healthy.

○ Green: Healthy

○ Yellow: Warning

○ Red: Critical

○ Grey: Unknown

● Secure core status:

○ Green: Enabled as all BIOS features are enabled on all nodes.

○ Yellow: Partially enabled as one or more nodes do not have all BIOS features enabled.

○ Red: Disabled as no node has all BIOS features enabled.

● Compliance status:

○ Green: Compliant as all cluster nodes are compliant with all Dell policies.

○ Yellow: Warning as one or more cluster nodes is not compliant with any optional policy.

○ Red: Error as cluster node is not compliant with Dell Hardware Symmetry Policy.

○ Grey: Unknown as any cluster node is not reachable.

Azure:

● Azure integration status and option to configure cluster settings.

● Policy onboard status and option to configure policies.

HCP and Firmware Compliance:

When extension loads for the first time, this section remains empty. This section is populated once you generate the reports from the Action menu.

42 Dell OpenManage Integration with Windows Admin Center

● HCP Compliance: Displays the compliance status of HCI cluster with number of components for each status.

○ Green: Components are compliant with Dell policies.

○ Yellow: Warning as components are not compliant with any optional policy.

○ Red: Error as components are not compliant with Dell Hardware Symmetry Policy.

● Firmware update: Firmware compliance status of cluster.

○ Green: Compliant as cluster nodes have same version of BIOS, firmware, and driver as of imported catalog.

○ Yellow: Warning as one or more cluster nodes is not compliant with any optional policy.

○ Red: Non-compliant as BIOS, firmware, and drivers need updates.

Security:

● Secure core status:

○ Green: Enabled as all BIOS features are enabled on all nodes.

○ Yellow: Partially enabled as one or more nodes do not have all BIOS features enabled.

○ Red: Disabled as no node has all BIOS features enabled.

● Infrastructure lock status:

○ Green: Enabled as all nodes of the cluster are locked.

○ Yellow: Partially enabled as one or more nodes of the cluster are unlocked.

○ Grey: Unknown as unable to connect the cluster nodes.

Resources:

● Current storage details such as used storage and available storage.

● Current CPU core details such as current cores and available cores.

Quick Task:

Provides direct links to perform different actions using OpenManage Integration extension. Some actions may be disabled if it has dependency on another action.

Prerequisite checks

Use stand-alone prerequisite checks to verify if the connection meets the requirements for monitoring and management operations.

About this task

When you perform any operation using OpenManage Integration extension, the extension automatically runs the prerequisite checks to verify the hardware and software requirements. If any prerequisite checks fail, the View Details button appears on the respective pages. Click View Details to see how to resolve the issues before performing the operation.

In addition, OpenManage Integration extension also provides an option for stand-alone prerequisite checks. Before you run any operation using the extension, use the stand-alone Prerequisite Checks to verify the server or cluster readiness. Use this check to identify and fix any problems for all operations together for seamless monitoring and management experience.

If any of the prerequisite checks fail, it can be resolved manually or automatically based on the nature of the check. For Auto

Fix, the extension allows you to resolve the issues by clicking Resolve . For Manual fix, the extension displays recommendations on how to resolve issues manually. Once prerequisite checks are all compliant, you can use the extension to seamlessly perform server or cluster management operations.

Steps

1. Click View > Prerequisite Checks .

2. Click Select Operation to choose the operation you want to run the prerequisite checks. By default, all the operations are selected.

3. Click Check to run the prerequisite checks for the selected operations.

4. After the prerequisite checks are done, View Details appears next to the Select Operation button. Click View Details to see the prerequisite check report.

Dell OpenManage Integration with Windows Admin Center 43

5. To fix the prerequisite issues with Auto Fix category, ensure that the prerequisites are selected, and then click Resolve .

6. To fix the prerequisite issues with Manual Fix category, see the recommendations on how to resolve the issue.

7. After the issues are resolved, click Rerun to see the updated prerequisite checks status report. For more information, see list of prerequisite checks in the OMIWAC user guide .

Health status

1. To view the overall cluster health status, health status of each node and its components, click View : Health . The Health page appears and checks the inventory prerequisites. If all the prerequisites are compliant, the page displays the health status of all hardware components. If one or more checks are not compliant, an error message appears.

Figure 21. Health page

2. Click View Details to see the prerequisites that have failed.

3. Click Resolve to fix the issues that belong to the Auto Fix category. For issues that belong to Manual Fix , see the recommendations on how to resolve them. See Prerequisites check details for more information about inventory prerequisites checks.

4. To view the latest health information, click Refresh .

Overall Health Status

A doughnut chart displays the overall health status of a cluster using a color code. Green color indicates the cluster is healthy.

Yellow indicates the cluster is not in recommended state. Red indicates the cluster is in critical state. And unknown indicates that the cluster nodes are not reachable. The number of nodes in the doughnut chart indicates the number of cluster nodes that have the same health status as in the cluster. In addition, different color codes show the respective health status of number of nodes. To filter the health status of nodes and its components, click the respective color code in the doughnut chart. For example, click red color to see the nodes and its components that are in critical health status.

Inventory

The Inventory tab lists the servers that are part of the cluster.

1. Select View > Inventory . The Inventory page appears and checks the inventory prerequisites. If all the prerequisites are compliant, the page displays the hardware and firmware information for cluster nodes. If one or more checks are not compliant, an error message appears.

2. Click View Details to see the prerequisites that have failed.

3. Click Resolve to fix the issues that belong to the Auto Fix category.

4. For issues that belong to Manual Fix, see the recommendations on how to resolve them. See

Prerequisites check

for details about inventory prerequisites checks.

44 Dell OpenManage Integration with Windows Admin Center

5. To view the latest hardware or firmware information, click Refresh .

Clicking a server name on the inventory list provides details about the following components:

● System

● Firmware

● CPUs

● Memory

● Storage controllers

● Storage enclosures

● Network devices

● Physical disks

● Power supplies

● Fans

Locating physical disks and viewing their status

The Blink and Unblink feature of Windows Admin Center enables you to locate physical disks or view disk status.

Steps

1. Under the Inventory tab, from the Components list, select Physical Disks .

2. For each physical disk, select Blink or Unblink to control the disk's LED.

Viewing update compliance and updating the cluster

About this task

To view and update the hardware compliance, select View > Compliance . Another menu button with drop-down appears next to the Node button. Select Hardware updates to go to the compliance page.

Use the Update tab of the OpenManage Integration with Microsoft Windows Admin Center UI to view update compliance and update the cluster.

To view the latest update compliance report and update the cluster using an offline catalog, OpenManage Integration with

Windows Admin Center requires that you configure the update compliance tools.

Steps

1. At Check Compliance , select the online catalog or offline catalog to configure update tools.

If you select the online catalog, OpenManage Integration downloads the Azure Stack HCI catalog, system tools, and the required Dell Update Packages from the Internet.

To use an offline catalog, configure the update tools under Hardware updates and select the Configure DSU and IC option. The catalog file must be exported using the Dell Repository Manager and placed in a shared folder. See

Obtaining the firmware catalog for AX nodes or Ready Nodes using Dell EMC Repository Manager

.

2. Click Check Compliance to generate the update compliance report.

By default, all the upgrades are selected, but you can make alternate selections as needed.

Dell OpenManage Integration with Windows Admin Center 45

Figure 22. Compliance details

3. Click Fix compliance to view the selected component details.

NOTE: Cluster Aware Update is a license feature. Ensure that the Azure Stack HCI license is installed before proceeding.

4. To schedule the update for a later time, click Schedule later , select Date/time and click Next cluster aware update to download the required updates.

To use the schedule later feature, download the required downloads and keep them ready to update at the specified time.

5. Click Update to begin the update process and click Yes at the prompt to enable Credential Security Service Provider

(CredSSP) to update the selected components.

When the update job is completed, the compliance job is triggered automatically.

Full Stack Cluster-Aware Offline Updating

About this task

If an Internet connection is not available, run Full Stack Cluster-Aware Updating (CAU) in offline mode as follows:

Steps

1. Download the asHCISolutionSupportMatrix.json

and asHCISolutionSupportMatrix.json.sign

files from http://downloads.dell.com/omimswac/supportmatrix/

2. Place these files in the C:\Users\Dell\SymmetryCheck folder in the gateway system where Windows Admin Center is installed.

3. Run Full Stack CAU.

Results

For more information about CAU, see the Cluster-Aware Updating Overview .

46 Dell OpenManage Integration with Windows Admin Center

Full Stack Cluster-Aware Updating for Azure Stack

HCI clusters using the OpenManage Integration snapin

About this task

Windows Admin Center with the Dell extension makes it easy to update an Azure Stack HCI cluster using the cluster aware update feature. The feature updates the operating system and Dell-qualified firmware and drivers. When an update is selected, all the updates are installed on the cluster nodes. A single reboot is required to install operating system, firmware, and driver updates per server.

NOTE: Full Stack Cluster-Aware Updating (CAU) is only available on Azure Stack HCI clusters built using the Azure Stack

HCI operating system. For more information about CAU, see Cluster-Aware Updating Overview .

NOTE: Full Stack CAU is a licensed feature. Ensure that the OMIMSWAC license is installed before proceeding.

To perform both operating system updates and hardware upgrades on Azure Stack HCI cluster nodes, carry out the following steps:

Steps

1. In Windows Admin Center, select Updates from the Operations menu.

You must enable CredSSP and provide explicit credentials. When asked if CredSSP should be enabled, click Yes .

The Updates page is displayed.

2. For an operating system update, see Microsoft's Azure Stack HCI documentation .

3. On the Install updates page, review the operating system updates and select Next: Hardware updates .

4. If the Dell OpenManage Integration extension is not installed, click Install to accept the license terms and install the extension. If you have already installed the OpenManage Integration extension, click Get updates to move to the Hardware updates page.

5. On the Hardware updates page, review the prerequisites listed to ensure that all nodes are ready for hardware updates and then click Next: Update Source . Click Re-Run to run the prerequisites again.

You must meet all the prerequisites listed on the Prerequisites tab, otherwise you cannot proceed to the next step.

6. To generate a compliance report against the validated Azure Stack HCI catalog, follow these steps on the Update source page:

● Select one of these methods to download catalog files:

○ Select Online (HTTPs) - Update Catalog for Microsoft HCI Solutions to download the catalog automatically from dell.com. The online catalog option is selected by default. Online catalog support requires direct Internet connectivity from the Windows Admin Center gateway. The overall download time of a catalog depends on the network bandwidth and the number of components being updated.

NOTE: Accessing the Internet using proxy settings is not supported.

○ Select Offline - Dell Repository Manager Catalog to use the DRM catalog configured in a CIFS location.

OMIMSWAC with or without Internet access allows you to select Offline - Dell Repository Manager Catalog to generate a compliance report. You can use this option when the Internet is not available. For more information, see

Obtaining the firmware catalog for AX nodes or Ready Nodes using Dell Repository Manager

.

○ To use the offline catalog, select DRM Settings to ensure that the CIFS share path is configured with the DRM catalog.

● To use the Dell System Update (DSU) and Inventory Collector (IC) tools, select Advance setting and then do the following:

○ Select Manually configure DSU and IC and then select Settings to manually download and configure DSU and IC tools in a shared location. Dell Technologies recommends using this option when OMIMSWAC is not connected to the Internet. DSU and IC settings that are configured using Update Tool settings in the OpenManage Integration extension are also available under Advanced settings in the OpenManage Integration snap-in.

OMIMSWAC downloads the catalog, collects the DSU and IC tools that are configured in the Settings tab, and generates a compliance report. If DSU and IC tools are not configured in the Settings tab, then OMIMSWAC downloads them from https://downloads.dell.com

to generate the compliance report.

7. On the Compliance report tab, view the compliance report. When finished, click Next: Summary .

Dell OpenManage Integration with Windows Admin Center 47

The 'upgradable' components that are 'non-compliant' are selected by default for updating. You can clear the check box for the selected components or select the 'non-compliant,' 'downgradable' components. However, if you want to change any of the default selections, ensure that the dependencies between the corresponding component firmware and drivers are met.

8. On the Summary tab, review the components to be updated and then click Next: Download updates . The download task continues in the background whether the UI session is live or not. If the UI session is live, the node level progress status is displayed. OMIMSWAC creates a notification when the download task is finished.

NOTE: While the download is in progress, it is recommended that you do not exit or close the browser. If you do, the download update operation may fail.

9. If the download operation fails, check the log files stored at the following paths for troubleshooting purposes:

● Gateway system— <Windows

Directory>\ServiceProfiles\NetworkService\AppData\Local\Temp\generated\logs

● Windows 10 gateway system— <Windows installed drive>\Users\<user_name>\AppData\Local\Temp\generated\logs

● After the cluster update is over, DSU logs for individual nodes can be found in the <Windows

Directory>\Temp\OMIMSWAC folder on the respective nodes.

To run the compliance report again, click Re-run Compliance and repeat steps 4 to 7.

10. After the updates are downloaded, follow the instructions in the Windows Admin Center to install both operating system and hardware updates. If the UI session is live, the node level progress status is displayed. Windows Admin Center creates a notification once the update is completed.

Updating a standalone node before adding it to the cluster

Before creating a cluster, ensure that each node is updated with the latest versions of firmware and drivers.

Steps

1. In Windows Admin Center, in the left pane, click Add .

2. In the Windows Server tile, click Add .

3. Enter the node name and click Add .

4. Under All connections , select the server and click Manage as .

5. Select use another account for this connection , and then provide the credentials in the domain\username or hostname\username format.

6. Click Confirm .

7. In the Connections window, click the server name.

8. In the left pane of Windows Admin Center, under EXTENSIONS , click OpenManage Integration .

9. Review the Dell Software License Agreement and Customer Notice and select the check box to accept the terms of the license agreement.

10. Click View > Compliance . Another menu appears, select Hardware Updates.

11. Click Check compliance and select either the online catalog or offline catalog .

12. Click Fix Compliance and select update to update the node.

Secure cluster with Secured-core

A malicious hacker who has physical access to a system can tamper with the BIOS. A tampered BIOS code poses a high security threat and makes the system vulnerable to further attacks. With the Secured-core feature, OMIMSWAC ensures that your cluster boots only using the software that is trusted by Dell.

Prerequisites

Secured-core feature is supported on the following configurations:

● AMD processor types:

48 Dell OpenManage Integration with Windows Admin Center

○ AMD Milan with cluster nodes BIOS version must be 2.3.6 or above.

● Intel processor types:

○ Cluster nodes BIOS version must be 1.3.8 or above.

NOTE: The following Intel processor types are not supported for Secured-core feature:

○ E-23 series and Pentium SKUs such as G6605, G6505, G6505T, G6405, and G6405T.

● OS versions:

○ Windows Server 2022 and Azure Stack HCI OS 21H2 or 22H2.

● TPM V2.0 module must be installed with firmware 7.2.2.0 or above.

● OMIWAC Premium License must be installed on each cluster node.

NOTE: To ensure proper functioning of the System Guard OS feature, ensure that the TPM Hierarchy under System

Security section is enabled in the BIOS settings.

About this task

Secured-core feature includes enabling BIOS and OS security features. Both Dell Technologies and Microsoft recommend enabling BIOS security features and OS security features respectively to protect infrastructure from external threats. In

Windows Admin Center, use Dell OpenManage Integration with Microsoft Windows Admin Center extension to enable BIOS security features and use Security extension to enable OS security features. For more information about OS security features, see the Microsoft guidelines.

Enable BIOS security features as follows:

Steps

1. Log in to Windows Admin Center and launch Dell OpenManage Integration with Microsoft Windows Admin Center extension.

2. Select View > Security .

3. From the drop-down menu, select Secured Core . Alternatively, go to the Action menu, under Security and select Secured

Core .

4. Specify Manage as credentials if prompted.

The Dell OpenManage Integration with Microsoft Windows Admin Center validates if the following prerequisites are fulfilled on the target or cluster nodes:

● The supported platform and processor types

● The supported BIOS version

● The supported OS version

● The OMIWAC Premium License is installed

For more information, see

Prerequisites

.

5. If one or more prerequisites are not fulfilled, Dell OpenManage Integration with Microsoft Windows Admin Center displays the list of prerequisites and its overall status and recommendation. Review the recommendations with the status showing or and resolve the prerequisites. To see the prerequisites to be fulfilled for each cluster node, switch Show Node Level Details.

After resolving the perquisites, go to Security > Secured-core again to display the overall status. If all the perquisites are met, OMIMSWAC displays the overall secured-core status for both BIOS and OS. The overall BIOS/OS status is the summary of all BIOS/OS feature configuration statuses for the entire cluster.

6. If infrastructure lock is enabled, click Disable . You must disable the infrastructure lock before enabling the BIOS configurations.

7. Review all the BIOS feature status and the corresponding OS feature status. A consolidated view of all BIOS/OS feature configuration status that is displayed in the Cluster level BIOS Features and Status and Cluster level OS Features and

Status' sections.

The following table lists are of the BIOS and corresponding OS features with security functionalities:

Table 8. BIOS, OS feature, and security functionality

BIOS Feature Security Function Corresponding OS

Features

Virtualization Technology Helps BIOS to enable processor virtualization features (such as protecting against exploits in user-mode

● Hypervisor-Protected

Code Integrity (HVCI)

● Virtualization-Based

Security (VBS)

Other Information n/a

Dell OpenManage Integration with Windows Admin Center 49

Table 8. BIOS, OS feature, and security functionality (continued)

BIOS Feature Security Function Corresponding OS

Features drivers and applications) and provide virtualization support to the Operating System

(OS) through the DMAR table.

Kernel DMA Protection

Secure Boot

Trusted Platform Module

(TPM) 2.0

TPM PPI Bypass Provision

TPM PPI Bypass Clear

TPM2 Algorithm Selection

When enabled, both BIOS and OS protects devices from Direct Memory Access attacks in early boot by leveraging the Input/Output

Memory Management Unit

(IOMMU).

Boot DMA Protection

Secure Boot ensures that the device boots with trusted,

Dell signed software.

Secure Boot

Trusted Platform Module

(TPM) is a dedicated microprocessor that is designed to secure hardware by integrating cryptographic keys into devices. Software can use a TPM to authenticate hardware devices.

● Trusted Platform Module

(TPM) 2.0

● System Guard

NOTE: To ensure proper functioning of the System Guard OS feature, ensure that the

TPM Hierarchy under the

System Security section is enabled in the BIOS settings.

Other Information n/a n/a

NOTE: If TPM firmware version is less than the 7.2.2.0, Enable BIOS

Configuration button is disabled. You must replace with a hardware that has TPM firmware version 7.2.2.0 or above.

NOTE: TPM2 Algorithm

Selection is set to

SHA256.

(AMD) Dynamic Root of

Trust Measurement

This feature is available for

AMD based processors.

[Intel] Trusted Execution

Technology

Enables AMD Dynamic

Root of Trust Measurement

(DRTM). Also enables AMD secure encryption features such as Secure Memory

Encryption (SME) and

Transparent Secure Memory

Encryption (SME).

Enhances platform security by using Virtualization

Technology, TPM Security, and TPM2 Algorithm (must be SHA256). Intel TXT provides security against hypervisor, BIOS, firmware, and other pre-launch software-based attacks by establishing a root of trust' during the boot process.

n/a n/a This feature is available for

Intel-based processors.

8. To configure secured core for all BIOS attributes, click Enable BIOS Configuration .

9. To apply the BIOS configuration, perform one of the following actions:

● Apply and Reboot Now : Applies the BIOS configuration changes in all cluster nodes and reboot the cluster using Cluster

Aware Updating method (without impacting the workload).

● Apply at Next Reboot : Saves the changes and applies the BIOS configuration in all cluster nodes at the next reboot.

If you choose this option, ensure to exit the Dell OpenManage Integration with Microsoft Windows Admin Center extension and restart the cluster using the Windows Admin Center before performing any cluster management operations.

50 Dell OpenManage Integration with Windows Admin Center

10. When finished, click Apply .

The operation enables CredSSP. To improve the security, disable CredSSP after the operation is complete.

11. Click View Details to see the status of the BIOS configuration changes at node level.

Enabling operating system features

Prerequisites

1. Intel chipset driver version 10.1.18793.8276 and above should be installed for AX-650 and AX-750xd.

2. AMD chipset driver version 2.18.30.202 and above should be installed for AX-7525.

Steps

1. OS Settings can be enabled by manually modifying the registry settings or by leveraging the WAC Security Extension.

● Using registry settings a. Run the following commands on each server in a cluster: reg add

"HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard\Scenarios\HypervisorEnforcedCodeI ntegrity" /v "Enabled" /t REG_DWORD /d 1 /f reg add “HKLM\System\CurrentControlSet\Control\DeviceGuard\Scenarios\SystemGuard” / v “Enabled” /t REG_DWORD /d 1 /f

NOTE: After you run these commands, restart the servers one at a time, see

Restarting a cluster node or taking a cluster node offline

.

● Using Windows Admin Center Security extension: a. Log in to Windows Admin Center and connect to the cluster.

b. In the Extensions, click Security .

c. In the Security page, click Secured-Core .

d. Select Hyper Hypervisor Enforced Code Integrity (HVCI) and System Guard , then click Enable .

NOTE: After you run these commands, restart the servers one at a time, see

Restarting a cluster node or taking a cluster node offline

.

Dell OpenManage Integration with Windows Admin Center 51

e. After the settings are enabled and each server is restarted, the Secured-Core section on each server must show all features with the On status.

2. The BIOS and OS settings can be verified using OMIMSWAC.

3. Select View > Security . From the drop-down menu, select Secured Core .

Results

Dell Technologies and Microsoft recommends enabling Secured Core for Azure Stack HCI 21H2, 22H2, and Windows Server

2022 that includes both the Dell Infrastructure and Microsoft Operating System features to protect the infrastructure from external threats.

Protect your infrastructure with infrastructure lock

Infrastructure lock (also known as iDRAC lockdown mode or system lockdown mode) helps to prevent unintended changes after a system is provisioned. Infrastructure lock is applicable to both hardware configuration and firmware updates. When the infrastructure is locked, any attempt to change the system configuration is blocked. If any attempts are made to change the critical system settings, an error message is displayed. Enabling infrastructure lock also blocks server or cluster firmware updates using the OpenManage Integration extension tool.

The following table lists the functional and nonfunctional features that are affected when the infrastructure lock is enabled:

Table 9. Functional and Nonfunctional Features Affected by Infrastructure Lock

Disabled

● Full stack update on clusters

● Individual server update and Cluster-Aware Updating on clusters

● Managing CPU cores on servers and clusters

● Integrated deploy and update clusters

● Preparing nodes for cluster expansion

● Secured core

Remains functional

● Retrieving health, inventory, and iDRAC details

● Blinking and unblinking server LEDs

52 Dell OpenManage Integration with Windows Admin Center

Enable or disable infrastructure lock

To enable or disable the Infrastructure lock in Dell OpenManage Integration extension for better security, select View >

Security . From the drop-down menu, select Infrastructure Lock , and then click Enable or Disable .

The infrastructure lock can also be disabled during server or cluster management operations such as server update, CAU, full stack update, manage CPU cores, and so on. If the lock is already enabled, OMIMSWAC displays an error during the management operation and gives an option in the same window to disable the lock and proceed. To allow OMIMSWAC automatically lock the infrastructure after the operation is complete, select Allow OMIMSWAC to re-enable the infrastructure lock after the operation is complete.

NOTE: OMIMSWAC does not allow enabling or disabling the infrastructure lock for individual servers that are part of a cluster to maintain cluster homogeneity.

Manage CPU cores in Azure Stack HCI clusters

Prerequisites

● Ensure that the cluster contains homogenous nodes. For example, the nodes must have the CPUs either from Intel or AMD and from the same processor family. Having nodes that include CPUs from both Intel and AMD or from different processor family is not supported.

● OMIMSWAC Premium License for Microsoft HCI Solutions must be installed on each cluster node.

About this task

To manage workload demands, power consumption and licensing cost, you can change the amount of CPU cores allocated to a cluster by using the "Update CPU core" feature. This feature also helps you to optimize CPU cores in clusters to keep the Total

Cost of Ownership (TCO) at an optimal level.

Steps

1. In Windows Admin Center, connect to a cluster.

2. In Windows Admin Center, under Extensions , click Dell OpenManage Integration .

3. In Dell OpenManage Integration , select View > Configure . Another menu with drop-down appears. Select CPU Core .

4. To manage CPU cores, click Update CPU Core .

The Update CPU Core wizard is displayed on the right.

5. In the Update CPU Core wizard, select the number of cores to be used based on workloads.

Based on the CPU core manufacturer (Intel or AMD), you can configure the cores in the following table. To maintain cluster homogeneity, OMIMSWAC applies the same configuration across all nodes in the cluster.

NOTE: Changing the number of cores impacts the overall core count of the cluster. Ensure that you are using the right number of cores to maintain the balance between power and performance.

Dell OpenManage Integration with Windows Admin Center 53

Table 10. CPU

CPU type

Intel CPU

AMD CPU

Instructions

● Select the number of cores you want to enable per CPU.

● Minimum number of cores that can be enabled is four.

You can enable all the cores that you want to manage workloads.

N/A

6. Select one of the following options to apply the changes and reboot nodes:

● Apply and Reboot Now : Select this option if you want to apply the changes and reboot the cluster nodes now. Dell

Technologies recommends using this option as the nodes are rebooted automatically by taking care of the workloads using cluster aware feature.

● Apply at Next Reboot : Select this option if you want to apply the changes now and reboot the cluster nodes later.

Ensure that you reboot the cluster nodes later to successfully apply the CPU core changes. Also, ensure to take care of the workload before rebooting the nodes.

NOTE: The Apply and Reboot now process requires the CredSSP to be enabled. To improve security, disable CredSSP after the CPU configuration changes are complete.

7. To apply the changes, click Confirm .

Cluster expansion

Prerequisites

● The new node must be installed with an operating system as in the current cluster.

● Host network configuration must be configured identical to the existing cluster nodes.

● New node must be added in the Windows Admin Center.

● OMIMSWAC Premium License for Microsoft HCI Solutions must be installed on each cluster node.

By using OMIMSWAC, you can prepare nodes to add to your existing Azure Stack HCI cluster to improve capacity. It is always important for administrators to keep the cluster symmetrical and adhere to Dell recommendations. During cluster expansion, to automate the process and help customers comply with Dell recommendations, OMIMSWAC has introduced a feature that is called Expand Cluster. With the use of Expand Cluster feature, administrators can prepare nodes ensuring the node is compatible and follow Dell recommendations, which can and then be added to the existing cluster.

The cluster expansion process involves the following:

● High-Level Compatibility Check: Helps to identify compatible nodes to add to the cluster.

● License Availability Check: Checks for OMIMSWAC premium licenses available on new nodes and cluster nodes.

● HCI Configuration Profile Check: Helps you to validate new node and cluster nodes HCI configurations based on Dell

Technologies recommendations.

● Update Compliance: Helps you to generate compliance report for both new nodes and cluster nodes and then fix the compliance only for new nodes.

About this task

To prepare nodes for cluster expansion, perform the following steps:

Steps

1. Connect to the cluster using Windows Admin Center and launch OpenManage Integration extension.

2. Select View > Expand Cluster . Another menu with drop-down appears. Select ADD nodes .

3. In the Expand Cluster window, click Select Nodes .

4. In the Cluster Expansion window, under Select compatible nodes , a list of nodes is displayed. The list fetches all nodes available on the Server Manager page in the Windows Admin Center.

a. Select any nodes that you want to add to the cluster. You can also search any node using the search box or click the select all check box to select all nodes. Ensure that new nodes are not part of the cluster.

NOTE: Total number of nodes that are supported in a cluster is 16. For example, for a cluster with existing 4 nodes, you can select up to 12 nodes for cluster expansion.

54 Dell OpenManage Integration with Windows Admin Center

b. After nodes are selected, click Check for High Level Compatibility to validate the new nodes and cluster nodes as per

Dell recommendations. The validation happens on a high level as follows:

● Both new nodes and cluster nodes must be from Dell Technologies.

NOTE: Only AX nodes from Dell Technologies are supported for HCI cluster expansion. Storage Space Direct

Ready Nodes are not supported for HCI cluster expansion.

● New nodes and cluster nodes must be of the same model for symmetrical cluster.

● Operating system that is installed on new nodes must be supported and same as cluster nodes. If the high-level compatibility shows:

○ Non-compliant: None of the selected nodes are compliant per Dell recommendations.

○ Partially Compliant: Few of the selected nodes are compliant as per Dell recommendations and you can proceed for License Availability check only for the compliant nodes.

○ Compliant: All the selected nodes are compliant as per Dell recommendations and you can proceed for

License Availability check for all the compliant nodes. If the high-level compatibility shows Noncompliant or

Partially Compliant, click View Details to learn more about the nodes and type of noncompliance.

c. Click Check for License Availability to verify whether new nodes and cluster nodes have 'OMIMSWAC Premium

License for MSFT HCI Solutions' installed. Before moving for HCI Configuration Profile check, ensure that new nodes and cluster nodes have OMIMSWAC premium license installed.

d. Click Check for HCI Configuration Profile to validate new nodes and cluster nodes against symmetrical recommendations from Dell. If an Internet connection is not available, run the HCI configuration profile check in offline mode.

If any of the nodes are not compatible, click View Details to see more information about the nodes, the reason for noncompliance, and recommendations.

NOTE: HCI configuration profile fails if any of the required configurations fail with a Critical error. Review the recommendations and details to resolve any issues to achieve HCI configuration profile and go to the next step.

If the configuration fails with a Warning, this means that the configuration can be supported for cluster deployment, but could result in suboptimal cluster performance. It should be reviewed. Before you go to the next step, ensure HCI configurations of all nodes are compliant as per Dell recommendations.

5. After you successfully complete the high-level compatibility check, license check, and HCI configuration profile check, click

Next: Update compliance to check for firmware, BIOS, and drivers compliance for new nodes and cluster nodes. Using

Expand Cluster flow, you can update firmware, BIOS, and drivers for new nodes only. To generate compliance report for both new nodes and cluster nodes: a. Select one of the methods to download catalog files.

● Online catalog to download the catalog automatically from dell.com for PowerEdge servers. Online catalog is selected by default.

● Offline catalog to use the DRM catalog configured in a CIFS location. OMIMSWAC with or without Internet access allows you to select the Offline - Dell EMC Repository Manager Catalog to generate compliance report. You may use this option when the Internet is not available or to use a customized DRM catalog. When Internet is not available, before using offline catalog, ensure that the DSU and IC settings are configured on the Settings page.

● When finished, click Check Compliance .

6. The Compliance Results section shows compliance reports of cluster nodes and new nodes. Click View Details to see the compliance report or Export to export the report in CSV format.

● If cluster nodes are noncompliant, ensure that the cluster nodes are compliant before adding new nodes in the cluster. To update cluster nodes, exit the wizard and go to the Update tab for cluster update using cluster-aware updating method.

● If new nodes are noncompliant, click View Details in the Cluster Expansion Summary to verify the noncompliant components and then click Finish to update the new nodes and keep them ready for cluster expansion. Click Updating-

View Details to see the update status.

● If new nodes are compliant, click View Details in the Cluster Expansion Summary to see the list of nodes that is prepared for cluster expansion. Then click Exit .

Results

After both new nodes and cluster nodes are updated, go to the Windows Admin Center workflow to add new nodes to the existing cluster.

Dell OpenManage Integration with Windows Admin Center 55

Validate and remediate Azure Stack HCI clusters

By using the HCP Compliance feature, you can see the cluster compliance using HCI Configuration Profile (HCP) checks, see the recommendations, and remediate non-compliant components.

Prerequisites

● Cluster must be Dell Integrated System for Microsoft Azure Stack HCI, versions 21H2 or 22H2.

● Cluster nodes must have a valid OMIWAC premium license installed.

● Cluster node models are listed as supported in the support matrix.

Steps

To run HCP Compliance check in OpenManage Integration extension, perform the following steps::

1. Select View > Compliance . Another menu with drop-down appears. Select HCP .

The Compliance Summary page appears. If the Check Compliance button is enabled, go to

step 3

.

2. If the Check Compliance button is disabled, ensure:

Requirements mentioned in the Prerequisites are met.

● Required inputs are provided in the Configure Cluster Settings tab. If not, choose the network topology and the deployment model, and click Save to save the settings for future use. These settings are required to generate compliance for Dell OS Configuration policies.

○ Network topology

■ Fully-converged : All storage ports from the server are connected to the same network fabric. Within the host operating system, the NIC ports are used for both storage and management/VM traffic.

■ Non-converged-Physical : Storage traffic is on the physical storage network adapter ports and management/VM traffic through a SET created using network ports of the server rNDC.

■ Non-Converged-Set : Storage traffic uses virtual adapters in the host operating system connected to a SET.

○ Deployment model

■ Scalable model supports from 2 to 16 nodes in a cluster and uses top-of-rack switches for management and storage traffic networking.

■ Switchless model uses full mesh connections between the cluster nodes for storage traffic and supports from 2 to

4 nodes in a cluster.

■ Stretch cluster with Azure Stack HCI consists of servers residing at two different locations or sites, with each site having two or more servers, replicating volumes either in synchronous or asynchronous mode.

3. Click Check Compliance to

View HCP Compliance Summary

.

● Optional: If Internet connection is not available, perform the following steps to run HCP check in offline mode: a. Download the HCPMetaData.json

and HCPMetaData.json.sign

files from https://downloads.dell.com/ omimswac/ase/ .

b. Place these files in C:\Users\Dell\HCPMetaData folder in the gateway system where Windows Admin Center is installed.

c. Run the HCP compliance check.

OpenManage Integration extension fetches the applicable policies, validates the cluster node(s) attributes with the policy attributes, and displays the compliance summary.

4. Click Fix Compliance to fix non-compliant policies.

NOTE: Fix Compliance button is disabled if Dell Hardware Symmetry Policy is non-compliant with a Critical error.

In this case, you will not be allowed to remediate other policies. Review the recommendations and show details, and contact Dell.com/support to resolve the issue before proceeding to the next step. The critical error states that this aspect of nodes configuration is not supported. You must correct the issue before you can deploy a symmetric HCI cluster.

● Manual Fixes include policies that requires physical interventions and recommendations that can be remediated manually.

NOTE: Ensure your cluster is compliant with Dell Hardware Symmetry Policy and Dell OS Configuration Policies otherwise the cluster performance is not guaranteed.

● Automatic Fixes include policies that can be remediated using OpenManage Integration extension.

5. Choose the reboot option to apply the configurations by restarting your cluster in cluster-aware updating manner if any of the policy fixing requires cluster reboot:

56 Dell OpenManage Integration with Windows Admin Center

● Apply and Reboot Now : Applies BIOS, NIC, and iDRAC attributes concurrently to all the cluster nodes now and restart the cluster in cluster-aware update manner.

● Apply at Next Reboot : Applies BIOS, NIC, and iDRAC attributes concurrently to all the cluster nodes when cluster nodes are restarted at the next maintenance window.

a. Click the Schedule Reboot checkbox to schedule the date and time when BIOS, NIC, and iDRAC attributes will be applied concurrently to all the cluster nodes and restart the cluster in cluster-aware update manner.

NOTE: If you select to apply the changes at the next reboot, ensure to disable the infrastructure lock before the next reboot starts to successfully apply the changes.

6. If the infrastructure lock is enabled, a window appears to disable the lock and proceed next. To allow OpenManage

Integration extension automatically locks the infrastructure after the operation is complete, select Allow OMIMSWAC to re-enable the infrastructure lock after the operation is complete.

7. When finished, click Apply to trigger the update.

● If Kernel Soft Reboot is enabled for the cluster, the OpenManage Integration extension ignores this setting and performs a full reboot to apply all the BIOS related settings. See When to use Kernel Soft Reboot .

● The operation enables the CredSSP. To improve the security, disable the CredSSP after the operation is complete.

If any error occurs while applying the updates or update job fails, see the Troubleshooting section to troubleshoot issues.

8. Click View Details to see the update status and progress at node level.

After the status is succeeded, OpenManage Integration extension rechecks the compliance and displays the updated compliance summary.

9. Click Export compliance summary to export the HCP compliance report in an Excel file which can be useful while contacting the Dell Technologies support team.

View HCP compliance summary

OpenManage Integration computes the compliance using different types of Dell HCI Configuration Profile policies that are mentioned and displays the report.

● Dell Infrastructure LockDown policy : Checks whether the HCI cluster infrastructure is locked.

● Dell Hardware Symmetry policy : Checks whether cluster nodes have validated and supported hardware components and have symmetrical hardware configurations. For more information about checks, see OpenManage Integration with Microsoft

Windows Admin Center .

● Dell HCI Hardware Configuration policy : Checks whether cluster nodes have Dell Technologies recommended BIOS, NIC, and iDRAC configurations.

● Dell OS Configuration policy : Checks whether cluster nodes have Dell Technologies and Microsoft recommended operating system configurations.

If the compliance summary is already generated, the last report is displayed with the timestamp. In this case, the Re-Check

Compliance button is enabled. You can click Re-Check Compliance to see the latest compliance report.

The compliance summary is divided into three sections:

● Overall Compliance shows the overall compliance of the cluster such as compliance percentage, number of policies compliant, and total number of policies from the displayed policy types.

● Overall Compliance State shows overall compliance of the cluster using a doughnut chart and color codes. The color codes indicate different compliance types. You can select different color codes, if any, to filter the respective policy details and its cluster nodes. For example, when you select the red color code, policies that are not compliant and their respective cluster nodes are displayed. The color codes are explained below.

Compliant : Shows all cluster nodes are compliant with all Dell policies.

Errors : Shows if any cluster node is not compliant with Dell Hardware Symmetry policy.

Warnings : Shows if any cluster node is not compliant with any optional policies.

○ Unknown : Shows if any cluster node inventory could not be retrieved when the node is down or not reachable.

● Policy Summary shows compliance of each cluster node and its components for each policy types.

○ Policy Name: Name of Dell policies.

○ Description: Description of Dell policies.

○ Overall Status: Overall status of compliance of the cluster for each policy type.

○ Overall State: How many policies are compliant out of the total number of policies and its percentage.

Dell OpenManage Integration with Windows Admin Center 57

○ Errors: Number of policies that are non-compliant and must be fixed.

○ Warnings: Number of policies that are non-compliant and must be fixed.

○ Unknown: Number of unknown issues, or unable to run or fetch the data.

○ Details: Click Details to see more information about each node and its components and their compliance status.

Onboard Dell policies to Azure Arc from Windows

Admin Center to manage Azure Stack HCI clusters

This topic explains how to deploy Dell HCI Configuration Profile (HCP) policies in Azure Arc from Windows Admin Center to monitor the compliance of Azure Stack HCI clusters.

Prerequisites

● Cluster must be Dell Integrated System for Microsoft Azure Stack HCI, versions 21H2 or 22H2.

● OMIWAC premium license must be installed on each cluster node.

● The target Windows Servers that you want to manage must have Internet connectivity to access Azure.

● Deploying Azure policies requires that you have administrator rights on the target Windows machine or server to install and configure the agent. Also, you must be a member of the Gateway users role.

● Azure Stack HCI Cluster and WAC gateway must be registered into Azure and use the same account to sign in with resource group edit right. For more information about Azure registration, see Register Windows Admin Center with Azure .

● In all cluster nodes, ensure that all RDMA enabled network adapters are connected to the same NIC ports as recommended by Dell. For more information, see HCI Operations Guide: Managing and Monitoring the Solution Infrastructure Life Cycle in https://infohub.delltechnologies.com/t/guides-74/ .

About this task

Azure Arc is one of the primary management tools for managing resources at cloud and hybrid platforms. Dell Technologies recommends using Dell HCP policies in Azure Arc to maintain compliance with Dell Technologies recommended configurations throughout the life cycle of the Azure Stack HCI cluster/host.

By using OpenManage Integration in Microsoft Windows Admin Center, you can deploy Dell HCP policies to Azure Arc. And then use these policies in Azure Arc to monitor your cluster. In addition, if multiple clusters are onboarded under the same Azure subscription, you can use the same policy to manage and monitor multiple clusters in the Azure Arc.

Steps

Perform the following steps to deploy Dell HCP policies to Azure Arc.

1. Sign in to Windows Admin Center.

2. From the All connections page, select a cluster to connect to it.

3. From the left-hand pane, under Extensions , click Dell OpenManage Integration > Select View > Azure Integration .

4. Click Sign In . A sign-in window opens to let you sign into the Azure account. Specify your Azure subscription account details.

When finished, the button shows as Signed-in .

5. In the Onboarding Checklist section, OpenManage Integration extension verifies and displays the list of prerequisites and shows recommendations if prerequisites are not met.

● Click Show Details to see the list of prerequisites, status, and recommendations if prerequisites do not match. You cannot move to the next step until all the prerequisites match.

● Click Refresh Checklist to see the prerequisites with updated status.

6. In the Onboard Policies section:

● Click View Subcription Details to see the Azure account subscription details.

● To edit network topology and deployment model settings, click Edit Cluster Settings , specify the details, and click

Save to save the settings for future use.

● If policies are not onboarded to Azure Arc, click Onboard Policies to see the details of applicable Dell HCP policies for onboard and go to the next step.

● If policies are already onboarded to Azure Arc, click View Details to see the details about already onboarded policies. To export the onboarded policies in a CSV format, click Export Details .

58 Dell OpenManage Integration with Windows Admin Center

7.

Onboard Dell HCI Configuration Profile policies for Azure window appears on the right side showing the available policies for onboarding or updating. It also shows the version of the previously onboarded policy, if any. All the mandatory policies are selected by default and you cannot change it. You can clear the optional policies, if any. To onboard or update the onboarded policies, click Onboard .

8. After the policies are onboarded, next to the Onboard Policies button, the policy onboarding status appears. Click View

Details to see the Policy Onboarding Status page that displays the status of each policy that was onboarded to the Azure

Arc.

● To export the onboarded policies in a CSV format, click Export Details .

9. After the policies are successfully onboarded, to monitor your cluster using Dell HCP policies, log in to Azure portal (Azure

Arc) using the same subscription account. Click the Resource Groups to find the resources you want to manage (

Go to step

6

to find the resource group in the view subscription details).

Cluster node details are displayed.

10. To check the cluster compliance, click Policies under Settings .

The Policy page displays the cluster compliance and lists all the applicable policies including the default policies.

NOTE: Ignore the policy details of the Dell Exempted Policy as it only helps in tracking policy versions of onboarded Dell

HCP policies and it does not impact cluster performance or compliance.

11. In the Search box, enter Dell to find the policies that are onboarded from the OpenManage Integration extension. Select a policy to have a deeper look at the compliance for that particular policy. For more information about viewing compliance data in Azure portal, see Microsoft Azure Policy documentation .

View recommendations for storage expansion

This topic explains how to view recommendations, using OpenManage Integration extension, that enable you to prepare nodes for expanding Azure Stack HCI cluster storage.

Prerequisites

● Cluster must be Dell Integrated System for Microsoft Azure Stack HCI, versions 21H2 or 22H2.

● Cluster nodes must have valid OMIWAC premium license installed.

● Cluster node models are supported in the support matrix.

About this task

In an HCI cluster, expanding storage by adding drives on the available slots on the cluster nodes adds storage capacity to the cluster and improves storage performance. By using the Expand Storage feature, you can view recommendations based on HCI

Configuration Profile (HCP) policies that enable you to prepare nodes for expanding HCI clusters storage.

Steps

1. In Windows Admin Center, under Extensions , click Dell OpenManage Integration > Select View > Expand Cluster .

From the drop-down menu, select ADD Storage .

a. If the cluster meets the

Prerequisites

, OpenManage Integration extension validates the storage configuration of cluster nodes against HCP policies (Dell Hardware Symmetry Policy rules) and displays the cluster level status and drives details for each cluster node. See

View node level storage configuration details .

The Expand Storage button is enabled. Click Refresh Storage Inventory to see the latest storage configuration details of cluster nodes.

b. If the Expand Storage button is disabled:

● The cluster may not fulfill all the prerequisites. See the banner message and ensure that all the prerequisites

mentioned are met and then refresh the cluster inventory and start from step 1

.

● The cluster may be non-compliant if its storage configuration does not comply with all the critical Dell Hardware

Symmetry Policy rules. Make the cluster compliant first by going to the HCP Compliance tab and following the recommendations and then try again. To learn more about storage and disks specific rules, see OpenManage

Integration with Microsoft Windows Admin Center .

2. Click the Expand Storage if enabled. Storage Expansion Summary appears on the right.

3. In the Storage Expansion Summary wizard, view the summary about the cluster storage configuration:

● Cluster name

● Node counts

Dell OpenManage Integration with Windows Admin Center 59

● Available slot count where storage expansion is possible in that cluster. The available slots shows a warning if any of the cluster node has SAS/SATA drives. This is because the exact number of empty slots for drives with media types such as

HDD, SDD may not be derived.

NOTE: Available empty slots are calculated using the total available slots and total connected drives, and may not match the number of empty slots for the HDD SSD. Because these drives are embedded in the server chassis.

● Used storage

● Available storage

4. If the cluster is non-compliant with a warning, follow the recommendations to make the cluster storage symmetric. The

Recommendation section includes the following details:

● The Dell HCP policy rule that does not comply with your cluster.

● The drive details that need to be added or replaced to make your cluster compliant.

After the cluster is compliant (cluster storage is symmetric), go to step 1

and follow the instructions to see storage expansion recommendations.

5. If the cluster is compliant, choose one of the storage expansion options, specify the storage size, and then click Refresh

Recommendation to see the storage expanding recommendations.

● Increase overall capacity: Overall capacity indicates the sum of all the disk storage capacity in a cluster. Specify the storage size as available to increase the overall capacity of the cluster.

● Increase usable capacity: Usable capacity indicates the actual storage size that is used by the cluster after reserving some storage for fault tolerance (resiliency). Specify the overall usable capacity as available for the entire cluster.

OpenManage Integration displays overall capacity based on resiliency type. See the Microsoft document for more information about fault tolerance and storage efficiency on HCI clusters.

The Recommendation section includes the below details:

● STORAGE: Displays node name. Also, displays number of drives that are required with recommended media type, and recommended storage capacity of each drive.

● Supported models

NOTE: AGN MU in the model description indicates that the model is vendor agnostic and mixed use.

● Manufacturer

● Endurance

● Actual capacity

● Bus protocol

● Cache to capacity ratio (only available for hybrid nodes)

NOTE:

● Recommendations are derived based on possible empty slots present in the nodes. For non-uniform media types across the cluster, even after following the recommendation you may see some warnings.

● To order drives, Dell Technologies recommends to use the bus protocol, model, size mentioned in the storage column, and endurance information to identify compatible drives for storage expansion.

Next steps

Insert the recommended drives and then check the updated storage configuration by running HCP compliance. Contact the Dell support for any issues.

View node level storage configuration details

Cluster level status includes the types of cluster storage configuration (whether single media type or hybrid media type), total number of drives in all cluster nodes, and the compliant status below:

● : Cluster is compliant for storage expansion as it meets all the prerequisites. Expand Storage button is enabled.

● : Cluster is noncompliant as its storage configuration does not comply with the Dell Hardware Symmetry Policy rules. You can still continue to view recommendation for storage expansion. However, Dell Technologies recommends you to make the cluster compliant by following the recommendations before continuing for storage expansion.

● : Cluster is non-compliant as its storage configuration does not comply with some of the critical Dell Hardware

Symmetry Policy rules. The Expand Storage button is disabled and you cannot continue for storage expansion. Dell

60 Dell OpenManage Integration with Windows Admin Center

Technologies recommends you to make the cluster compliant first by following the recommendations and then continue for storage expansion.

Below drive details for each cluster node are also displayed.

● Node status:

○ Green: Compliant, when there is no critical error or warning.

○ Yellow: Non-compliant but not critical, there is some warning.

○ Red: Non-compliant (Critical), there is critical error.

● Serial number

● Slot number (if applicable)

● Media type

● Bus protocol

● Model: AGN MU in the model description, if present, indicates that the drive is vendor agnostic and of mixed use.

● Manufacturer

● Capacity

● Endurance

● Used for cache or capacity type

● Cache/capacity ratio (available only for hybrid media types)

Known issues

The following table lists known issues and workarounds related to OpenManage Integration with Microsoft Windows Admin

Center with Microsoft HCI Solutions from Dell Technologies clusters.

NOTE: For details about troubleshooting steps and known issues, see the Dell EMC OpenManage Integration Version 2.3

with Microsoft Windows Admin Center User’s Guide at https://www.dell.com/support/home/product-support/product/ openmanage-integration-microsoft-windows-admin-center/docs .

Table 11. Known issues

Issue Resolution or workaround

Running Test-Cluster fails with network communication errors.

With USB NIC enabled in iDRAC, if you run the Test-

Cluster command to verify the cluster creation readiness or cluster health, the validation report includes an error indicating that the IPv4 addresses assigned to the host operating system USB NIC cannot be used to communicate with the other cluster networks.

This error can be safely ignored. To avoid the error, temporarily disable the USB NIC (labeled as Ethernet, by default) before running the Test-Cluster command.

The USB NIC network appears as a partitioned cluster network.

When the USB NIC is enabled in iDRAC, cluster networks in the failover cluster manager show the networks associated with the USB NIC as partitioned. This issue occurs because the cluster communication is enabled by default on all network adapters, and USB NIC IPv4 addresses cannot be used to communicate externally, which, therefore, breaks cluster communication on those NICs.

Remove the USB NIC from any cluster communication by using the following script:

$rndisAdapter = Get-NetAdapter

-InterfaceDescription 'Remote NDIS

Compatible Device' -ErrorAction

SilentlyContinue

if ($rndisAdapter)

{

Write-Log -Message 'Remote NDIS found on the system. Cluster communication will be disabled on this adapter.'

# Get the network adapter and associated cluster network

$adapterId =

[Regex]::Matches($rndisAdapter.InstanceID,

Dell OpenManage Integration with Windows Admin Center 61

Table 11. Known issues (continued)

Issue Resolution or workaround

'(?<={)(.*?)(?=})').Value

$usbNICInterface = (Get-

ClusterNetworkInterface).Where({$_.adapter

Id -eq $adapterId})

$usbNICClusterNetwork =

$usbNICInterface.Network

# Disable Cluster communication on the identified cluster network

(Get-ClusterNetwork -Name

$usbNICClusterNetwork.ToString()).Role = 0

}

While triggering full stack updates, the Tests Summary page might appear.

As a workaround, verify whether the pre-update or postupdate scripts are part of the cluster role. If they are present, remove the scripts from the cluster node by running the following command in PowerShell:

Set-CauClusterRole -PreUpdateScript $null

-PostUpdateScript $null

The update status takes a long time to refresh.

When using CredSSP authentication to run scripts on a remote machine, the update job might fail with an error.

This failure occurs because CredSSP has been disabled in the gateway machine.

For more information about the prerequisites required for a cluster update, see Update Azure Stack HCI clusters .

During full stack cluster updates, the update status shown in the Updates page might take a long time to refresh. If this issue occurs, it is recommended that you stay on the Updates page and wait for the update to complete. The update status will automatically be displayed once the update is complete.

To resolve the issue, follow these steps:

1. From the PowerShell window, run gpedit .

2. In the Group Policy Editor window, browse to Computer

Configurations > Administrative Templates > System

> Credentials Delegation .

3. Select Allow delegating fresh credentials with NTLMonly server authentication and enable it.

4. Run gpupdate /force in PowerShell.

62 Dell OpenManage Integration with Windows Admin Center

11

Updates and maintenance

Topics:

Annual feature update for an Azure Stack HCI Solution

Firmware and driver updates using the manual method

Restarting a cluster node or taking a cluster node offline

Expanding the Azure Stack HCI cluster

Extending volumes

Performing AX node recovery

Operating system recovery

Annual feature update for an Azure Stack HCI

Solution

Azure Stack HCI is a hyperconverged infrastructure (HCI) operating system that is delivered as an Azure service. HCI provides the latest security, performance, and feature updates at regular intervals. Updates are provided in the form of monthly quality updates and annual feature updates. The annual feature update is a major update and requires adequate resources and planning. Dell Engineering has worked with Microsoft on listing together prerequisites that should be considered and met before you update your Azure Stack HCI Solution, and also listed the different supported methods of updating your solution. We recommend that you go through this section before you decide which method to choose for updating your infrastructure.

Prerequisites

Following are the prerequisites to consider before you run annual updates on your solution.

1. This activity should preferably be performed during off peak hours, as the upgrade process might require multiple CAU runs and heavy I/O on the subsystem will require long repair times.

2. Ensure that the servers in use are in the compatibility matrix for support with the version of Azure Stack HCI that you intend to upgrade.

3. Ensure that all nodes in the cluster are updated to the latest Windows operation system patches/hotfixes (Quality Updates) and hardware updates as per the latest support matrix/catalog. Drivers and firmware specific to the newer version of Azure

Stack HCI OS, still have to be updated post upgrade.

4. All nodes in the cluster should be available and running.

5. The storage pool and all virtual disks should be healthy.

6. Physical disks should not be in storage maintenance mode.

7. If the cluster is a stretch cluster, see the section

Stretch cluster scenarios

for further details.

8. Any feature upgrade requires the CAU process to download large amounts of data to perform the upgrade. Usually it would be more than 2 to 3 GB.

NOTE: At the time of writing this document, a CAU upgrade downloads all the updates once for each node.

Updates and maintenance 63

Recommended methods for feature update

Dell Engineering has validated and tested the following update paths, and it is highly recommended that you follow the process that is given in these sections.

Connected scenarios

Windows Admin Center based upgrades

All Full Stack updates that are initiated from Windows Admin Center involve a CAU process, running both OS quality and feature updates, and also updating the subsystem components like firmware and drivers. To get an insight on how to run a Full

Stack update using Windows Admin Center and to take advantage of Dell OpenManage Integration with Microsoft Windows

Admin Center extension, see

Full Stack Cluster-Aware Updating for Azure Stack HCI clusters using the OpenManage Integration snap-in .

Once prerequisites to perform a feature update are met, the process of running a WAC full stack update is no different than running any regular end-to-end update. Go to the Updates section and select Feature Updates , to update the cluster to a higher version of Azure Stack HCI.

PowerShell based upgrades

About this task

An update to Azure Stack HCI OS can also be done by using Invoke-CAURun PowerShell cmdlet.

After all prerequisites are met, perform the following steps:

Steps

1. Run the following commands on all nodes to enable the firewall rule and allow automatic restarts:

Set-WSManQuickConfig

Enable-PSRemoting

Set-NetFirewallRule -Group "@firewallapi.dll,-36751" -Profile Domain -Enabled true

2. Run the following command to add a CAU cluster role in the cluster:

Add-CauClusterRole -CauPluginName Microsoft.WindowsUpdatePlugin,

Microsoft.HotfixPlugin,Microsoft.RollingUpgradePlugin

3. Run the following command to test the CAU setup:

Test-CauSetup -ClusterName <ClusterName>

4. You might see an error message The machine proxy on each failover cluster node should be set to a local proxy server . This is normal for nodes that are connected to the internet.

5. Run the following command:

NOTE: The Invoke-CauScan cmdlet performs the scanning of cluster nodes for applicable updates and gets a list of the initial set of updates that are applied to each node in a specific cluster.

Invoke-CauScan -ClusterName <clustername> -CauPluginName

"Microsoft.RollingUpgradePlugin" -CauPluginArguments @{'WuConnected'='true';}

-Verbose | fl *

64 Updates and maintenance

6. The Invoke-cauRun cmdlet can be run on any of the nodes of the cluster or on a remote server running Windows Server

2022 or Azure Stack HCI.

Invoke-CauRun -ClusterName <clusterName> -CauPluginName

"Microsoft.RollingUpgradePlugin" -CauPluginArguments @{'WuConnected'='true';}

-Verbose -EnableFirewallRules -Force -RequireAllNodesOnline -ForceSelfUpdate

7. When the cluster is not connected to the Internet, CAU can be used to upgrade the cluster using extracted Azure Stack HCI media.

Invoke-CauRun –ClusterName <cluster_name>

-CauPluginName Microsoft.RollingUpgradePlugin -CauPluginArguments

@{ 'WuConnected'='false';'PathToSetupMedia'='\some\path\';

'UpdateClusterFunctionalLevel'='true'; } -Force

8. The ForceSelfUpdate switch is needed, if you run the CAU process from a node within the cluster.

Post Update tasks

After an Azure Stack HCI cluster upgrade, perform the following steps:

1. Run the following command to update the cluster functional level:

Update-ClusterFunctionalLevel

2. Run the following command to update the storage pool:

Update-StoragePool -FriendlyName "S2D on hci-cluster"

3. Upgrade VM configuration levels.

You may optionally upgrade VM configuration levels by stopping each VM and using the Update-VMVersion cmdlet.

4. Ensure that the Azure Stack HCI is still connected to Azure.

After Microsoft Azure registration, use the Get-AzureStackHCI command to confirm the cluster registration and connection status.

Stretch cluster scenarios

The upgrading procedure for standalone and stretch clusters are exactly the same.

Ensure that the following steps are performed before attempting a stretch cluster upgrade from 21H2 to 22H2:

1. All the nodes in both sites have access to the internet to download the 22H2 update.

2. All the nodes in the cluster (both sites) are available and running, and no nodes or disks are in maintenance mode.

3. All the resources (Virtual Machines and Virtual Disks) have preferred site set so that they do not live migrate between sites during the upgrade.

Final notes

After upgrading Azure Stack HCI, do the following:

● Run the CAU process again to update all the nodes to the latest Microsoft CU and latest firmware and drivers.

● Ensure that the BitLocker feature is installed on the cluster nodes, if not install using the following powerless command:

Install-WindowsFeature -Name *BitLocker* -IncludeAllSubFeature

-IncludeManagementTools -Verbose

Updates and maintenance 65

Firmware and driver updates using the manual method

These procedures describe how to prepare and update firmware and drivers on an Azure Stack HCI cluster manually.

Preparing for maintenance operations

About this task

Use the following PowerShell commands to ensure that all the requirements are met before proceeding with the maintenance operation of an AX node in an Azure Stack HCI cluster. These steps ensure that all the requirements are met and that no faults exist before placing an AX node into maintenance mode.

Steps

1. Verify that all nodes in the cluster are available, run the Get-clusternode command.

2. Verify that all cluster networks are available, run the Get-ClusterNetwork command.

3. Verify that the cluster status is healthy. Run the following commands:

● Get-ClusterS2D

● Get-StoragePool

● Get-StorageSubSystem -FriendlyName *Cluster* | Get-StorageHealthReport

4. Verify that all the physical and virtual drives are healthy. Run the following commands:

● Get-physicaldisk

● Get-virtualdisks

5. Run the Get-storagejob command to verify that no back-end repair jobs are running.

Placing an AX node in maintenance mode

About this task

After ensuring that the prerequisites are met and before performing the platform updates, place the AX node in maintenance mode (pause and drain). You can move roles or VMs and gracefully flush and commit data in the AX node.

Steps

1. Run the following command to put the node in maintenance mode (pause and drain). Verify that all the roles and virtual drives are drained properly and operational in other nodes after they are moved:

Suspend-ClusterNode -name “Hostname” -Drain

2. Place the target node in maintenance mode:

Get-StorageFaultDomain -type StorageScaleUnit | Where-Object {$_.FriendlyName -eq

"<Hostname>"} | Enable-StorageMaintenanceMode

3. Run the Get-Physical Disk command and ensure that the Operational Status value is in maintenance mode for the drives that belong to that server.

You can also run the following command and verify that the drives all belong to the paused node:

Get-Storagepool -IsPrimordial 0 |Get-PhysicalDisk | ? operationalstatus -eq 'In

Maintenance Mode' |Get-StorageNode -PhysicallyConnected

4. Turn off the System Lockdown mode.

5. Suspend BitLocker if enabled using the following command.

Suspend-BitLocker -MountPoint "C:" -RebootCount 0

66 Updates and maintenance

Obtaining the firmware catalog for AX nodes or Ready Nodes using

Dell Repository Manager

About this task

For a qualified set of firmware and drivers for AX nodes or Ready Nodes, Dell Technologies recommends that you use an Azure

Stack HCI catalog.

You can generate the firmware catalog along with the firmware and drivers by using Dell Repository Manager (DRM) and copy it to a shared path.

Steps

1. Install DRM version 3.0.1.423 or later.

2. On the DRM home page, click the Dell EMC Repository Manager drop-down list.

3. In the Manage section, click Application Preferences .

The Preferences window is displayed.

4. Click Plug-ins .

5. Select all the plug-ins and click Update .

A message is displayed about the successful completion of the update.

6. Click Catalogs .

7. Select all the catalogs and click Update .

8. Click Close to close the Preferences window.

9. On the home page, click Add Repository .

The Add Repository window is displayed.

10. Enter the Repository name and Description .

11. Select Index Catalog-

<version>

from the Base Catalog drop-down menu.

12. Select Update Catalog for Microsoft HCI Solutions from the Catalog Group .

13. Select the latest catalog from the Catalogs section.

14. Click Save .

The Update Catalog for Microsoft HCI Solutions is populated in the Base Catalog section.

15. In the Manual Repository Type, click All systems in base catalog and then click Add .

The repository is displayed on the repository dashboard available in the home page.

16. Select the repository and click Export .

The Export Deployment Tools window is displayed.

17. Select the location to export files and click Export .

The files are exported to the specified location.

Updating the AX node by using iDRAC out of band

About this task

AX nodes offer device firmware updates remotely through iDRAC. For Azure Stack HCI clusters, the recommended option is to use an Azure Stack HCI catalog for a qualified set of firmware and BIOS. Generate the latest

Dell Azure Stack HCI catalog file

through Dell Repository Manager (DRM) and copy the file to a network location before proceeding with the update process.

Steps

1. Log in to the iDRAC web interface.

2. Click Maintenance > System Update .

The Firmware Update page is displayed.

3. On the Update tab, select Network Share as the file location.

4. Provide the details of the network share:

Updates and maintenance 67

Figure 23. Check for updates

5. Click Check for updates .

A list of available updates is displayed:

Figure 24. Select updates

6. Select the updates and click Install Next Reboot to install and reboot the system.

Updating the out-of-box drivers

For certain system components, you might need to update the drivers to the latest Dell supported versions, which are listed in the Supported Firmware and Software Matrix.

Run the following PowerShell command to retrieve the list of all driver versions that are installed on the local system:

Get-PnpDevice | Select-Object Name, @{l='DriverVersion';e={(Get-PnpDeviceProperty -

InstanceId $_.InstanceId -KeyName 'DEVPKEY_Device_DriverVersion').Data}} -Unique |

Where-Object {($_.Name -like "*HBA*") -or ($_.Name -like "*mellanox*") -or ($_.Name

-like "*Qlogic*") -or ($_.Name -like "*X710*") -or

($_.Name -like "*Broadcom*") -or

($_.Name -like "*marvell*")

Run the following PowerShell command to check the chipset driver installation status. If there is an error, install the chipset driver:

Get-PnpDevice -PresentOnly | Where-Object {($_.Status -ne 'OK') -and ($_.Problem -ne

'CM_PROB_NONE' -and $_.Problem -ne 'CM_PROB_DISABLED')}

68 Updates and maintenance

After you identify the required driver version, including for the chipset and the HBA, download the driver installers from https:// www.dell.com/support or by using the Dell Repository Manager (DRM) as described in

Obtaining the firmware catalog for AX nodes or Ready Nodes using the Dell Repository Manager

.

After the drivers are downloaded, copy the identified drivers to AX nodes from where you can manually run the driver DUP files to install the drivers and restart the node.

Alternatively, to install the drivers silently, go to the folder and run the following command:

<DriverUpdate>.EXE /s /f

Exiting the AX node from maintenance mode

After updating the AX node, exit the storage maintenance mode and node maintenance mode by running the following commands:

Get-StorageFaultDomain -type StorageScaleUnit | Where-Object {$_.FriendlyName -eq

"<Hostname>"} | Disable-StorageMaintenanceMode

Resume-ClusterNode -Name “Hostname” -Failback Immediate

Resume-BitLocker -MountPoint "C:"

These commands initiate the operation of rebuilding and rebalancing the data to ensure load balancing.

For the remaining cluster nodes, repeat the preceding procedures for conducting maintenance operations.

Restarting a cluster node or taking a cluster node offline

About this task

Use the following procedure to restart a cluster node or to take a cluster node offline for maintenance:

Steps

1. Verify the health status of your cluster and volumes:

● Get-StorageSubSystem -FriendlyName *Cluster* | Get-StorageHealthReport

● Get-physicaldisk

● Get-virtualdisks

2. Suspend the cluster node:

● Suspend-ClusterNode -name “Hostname” -Drain

3. Enable storage maintenance mode:

● Get-StorageFaultDomain -type StorageScaleUnit | Where-Object {$_.FriendlyName -eq

"<Hostname>"} | Enable-StorageMaintenanceMode

4. Suspend the BitLocker:

● Suspend-BitLocker -MountPoint "C:" -RebootCount 0

5. Restart the server or shut it down for maintenance.

6. Disable storage maintenance mode.

● Get-StorageFaultDomain -type StorageScaleUnit | Where-Object {$_.FriendlyName -eq

"<Hostname>"} | Disable-StorageMaintenanceMode

7. Resume the cluster node:

● Resume-ClusterNode -Name “Hostname” -Failback Immediate

● Resume-BitLocker -MountPoint "C:"

Results

For more information, see Taking a Storage Spaces Direct server offline for maintenance .

Updates and maintenance 69

Expanding the Azure Stack HCI cluster

Expanding cluster compute or storage capacity are tasks performed during cluster operations. This section provides instructions for performing these tasks.

Figure 25. Expanding the Azure Stack HCI cluster

Azure Stack HCI node expansion

In an HCI cluster, adding server nodes increases the storage capacity, improves the overall storage performance of the cluster, and provides more compute resources to add VMs. Before adding new server nodes to an HCI cluster, complete the following requirements:

● Verify that the processor model HBA and NICs are of the same configuration as the current nodes on the cluster and PCIe slots.

● Ensure that all disk types and the amount in each node are the same as the node in use. Do not combine two different disk types in the same cluster or node. For example, you cannot combine SATA and SAS HDD/SSD drives in the same node or cluster. The following table lists the supported options for expanding storage capacity of the cluster.

Table 12. Options to expand storage capacity of the cluster

Option 1 conditions Option 2 conditions

○ Drive is listed in the Support Matrix

○ Same drive manufacturer

○ Same capacity and endurance

○ Latest model

○ Latest firmware

Drive is listed in the Support Matrix

Different drive manufacturer

Same capacity and endurance

Different model

Different firmware

● Ensure that the BIOS, drivers, firmware, and chipset are as listed in the support matrix.

● Apply the BIOS configuration to the node and configure iDRAC. For more information about configuring the node, see

Microsoft HCI Solutions from Dell Technologies: End-to-End Deployment with Switchless Networking or Microsoft HCI

70 Updates and maintenance

Solutions from Dell Technologies: End-to-End Deployment with Scalable Networking . Do not run the PowerShell commands in the following sections of the deployment guide again because the cluster is already created, Storage Spaces Direct is already enabled, and the management network is already excluded:

○ Creating the host cluster

○ Enabling Storage Spaces Direct

○ Configuring the host management network as a lower priority network for live migration

● Ensure that the following tasks are completed:

1. Pass cluster validation and SES device compliance tests.

2. Verify that the nodes are compliant with the firmware baseline.

3. Update the hardware timeout configuration for the Spaces port.

4. After the node configuration, update Microsoft Windows to bring the node to the same level as the cluster.

Adding server nodes manually

NOTE: The procedure is applicable only if the cluster and Storage Spaces Direct configuration is done manually.

To manually add server nodes to the cluster, see https://technet.microsoft.com/windows-server-docs/storage/storagespaces/add-nodes .

Storage Spaces Direct storage expansion

In an HCI cluster, expanding storage by adding drives on the available slots on the cluster nodes adds storage capacity to the cluster and improves storage performance. Before the storage expansion, ensure that all disk types and the amount in each node are the same and are equal to that of the node in use. Do not combine two different disk types in the same cluster or node.

For example, you cannot combine SATA and SAS HDD/SSD drives in the same node or cluster.

The following options for expanding the storage capacity of the cluster are supported:

● Option 1: Expand the storage with the same drive manufacturer, capacity, endurance, latest model, and latest firmware.

Determine if it is available on the AX node support matrix.

● Option 2: Expand the storage with a different drive manufacturer, model, firmware, and the same capacity and endurance.

Determine if it is available on the AX node support matrix.

When new disks are added to extend the overall storage capacity per node, the Azure Stack HCI cluster starts claiming the physical disks into an existing storage pool.

After the drives are added, they are shown as available for pooling (CanPool set to True) in the output of the Get-

PhysicalDisk command.

Within a few minutes, the newly added disks are claimed in the existing pool and Storage Spaces Direct starts the rebalance job.

Run the following command to verify that the new disks are a part of the existing pool:

PS C:\> Get-StorageSubSystem -FriendlyName *Cluster* | Get-StorageHealthReport

CPUUsageAverage : 2.66 %

CapacityPhysicalPooledAvailable : 8.01 TB

CapacityPhysicalPooledTotal : 69.86 TB

CapacityPhysicalTotal : 69.86 TB

CapacityPhysicalUnpooled : 0 B

CapacityVolumesAvailable : 15.09 TB

CapacityVolumesTotal : 16.88 TB

IOLatencyAverage : 908.13 us

IOLatencyRead : 0 ns

IOLatencyWrite : 908.13 us

IOPSRead : 0 /S

IOPSTotal : 1 /S

IOPSWrite : 1 /S

IOThroughputRead : 0 B/S

IOThroughputTotal : 11.98 KB/S

IOThroughputWrite : 11.98 KB/S

MemoryAvailable : 472.87 GB

MemoryTotal : 768 GB

After all available disks are claimed in the storage pool, the CapacityPhysicalUnpooled is 0 B .

Updates and maintenance 71

The storage rebalance job might take a few minutes. You can monitor the process by using the Get-StorageJob cmdlet.

Extending volumes

You can resize volumes that are created in Spaces Direct storage pools by using the Resize-VirtualDisk cmdlet. For more information, see https://technet.microsoft.com/windows-server-docs/storage/storage-spaces/resize-volumes .

Performing AX node recovery

If a cluster node fails, perform node operating system recovery in a systematic manner to ensure that the node is brought up with the configuration that is consistent with other cluster nodes.

The following sections provide details about operating system recovery and post-recovery configuration that is required to bring the node into an existing Azure Stack HCI cluster.

NOTE: To perform node recovery, ensure that the operating system is reinstalled.

Configuring RAID for operating system drives

Prerequisites

The Dell PowerEdge servers offer the Boot Optimized Storage Solution (BOSS) controller as an efficient and economical way to separate the operating system and data on the internal storage of the server. The BOSS solution in the latest generation of

PowerEdge servers uses one or two BOSS M.2 SATA devices to provide RAID 1 capability for the operating system drive.

NOTE: All Microsoft HCI Solutions from Dell Technologies are configured with hardware RAID 1 for the operating system drives on BOSS M.2 SATA SSD devices. The steps in this section are required only when recovering a failed cluster node.

Before creating a new RAID, the existing or failed RAID must be deleted.

About this task

This procedure describes the process of creating operating system volumes.

Steps

1. Log in to the iDRAC web interface.

2. Go to Storage > Controllers .

Figure 26. View controllers

3. Go to Configuration > Storage Configuration > Virtual Disk Configuration , and then click Create Virtual Disk .

72 Updates and maintenance

Figure 27. Create a virtual disk

4. Provide a virtual disk name and select BOSS M.2 devices in the physical disks.

Figure 28. Provide virtual disk name

Figure 29. Set physical disks

Updates and maintenance 73

5. Click Add Pending Operations .

6. Go to Configuration > Storage Configuration > Virtual Disk Configuration .

Figure 30. Initialize configuration

7. Select the virtual disk and then select Initialize: Fast in Virtual Disk Actions .

8. Reboot the server.

NOTE: The virtual disk creation process might take several minutes to complete.

9. After the initialization is completed successfully, the virtual disk health status is displayed.

Figure 31. Virtual disk health status

Operating system recovery

This section provides an overview of steps involved in operating system recovery on the Microsoft HCI Solutions from Dell

Technologies.

NOTE: Ensure that the RAID 1 VD created on the BOSS M.2 drives is reinitialized.

NOTE: Do not reinitialize or clear the data on the disks that were a part of the Storage Spaces Direct storage pool. This helps to reduce repair times when the node is added back to the same cluster after recovery.

Manual operating system recovery

For manually deployed nodes, you can recover the operating system on the node by using any of the methods that were used for operating system deployment.

74 Updates and maintenance

Factory operating system recovery

For the factory-installed OEM license of the operating system, Dell Technologies recommends that you use the operating system recovery media that shipped with the PowerEdge server. Using this media for operating system recovery ensures that the operating system stays activated after the recovery. Using any other operating system media triggers the need for activation after operating system deployment. Operating system deployment using the recovery media is the same as either retail or other operating system media-based installation.

After completing the operating system deployment using the recovery media, perform the following steps to bring the node into an existing Azure Stack HCI cluster:

1. Update CPU chipset, network, and storage drivers.

2. Configure host networking.

3. Change the hostname.

4. Perform AD Domain Join.

5. Configure the QoS policy (for RoCE for RDMA only).

6. Configure RDMA.

7. Configure the firewall.

8. Perform Day 0 operating system updates.

9. Add server nodes to the cluster.

For instructions on steps 1 through 7, see Microsoft HCI Solutions from Dell Technologies: End-to-End Deployment with

Switchless Networking or Microsoft HCI Solutions from Dell Technologies: End-to-End Deployment with Scalable Networking .

iDRAC Service Module (iSM) for AX nodes and Storage Spaces

Direct Ready Nodes

The iDRAC Service Module (iSM) is a lightweight software module that you can install on AX nodes to complement the iDRAC interfaces—the user interface (UI), RACADM CLI, Redfish, and Web Services-Management (WS-Man)—with additional monitoring data.

About this task

To install iSM on the operating system, perform the following steps.

NOTE: The ISM application package is installed as part of firmware and driver updates using the ASHCI catalog.

Steps

1. Go to iDRAC > iDRAC Settings > Settings > iDRAC Service Module Setup .

2. Start the virtual console.

3. Log in to the host operating system as an administrator.

4. From the device list, select the mounted volume that is identified as SMINST and then click the ISM_Win.bat

script to start the installation.

Results

After the installation is completed, iDRAC indicates that the iSM is installed and specifies the latest installation date.

FullPowerCycle

FullPowerCycle is a calling interface function that provides a way to reset the server auxiliary power. An increasing amount of server hardware runs on server auxiliary power. Troubleshooting some server issues requires you to physically unplug the server power cable to reset the hardware running on auxiliary power.

The FullPowerCycle feature enables the administrator to connect or disconnect auxiliary power remotely without visiting the data center. This feature is supported on AX nodes and Storage Spaces Direct Ready Nodes.

These are the relevant commands to run in the PowerShell console:

● To request FullPowerCycle on your system: Invoke-FullPowerCycle —status request

● To get the status of FullPowerCycle on your system: Invoke-FullPowerCycle -status Get

Updates and maintenance 75

● To cancel FullPowerCycle on your system: Invoke-FullPowerCycle -status cancel

76 Updates and maintenance

advertisement

Was this manual useful for you? Yes No
Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement

Table of contents