Solution deployment. Dell AX-750, AX-640, AX-6515, AX-740XD, AX-7525, AX-650


Add to my manuals
76 Pages

advertisement

Solution deployment. Dell AX-750, AX-640, AX-6515, AX-740XD, AX-7525, AX-650 | Manualzz

3

Solution deployment

This chapter presents the following topics:

Topics:

Introduction

PAL and DPOR registration for Azure Stack

Install roles and features

Configuring a cluster witness

Deployment prerequisites for stretched clusters

Customer network team requirements

Design principles and best practices

Validated network topology

Introduction

Stretched clusters with Dell Solutions for Azure Stack HCI can be configured using PowerShell. This guide describes the prerequisites for this deployment.

NOTE: The instructions in this guide are applicable only to the Microsoft Windows Azure Stack HCI operating system.

Each task in this deployment guide requires running one or more PowerShell commands. On some occasions you might have to use Failover Cluster Manager or Windows Admin Center from a machine that supports Desktop Experience.

PAL and DPOR registration for Azure Stack

Partner Admin Link (PAL) and Digital Partner of Record (DPOR) are customer association mechanisms used by Microsoft to measure the value a partner delivers to Microsoft by driving customer adoption of Microsoft Cloud services.

Currently, Dell Azure projects that are not associated through either of these mechanisms are not visible to Microsoft and, therefore, Dell does not get credit. Dell technical representatives should attempt to set both PAL and DPOR, with PAL being the priority.

To register the PAL or DPOR for the Azure Stack system, refer to PAL and DPOR Registration for Azure Stack under

Deployment Procedures in the Azure Stack HCI generator in SolVe.

Install roles and features

Deployment and configuration of a Windows Server 2016, Windows Server 2019, Windows Server 2022, or Azure Stack HCI operating system version cluster requires enabling specific operating system roles and features.

Enable the following roles and features:

● Hyper-V service (not required if the operating system is factory-installed)

● Failover clustering

● Data center bridging (DCB) (required only when implementing fully converged network topology with RoCE and when implementing DCB for the fully converged topology with iWARP)

● BitLocker (optional)

● File Server (optional)

● FS-Data-Deduplication module (optional)

● RSAT-AD-PowerShell module (optional)

Enable these features by running the Install-WindowsFeature PowerShell cmdlet:

10 Solution deployment

Install-WindowsFeature -Name Hyper-V, Failover-Clustering, Data-Center-Bridging, BitLocker,

FS-FileServer, RSAT-Clustering-PowerShell, FS-Data-Deduplication -IncludeAllSubFeature

-IncludeManagementTools -verbose

NOTE: Install the storage-replica feature if Azure Stack HCI operating system is being deployed for a stretched cluster.

NOTE: Hyper-V and the optional roles installation require a system restart. Because subsequent procedures also require a restart, the required restarts are combined into one.

Configuring a cluster witness

A cluster witness must be configured for a two-node cluster. Microsoft recommends configuring a cluster witness for a four-node Azure Stack HCI cluster. Cluster witness configuration helps maintain a cluster or storage quorum when a node or network communication fails and nodes continue to operate but can no longer communicate with one another.

A cluster witness can be either a file share or a cloud-based witness.

NOTE: If you choose to configure a file share witness, ensure that it is outside the two-node cluster.

For information about configuring a cloud-based witness, see Cloud-based witness .

Deployment prerequisites for stretched clusters

Dell Technologies assumes that the management services required for the operating system deployment and cluster configuration are present in the existing infrastructure. An internet connection is required to license and register the cluster with Azure. Because Microsoft Azure Stack HCI operating system is a Server Core operating system, you require a system that supports Desktop Experience to access Failover Cluster Manager and Windows Admin Center. For more information, see the

Windows Admin Center FAQ .

Table 2. Deployment prerequisites for stretched clusters

Component Requirements

Active Directory Sites & Subnets Configure two sites and their corresponding subnets in Active

Directory so that the correct sites appear on Failover Cluster

Manager on configuration of stretched clusters.

Configure Fault domains for each cluster if the IP subnets are the same across both sites.

Network

Windows Admin Center Node

The following requirements apply:

● If two sites have host networks in different subnets, no additional configuration is needed for creating clusters.

Otherwise, manual configuration of the cluster fault domain is required.

● RDMA Adapters for Storage/SMB traffic.

● RDMA is not supported for Replica traffic across WAN.

● At least a 1 Gb network between sites for Replication and inter-site Live Migration is required.

● The bandwidth between sites should be sufficient to meet the write I/Os on the primary site.

● An average latency of 5 ms or lesser for Synchronous

Replication.

● There are no latency requirements or recommendations for

Asynchronous replication.

● There is no recommendation from Microsoft regarding the maximum distance between sites that a stretched cluster can support. Longer distances normally translate into higher network latency.

Windows features required:

Solution deployment 11

Table 2. Deployment prerequisites for stretched clusters (continued)

Component Requirements

RSAT-Clustering

RSAT-Storage-Replica

Number of nodes supported

Minimum: 4 (2 Nodes per site)

Maximum: 16 (8 Nodes Per Site)

Number of drives supported

Tuning of cluster heartbeats

SDN/VM Network

Minimum of 4 drives per node. Both sites should have the same capacity and number of drives. Dell Technologies currently supports only an All-Flash configuration for stretched clusters.

(get-cluster).SameSubnetThreshold = 10

(get-cluster).CrossSubnetThreshold = 20

SDN on multi-site clusters is not supported at this time.

For the maximum supported hardware configuration, see Review maximum supported hardware specifications .

Customer network team requirements

Depending on the network configuration chosen, customers should ensure that the requisite end-to-end routing is enabled for inter-site communication. A minimum of one IP route or three IP routes based on Basic or High Throughput configuration is required for the environment.

Depending on the network configuration, the customer network team may also need to add static routes on the switches or on

Layer-3 to ensure site-to-site connectivity.

Design principles and best practices

Stretched clusters and Storage Replica

A stretched cluster setup has two sites and two storage pools. Replicating data across WAN and writes on both sites results in lower performance compared to a standalone Storage Spaces Direct Cluster. Low latency inter-site links are necessary for optimum performance of workloads. Low bandwidth and high latency between sites can result in very poor performance on the primary site in the case of both synchronous and asynchronous replication.

Synchronous replication involves data blocks being written to log files on both sites before being committed. In asynchronous replication, the remote node accepts the block of replicated data and acknowledges back to the source copy. Application performance is not affected unless the rate of change of data is faster than the bandwidth of the replica link between the sites for large periods of time. This point is critical and must be taken into consideration when you are designing the solution.

The size of the log volume has no bearing on the performance of the solution. A larger log collects and retains more write I/Os before they are wrapped out. This allows for an interruption in service between the two sites (such as a network outage or the destination site being offline) to go on for a longer period.

Table 3. Disk writes

Scenario

Standalone storage spaces

Replication to secondary site

Writes in two-way mirrored volumes Writes in three-way mirrored volumes

2x

4x

3x

6x

NOTE: WAN latency and additional writes to log volumes on both sites causes higher write latency. Along with writes to the log and data disks, the inter-site bandwidth and latency also play a role in limiting the IOPs in the environment. For this reason, we highly recommend using all-flash configurations for stretched clusters.

12 Solution deployment

NOTE: In a Storage Spaces Direct environment both data and log volumes eventually reside on the same SSD pool because multiple storage pools per site are not supported.

The following figure illustrates the difference between synchronous and asynchronous replication:

Figure 2. Synchronous and asynchronous replication

Synchronous replication : A block of data written by an application to a volume on Site A (1) is written first to the corresponding log volume on the same site (2), and is then replicated to Site B (2). At site B, the block of data is written to the Replica log volume (3) before a commit is sent back to the application using the same route (4 and 5). The block is subsequently pushed to the data volumes on both sites. For each block of data that the application writes, the commit is issued only after data is written to the secondary site. Thus there is no data loss at file system level in the event of a site failure. This results in a lower application write performance compared to a standalone deployment.

Asynchronous replication : A block of data written by an application to a volume on Site A (1) is written first to the corresponding log volume on the same site (2). A commit is immediately sent back to the application. At the same time, the block of data is replicated to Site B and written to the Replica log volume. In the case of a site failure, the cluster ensures that no data is lost beyond the configured Recovery Point Objective (RPO). Application performance is not affected unless the rate of change of data is faster than the bandwidth of the replica link between the sites for large periods of time. This is critical and must be taken into consideration when designing the solution.

NOTE: Both replication scenarios affect application performance because each data block has to be written multiple times, assuming that all volumes are configured for replication.

NOTE: Stretched cluster with Storage Replica is not a substitute for a backup solution. Stretched cluster is a disaster recovery solution that keeps a business running in the event of a site failure. Customers should still rely on application and infrastructure backup solutions to recover lost data due to user error or application/data corruption.

CAUTION: To use the Data Deduplication feature for your Azure Stack HCI data volumes, you must install the server role on both the source and destination servers. Do not enable Data Deduplication on the destination volumes within an Azure Stack HCI stretched cluster. Data Deduplication manages the writes, so it must run only on the source cluster nodes. Destination nodes always receive Deduplicated copies of each volume.

Solution deployment 13

Validated network topology

Basic configuration

This section describes the host network configuration and network cards that are required to configure a basic stretched cluster. The purpose of this topology is to keep the host and inter-site configuration simple with little or no change to a standard standalone cluster networking architecture.

Here we use two 25 GbE NICs for each host on both sites. One NIC is dedicated to intra-site storage traffic, similar to a standalone Storage Spaces Direct environment. The second NIC is used for management, compute, and Storage Replica traffic.

To ensure management traffic is not bottlenecked due to high traffic on the Replica network, we request the customer network team to throttle traffic between the two sites using firewall or router QoS rules. It is recommended that the network is throttled to 50 percent of the capacity of the total number of network cards supporting the management NIC team.

The management network is the only interface between the two sites. Because only one network pipe is available between the hosts on Site A and Site B, you will see the following warning in the cluster validation. This is an expected behavior.

Node SiteANode1.Test.lab is reachable from Node SiteBNode1.Test.lab by only one pair of network interfaces. It is possible that this network path is a single point of failure for communication within the cluster. Please verify that this single path is highly available, or consider adding additional networks to the cluster.

Table 4. Sample IP address schema

Site A

Management/Replica Traffic 192.168.100.0/24

Intra-site Storage (RDMA) - 1 192.168.101.0/24

Site B

192.168.200.0/24

192.168.201.0/24

Type of traffic

L2/L3

L2

Intra-site Storage (RDMA) - 2 192.168.102.0/24

VMNetwork/Compute

Network

192.168.202.0/24 L2

As per customer environment As per customer environment L2/L3

The following figure shows the network topology of a basic stretched cluster:

Figure 3. Network topology for a stretched cluster (basic)

14 Solution deployment

High throughput configuration

In this topology we use two 25 GbE NICs and two additional 1/10/25 GbE ports from each host to configure a high throughput stretched cluster. One NIC is dedicated for intra-site RDMA traffic, similar to a standalone Storage Spaces Direct environment.

The second NIC is used for replica traffic. SMB Multichannel is used to distribute traffic evenly across both replica adapters and it increases network performance and availability. SMB Multichannel enables the use of multiple network connections simultaneously, and facilitates the aggregation of network bandwidth and network fault tolerance when multiple paths are available. For more information, see Manage SMB Multichannel .

The Set-SRNetworkConstraint cmdlet is used to ensure replica traffic flows only through the dedicated interfaces and not through the management interface. Run this cmdlet once for each volume.

IP Address schema

The following table shows the IP Address schema:

Table 5. IP Address schema

Site A

Management 192.168.100.0/24

Intra-site Storage (RDMA) - 1 192.168.101.0/24

Intra-site Storage (RDMA) - 2 192.168.102.0/24

Replica - 1*

Replica - 2*

VMNetwork

Cluster IP

192.168.111.0/24

192.168.112.0/24

Site B

192.168.200.0/24

192.168.201.0/24

192.168.202.0/24

192.168.211.0/24

192.168.212.0/24

Type of traffic

L2/L3

L2

L2

L2/L3

L2/L3

As per customer environment As per customer environment L2/L3

192.168.100.100

192.168.200.100

L2

*Static routes are needed on all hosts on both sites to ensure the 192.168.111.0/24 network can reach 192.168.211.0/24 and the

192.168.112.0/24 network can reach 192.168.212.0/24. Static routes are needed in this network topology because we have three network pipes between Site A and Site B. Network traffic on Management uses the default gateway to traverse the network, while Replica network uses static routes on the hosts to reach the secondary site. If your ToR switches do not have BGP configured, static routes are needed on them also.

The following figure shows the network topology of an advanced stretched cluster:

Figure 4. Network topology for a stretched cluster (advanced)

Solution deployment 15

advertisement

Was this manual useful for you? Yes No
Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Related manuals

advertisement

Table of contents