Performance & Scalability of
SAS® Business Analytics on an
NEC Express5800/A1080a
(Intel® Xeon® 7500 series-based Platform)
using Red Hat Enterprise Linux 5
SAS® Business Analytics
Base SAS® for SAS 9.2
Red Hat Enterprise Linux 5.4
NEC Express5800/A1080a
8 Socket Intel® Xeon®
7500 series-based platform
(8 Cores / Socket = 64 Cores
2 Threads / Core = 128 Threads)
Version 1.0
April 2010
Performance & Scalability of
SAS Business Analytics on an
NEC Express5800/A1080a
(Intel® Xeon® 7500 series-based Platform)
using Red Hat Enterprise Linux 5
1801 Varsity Drive
Raleigh NC 27606-2072 USA
Phone: +1 919 754 3700
Phone: 888 733 4281
Fax: +1 919 754 3701
PO Box 13588
Research Triangle Park NC 27709 USA
"Red Hat," Red Hat Enterprise Linux, the Red Hat "Shadowman" logo, and the products listed are
trademarks or registered trademarks of Red Hat, Inc. in the United States and other countries.
Linux is a registered trademark of Linus Torvalds.
All other trademarks referenced herein are the property of their respective owners.
© 2010 by Red Hat, Inc. This material may be distributed only subject to the terms and conditions
set forth in the Open Publication License, V1.0 or later (the latest version is presently available at
http://www.opencontent.org/openpub/).
The information contained herein is subject to change without notice. Red Hat, Inc. shall not be
liable for technical or editorial errors or omissions contained herein.
Distribution of modified versions of this document is prohibited without the explicit permission of
Red Hat Inc.
Distribution of the work or derivative of the work in any standard (paper) book form for
commercial purposes is prohibited unless prior permission is obtained from Red Hat Inc.
The GPG fingerprint of the security@redhat.com key is:
CA 20 86 86 2B D6 9D FC 65 F6 EC C4 21 91 80 CD DB 42 A6 0E
www.redhat.com
2
Table of Contents
1 Executive Summary ........................................................................................................ 5 2 Test Configuration ..........................................................................................................6 2.1 Hardware .................................................................................................................7 NEC Express5800/A1080a Server ............................................................................. 7 NEC D3 SAN Storage Array ....................................................................................... 8 Intel Xeon Processor 7500 Series .............................................................................. 8 2.2 SAS 9.2 ..................................................................................................................11 2.3 Red Hat Enterprise Linux 5.4 ................................................................................. 11 3 Test Methodology ......................................................................................................... 12 3.1 Test Execution ....................................................................................................... 12 3.2 Data ....................................................................................................................... 13 4 Performance Results .................................................................................................... 14 4.1 Performance Effects of NUMA ............................................................................... 18 4.2 RHEL Observations & Tuning ................................................................................ 20 4.3 SAS Tuning Guidelines .......................................................................................... 21 5 Conclusions .................................................................................................................. 22 3
www.redhat.com
www.redhat.com
4
1 Executive Summary
Today, companies are increasingly utilizing analytics to discover new revenue and costsaving opportunities. Many business professionals turn to SAS, a leader in business
analytics software and service, to help them improve performance and make better
decisions faster. Analytics are also being employed in risk management, fraud detection,
life sciences, sports, and many more emerging markets. However, to maximize the value
to the business, analytics solutions need to be deployed quickly and cost-effectively,
while also providing the ability to readily scale without degrading performance. Of course,
in today’s demanding environments, where budgets are still shrinking and mandates
to reduce carbon footprints are growing, the solution must deliver excellent hardware
utilization, power efficiency, and ROI.
To help solve these challenges, Red Hat, SAS, NEC, and Intel collaborated to prove the
linear scalability of SAS 9.2 running Red Hat® Enterprise Linux® on NEC’s newest Intel®
Xeon® processor 7500 series-based platform. The result is a pre-tested, scalable server
configuration that can help you deploy faster, reduce risk, lower cost, and plan for future
upgrades.
Game changing performance and scalability
The ability to take advantage of performance enhancements in the latest processors and
the ability to tune I/O and virtual memory, makes Red Hat Enterprise Linux an ideal
platform for SAS Business Analytics. Industry benchmarks reflect the scalability and
performance of Red Hat Enterprise Linux in both scale-up (vertical) and scale-out (grid)
models. The SAS 9.2 test results documented here demonstrate excellent scalability up
to 64 cores (128 threads) on a single system.
5
www.redhat.com
2 Test Configuration
System Configuration:
•
NEC Express5800/A1080a system
•
8 Intel Xeon processor 7500 series (8 cores per socket) – 64 total cores/128
threads
•
Intel® Hyper-threading Technology on
•
Intel® Turbo Boost Technology on
•
256 GB RAM
•
4 x external NEC D-series storage arrays
•
30 x 1 TB SATA 7200 RPM disk per array
•
16 x 4 G-bit fiber connections to disk
Note: Half the CPU cores, RAM and storage was used for the 32-core test.
SAS Software:
•
Foundation SAS 9.2, SAS/STAT®
Operating System:
•
Red Hat Enterprise Linux 5.4z 64-bit
www.redhat.com
6
2.1 Hardware
NEC Express5800/A1080a Server
Figure 1
Designed specifically for the Intel Xeon processor 7500 series and scalable from 8 to 64
processor cores and up to 128 threads, the NEC Express5800/A1080a server is an ideal
7
www.redhat.com
platform for scaling up or consolidating SAS 9.2 instances. This server takes advantage
of the intelligent performance, energy efficiency, and virtualization capabilities of the Intel
Xeon processor and leverages the low latency and high-performance interface
technology of NEC’s supercomputers. The server supports modular designs, redundant
components, hot plug capabilities, and floating I/O. Additional benefits include:
• Up to 64 cores, 128 threads, and 2 TB memory
• Integrated Intel Quick-Path Interconnect technology to increase performance
through efficient memory access
• Green Cooling Technology that helps to minimize power consumption and
automate power usage for more effective datacenter consolidation benefits
• Built-in service processor working in conjunction with NEC’s BIOS and Intel’s
Machine Check Architecture to provide reliability, availability, and serviceability for
mission-critical computing
• Server virtualization that offers high performance, energy efficiency, and higher
server bandwidth to handle the increased communications in a virtualized
environment
NEC D3 SAN Storage Array
The NEC D3 SAN Storage Array offers scalability to 288 TB and aggregate throughput of
over 1,100 MB/s. The system is fully redundant to protect against single point of failure,
with a battery backup unit to protect its 4 GB of cache.
Features include replication, snapshots, performance monitoring, automatic tuning, multipathing, and failover.
Intel Xeon Processor 7500 Series
Intel Xeon Processor 7500 Series is built to handle your most processor-intensive,
mission-critical applications, the Intel Xeon processor 7500 series delivers a quantum
leap in enterprise computing performance. The Intel Xeon processor 7500 series
combines up to eight cores and 16 processing threads in a single device and offers four
advanced, high-bandwidth interconnect links that allow multiple processors to be directly
connected to each other. The result is unprecedented scalability.
www.redhat.com
8
Figure 2
The Intel Xeon processor 7500 series intelligently adjusts performance and energy
consumption to accommodate application needs. Built-in Intel Turbo Boost Technology
automatically speeds up the processor when your SAS workload requires extra
performance. Intel Hyper-Threading Technology allows each processor core to work
on two tasks at the same time to enhance performance for highly-threaded workloads.
Intel Intelligent Power Technology automatically places CPUs and memory into
an optimal power state for maximum performance, while reducing energy use.
9
www.redhat.com
Figure 3
www.redhat.com
10
2.2 SAS 9.2
SAS 9.2 provides the core components of the SAS Business Analytics Framework and
significant performance improvements over SAS 9.1 on Linux. SAS 9.2 helps users gain
insights that are often hidden in data, so they can reach evidence-based decisions with
confidence. SAS 9.2 supports the entire analysis process — from data access to the
point of decision — however varied or complex. A wide range of data integration
techniques empowers users to collect, classify, process, analyze, and interpret data to
reveal new insights. SAS 9.2 advances the capabilities of SAS analytical products,
including forecasting, data mining, optimization, and model management. SAS
Analytics provide rapid answers to key business questions, allowing decision makers to
react more quickly to fast changing conditions.
SAS 9.2 is available as 64-bit enabled applications supporting 64-bit extended
architectures. This enables you to scale up or consolidate multiple SAS instances within
one affordable, powerful, commodity system.
2.3 Red Hat Enterprise Linux 5.4
Red Hat Enterprise Linux performance features include:
• SMP performance and scalability. Multi-process or threaded applications can be
optimally scheduled in large SMP systems. A vast virtual address space enables
SAS Business Analytics to effectively use more memory to work on larger data
sets. Enhancements enable applications to effectively use more processors.
• Intelligent performance. Efficient use of software threads, support for hyperthreading technology, and the ability to change the clock speed on a running
processor increase performance.
• Automated energy efficiency. Power technologies from Intel and AMD and
optimizations in the operating system lower power consumption during off-peak
times. Consuming less power means lower cooling requirements, which
contributes to further savings and greener datacenters.
• Tuning for optimum I/O throughput. I/O performance can be optimized on a perdevice basis. Support for 10 gigabit Ethernet, iSCSI, and Fibre Channel over
Ethernet allow the latest storage technologies to be used. MPIO allows multiple
connections from servers to storage to increase availability and throughput.
11
www.redhat.com
3 Test Methodology
SAS created multiuser benchmarking scenarios to simulate the workload for a typical
Foundation SAS customer. The goal of these scenarios was to evaluate the multiuser
performance of SAS on various platforms. Various-sized mixed analytic workloads were
created to simulate many users utilizing CPU, RAM and I/O resources that SAS
programs heavily use during typical program execution.
A 32-core and 64-core test scenario was executed with the help of Intel, Red Hat and
NEC staff on a 64-core NEC Express5800/A1080a system running Red Hat Enterprise
Linux 5.4z.
3.1 Test Execution
The two scenarios used in the performance efforts included the following:
1) Mixed 32-core workload – mix of CPU- and I/O-intensive jobs:
a. 206 jobs launched during the scenario.
b. PROCS: GLM, LOGISIC, RISK, REG, MEANS, SORT, FREQ, SUMMARY,
SQL.
c. Mixed shorter and longer running jobs.
d. Goal of scenario is to leverage a mix of I/O and CPU resources.
e. This version was designed to run on a 32-core system.
2) Mixed 64-core workload – mix of CPU- and I/O-intensive jobs:
a. 412 jobs launched during the scenario.
b. This version was designed to run on a 64-core system.
c. This version has twice the jobs as the 32-core test.
www.redhat.com
12
Each test scenario consists of a set of SAS jobs run in a multiuser fashion to simulate a
typical SAS batch, SAS® Enterprise Guide® user environment. All SAS jobs are a
combination of computational- and I/O-intensive SAS procedures. Each scenario
launches jobs simultaneously at a set interval to help simulate a multiuser environment
where users come and go from the system. The test is designed to run in a period of 30
to 60 minutes.
3.2 Data
Characteristics of the data for this scenario:
•
SAS data sets and text files.
•
Row counts up to 90 million.
•
Variable counts up to 297.
•
1.1 GB total input/output data.
•
File sizes ranging from several kilobytes to 30 GB in size.
Data volumes were designed to be larger than the hardware cache in order to place
realistic stress on the hardware and operating system file cache.
Note: The SAS benchmarking scenarios are designed to replicate a typical Foundation
SAS customer’s resource use. However, particular customer applications can vary
greatly depending on tasks, PROCs used, data volumes and other customer
requirements.
13
www.redhat.com
4 Performance Results
CPU and I/O utilization patterns for the 32- and 64-core tests were similar. This was
expected, as the 64-core workload and CPU resources were doubled from the 32-core
workload. CPU utilization ranged from 5 to 95 percent due to the nature of the workload
and the way users come and go from the system.
Analysis of the data focused on the workload response and run times of both the entire
scenario and the sum of all job run times used in each scenario. Below is a table showing
the various statistics and how much scalability was achieved going from the 32-core to
the 64-core workload and hardware configuration.
I/O rates MBps
Cores
Sum of all jobs run
time in seconds
Scenario time (all
clock in seconds)
Average job
run time
Peak
Sustained
64
62,769
2,100
145.30
1,818
1,420
32
31,720
2,040
146.85
1,158
885
The following features contributed to the performance and scalability of the configuration:
• Intel Advanced Programmable Interrupt Controller (APIC) for optimized NonUniform Memory Access (NUMA)
• Support for Red Hat Enterprise Linux on the new multi-socket NEC
Express5800/A1080a server
• NEC NUMA aware BIOS —significant performance gains
• Red Hat Enterprise Linux processor scheduler — automatically optimizes SAS
application processes
• Intel Turbo Boost Technology and Intel Hyper-Threading Technology —improved
scalability.
www.redhat.com
14
Figure 4: CPU Utilization (64-Core, 256 GB, XFS)
Figure 5: CPU Utilization (32-Core, 128 GB, XFS)
15
www.redhat.com
Figure 6: I/O Throughput (64-Core, 256 GB, XFS)
www.redhat.com
16
Figure 7: I/O Throughput (32-Core, 128 GB, XFS)
Figure 8: Comparison of 64-Core & 32-Core I/O
17
www.redhat.com
4.1 Performance Effects of NUMA
In general, servers with a Non Uniform Memory Architecture (NUMA) configuration,
whose BIOS and operating system recognize the hardware layout perform best. Memory
allocation policies with NUMA tend to try an allocate on the node where the process is
scheduled.
When the environment (BIOS/OS) does not properly support the NUMA configuration or
has it disabled, memory is typically interleaved across all NUMA nodes. While
performance may be more predictable, most of the memory accesses are on a remote
node which increases latency. Memory accesses to the remote nodes occur:
• 3/4 of the time for the 4 socket 32 core configuration
• 7/8 of the time for the 8 socket 64 core configuration
Since not all nodes are guaranteed to be directly connected to each other, the latency
cost depends on how many nodes (hops) the memory request must travel through. The
4 socket 32 core configuration provides single hop connectivity to each node. In the 8
socket configuration, half of the nodes are a single hop away while the other half require
two hops.
• For 4-node/32-core configuration there is only a 1-hop latency. Thus, there was a
performance improvement of 11% (hw cost of 1-hop=X) if the scheduler can use
local memory for the 216 jobs.
• For the 8-node/64-core configuration there is a variable number of hops (1-hop at
X, and 2-hops Y). Thus the performance improvement rose to over 50% by
allowing the scheduler to localize a good part of the memory references running
up to 432 jobs at the same wall clock time that it took to run 1/2 the jobs on the 4node/32-core configuration. Without NUMA in the OS, there would be a significant
degradation in scaling.
The importance of NUMA support in the 8 socket 64 core configuration is shown below.
Without NUMA enabled, the higher ratio of remote accesses coupled with the multiple
hops reduce the performance throughput significantly.
www.redhat.com
18
Figure 9: Performance Effects of NUMA
19
www.redhat.com
4.2 RHEL Observations & Tuning
•
•
•
•
•
•
•
•
RHEL 5.4z used for testing and recommended for any Intel Xeon processor 7500
series-based system.
RHEL tuning best practices applied using Ktune*
SELinux and unneeded services were disabled via chkconfig and sysctcl
parameters
NUMA (non-uniform memory access) is RHEL 5 default on NEC and all systems
based on Intel processors 6000 and 7000 series.
Check BIOS to ensure NUMA is enabled or disable memory interleave
The I/O output from 8-Socket server exceeded the I/O capacity of test storage
configuration. Server utilization would have been significantly greater with more
storage controllers.
o 1.78 GB/sec peak and 1.39 GB/sec sustained throughput for 64-Core runs
o 1.13 GB/sec peak and 885 MB/sec sustained throughput for 32-Core runs
Red Hat Enterprise Linux “Tunable” I/0 stack is essential to SAS performance
o Tuned read-ahead on Logical Volume Manager (LVM) devices adjusted to
8192 bytes
o Standard blockdev tool to adjust for large sequential access to
LVM/filesystem
XFS is best suited for large sequential I/O in the SAS Business Analytics workload
o 30% improvement in performance over ext3
o Significant reduction is system CPU resources thus providing more
compute cycles for more SAS jobs
o Support for larger file extents, essential for very large files and provides
significantly less file fragmentation and faster file deletion. SAS creates
and deletes lots of files.
www.redhat.com
20
4.3 SAS Tuning Guidelines
Follow best practices for configuring I/O:
•
•
•
•
Split permanent SAS data files, SAS WORK files, and SAS UTILLOC files into
separate file systems
Insure you have enough IO bandwidth to support the SAS application
requirements
Set the IO transfer unit size on storage array to match SAS BUFSIZE value
Increase SAS BUFSIZE value if you are doing large volumes of IO
21
www.redhat.com
5 Conclusions
The SAS 9.2 mixed analytic workload running on Red Hat Enterprise Linux results
highlight:
• Test results for both computational and mixed analytic benchmarking scenarios.
• Linear scalability when workload and CPU resources doubled.
• 32- and 64-core scenarios executed.
• 206 and 412 job mixed computation and I/O-intensive scenario run in 35 minutes
or less.
• Sustained I/O rates of over 1.4 gigabytes per second during 64-core workload.
• Peak I/O rates over 1.8 gigabytes per second during 64-core workload.
• Sustained I/O between 17 and 23 MBps per core during test scenario execution.
www.redhat.com
22