Project Goals
Establish a resource for researchers on campus
with large computing needs.
Help researchers convert their programs to run
on the cluster.
Research performance bottlenecks.
Develop tools to improve the usability of clusters.
The Beowulf Cluster Lab is funded by the National Science Foundation
Major Research Infrastructure Award No. 0321233.
Cluster Specifications
The main Beowulf Cluster (beowulf.boisestate.edu)
61 nodes
122 2.4 GHz Intel Xeon processors
2.4 TB disk space
private Gigabit network
Gigabit connection to the campus backbone
Other clusters:
6 processor developmental cluster (tux.boisestate.edu)
32 processor teaching cluster (onyx.boisestate.edu)
Beowulf Cluster Lab
Cluster Hardware
Compute nodes (about $1400/node for 64 nodes = $90,000)
Tyan i7505 S2665ANF dual-533 MHz FSB
dual 2.4 Ghz Intel Xeon CPUs with 512K Cache 533MHz FSB
2 x 512MB Micron Technology Memory Module 184-pin DIMM PC2100
DDR 266 MHz, unbuffered, non-parity
Samsung SP4002H disk drives, 80GB 7200RPM ATA100
HP Broadcom NetXtreme 5782 Gigabit card
Antec 1080 Plus AMG case with Antec True Power 550W Supply
Master node: same, except with 4GB RAM and SATA drives with RAID
Networking: (about $12,000)
3 x Cisco 3750G 24-port Stacking Cluster Gigabit Switches with redundant
power supply
Facilities: Liebert A/C, power setup to handle up to 300 Amps ($29,000)
Cluster Software
Red Hat Linux 9.0 with custom 2.4.24 -bigmem SMP kernel (Fedora
Core 1 Linux with stock kernel on the cluster used for teaching)
● Portable Batch Scheduling for job scheduling
● Parallel Programming Libraries and Tools
● Portland Group Cluster Development Toolkit
● HPF, Fortran 90, Fortran 77, C, C++
● Parallel graphical debugger
● Parallel graphical profiler
● GNU C, C++ and Fortran 77 compilers and related tool set like ddd
(Data Display Debugger)
● Full suite of other tools available under Linux.
Cluster Setup Experiences
YACI (Yet Another Cluster Installer) was used for automated
installation. The 61- node cluster went from bare disks to
fully operational in 12 minutes! YACI is available from
Larwence Livermore National Lab.
Design choice to go with boxes instead of blades since
cooling boxes is easier and real estate was a relatively
smaller issue.
Evaluated AMD Athlon, AMD Opteron, Intel Xeon for
Performance/Power/Price (PPP) factor to choose Intel Xeons.
Chose to go with a regular PC assembler rather than a
“cluster” company to keep costs down and have more control
of what goes in each node.
Faculty: Amit Jain (Computer Science) and Paul Michaels (Geophysics)
Graduate Students: Kevin Nuss, Hongyi Hu and Mason Vail
Undergraduate Students: Joey Mazzarelli, Brady Catherman,
Luke Hindman, Charles Paulson, Jason Main and Oralee Nudson.
The project uses a model of teaming up computer scientists
with researchers from other fields to create a synergistic
Some applications running on the cluster.
Air Quality Modeling. Paul Dawson (Mechanical Engineering), Kevin
Nuss and Charles Paulson.
Modeling of Ocean Currents. Jodi Mead (Mathematics) and Hongyi Hu.
Waveform Relaxation. Barbara Zubik-Kowal (Mathematics) and Hongyi
Hydraulic Tomography. Tom Clemo (Geophysics) and Kevin Nuss.
Bioinformatics: Bayesian Analysis of Phylogeny. James Smith
(Biology) and Amit Jain.
Basic Seismic Utilities package. Paul Michaels (Geophysics) and Amit
Biologically Inspired Computing. Crowley Davis Research (private
Design Patterns
Projects (contd.)
Clusmon. A comprehensive web-based cluster monitoring
software. (Joey Mazzarelli, Computer Science senior)
Remote Power Control. A cluster of smart power strips to
enable remote hard power on/off, cascaded power on/off
etc. (Brady Catherman, Computer Science junior)
Parallel Shell. A more capable parallel shell for system
administration. (Mason Vail, Computer Science graduate
Clusmon: Cluster Monitor
Clusmon: Cluster Monitor
Remote Power Control
Cluster Statistics
1179 jobs since July, adding up to about 88000 CPU-hours.
Average CPU temperatures: 77F at low load and 100F at full load.
The A/C is set to 65F with tolerance of 4F.
Hardware failures: Extremely low...
One disk drive failed right after installation.
The memory for one node failed.
Only one unscheduled “downtime” in the last three months. The A/C
compressor was cycling more than the factory set limit. As a result, it shut
itself off. The CPU temperatures still remained below 115F after several
hours! (as the air flow was maintained) The cluster was shut down as a
precaution. The solution was to simply set a higher tolerance (4 degrees
instead of 2 degrees)
The experiences gained in this project were used to help
Geophysics set-up a 10 processor cluster and Mathematics a 20
processor cluster.
Further Work
Integrate Beowulf clusters with Condor grids.
Develop a complete catalogue of programs illustrating each
design pattern in PVM and MPI.
Continue to team with researchers to help get their code up and
running on clusters.