Cluster Documentation| Johanna-Laina Fischer (jfischer)
Log On: (ssh)
enter password
(will be in home directory)
Daily Commands:
condor_q | less
jobs in que, if held or running, idle jobs, size, when submitted,
who submitted them, name of job, id for job
to exit press q
df –h
partition memory
df --total
total memory
man df for total commands
who is logged in, when, where
sam tests :
CE: Computing Element
SRMv2: Storage Element
Bockjoo Kim: CMS software, HyperNews at UF
CMS Software Problems
Samir Needs specific version, email Bockjoo
Yujun Wu: Network Stuff, Systems Analyist? at UF
Network Problems, usually FIT's problem, but he can help
HyperNews: (Mailing List)
Grid Problem
OSG Storage
Mailing Lists
What Things Mean:
BestMan: Berlkeley Storage Manager, allows communication with local
storage from the grid
PhEDEx: Physics Experiment Data Exports, transfer of data files,
handles anything we import or export out of cluster
Sam Tests: tells us what is working/not working, tell us if CRAB
jobs will run or not tells us about PhEDEx (able to transfer something
into/out of
gratia: Shows us use of cluster
NAS: Network Attached Storage, machine that is attached to the
network that has storage on it, not with the grid, storage over
GUMS: Grid User Management System, a site tool for resource
Authorization that addresses the function of mapping grid
certificates to local identities
NFS: Network File System, allows a user on a client computer to
access files over a network in a manner similar to how local
storage is accessed
RAID: Redundant Array of Inexpensive Disks, technology that allows
computer users to achieve high levels of storage reliability from
low-cost and less reliable PC-class disk-drive components, via the
technique of arranging the devices into arrays for redundancy
Most common RAIDs:
RAID 0: Striping feature, the data will be split block
by block between the two hard disks.
RAID 1: Mirroring gives, as with striping this setup
uses two hard disk drives to produce a single logical
drive. Mirroring gives added security for your data at
the cost of storage space.
RAID 5: (we use) Uses Mirroring and/or Striping,
provides a very redundant fault tolerance in addition to
performance advantages allowing data to be safeguarded
while only sacrificing the equivalent of one drive's
space. RAID-5 requires at least three hard drives of the
same size
RAID 6: (we use) Data is striped across several physical
drives and dual parity is used to store and recover
data. Minimum of 4 disks. Usable capacity is always 2
less than the number of available disk drives in the
RAID set.
RAID 10: Combines RAID 0 striping and RAID 1 mirroring.
This level provides the improved performance of striping
while still providing the redundancy of mirroring.
Raid Efficiency (Table):
DD: Programs to test the speed of the cluster, more actually how
long stuff takes, sends packets to write and read documents and
IPerf: more theoretically what we should be capable of, setup
client and server, measures capabiltiy between them, can do it on
two different protocals
TCP: better for sending a document (makes sure all pieces get
UDP: doesn't check if its getting there okay, lets us know
jitter (bad, dont want jitter, variation and time of packets
arriving) and packets lost
Registering Proxy every month
on twiki?
Dr. Hohlmann Twiki account
March 23, 2010
Change Tau’s operating system from Red Hat to Ubuntu:
Ctl+Alt+F1 “switches between windows” (used for “out of range”
error message)
Backup Samir’s CMS files:
Backup: tar cvf /tmp/CMSBackup.tar \
/home \
/var/spool/mail \
/etc/mail \
/etc/passwd \
/etc/shadow \
Restore: cd /
tar xvpf /tmp/backupaccount.tar
ls / - top folder
boot, where things boot
lib, shared libraries
home, your home directory
find command: find [/home (directory you want to find things in] –name
“*[name of thing you are looking for]*”
asterisk use if you do not know the name that well (almost like a
keyword search)
man [command name]: manual for commands, how to work them, flags
help: list of commands
scp CMSBackup.tar
if error “temporary failure in name resolution”, means no internet
For Samir’s Backup:
tar cvzf /tmp/CMSBackup.tar.gz \
/home \
/var/spool/mail \
/etc/mail \
/etc/passwd \
/etc/shadow \
scp CMSBackup.tar.gz
To shutdown in Root:
shutdown –a –t time now
March 24, 2010
/dev – devices
list of things, usb, partitions
plug in drive
find which got “renamed”
if mnt/usb does not exist
mkdir mnt/usb
in /mnt/usb
copy CMS file by: cp CMSBackup.tar.gz
on Tau’s monitor, if signal is not found,
Samir’s CMS files backed up onto external hard drive (Seagate)
Cluster Fiesta! – Chicken Enchiladas
March 25, 2010
Redhat to Ubuntu
CD in first, then boot up computer
Original CD had problems
Created new CD:
download from:
How to burn to a CD, do next week
*to reboot computer if already logged on, type reboot in the command line
April 1, 2010
*Installing Ubuntu*
Ubuntu CD into drive
To reboot, Ctl+Alt+Del
F2 from startup continually - enter setup screen
Priority to boot from CD/BIOS
Enter password:
What every system admin should know:
April 8, 2010
*Installing Ubuntu take 2*
Computer does not recognize CD, recognized Windows CD
CD would boot on Xenia’s laptop
Problem – could be that Ubuntu is on a DVD, Tau too old to recognize DVD
Solution – we need to burn a CD! However, the downloader we already have
is for a CD, no need to do extra searching :D
?Wipe computer – delete partition (Patrick’s magical CD)?: did not need to
Burned a CD and checked disk integrity (any problems on disk: no errors
Ubuntu on machine
Next Task is to put Samir’s CMS files onto computer again:
In Terminal
tar xzvf CMSBackup.tar.gz
April 13, 2010
Need to put Samir’s CMS stuff on the computer, errors occurred
Put CMSuser onto Tau:
name: cmsuser
Tracks onto Tau: John Stevens will come Wednesday at 1:00 pm
April 14, 2010
Reinstalled Ubuntu, problems with partitions
April 15, 2010
ReReinstalled Ubuntu
Cluster Fiesta Questions for Patrick:
Map of the Cluster
• How to calculate Wall hours:
• on Condor
• Condor_userprio-allusers
• Only since installing new version of condor’
• On Gratia
• Rocks: Appliances to auto install nodes and post install scripts
• Where are they
• See Xenia’s Node Changes
• Would they ever need to be fixed or updated
• See Xenia’s Node Changes
• Basic Hardware Diagram: where? up to date?
• Patrick’s Presentation (April, Up to date)
• Diagram of workings - where, up to date
• Patrick sent (Wiki?)
• What is XFS
• In kernel, only if it breaks will it need breaking
• High-performance journaling file system
• Good for large files
• 8 billion Giga bytes
• Max volume size is 16^
• Needed for PhEdeX
• Distributed vs. Parallel computing: based off message passing
• We have distributed: (one CPU or user divides into multiple CPUs
but they do not rely on each other, no “talking’)
• Parallel: two cores doing two or more things while talking to
each other, pass messages (multiple CPU’s: rely on each other,
talk to each other)
• What are universes? How many? Difference between them?
• Standard: No one really uses it
• Periodic saving of jobs
• Remote System Calls: If we don’t have a very good network
file system (“Condor internal network thing”) we have NFS
so we don’t have to worry.
• Vanilla: individual Grid job
• Any UNIX executable
• No Condor Checkpointing
• Grid: Don’t need to worry, users worry
Stats for Hardware: Companies/Good Numbers/Meshing with other systems
• RAID: We have documentation
• NAS: must be designed, find out what we need get from Silicon
Mechanics, they have configuring pages
• Video Cards: We have documentation
• Monitors: We have documentation
• Mice: We have documentation
• Mother Board: We have documentation
• Key Boards: We have documentation
• RAM: We have documentation
• CPU: We have documentation
• Nodes: We have documentation
• Other…
Software Packages, will we need more (updates)
• Updates on Condor, Rocks OS
April 20, 2010
Sudo – run as root with root permissions
Problem? – Tau’s root had no permissions.
Changed account permissions after sudo for root did not work to move
Samir’s CMS Backup files onto Tau failed.
Tried to reboot, got “Grub” – something wrong with Ubuntu? Grub file
Used Patrick’s magic “wiping disk”, going to need to reinstall Ubuntu
Cannot put Tracks onto Tau until Ubuntu is finally working on Tau
Patrick’s advice: chuck Tau, too old, too problematic
Need to get a CERN account: (
CMS account?
DOE account?
April 22, 2010 (last day of work for semester)
Magic wiping disk:
To do for next semester:
Ubuntu: Grub?
Change user allowances
CMSuser account
Samir's Data onto Tau
May 1, 2010(Cluster Fiesta)
Cluster Fiesta Questions for Patrick:
Map of the Cluster
How to calculate Wall hours
Rocks: Appliances to auto install nodes and post install
Where are they
Would they ever need to be fixed or updated
Basic Hardware Diagram - where, up to date
Diagram of workings - where, up to date
What is XFS?
Distributed vs. Parallel computing?
What are universes? How many? Difference between them?
Stats for Hardware: Companies/Good Numbers/Meshing with other
Video Cards
Mother Board
Key Boards
Software Packages, will we need more (updates)
From whom or where
What kind (i.e. size, capabilities…)
Download PDF