Cluster Documentation| Johanna-Laina Fischer (jfischer)
Log On: (ssh) logonname@uscms1.fltech-grid3.fit.edu
enter password
(will be in home directory)
Daily Commands:
condor_q | less
jobs in que, if held or running, idle jobs, size, when submitted,
who submitted them, name of job, id for job
to exit press q
top
df –h
partition memory
df --total
total memory
man df for total commands
who
who is logged in, when, where
sam tests : http://dashb-cmssam.cern.ch/dashboard/request.py/latestresultssmry?siteSelect3=T3&serviceT
ypeSelect3=vo&sites=T3_US_FIT&services=CE&services=SRMv2&tests=1301&tests=
133&tests=111&tests=6&tests=1261&tests=76&tests=64&tests=20&tests=281&test
s=882&exitStatus=all
CE: Computing Element
SRMv2: Storage Element
gratia:
http://myosg.grid.iu.edu
Contacts:
Bockjoo Kim: CMS software, HyperNews at UF
CMS Software Problems
Samir Needs specific version, email Bockjoo
bockjoo@phys.ufl.edu
Yujun Wu: Network Stuff, Systems Analyist? at UF
Network Problems, usually FIT's problem, but he can help
yujun@phys.ufl.edu
HyperNews: (Mailing List)
Grid Problem
OSG Storage
Mailing Lists
What Things Mean:
BestMan: Berlkeley Storage Manager, allows communication with local
storage from the grid
PhEDEx: Physics Experiment Data Exports, transfer of data files,
handles anything we import or export out of cluster
Sam Tests: tells us what is working/not working, tell us if CRAB
jobs will run or not tells us about PhEDEx (able to transfer something
into/out of
site)
gratia: Shows us use of cluster
NAS: Network Attached Storage, machine that is attached to the
network that has storage on it, not with the grid, storage over
network
GUMS: Grid User Management System, a site tool for resource
Authorization that addresses the function of mapping grid
certificates to local identities
NFS: Network File System, allows a user on a client computer to
access files over a network in a manner similar to how local
storage is accessed
RAID: Redundant Array of Inexpensive Disks, technology that allows
computer users to achieve high levels of storage reliability from
low-cost and less reliable PC-class disk-drive components, via the
technique of arranging the devices into arrays for redundancy
Most common RAIDs:
RAID 0: Striping feature, the data will be split block
by block between the two hard disks.
RAID 1: Mirroring gives, as with striping this setup
uses two hard disk drives to produce a single logical
drive. Mirroring gives added security for your data at
the cost of storage space.
RAID 5: (we use) Uses Mirroring and/or Striping,
provides a very redundant fault tolerance in addition to
performance advantages allowing data to be safeguarded
while only sacrificing the equivalent of one drive's
space. RAID-5 requires at least three hard drives of the
same size
RAID 6: (we use) Data is striped across several physical
drives and dual parity is used to store and recover
data. Minimum of 4 disks. Usable capacity is always 2
less than the number of available disk drives in the
RAID set.
RAID 10: Combines RAID 0 striping and RAID 1 mirroring.
This level provides the improved performance of striping
while still providing the redundancy of mirroring.
Raid Efficiency (Table):
http://www.pcguide.com/ref/hdd/perf/raid/levels/comp-c.html
DD: Programs to test the speed of the cluster, more actually how
long stuff takes, sends packets to write and read documents and
times
IPerf: more theoretically what we should be capable of, setup
client and server, measures capabiltiy between them, can do it on
two different protocals
TCP: better for sending a document (makes sure all pieces get
there)
UDP: doesn't check if its getting there okay, lets us know
jitter (bad, dont want jitter, variation and time of packets
arriving) and packets lost
Registering Proxy every month
on twiki?
Dr. Hohlmann Twiki account
March 23, 2010
Change Tau’s operating system from Red Hat to Ubuntu:
Ctl+Alt+F1 “switches between windows” (used for “out of range”
error message)
Backup Samir’s CMS files:
http://www.grape-info.com/doc/linux/root/backup.html
Backup: tar cvf /tmp/CMSBackup.tar \
/home \
/var/spool/mail \
/etc/mail \
/etc/passwd \
/etc/shadow \
/etc/group
Restore: cd /
tar xvpf /tmp/backupaccount.tar
ls / - top folder
boot, where things boot
lib, shared libraries
home, your home directory
find command: find [/home (directory you want to find things in] –name
“*[name of thing you are looking for]*”
asterisk use if you do not know the name that well (almost like a
keyword search)
man [command name]: manual for commands, how to work them, flags
help: list of commands
scp CMSBackup.tar jfischer@uscms.fltech-grid3.fit.edu:~
if error “temporary failure in name resolution”, means no internet
connection
For Samir’s Backup:
tar cvzf /tmp/CMSBackup.tar.gz \
/home \
/var/spool/mail \
/etc/mail \
/etc/passwd \
/etc/shadow \
/etc/group
scp CMSBackup.tar.gz jfischer@uscms.fltech-grid3.fit.edu:~
To shutdown in Root:
shutdown –a –t time now
March 24, 2010
/dev – devices
list of things, usb, partitions
plug in drive
find which got “renamed”
mount/dev/devicename/mnt/usb
if mnt/usb does not exist
mkdir mnt/usb
in /mnt/usb
copy CMS file by: cp CMSBackup.tar.gz
on Tau’s monitor, if signal is not found,
Ctl+Alt+F1
Samir’s CMS files backed up onto external hard drive (Seagate)
Cluster Fiesta! – Chicken Enchiladas
http://www.foodnetwork.com/recipes/tyler-florence/chickenenchiladas-recipe/index.html
March 25, 2010
Redhat to Ubuntu
CD in first, then boot up computer
Original CD had problems
Created new CD:
download from: http://www.ubuntu.com/GetUbuntu/download
How to burn to a CD, do next week
https://help.ubuntu.com/community/BurningIsoHowto
*to reboot computer if already logged on, type reboot in the command line
April 1, 2010
*Installing Ubuntu*
Ubuntu CD into drive
To reboot, Ctl+Alt+Del
F2 from startup continually - enter setup screen
Priority to boot from CD/BIOS
Enter password:
What every system admin should know:
http://www.cyberciti.biz/tips/top-linux-monitoring-tools.html
April 8, 2010
*Installing Ubuntu take 2*
Computer does not recognize CD, recognized Windows CD
CD would boot on Xenia’s laptop
Problem – could be that Ubuntu is on a DVD, Tau too old to recognize DVD
Solution – we need to burn a CD! However, the downloader we already have
is for a CD, no need to do extra searching :D
?Wipe computer – delete partition (Patrick’s magical CD)?: did not need to
do
Burned a CD and checked disk integrity (any problems on disk: no errors
found
Ubuntu on machine
Next Task is to put Samir’s CMS files onto computer again:
In Terminal
cd /media/BACKUPDRIVE
tar xzvf CMSBackup.tar.gz
April 13, 2010
Need to put Samir’s CMS stuff on the computer, errors occurred
Put CMSuser onto Tau:
name: cmsuser
password:
Tracks onto Tau: John Stevens will come Wednesday at 1:00 pm
April 14, 2010
Reinstalled Ubuntu, problems with partitions
April 15, 2010
ReReinstalled Ubuntu
Cluster Fiesta Questions for Patrick:
Map of the Cluster
• How to calculate Wall hours:
• on Condor
• Condor_userprio-allusers
• Only since installing new version of condor’
• On Gratia
• Rocks: Appliances to auto install nodes and post install scripts
• Where are they
• See Xenia’s Node Changes
• Would they ever need to be fixed or updated
• See Xenia’s Node Changes
• Basic Hardware Diagram: where? up to date?
• Patrick’s Presentation (April, Up to date)
• Diagram of workings - where, up to date
• Patrick sent (Wiki?)
• What is XFS
• In kernel, only if it breaks will it need breaking
• High-performance journaling file system
• Good for large files
• 8 billion Giga bytes
• Max volume size is 16^
• Needed for PhEdeX
• Distributed vs. Parallel computing: based off message passing
• We have distributed: (one CPU or user divides into multiple CPUs
but they do not rely on each other, no “talking’)
• Parallel: two cores doing two or more things while talking to
each other, pass messages (multiple CPU’s: rely on each other,
talk to each other)
• What are universes? How many? Difference between them?
• Standard: No one really uses it
• Periodic saving of jobs
• Remote System Calls: If we don’t have a very good network
file system (“Condor internal network thing”) we have NFS
so we don’t have to worry.
• Vanilla: individual Grid job
•
•
• Any UNIX executable
• No Condor Checkpointing
• Grid: Don’t need to worry, users worry
Stats for Hardware: Companies/Good Numbers/Meshing with other systems
• RAID: We have documentation
• NAS: must be designed, find out what we need get from Silicon
Mechanics, they have configuring pages
• Video Cards: We have documentation
• Monitors: We have documentation
• Mice: We have documentation
• Mother Board: We have documentation
• Key Boards: We have documentation
• RAM: We have documentation
• CPU: We have documentation
• Nodes: We have documentation
• Other…
Software Packages, will we need more (updates)
• Updates on Condor, Rocks OS
April 20, 2010
Sudo – run as root with root permissions
Problem? – Tau’s root had no permissions.
Changed account permissions after sudo for root did not work to move
Samir’s CMS Backup files onto Tau failed.
Tried to reboot, got “Grub” – something wrong with Ubuntu? Grub file
corrupted?
Used Patrick’s magic “wiping disk”, going to need to reinstall Ubuntu
Cannot put Tracks onto Tau until Ubuntu is finally working on Tau
Patrick’s advice: chuck Tau, too old, too problematic
Need to get a CERN account: (http://it-dep.web.cern.ch/it-dep/compusage/#New_people_at_CERN_without_a_current_CERN_computing_account)?
CMS account?
DOE account?
April 22, 2010 (last day of work for semester)
Magic wiping disk:
To do for next semester:
Tau:
Ubuntu: Grub?
Change user allowances
CMSuser account
Tracks
Samir's Data onto Tau
May 1, 2010(Cluster Fiesta)
Cluster Fiesta Questions for Patrick:
Map of the Cluster
How to calculate Wall hours
Rocks: Appliances to auto install nodes and post install
scripts
Where are they
Would they ever need to be fixed or updated
How?
Basic Hardware Diagram - where, up to date
Diagram of workings - where, up to date
What is XFS?
Distributed vs. Parallel computing?
What are universes? How many? Difference between them?
`
Stats for Hardware: Companies/Good Numbers/Meshing with other
systems
RAID
NAS
Video Cards
Monitors
Mice
Mother Board
Key Boards
RAM
CPU
Nodes
Other…
Software Packages, will we need more (updates)
From whom or where
What kind (i.e. size, capabilities…)
Download PDF