Saddleback Install Records & Specifics
Updated June 5, 2018
Specifications
Cluster is installed in ERC room 214 (ERC computer room).
- Head Node - saddleback
- System Board Intel Server System SR1690WB
- Processors: 2 Intel Six-Core Xeon E5645 2.4GHz
- 24GB memory ECC DDR3 1333
- Four 3TB SATA hard drives (one CentOS boot drive, others unpartitioned
data drives, un-raided
- Video on-board (Matrox Graphics MGA G200e card - mga is the driver)
- Optical SATA DVD+/- RW internal drive
- Network card dual-port Gigabit ethernet for commodity communication
- Infiniband card 40Gb/s QDR single QSFP port
- Network card 10GbE for communication to the CSU/Atmos network
- Subordinate Nodes - sb1, sb2...
- Server System SR1680MV (split-node chasis : 2 nodes per chasis)
- 4 Six-Core Xeon X5675 3.06GHz processors per node
- Total of 192 cores
- 48GB memory ECC DDR3 1333 per node
- 1TB drive per node
- Gigabit ethernet per node
- Infiniband card 40Gb/s QDR single QSFP per node
- Additional Hardware
- 18-port Mellanox InfiniScale IV infiniband switch (QSFP)
- Eight 2M 40Gb/s QDR Infiniband copper cables with QSFP connectors
- 7M 10GBase-CR Twinax copper cable with SFP+ connectors for connect to
10GigE switch to outside
- Warranty 5-year parts & labor server warranty / free lifetime tech
support
- Asante 10/100/1000 commodity switch (stolen from old XServe)
How to Run Jobs
Saddleback jobs must be run in batch mode using the Torque
job manager which uses only the compute nodes sb1, sb2.... Small interactive
debug runs may be run on 12 cores or less on the master, saddleback, including
jobs run under TotalView. If you have a job to run under TotalView that
will use more cores, contact Kelley.
Home Directories :
/Users/username
Other Directories :
/usr/local/*
/temp
/data
/Models
Disk Storage :
User disk (boot drive) : 2TB shared by the OS and others
/disk2 : 3TB
/disk3 : 3TB
/pond : 82.5 TB raid
/pool : 164 TB raid
Program Locations :
PGI fortran : /usr/local/pgi/linux86-64/14.7/bin/pgf95, pgf90, pgf77...
Totalview : /usr/local/toolworks/totalview/bin
netCDF : /usr/local/...
Shell Environment & Paths : default shell is csh
setenv PGI /usr/local/pgi
setenv PGRSH ssh
setenv LD_LIBRARY_PATH /usr/local/pgi/linux86-64/14.7/libso
include these in your path statement:
/usr/local/bin
/usr/local/pgi/linux86-64/14.7/bin
/usr/local/pgi/linux86-64/2014/mpi/mvapich/bin
Running Your Program :
Copy the commands below into a file (ex: test.pbs)
#!/bin/sh
#
#PBS -N Test1
#PBS -q batch
#PBS -l nodes=4:ppn=24
#PBS -d /Users/username/working directory
#PBS -M username@kiwi.atmos.colostate.edu
#PBS -l walltime=00:20:00
#PBS -m abe
mpirun -np 96 mpi_code
In order, these lines mean
- Define the shell.
- This indicates the name of the job. In this case, Test1
- This specifies the queue we use. Currently, we have only one: "batch".
- This requests 4 nodes with 24 processors each for a total of 96.
- This specifies the working directory to use.
- This indicates the email to send notifications to - note you need to have 'kiwi' in there.
- This is the maximum time your code will run : here 20 minutes.
- This instructs Torque to send you email when: the job aborts (a), it starts (b), and when it ends (e).
- This is the command for the job itself. Be sure that np (e.g.96) is the product for nodes * ppn
Finally, you run your program with the following command:
> qsub test.pbs
This is the qsub command documentation.
Debugging with TotalView : Default shell is csh. TotalView can only
be run on 1 node: the master. If you need more than 6 cores please talk to
Kelley or Mostafa for some non-batch reservation time.
TotalView variables:
setenv TVROOT /usr/local/toolworks/totalview
setenv TOTALVIEW /usr/local/toolworks/totalview/bin/totalview
setenv TVDSVRLAUNCHCMD ssh
setenv LM_LICENSE_FILE /usr/local/toolworks/license.dat:/usr/local/pgi/license.dat
Launch totalview with:
mpirun -tv -np 1 program (where 1 is the number of processors in this
example)
Note that with the pgi compiler, doing both -g and -O0 will allow you to set
breakpoints on almost any line. If you use just -g then some of the lines
don't seem to be available for breakpoints.
If you want to specify nodes, use this command:
mpirun -tv -np 32 -hostfile myhostfile program
Where myhostfile contains the name of the nodes you want to use.
Generating Your Authorized Keys for the Nodes :
If you get an error indicating you do not have permission to run on the nodes
(sb1, sb2, ...), you probably have not generated your keys yet. Do this:
> ssh-keygen (use default filename, do not enter a passphrase when asked)
> cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
Templates for Some of Our Models :
Don has put together a PDF file of how to port some of our models to
saddleback, namely the GCRM, SAM, VVM, SPCAM and BUGS.
Click here for Don's notes.
Kelley's Install Notes
See the equipment list for serial numbers, etc. Winning bid was
Nor-Tech out of
Burnsville, MN.
Contact Bob Dreis.
saddleback IP = 129.82.48.243
Warranty info :
http://prd1warser.cps.intel.com and enter serial number:
master : azgd1150033
sb1/sb2 : BZMY93400408
sb3/sb4 : BZMY93500179
sb5/sb6 : BZMY93600292
sb7/sb8 : BZMY93600343
Operating system: CentOS v7 (upgraded by Mostafa May 2018)
Additional Software
- Infiniband support from
OpenFabrics libraries.
- Batch system: Portable Batch System (PBS).
- Portland Group (PGI) Cluster Development Kit (CDK) for 2 users
- Totalview Debugger 64-token Totalview Team license
RAID Install
Chasis: SuperMicro SuperChasis 847E26-R1400LPB RAID chasis configured for
Infiniband.
Vendor is
Quick-800 from La Mesa,
CA. Contact AJ Jackson.
Disks:
2 300GB Intel SSD 320 Series SATA2 drives for cache and logging from Quick-800
1 160GB 7.5K rpm SATA drive for OS from Quick-800
36 3TB Seagate Constellation enterprise drives.