The College of Engineering HPC Cluster is currently operating with 1,554 processor nodes. The HPC Cluster is used by several of OSU's Engineering schools for various research and graduate projects in parallel computing. The CoE HPC Cluster is rated at 15.9 TFLOPS.

About the HPC Cluster

The HPC Cluster is a compute server built using the Sun Grid Engine with commodity hardware and freely available software. With more than 1,554 CPU's, the Cluster is ideally suited for Monte Carlo-style computations in which the same program is run many times with different random number seeds. It is also possible to run parallel programs on the Cluster using C and the MPI (Message Passing Interface) library, though the relatively high latency and low bandwidth of the network means only relatively coarse-grained computations can achieve reasonable speedup on many processors.

The Hardware

As of September 2012, the College of Engineering HPC cluster is utilizing 204 systems with 1,554 cores providing 15.9 TFLOPS performance. The operating system is RedHat Enterprise Linux 6 (RHEL6). Each systems is connected to the public network via gigabit ethernet. A second MPI communication network is built on dedicated gigibit hardware. Most of the new Dell R610's also utilize a Mellanox InfiniBand network connection.

The systems are a hetergenous mix of dual processor servers including Dell PowerEdge 1850 & R610, HP Proliant DL145 and Sun SunFire X4200 rack-mounted servers.

Submit Hosts (submit-em64t-01 - 02 - 03)

Processor
Dual Processor 6 Core 2.67 Ghz Intel Xeon with 12288 KB cache
Motherboard Dell Power Edge R610
Memory 48 GB SDRAM
Hard Drive 160 GB SATA with Perc6
NIC

4x Broadcom NetXtreme II Gigabit Ethernet Adapter

High Speed Mellanox InfiniBand

 

Submit Host (submit-amd64-01)

Processor Dual Processor 2.6 Ghz AMD64 Opteron with 1024 KB cache
Motherboard HP Proliant DL 385
Memory 2GB SDRAM
Hard Drive 72GB Ultra3 SCSI 
NIC 2x Broadcom NetXtreme Gigabit Ethernet Adapter

 

Compute nodes (exec-em64t-01 -- exec-em64t-20)

Processor Dual Processor 3.4 Ghz Intel Xeon with 2048 KB cache
Motherboard Dell Power Edge 1850
Memory 4GB SDRAM
Hard Drive 10.2 GB Ultra-DMA
NIC

Intel EtherExpress Pro 100

 

Compute nodes (compute-0-1 -- compute-0-9)

Processor Dual Processor 6 Core 2.67 Ghz Intel Xeon with 12288KB cache
Motherboard Dell Power Edge R610
Memory 48 GB SDRAM
Hard Drive 160 GB SATA with Perc6
NIC

4x Broadcom NetXtreme II Gigabit Ethernet Adapter

High Speed Mellanox InfiniBand

 

Compute nodes (compute-1-1 -- compute-1-13)

Processor Dual Processor 6 Core 2.67 Ghz Intel Xeon with 12288KB cache
Motherboard Dell Power Edge R610
Memory 48 GB SDRAM
Hard Drive 160 GB SATA with Perc6
NIC

4x Broadcom NetXtreme II Gigabit Ethernet Adapter

High Speed Mellanox InfiniBand

 

Compute nodes (compute-2-1 -- compute-2-11)

Processor Dual Processor 6 Core 2.67 Ghz Intel Xeon with 12288KB cache
Motherboard Dell Power Edge R610
Memory 48 GB SDRAM
Hard Drive 160 GB SATA with Perc6
NIC

4x Broadcom NetXtreme II Gigabit Ethernet Adapter

High Speed Mellanox InfiniBand

 

Compute nodes (compute-3-1 -- compute-3-12)

Processor Dual Processor 6 Core 2.67 Ghz Intel Xeon with 12288KB cache
Motherboard Dell Power Edge R610
Memory 48 GB SDRAM
Hard Drive 160 GB SATA with Perc6
NIC

4x Broadcom NetXtreme II Gigabit Ethernet Adapter

High Speed Mellanox InfiniBand

 

Compute nodes (compute-4-1 -- compute-4-4)

Processor Dual Processor 6 Core 2.67 Ghz Intel Xeon with 12288KB cache
Motherboard Dell Power Edge R610
Memory 48 GB SDRAM
Hard Drive 160 GB SATA with Perc6
NIC

4x Broadcom NetXtreme II Gigabit Ethernet Adapter

High Speed Mellanox InfiniBand

 

Compute nodes (compute-5-1 -- compute-5-12)

Processor Dual Processor 6 Core 2.67 Ghz Intel Xeon with 12288KB cache
Motherboard Dell Power Edge R610
Memory 48 GB SDRAM
Hard Drive 160 GB SATA with Perc6
NIC

4x Broadcom NetXtreme II Gigabit Ethernet Adapter

High Speed Mellanox InfiniBand

 

Compute nodes (compute-6-1 -- compute-6-27)

Processor Dual Processor 4 Core 2.8 Ghz AMD Opteron 512 KB Cache
Motherboard  Custom
Memory 8 GB SDRAM
Hard Drive 80 GB SATA
NIC

2x nVidia MCP55 Ethernet Adapter

 

Compute nodes (compute-7-1 -- compute-7-29)

Processor Dual Processor 4 Core 2.6 Ghz Intel Xeon with 6144KB Cache
Motherboard  Custom
Memory 8 GB SDRAM
Hard Drive 250 GB SATA
NIC

2x Intel Gigabit Ethernet Adapter

Compute nodes (compute-8-1 -- compute-8-29)

Processor Dual Processor 4 Core 2.6 Ghz Intel Xeon with 6144KB Cache
Motherboard  Custom
Memory 8 GB SDRAM
Hard Drive 250 GB SATA
NIC

2x Intel Gigabit Ethernet Adapter

 

Compute nodes (compute-9-1 -- compute-9-4)

Compute nodes (exec-amd64-01 -- exec-amd64-11)

Processor Dual Processor 2.2 GHz AMD Opteron with 1024 KB cache
Motherboard HP Proliant DL 145
Memory 10GB SDRAM
Hard Drive 40 GB
NIC 2x Broadcom NetXtreme BMC5704 Gigabit Ethernet

 

Compute nodes (exec-amd64-12 -- exec-amd64-22)

Processor Dual Processor 2.8 GHz AMD Opteron with 1024 KB cache
Motherboard Sun SunFire X4100
Memory 10GB SDRAM
Hard Drive 67GB
NIC 4x Intel 82546EB Gigabit Ethernet

 

Compute nodes (exec-amd64-35 -- exec-amd64-47) 

Processor Dual Processor 1.2 GHz Dual-Core AMD Opteron with1024 KB Cache 
Motherboard Sun SunFire X4200 
Memory 8GB SDRAM 
Hard Drive  67GB
NIC

2x Intel 82546EB Gigabit Ethernet & 

2x nVidia CK804 Ethernet 

Software

We are running Redhat Enterprise Linux 6 WS with the Linux 2.6.32 kernel.

The most popular parallel tool around here is mpi. We are using Argonne National Labs mpich2.

Projects

SSSL: Sequential Spatial and Structural Supervised Learning

This NSF-funded projects seeks to develop algorithms for learning to
classify items in sequential, spatial, and relational data. Application
projects include sequence labeling problems in bioinformatics (protein
secondary structure prediction, gene-finding, etc.), sequence labeling
problems in language processing (part-of-speech tagging, shallow
parsing, etc.), and pixel labeling problems in remote sensing (e.g.,
classifying pixels into land cover classes). 

TaskTracer: Task-sensitive user interface for Windows

This NSF-managed project seeks to build a user interface that knows what
tasks you are currently working on and can help you carry out those
tasks. In particular, the system learns to predict your current task
and then provide easy access to relevant documents, email addresses, web
pages, and so on. We use the cluster to develop and test learning algorithms
for this project.

Knowledge-Intensive Learning

This DARPA and NSF-funded project has as its goal to bridge the gap
between knowledge representation and machine learning. Our goal is to
develop a system in which you can describe a learning problem in a
formal knowledge representation system and then the system automatically
formulates a learning system to solve that problem. This involves the
invention of features, selection among candidate features, and
extraction and learning with those features. Our application areas
include (a) modeling the spread of West Nile Virus and (b) predicting
grasshopper infestations in Eastern Oregon, and (c) learning for the
Task Tracer project.

INSECT-ID: Pattern Recognition of Insects for Environmental Modeling and Ecological Science

In this NSF-funded project, we are developing image processing and
learning algorithms for determining the genus and species of selected
classes of insects from image data. We are also constructing a
mechanical/optical device for manipulating and photographing insects.
We are using the cluster to perform the image processing and to develop and
test learning algorithms for this problem. Our two application tasks
are (a) measuring stream health by recognizing stone fly larvae in stream
substrate, and (b) measuring soil biodiversity by recognizing soil
mesofauna in forest soils. 

Pedestrian Evacuation Modeling

As the war against terrorism escalates, office buildings, transportation facilities, and sports arenas become tempting targets. We are developing models of pedestrian motion and the spaces they occupy. We have a microscopic crowd evacuation simulator that moves each individual pedestrian separately. Our goal is to develop a
system capable of updating the positions of 100,000 or more people in real time.