Using Linstat

Linstat is the SSCC's primary Linux computing cluster. Linstat combines familiar statistical software like Stata, SAS, R, and Matlab with the power of Linux, making it ideal for jobs that require more memory or computing time than Winstat can provide. Linstat also gives you access to the SSCC's HTCondor flock, where you can run multiple jobs at the same time.

Learning to run jobs on a Linux server is probably easier than you think. If you're new to Linux, be sure to read the section Getting Started on Linstat. Veteran Linux users can probably stop reading when they reach that point, but should be sure to read the sections before that which describe some of the unique features of Linstat.

To log in to Linstat you'll use your SSCC username (typed in lower case) and password. If you've forgotten your password, you can reset it here.

If you are outside the United States please read Connecting to Linstat from Outside the United States.

Connecting to Linstat

How you'll connect to Linstat depends on what kind of computer you're connecting from:

Windows PCs or Winstat

If your computer runs Windows, we suggest you connect using a program called X-Win32 (though there are many fine alternatives). X-Win32 is already installed and configured on Winstat, so one option is to log in to Winstat and run X-Win32 there. Alternatively, you can download and install a pre-configured version of X-Win32 from the SSCC web site. Simply download the installation file and then double-click on it.

Download X-Win32 from the SSCC

You'll be asked to log in because X-Win32 is only licensed for UW faculy, staff, and students. Just give your usual SSCC username and password.

When you run X-Win32 it will place an icon in the lower right corner of your screen: The X-Win32 Icon

Click on the icon once and choose Linstat.

For more details, including how to set up a connection to a particular Linstat server, see Connecting to SSCC Linux Computers using X-Win32.

If you are not on the UW-Madison campus you must establish a VPN connection to campus before using X-Win32.

Macs or Computers running Linux

Macs and Linux computers have client programs for connecting to Linux servers installed by default. Simply start a Terminal program (on a Mac it will be found under Applications, Utilities) and then type:

ssh -Y username@linstat.ssc.wisc.edu

username should be replaced by your SSCC username. If your username on your computer is the same as your SSCC username, you can leave it out (ssh -Y linstat.ssc.wisc.edu). If you are plugged into the wired network in the Sewell Social Sciences Building you can leave out the domain (ssh -Y linstat).

For more details, including how to connect to a particular Linstat server, see Connecting to Linstat from a Mac.

The Linstat Cluster

When you connect to Linstat, you'll be directed to the least busy of the five Linstat servers (linstat1, linstat2, linstat3, linstat4, and linstat5) automatically. This will spread users among the five servers and help avoid situations where one server is much busier than another.

If you are running a long job and need to connect to the same server again to monitor it, log in to Linstat and then type ssh server, where server should be replaced by the name of the server where you started the job. Be sure to note which server you're on when you start a long job. Most people have the server name in their prompt, but if you don't you can find out which server you're using by typing printenv HOST. It's also possible to connect to a specific server directly—the links in the previous section have instructions.

Memory

The Linstat servers have 320GB of RAM (except for Linstat5 which has 384GB), but we've limited the amount any one person can use to 64GB. This will help prevent the servers from running out of RAM, which causes performance and stability problems. Exceptions can be made, so if you need more than 64GB of RAM contact the Help Desk.

/ramdisk

/ramdisk is a special "directory" that is actually stored in RAM, making it extremely fast. The maximum size of /ramdisk is 32GB, and any files that are not in use will be deleted after one hour. /ramdisk can be very helpful for programs that spend a lot of time reading and writing temporary files.

Stata

We have a small number of Stata MP16 licenses, which are ideal for running computationally intensive do files. Do files run using the stata and condor_stata commands will be run using Stata MP16, though some HTCondor servers only have 8 cores and Stata MP16 will automatically adapt accordingly.

SAS

On Linstat, the default directory where SAS stores temporary data sets (the WORK library) is /ramdisk. This increases the speed of data-intensive programs significantly. It also prevents them from slowing down the entire server due to disk I/O bottlenecks.

If you need more than 32GB of temporary space, change the WORK directory to /tmp. You can do so by adding the -work option to your SAS command:

sas -work /tmp myprogram

You'll then be able to use up to 243GB of space (or as much of it as is available at the time). For more details see Running Large SAS Jobs on Linstat.

HTCondor

The SSCC's HTCondor flock contains CPUs and is ideal for running very long statistical programs or for running multiple jobs at the same time. HTCondor can run Stata, SAS, Matlab, R, and Mplus jobs as well as user-written programs. We've written scripts that make submitting jobs to HTCondor very easy—see An Introduction to HTCondor for instructions. (You can also submit Stata jobs to HTCondor flock via the web.)

Mplus

Due to licensing restrictions, Mplus is only installed on Linstat1, Linstat2, and Linstat3, and may only run one job at a time on each server. Because of the unusual way Mplus launches additional terminal sessions you'll need to stay logged in the entire time the program is running. Running Mplus on Linstat has more details.

Getting Started on Linstat

Linux can be intimidating because it just waits for you to type commands without giving you any menus or icons to suggest what you can do. But if all you want to do is run jobs, you can get by with just a couple of Linux commands. Here's how:

Get your job ready using your computer

If you're on Winstat or a Windows PC that logs into the SSCC's PRIMO domain, your Linux home directory is available as the Z: drive, and Linux project directories are the V: drive. They're also available from Macs—see Using SSCC Network Disk Space from Macs. This means you can write your program, manage your files, etc. using the tools you're familiar with and still put the programs and related files on the Linux file system so Linstat can run them.

Put all the files relating to a given project in a single folder (or directory in Linux terminology), then write your programs on the assumption that that folder will be your working directory (i.e. a Stata program should say use datafile, not use z:\research\datafile). If you're only working on a single project then just declare Z: itself that project's "directory."

Command #1: cd

When you log into Linux, your "current working directory" (where you "are" in the file system) starts out as your home directory—what Windows calls Z:. If that's where your project's files are, you can skip directly to running your job. Otherwise you'll need to go to your project's directory using the cd ("change directory") command. If your project's directory is on your Z: drive, type:

cd myProject

Where myProject should be replaced by the name you gave your project's directory.

If your project's directory is inside an official Linux project directory on the V: drive, type:

cd /project/projectName/myProject

A few more points on the Linux file system:

  • Directories are separated using the forward slash (/) rather than the backslash (\).
  • There are no drives or drive letters in Linux. All directories are part of a single tree structure with the "root" of the tree denoted by a slash (/).
  • If a directory path starts with a slash (/), it starts from the root. Thus cd /project means "go to the root directory, then to project underneath that"
  • If a directory path does not start with a slash, it is assumed to start with the current directory and go from there. Thus cd myProject means "go to the myProject directory under the current directory"
  • Linux does not like spaces in file or directory names (you have to put the whole path in quotes if it includes a space)
  • Unlike Windows, Linux is case-sensitive. File and file are two different files.

Command #2: Run Your Program

The command to run your program will depend on the program you want to use. Here are some of the most popular:

Stata

You can start Stata's graphical user interface by typing xstata or xstata-mp. You can also run a do file called mydofile.do in batch mode by typing:

stata -b do mydofile

Alternatively you can submit it to HTCondor with:

condor_stata mydofile

If you run mydofile.do in batch mode or on HTCondor, Stata will automatically log its output in mydofile.log.

SAS

You can start SAS's graphical user interface by typing sas, though it's somewhat clunkier than the Windows version. You can also run a program called myprogram.sas in batch mode by typing:

sas myprogram

R

To run R, simply type R. It does not have a graphical user interface but the commands are the same as in Windows R.

To run an R program in batch mode, type:

R < myprogram.R > myprogram.log

To submit myprogram.R to HTCondor and save the output to myprogram.log, type:

condor_R my program.R myprogram.log

If your job uses multiple processors, type:

condormp_R program.R program.log &

Matlab

You can start Matlab's graphical user interface by typing matlab. To submit myprogram.m to HTCondor and save its output in myprogram.log, type:

condor_matlab myprogram.m myprogram.log

If your job uses multiple processors, type:

condormp_matlab program.m program.log &

Mplus

To run an Mplus job, log into Linstat1, Linstat2, or Linstat3, and type:

mplus myprogram.inp

where myprogram.inp should be replaced by the name of the Mplus program (the .inp file) you want to run.

Linstat has many other programs available (see our software database). See the documentation of the program you're interested in for details on how to run it.

Running Long Jobs

If your job will run for a long time, put it "in the background" by adding an ampersand (&) to the end of the command. For example:

stata -b do mydofile &

This will allow you do other things on Linstat while the job is running, or log off without interrupting the job. Jobs submitted to HTCondor are essentially "in the background" already.

For more information see Managing Jobs on Linstat.

Learning More

While this will get you started, there are several other SSCC Knowledge Base articles you can read to become a more flexible and efficient Linstat user. Managing Jobs on Linstat will teach you how to monitor and manage jobs while they run. An Introduction to HTCondor will teach you more about the SSCC's HTCondor flock and how to use it. Finally, if you really want to make yourself at home in Linux, read the SSCC's Getting Started in Linux. For a full list of articles, visit the Linux section of our Knowledge Base. SSCC staff will also be happy to answer any questions you have about using Linstat and help you solve any problems you run into—just contact the Help Desk.

Last Revised: 3/20/2017