Getting Started in Linux

Table of Contents

  1. Introduction
  2. SSCC Linux Computers
  3. The Linux operating system
  4. Managing Disk Space
  5. Choosing the Proper Linux Computer
  6. Running Jobs
  7. Summary of Commands
  8. Other Sources of Information

1. Introduction

This handbook contains information on using SSCC's Linux computers. It contains introductory topics like getting into the system, organizing your files, and getting out again. It also contains more advanced topics such as setting file permissions and running jobs. This handbook is a companion to the SSCC Member's Handbook, which you should have at hand when learning Linux at SSCC. Many important matters discussed in the SSCC Member's Handbook are referenced by this handbook, but not discussed in detail. This includes such important matters as logging into SSCC systems, selecting and changing passwords, and printing. Other SSCC Knowledge Base articles teach users how to run editors, use SFTP, work with compressed data, etc. Some references to these tasks will be made in this handbook, but for thorough tutorials on these subjects, please see those SSCC Knowledge Base articles, which are available on line by accessing SSCC's Knowledge Base.

2. SSCC Linux Computers

Linstat is the SSCC's cluster of servers running Linux. When you connect to Linstat, you'll be directed to one of the three Linstat servers (linstat1, linstat2 and linstat3) automatically. This will spread users among the three servers and help avoid situations where one server is much busier than another.

Connecting to a Linux computer, logging in, logging out

Linux is designed for remote logins and can be used very successfully from anywhere in the world. To connect to a Linux server you will need a client program capable of using a secure protocol, ideally SSH. X-Win32 is the best choice for PC's. For details on downloading and using X-Win32, see Connecting to SSCC Linux Computers using X-Win32. For other options see the Connecting to Linstat section of Using Linstat.

When you are finished with your login session, be sure to log off. Forgetting puts the entire SSCC computer network at a security risk. To log off, type exit at the Linux prompt.

Passwords

The SSCC Member's Handbook discusses how to select a good password and how to change passwords on Linux. Read the sections Creating a secure password and Changing your password on Linux carefully. It is very important to change the default password assigned to you the first time you log in. The following warning from the SSCC Member's Handbook is worth repeating:

IMPORTANT: SHARING PASSWORDS IS STRICTLY FORBIDDEN
Sharing passwords endangers the security of the entire SSCC network.
DON'T DO IT!

3. The Linux Operating System

Linux is a very powerful, flexible operating system. In a few minutes, it is possible to learn enough to get into the system, run statistical programs like Stata, and get out again. On the other extreme, those who have worked on Linux for years are still learning every day. This reflects both the power and the complexity of the operating system.

How to Formulate a Linux Command

When you log in to a Linux computer, a prompt will appear on the screen, waiting for you to enter a command. At this point you can enter any valid Linux command and the computer will run it.

The syntax of a Linux command is very simple: first, enter the command name, followed by any options and any other parameters. Spaces separate the command name from the options and the options from the parameters. Once the command has been completely formed, press Enter. When you press Enter, the command is executed.

For instance, if you want to know the current date and time, use the date command. Then press Enter. The current date and time will appear, followed by another prompt. Your login session will look like this:

linstat2.ssc.wisc.edu> date
Mon Feb 18 10:52:55 CST 2008
linstat2.ssc.wisc.edu>

When the prompt appears (the prompt here is linstat2.ssc.wisc.edu>), the computer is ready for you to enter another command.

Note that the prompt will vary depending on the machine on which you are working. You can also customize the prompt to be anything you like.

Unlike some other operating systems, Linux is case sensitive. The command date is not the same as the command DATE. You must always use the proper case when running Linux commands. Fortunately, this is simple, as virtually all Linux commands are lower case.

A Few Simple Useful Utilities

Below are some simple, useful commands that you can run right away. Try these:

> cal

displays the calendar for the current month. To see a calendar for the whole year, try:

> cal 1997

In this example, "1997" is a parameter to the command cal: it is telling cal to give information for all of 1997, instead of giving the default information for the current month. Be sure to use the "19" or cal will display the calendar for the year 97, not the year 1997.

> cal 12 1997

displays the calendar for the month of December, 1997. Here, cal is taking two parameters. The first parameter is the month and the second parameter is the year.

> who

displays a list of users currently logged into that computer, also giving the time that the user logged in.

> uptime

This extremely useful command tells the current time, how long the computer has been up, how many users are currently logged on, and how busy the computer has been for the last one, five, and 15 minutes. This is the "load average", the average number of jobs that were waiting to run in that time increment. To understand how to interpret the load average, see the System Load section later in this handbook.

> hostname

displays the name of the computer on which you are working.

> clear

clears your screen and puts a prompt on the top line of the screen.

> lookup Gary Sandefur

The lookup command looks into the UW-Madison student, faculty, and staff information database and displays information about the person you are looking up. In the command above, information about UW-Madison Dean of the College of Letters and Science, Gary Sandefur, will be displayed, as well as information about any other person on campus with these names.

Most of the above commands were simple commands to run. Only one of them required parameters (lookup) and none required options. Later, commands will be introduced that require options to provide important information. The critical point about these commands can be seen from these examples: the command comes first; spaces separates parameters from the command and parameters from each other.

How Linux Stores Files: The Linux File System

All computers store files in some type of file system. These file systems largely resemble each other: individual files are referenced through folders or directories, terms that can be used interchangeably. The term "directory" is preferred by Linux users.

Two features distinguish the Linux file system from Windows:

1. Linux uses a forward slash, instead of a backslash to indicate the existence of a directory. For example, Windows might refer to a file as:

F:\home\r\rdimond\saswork\data

but Linux would refer to a file as:

/home/r/rdimond/saswork/data

The items "home", "r", "rdimond", and "saswork" are all directories, but the names are separated by forward slashes in Linux, not backslashes, as in Windows.

2. Linux does not start a file name with the name of a disk. On a Windows machine, the start of any file name is a disk name, such as C: for the main hard disk or A: for the floppy. Linux attempts to hide disks from the user. For instance, a directory might be called:

/home/r/rdimond

This path name refers to a directory called rdimond. The rdimond directory is in the directory called r; the r directory is in the directory called home; the home directory is in the directory called root, and displayed as a preceding forward slash, the "/" at the beginning of the name. The root directory is the starting directory on Linux, from which all other files and directories are descended. All files and directories on Linux exist at some place relative to the root directory. The full path name of a file always begins with a forward slash, with a reference to the root.

File Names under Linux

File and directory names under Linux are quite freeform. (In this section, we will use the expression "file names" to mean "file or directory names".) All numbers and letters of the alphabet are allowed in file names, as are several special characters such as "." (dot) and "_" (underscore). Linux has no naming regulations, such as the requirement that a dot appear in the name. However, despite having few formal rules, the following guidelines will assist you in working with files.

  • The first character of a file name should be a letter of the alphabet or a number. Do not use a special character, such as a dot, a plus sign or a minus sign. Any of these could lead to difficulties when attempting to manipulate the file or directory.
  • Do not use spaces or tabs in file names.
  • File names with multiple periods such as filename.ext.ext are valid.
  • Keep in mind that Linux is case sensitive: the names outfile, Outfile and OutFile represent three different files. However, it is not wise to create files in which the only difference among names is the case, as this can confuse PCs if you ever map your Linux home directory as a network drive on a PC.
  • Although virtually all file names are legal, there are a few names that should be avoided: core and .rhosts. The system uses the name core for a dump of certain data when a command fails. (If you ever see one of these files in one of your directories, the file can be safely removed.) If you create a file called .rhosts you may unintentionally permit others to access your home directory. Of course, this is an uncommon name, and one that you are not likely to create accidentally.
  • Filenames starting with a period are special files called "hidden files" and will only be displayed in a directory listing if you use ls with the -f or -a option.

File naming conventions are only conventions and are not used to distinguish file type. Some commonly-used conventions are:

 

.do (Stata command files)
.dta (data files stored in Stata format)
.gif (graphics file)
.gz (compressed file)
.htm (Web page)
.html (Web page)
.jpg (graphics file)
.jpeg (graphics file)
.log (SAS or Stata log file)
.lst (SAS listing)
pdf (Adobe pdf file)
.ps (PostScript file)
.sas (SAS source file)
.sas7bdat (data files stored in SAS format)
.sps (SPSS source file)
.tar (archive file)
.tex (TeX file)
.zip (compressed file)
.Z (compressed file)

Home Directories and the Present Working Directory

All user accounts have a part of the file system that is their own. This is called their home directory. When you first log in, Linux makes your home directory your present working directory. Your present working directory is the directory where files and directories will be listed, created, changed, or removed by default, unless you instruct the computer to perform the action in another location (examples to follow, below).

Home directories are located in a subdirectory of the directory called /home. /home consists of a series of directories, one for each letter of the alphabet. Home directories are under the letter of the alphabet corresponding to the first letter of your login name. For instance, the home directory of the user account named swald is at /home/s/swald and the home directory of the user account named mcdermot is at /home/m/mcdermot.

Home directories are the place for you to put your files. You can control access permissions for files in your home directory, allowing others to see files, or to change files, or denying them these privileges.

Manipulating The File System

The Linux tools used most often by users are the commands that allow users to manipulate files and directories. These commands include:

ls display the tables of a directory
pwd display the full path name of the present working directory
cd change present working directory
mkdir create a new directory
rmdir remove a directory
cp copy a file
mv move or rename a file
rm remove a file

 

Changing Your Present Working Directory

To determine your present working directory, use the pwd command:

> pwd
/home/r/rdimond

To change your present working directory, use the cd command. For example, to change to the /tmp directory (the system directory for temporary files):

> cd /tmp

Remember that a space separates the command (cd) from the parameter (/tmp). If the command is successful, it will not display any information; it will simply return a command prompt. To confirm that you really did change to the /tmp directory, issue the pwd command. For instance:

> cd /tmp
> pwd
/tmp

To return back to your home directory from any other directory, enter the cd command without a parameter. For instance:

> cd
> pwd
/home/r/rdimond

Listing directories

Once you change directories, one of the first things you will want to do is look at the tables of the directory. To do this, use the ls command. For instance:

> ls
bin README

There are two items in the present working directory, called bin and README. To determine if these items are files or directories, you must ask for a long listing. To do this, use the -l option (long listing) to the ls command. Options in Linux begin with minus signs and are usually one letter long. For instance:

> ls -l
total 52
drwxrwx--- 2 guest12 guest12 4096 Feb 18 11:09 bin
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:13 README

The dash "-" in the first column of the README line indicates that this is a file. The "d" in the first column of the bin line indicates that this is a directory. The "total" line indicates how many blocks are taken up by items in this directory. It is not usually useful and can be safely ignored.

Let's look at the long listing of the README file more closely:

-rw-r-----  1 guest12 guest12 38331 Feb 18 11:13 README
1  2        3   4       5       6     7           8

The long listing provides a lot of information about the file in a single line. As stated, the first character is the file type (labeled 1 above). Generally, this will either be a dash or a d, indicating that it is an ordinary file or a directory. Following the file type are nine characters (labeled 2 above) indicating the file permissions (file permissions will be discussed in a later section). The number following this (labeled 3 above) can be ignored; it is for use by advanced Linux users. The next two fields (labeled 4 and 5 above) are the owner of the file and the group affiliation of the file. All files on the Linux file system are owned by someone and have some group affiliation. Next is the size of the file in bytes (labeled 6 above). A byte is the equivalent of a single character. Next comes the date and time that the file was modified (labeled 7 above). Finally comes the file name (labeled 8 above).

You can also list the tables of a directory without changing to it. To do this, give the directory name that you want listed as a parameter to the ls command. For instance:

> ls -l /tmp
total 629
-rw-------   1 rdimond  system    147456 Aug  6 22:16 Ex25804
-rw-------   1 rdimond  system     81920 Aug  6 22:15 Rx25804
-rw-r--r--   1 root     system        59 Aug  6 13:34 lpq.00125519
-rw-------   1 flory    system    825012 Aug  5 11:54 ng5chi.dat
-rw-r--r--   1 tpan     system      3086 Aug  6 10:43 rrn.16443
-rw-r--r--   1 tpan     system    355337 Aug  6 10:43 rrnact.16443
drwxr-xr-x   2 pkovatch system       512 Aug  1 04:20 spss_125

Other useful options for the ls command are listed below:

ls -a (all) Include "dot" files, those beginning with a dot
ls -F (File types) Identify file types with codes; / for directories, * for executables, and @ for symbolic links
ls -R (Recursive) Recursively list all subdirectories
ls -r (reverse) Sort in reverse order
ls -s (size) Display the size in kilobytes
ls -t (time) Sort by time modified
ls -u (used) Show time of last access

Making and Removing Directories

Within your home directory, you have the ability to organize your files as you please. This means that you can create subdirectories within your home directory. To do this, use the mkdir command. For instance:

> ls -l
total 52
drwxrwx--- 2 guest12 guest12 4096 Feb 18 11:09 bin
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:13 README
> mkdir homework
> ls -l
total 56
drwxrwx--- 2 guest12 guest12 4096 Feb 18 11:09 bin
drwxrwx--- 2 guest12 guest12 4096 Feb 18 11:24 homework
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:13 README

In this example, a new directory was created called "homework". Use cd to change to the homework directory. For instance:

> pwd
/home/g/guest12
> cd homework
> pwd
/home/g/guest12/homework

If you decided that this directory was not needed after all, you could remove the directory using the rmdir command. For instance:

> ls -l
total 56
drwxrwx--- 2 guest12 guest12 4096 Feb 18 11:09 bin
drwxrwx--- 2 guest12 guest12 4096 Feb 18 11:24 homework
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:13 README
> rmdir homework
> ls -l
total 52
drwxrwx--- 2 guest12 guest12 4096 Feb 18 11:09 bin
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:13 README

The homework directory is now gone. This only works if the directory is empty, that is, it has no files or directories within it.

Copying, Moving, Renaming, and Removing Files

Files are created in a number of ways. You can use an editor, such as EMACS or PICO to create a file; statistical programs, such as SAS or SPSS create files; you might create files using a PC application like TextPad, with your Linux home directory as a network drive. In any case, once files are created, it is often necessary to copy, move, rename, or remove them.

To copy a file, use the cp command. For instance, if you have a file called README and you wish to copy it to readme.new, you would do this:

> cp README readme.new
> ls -l
total 92
drwxrwx--- 2 guest12 guest12 4096 Feb 18 11:09 bin
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:13 README
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:32 readme.new

The original file has not been changed in any way, but a new file has been created. This new file is a copy of the original, with a different name. Also, because Linux is case sensitive, the file names were specified with the appropriate cases. The new file name has a dot in the name, and a suffix. As stated earlier, suffixes to Linux are entirely unimportant (although they may be important to particular applications!). There may be as many letters before or after the dot as desired. Finally, note that the last modification date on the new file is different from the last modification date on the old file. The new file's modification date is the creation date.

Now, let's create a directory called Documentation and move the new file to that directory using the mv command:

> mkdir Documentation
> mv readme.new Documentation
> ls -l Documentation
total 40
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:32 readme.new

The readme.new file is now in the Documentation directory (again, notice that the D in Documentation is capitalized).

The cp command can also be used to make a copy of a file, using the same file name as the original, but placing it in a different directory. For instance:

> cp README Documentation
> ls -l Documentation
total 80
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:35 README
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:32 readme.new

In this example, the file called README is copied to the directory called Documentation, the name not changing.

The mv command can be used to rename a file. For instance:

> mv readme.new oldreadme
> ls -l
total 80
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:32 oldreadme
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:35 README

A note of caution about using the cp and mv commands: If you copy or move a file to a file name that already exists, the existing file will be overwritten without notice.

Now the Documentation directory has two copies of the same file with two different names. You can remove a file using the rm command. For instance:

> cd Documentation
> rm oldreadme
> ls -l
total 40
-rw-r----- 1 guest12 guest12 38331 Feb 18 11:35 README

You can also remove the Documentation directory and all of its tables, but you cannot use the rmdir command, which is only for removing empty directories. To remove a directory, including all of its tables, use the -r option to the rm command. For example:

> cd
> rm -r Documentation
> ls -l
total 52
drwxrwx--- 2 guest12 guest12 4096 Feb 18 11:09 bin

This will remove the Documentation directory and all of its tables with no questions asked. This is somewhat dangerous. A better way to use rm is to use the -i option also, which forces you to confirm that you really want to remove each file or directory. For example:

> cd
> rm -r -i Documentation
rm: remove Documentation/README? y
rm: remove Documentation? y
> ls -l
total 52
drwxrwx--- 2 guest12 guest12 4096 Feb 18 11:09 bin

The rm command now asks you to confirm that you really want to remove each item. You can answer y or Y (or any other answer that begins with a y or Y, such as yes, yep or yessireebob) and the item will be removed. Any other answer and the item will not be removed.

One warning about removing Linux files: once a file is removed, it may be gone forever. When a user accidentally removes a file, SSCC staff can sometimes restore the file from the nightly backups, but this is not always possible. Use the -i option when using the rm command to protect your data.

Viewing The Contents of Files

To view the tables of a file, you can use the more command:

> more filename

Replace "filename" with the name of the file you wish to view. The file will be displayed one screenful at a time. There are many subcommands within more, but the following are the most useful:

space scroll down a full screen
Enter scroll down a single line
b scroll up a full screen
q quit out of more and return to the command line

To use a subcommand, simply type in the command when the system pauses after displaying a screen of information.

Using Pipes to View The Output of Commands

Very often, the information scrolling across the screen is not the tables of a file, but other information, such as the long listing of a directory. You can still use the more command to view the output, but you use it through a special Linux feature called a pipe. To use a pipe, type the command as you usually would, but after the command, instead of pressing Enter, place the pipe symbol "|", depicted on your keyboard as a solid or broken line and then type the more command. This will take the output of the ls command and place it in the more command. For instance:

> ls -l /tmp | more

This can be used with any command that displays more than a screen full of information. For example:

> cal 1997 | more

This command would display the calendar for 1997, but it would be displayed within the more command, allowing you to scroll up or down, as desired.

Pipes are one of the most powerful features of Linux.

Using Pipes to Print the Output of Commands

Printing files is discussed in the SSCC Member's Handbook. However, Linux pipes give the user the ability to print any data that can be displayed on the screen. For instance, if you wish to print out a listing of your home directory, do the following:

> ls -l | enscript

In this example, no listing is printed to the screen; the computer returns a prompt to you without showing you the listing. The output of the ls command is sent to the default printer.

Command Shortcuts

Once users begin to use Linux commands with some regularity, they rapidly start to desire certain shortcuts for some operations. Linux provides shortcuts and alternative methods for performing actions in abundance. This section introduces some relatively simple shortcuts that are not necessary for users to perform their work, but may be useful to beginning level students.

Wildcard Characters

Wildcard characters allow you to specify many files at once, or to specify a single file concisely. The wildcard characters are the asterisk (*), the question mark (?), and the square brackets ([]). You can use wildcard characters with commands like ls, cp, mv and rm to perform an action on several files. Below are examples of the use of wildcard characters with the ls command:

> ls R*
README
README.old

The asterisk means "zero or more of any character." In this example, the ls command listed two files beginning with an R.

> ls *.old
hmwork1.old
README.old


Wild card characters can appear anywhere in a file name: in the beginning, middle, or end. In this example, the ls command listed two files ending with .old.

> ls *old*
hmwork1.old
oldnotes
README.old

Multiple wild card characters can be used. In this example, the ls command listed three files that had old somewhere within the file name.

> ls hmwork?
hmwork1
hmwork2
hmwork3
hmwork4

A question mark stands for one character within the list or range shown. In this example, the ls command listed four files that started with hmwork and then had a single character following.

> ls hmwork[2-4]
hmwork2
hmwork3
hmwork4

The ls command listed three files that started with hmwork and then had a single character following in the range of 2 to 4. This range might have been a to z (including all lower case letters), or N to m (including the second half of capitalized letters and the first half of lower case letters).

Any of these wild card characters can be used multiple times, and in combination with each other.

Home Directory Abbreviation: The Tilde (~)

As configured for new SSCC users, Linux allows you to use the tilde (~) as an abbreviation for your home directory. In any command where you want to specify your home directory, you may use the tilde instead. For example:

> cd ~/data
> ls ~

The user changes to the data subdirectory of her home directory and then listed the tables of her home directory.

The tilde followed immediately by a user's login name is an abbreviation for that user's login directory. For example:

> ls ~smith
> cd ~jones/sas

This will list the directory called /home/s/smith and then change to the directory called /home/j/jones/sas provided the proper permissions are set on the directories.

Path Abbreviations: The . and ..

Two other abbreviations, the .. and the . are shortcuts that can save you keystrokes. .., also called dot-dot, can be used to refer to the directory up one level from the current directory. For example:

> pwd
/home/g/guest12/homework
> cd ..
> pwd
/home/g/guest12
> cd ..
> pwd
/home/g
> cd
> pwd
/home/g/guest12 

Each cd .. command moved the present working directory up one level. The cd command without a parameter moved the present working directory back to the home directory, as we saw before.

., also called dot, is a shortcut used to refer to the current directory. For example:

> mv /project/sandefur/wave9/ameier/2003/readme.new .

moves the file readme.new from the location specified to the users current working directory.

Rerunning Commands and Editing the Command Line

As configured for new SSCC users, Linux allows users to edit the command line. This can be as simple as rerunning the previous command to making modifications in the command currently on the screen. This is performed using the arrow keys. Use the up arrow to display previous commands. Each strike of the up arrow key will step backwards through the list of previous commands. When you find the command that you want to rerun, simply press Enter. If you go past the command, use the down arrow to step forward through commands.

If you find a command that you want to rerun, but it is slightly off, use the left and right arrows to move across the command line, use the backspace key to remove a character, and add any character you wish. When the command is properly displayed, press Enter to execute the command.

The exclamation point can also run a previous command. Type an exclamation point followed by the first letters of a command and the last command that began with those letters will be rerun. For example:

> !emacs

This will run the last emacs command. This might be quite useful if, for instance, the last emacs command was something like:

> emacs ~jones/progs/oldstuff/dissert.dat

Getting Help

On-line help is available on Linux through the command called man, which is short for manual pages. The man command displays reference pages on the screen. These pages can be written obscurely. If you do not understand a reference page, contact SSCC's help desk for assistance.

If you don't know exactly what command you need to use, you can find a command using the -k option to the man command. The -k option searches for key words in the NAME section of the man page. For example:

> man -k compare

will list on the screen Linux commands that can be used to compare files.

In Case of Emergency: What to Try When Things Go Wrong

Sometimes the system just stops working properly for no reason apparent to the new user. When this happens, here are a few keystrokes that might help you.

The <Ctrl-C> keystroke is the interrupt command. It should cancel the current operation and return the prompt to the screen.

The <Ctrl-S> keystroke stops items from displaying on the screen temporarily. This is not useful to a beginning Linux user, but users may accidentally type this, perhaps when intending to type an upper case S. The <Ctrl-Q> keystroke will override the <Ctrl-S> keystroke, allowing the screen to begin displaying again.

Some times, the computer is taking input and waiting for the end of the input. A <Ctrl-D> is the end of file (or end of input) character. Type this keystroke if the system is awaiting input from you and you have given it all the input. This may happen when, for instance, you use the cat command, but forget to give the file name. The system will wait for you to type in what you want printed to the screen. It will take as many characters as you can type, including returns and will not return the prompt to you until it gets the end of file character, the <Ctrl-D>.

4. Managing Disk Space

In this section you will learn about the disk space available to you at SSCC and how to manage it.

Categories of Disk Space

SSCC provides two categories of storage space for individual users: home directory space and short term disk space. Both types of individual disk space are described in the SSCC's Member Handbook including quotas and backup policies.

If you are working on a research project with a group of people, we can provide you with separate storage space on Windows or Linux that you can all share. If you'd like project space you may fill out the online form. If you need your account added to a research project space, ask the person who set up the project (usually a faculty member) to contact SSCC's Help Desk on your behalf.

Please help keep costs down by using disk space wisely:

  • Compress large files.
  • Remove unneeded files.
  • Move files to project disks, if appropriate.
  • Do not make copies of standard data files archived by CDE or other agencies or individuals.

Determining How Much Disk Space You are Using

To determine how much disk space you are using, use the quota command. For example

> quota  
Disk quotas for user rdimond (uid 1931):
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
griffon:/home/t  936904  1024000 1024000            8119       0       0

In the column labeled "Used" is the amount of disk space you are using, in kilobytes. The quota column tells what your current disk quota is.

Often, this is not sufficient information. You want to know specifically which directories are using the disk space. To determine this, use the du command, which will tell you how many kilobytes are in each of your subdirectories. For example:

> du -k ~
29414 /home/s/somerset/data
8 /home/s/somerset/News
240 /home/s/somerset/Stuff
224 /home/s/somerset/Personal/gifs
77 /home/s/somerset/Personal/letters
2329 /home/s/somerset/Personal
164 /home/s/somerset/docs/reqs
703 /home/s/somerset/docs/faqs
13 /home/s/somerset/docs/tmp
42 /home/s/somerset/docs/soc361
1569 /home/s/somerset/docs/soc365
339 /home/s/somerset/docs/olddocs/homework
19878 /home/s/somerset/docs/olddocs
9049 /home/s/somerset/docs/travel
35343 /home/s/somerset/docs
202 /home/s/somerset/jobsearch/apps/old
221 /home/s/somerset/jobsearch/apps
238 /home/s/somerset/jobsearch
8336 /home/s/somerset/saslib
155 /home/s/somerset/practice
80024 /home/s/somerset

This user is using 80 MB of disk space. Most of the disk space usage is in the docs subdirectory, particularly in the olddocs subdirectory of the docs directory. Also, a lot of disk space is being used by the data directory.

You can also get a complete listing of the sizes of all files using the -a option to the du command. For example, below might be the output of the du -ak command, after the output has been sorted (numerically, and in descending order) and the first ten lines requested (the head command):

> du -ak ~ | sort -n -r | head
80024 /home/s/somerset
35343 /home/s/somerset/docs
29414 /home/s/somerset/data
19878 /home/s/somerset/docs/olddocs
11088 /home/s/somerset/docs/olddocs/thesis
9049 /home/s/somerset/docs/travel
8336 /home/s/somerset/saslib
7712 /home/s/somerset/data/brazil
6208 /home/s/somerset/saslib/course.ssd04
5264 /home/s/somerset/docs/olddocs/diagrams

This output includes both files and directories. A comparison with the output from the du -k, above, shows that the largest files are ~somerset/docs/olddocs/thesis, ~somerset/data/brazil, ~somerset/saslib/course.ssd04, and ~somerset/docs/olddocs/diagrams. In the interest of conserving disk space, user somerset may want to delete or compress some of these files.

To determine the amount of disk space available on a project disk, use the df command. For example, if you own a directory called /project/irp/bozeman, you can determine the total amount of free space by running this df command:

> df -k /project/irp/bozeman
Filesystem 1024-blocks Used Available Capacity Mounted on
irp1#irp 8220960 1692974 6507568 21% /project/irp

In this example, about 6.5 GB of disk space is available. Again, the units are kilobytes, which was requested when the -k flag was used.

Compressing Large Files

A good way to save disk space is to compress files. A compression savings rate of 75% is typical and even 95% is achievable, particularly for ordinary data files.

Two compression programs are commonly used on Linux: compress and gzip. The syntax for both is basically the same: issue the command, followed by the name of the file you wish to compress. The -v option is useful, as the compression commands will tell you the percentage of file space you saved by compressing the file. For example:

> compress -v vt20.alpha.tar
vt20.alpha.tar:Compression:74.18% - replaced with vt20.alpha.tar.Z

or

> gzip -v vt20.alpha.tar
vt20.alpha.tar: 89.2% -- replaced with vt20.alpha.tar.gz

The compression commands will change the names of the files, the compress command adding a ".Z" suffix, and the gzip command adding a ".gz" suffix.

To uncompress files, use the commands uncompress or gunzip:

> uncompress vt20.alpha.tar.Z

or

> gunzip vt20.alpha.tar.gz

The compressed file will be replaced by an uncompressed file without the suffix.

Once compressed, files can be uncompressed and then used. However, it is inefficient, both with respect to SSCC computing resources and your time, to constantly uncompress and then recompress files, particularly large data files. There are two ways to use compressed files without uncompressing them. First, some data analysis programs allow you to read in compressed data. Second, some programs that cannot use compressed data can read data from a special type of file called a named pipe.

Programs such as SAS, SPSS, and STATA allow data to be read from the output of commands. Using the zcat command or the gunzip -c command, the compressed file can be printed to standard output so that software programs can read the files. For instructions on how to use compressed data with commercial software programs, see SSCC Knowledge Base articles on the use of these programs available on SSCC's web site.

5. Choosing the Proper Linux Computer

In addition to the three Linstat servers, SSCC also has a Condor Flock and High Performance Computing cluster for running large jobs. When selecting a Linux computer on which to run a job, you must consider which machines have the software that you want to use and which machines have the computing resources necessary for your project. Visit our Computing Resources at the SSCC web page for details.

Condor

SSCC has a cluster of Linux servers for running large STATA, SAS, R, MatLab, Fortran, and C/C++ programs. This cluster has a powerful batch pooling utility installed called Condor which was developed at UW-Madison's Computer Science Department. For more information on Condor, refer to the SSCC Knowledge Base article, An Introduction to Condor.

High Performance Computing Cluster

The SSCC has a High Performance Computing cluster called FLASH. See Using the SSCC's High Performance Computing Cluster for instructions on using these machines. If you have parallelized C/C++, Fortran, or R programs you'd like to run on this cluster, please contact Ryan Horrisberger.

Software

Almost all the software installed on Linstat is installed on all three Linstat servers. The two exceptions (due to licensing restrictions) are SPSS and Stat/Transfer. They are installed on Linstat1. If you run SPSS or Stat/Transfer on another Linstat server they will automatically connect to Linstat1 and run your job there, but if you need to manage that job later you'll need to log in to Linstat1 to do so.

Software availability information for all of SSCC's computers can be found on SSCC's Software Availability web page.

CPU Power

The three Linstat servers have very similar processors. However, for large jobs that will take more than a few minutes to run, Condor is ideal. Please see An Introduction to Condor.

System Load

If you are going to use the computer intensively, for a STATA program, for example, then you should look for a machine that is not busy. There are several ways to determine if a machine is busy, and, if it is busy, what it is doing. Going to SSCC's Server Status web page is the easiest way to get a quick snapshot of how busy a system is.

The Linux operating system provides its own set of commands to get the same information. For example:

> uptime
13:33  up 1 day,  2:34,  4 users,  load average: 3.36, 3.31, 3.47

The uptime command tells the current time, the length of time the computer has been running (in this example, one day, two hours, 34 minutes), how many users are currently logged onto the system and the load average for the past one, five, and 15 minutes. The load average is the average number of jobs waiting to run over the particular time increment. The higher the number, the busier the system. A Linstat server is busy if its load average exceeds four and is very busy if their load average exceeds six.

To find out how busy Condor is, use the condor_status command.

Another excellent command for monitoring system activity is the top command. The top command lists jobs currently running, ordered by CPU usage, with the command using the greatest amount of CPU time on top of the list. The output of the top command looks like this:

 load averages:   0.16,  0.24,  0.23                       15:33:03
94 processes:  1 running, 1 waiting, 15 sleeping, 75 idle,2stopped
Cpu states: 10.0% user,  0.0% nice,  7.9% system, 82.0% idle
Memory:Real:471M/767M act/tot Virtual:16M/2243M use/tot Free: 181M

  PID USERNAME PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
 9124 esimpson  42    0 8192K 1327K WAIT    4:27 10.50% sas
10235 odrucker  42    0 7736K 4128K sleep   0:03  1.80% stata
 9387 mcdermot  44    0 2504K  393K run     0:00  0.40% top
  896 root      44    0 1704K  229K sleep   0:01  0.10% telnetd
   77 root      42    0 1600K   57K sleep  19:30  0.00% update
  488 root      44    0 1728K  122K sleep   0:38  0.00% snmpd
  365 root      44    0 2032K  335K sleep   0:22  0.00% rpc.lockd
  561 root      44    0 1992K  106K sleep   0:17  0.00% httpd
 8463 swald     42    0 4488K  180K sleep   0:12  0.00% xterm
  484 root      44    0 2432K  204K sleep   0:11  0.00% os_mibs
    1 root      44    0  440K   40K sleep   0:07  0.00% init
32490 root      44    0 1704K   40K sleep   0:05  0.00% telnetd
  150 root      44    0 1656K  122K sleep   0:02  0.00% syslogd
  452 root      32  -12 2072K  270K sleep   0:02  0.00% xntpd
 8459 mcdermot  44    0 4464K  729K sleep   0:01  0.00% xterm

The listing is updated every few seconds. The load averages are on the first line. Next is a list of how many processes are currently running (94 in this example). The third line shows the percent of time the CPU is spending in various modes. The most important item in this line is the idle percentage. If the idle percentage is non-zero, then the computer is not busy at all. The fourth line shows how much memory is in use.

The table at the bottom is the most interesting part of top output. It lists jobs that are currently running. In this snapshot, user esimpson is running SAS, using about 10% of CPU time. User odrucker is running Stata. He is taking about 2% of the CPU time. The top command is taking about half a percent and other commands are taking trivial amounts.

Because the Linstat servers have multiple CPUs, the percent of CPU used may total as much as 800%. Enter q to exit the top command.

6. Running Jobs

Any time you give Linux something to do, you've created a job. Of course many Linux commands execute almost instantly (cd, ls, etc.), but others may run for hours, days, or even longer. In these cases, how a job is run will impact both what you can do and how the system performs for all other users. The SSCC's Linux servers are a shared resource, and it is up to each member to share nicely.

Command Input and Output in Linux

Linux was designed to have many tools that do specialized tasks. In the Linux model, data flows from one command to another command, each command doing what it does best. To implement this model, every Linux command has three files associated with it. These files are called:

  • standard input
  • standard output
  • standard error

Standard input is the place from which commands get their data. By default, this is the keyboard. Standard output is the place that commands put their output. By default, this is the screen. Standard error is the place that commands put their error messages. By default, this is the screen, also. But it is important to note that standard output and standard error are not the same thing. It just happens that, by default, they send data to the same place. Collectively, these are called standard input and output, or standard I/O, abbreviated stdio.

Stdio Elements Default Abbreviation
standard input keyboard stdin
standard output screen stdout
standard error screen stderr

Standard I/O can be redirected so that it comes from, or goes to, any place. Standard input can come from the keyboard, or from a file, or from another command. Standard output can be sent to the screen, or to a file, or into another command (as standard input to that command). This is the power of the standard I/O system.

The symbols used to redirect output are:

> redirect stdout from command to a file
>> redirect stdout from command to a file, appending
>& redirect stdout and stderr from command to a file
>>& redirect stdout and stderr from command to a file, appending
< redirect stdin from file to a command
| pipe the stdout of one command into the stdin of another command

 

Redirection of Standard Output

One of the most common ways to manipulate standard I/O is to redirect standard output from a command into a file. For example, if you want to save a long listing of one of your directories, you can do this:

% ls -l Documentation > doc.list

The "greater than" sign (>) redirects data from the ls command to a file called doc.list. Without the redirection, the listing would appear on the screen, but with the redirection, the command only returns the prompt, with no listing. If the file doc.list already exists, then it will be overwritten by the data from the ls command. To append data to the file, instead of overwriting the current data, use two "greater than" signs:

% ls -l Documentation > doc.list
% ls -l Programs >> doc.list

In this example, the first command redirected the listing of the Documentation directory into the file called doc.list, creating a new file or overwriting an existing file. Then, the second command appended the listing of the Programs directory into the doc.list file. The doc.list file contains listings for both directories, now.

Redirection of Standard Input

Some commands can take information from sources other than the keyboard. They use standard input. For instance, if you wanted to mail the doc.list file to someone, you could use the Mail command to do so, instead of invoking pine or another mailer:

% Mail -s "Documentation Listing" odrucker < doc.list

In this example, the Mail command is used. Mail is sent to odrucker with the subject line "Documentation Listing" (the parameter to the -s option) and the tables of the mail message is the doc.list file.

Pipes

The most common use of redirection of standard I/O is with pipes, which take the output of one command and give it to the input of another command. Some common uses are exemplified below:

% ls -l Documentation | enscript

In this example, the listing of the Documentation directory is sent directly to the enscript command so that the file can be printed. The listing is never saved on disk or displayed on the screen.

% ls -lR | more

The -R option to ls instructs ls to recursively list all directories and subdirectories. This could lead to a very long list. In this example, the output of the ls command is piped through the more command, allowing you to read the listing one screen at a time.

Running Jobs in the Foreground and Background

Normally when you type a command, it is processed and you see the results (if any) before the cursor returns and you can type a new command. These jobs are said to be running in the foreground, and that may be exactly what you want if your job will run very quickly or you cannot proceed until you have your results. But you can tell Linux not to wait. When you put a job in the background, the cursor returns immediately and you can keep giving commands and doing other work while the your job is running. When it finishes, a message will appear on your screen.

To run a job in the background, simply add an ampersand (&) at the end of the command line. For example:

> stata -b do myprogram

Stata will start and run myprogram.do in the foreground. Thus the session will be unavailable until the job is done. On the other hand,

> stata -b do myprogram &

will start Stata in the background. The cursor returns immediately, and the user can edit other programs, organize files, etc. while waiting for the job to finish. When it is done you will see:

[1]    Done                          stata -b do myprogram

Note that a job which creates a separate window (emacs, for example) will be completely functional in the background. What makes it a background process is that your shell (the main session window) is ready for more commands. On the other hand if a program without a window is running in the background and needs input from you (for example if SAS runs out of resources), it will halt until you put in the foreground and give it the input it needs.

Note that a job running in the background will keep running even if you log out, so it is quite possible to start a long job before you leave in the evening, log out, and get the results the next morning. Remember that Linstat is actually a cluster of three servers and when you log in you're assigned to a server randomly (to try to balance the load between them). However, you can choose to connect to a specific server to monitor a job you started previously or if the server you're assigned to turns out to be particularly busy.

To switch to a different server, type:

ssh server

where server can be linstat1, linstat2 or linstat3. Alternatively you can set up your client program to log in to one of those three servers directly.

Switching Between Foreground and Background

If you have a job running in the foreground and you want to do something else, simply press CTRL-z (note that if the current job has opened a window of some sort, you must return to your shell window before pressing CTRL-z). The current job will be suspended and you will get your cursor back. If you want the job to run while you are doing other things, type bg to put it in the background. You can also type fg to move it back to the foreground, either from being suspended or from the background.

Managing Background Jobs

It can be very easy to lose track of jobs you have running in the background, but there are several commands that can tell you about them.

jobs will list all the jobs you started this session that are not yet complete. For example:

> jobs
[1] - Running emacs
[2] + Suspended emacs

The number in brackets is the job number, and you can use that number preceded by a percent sign (%) to refer to the job. Naming a job will move it to the foreground, so in this case %2 is similar to fg (except you don't have to keep track of which job is considered the "current" job). Adding an ampersand moves it to the background, so %2 & is similar to bg.

You can list jobs started in a previou session using the ps command (think processes). The syntax is ps x -u username. For example:

> ps -u rdimond
PID TTY TIME CMD
29413 pts/30 00:00:00 tcsh
1601 pts/30 00:00:00 emacs
1602 pts/30 00:00:00 emacs
1605 pts/30 00:00:00 ps

Note how the bracketed numbers have been replaced by the PID (Process IDentification) and the list is more complete, including your shell (in this case the tcsh shell), and the ps command itself. Note that PID's cannot be used to move things from foreground to background. On the other hand this is the only way to check on jobs from previous sessions.

Killing a Job

Sometimes you will change your mind about a job, and occasionally things even go wrong. In these cases, the kill command can be invaluable. Simply type kill and then the job number or PID. For example:

> kill %2

or

> kill 1602

This doesn't actually stop the job, it merely requests that it shut down, giving the program an opportunity to clean up temporary files and such. Unfortunately both SAS and SPSS will not do so, so if you kill one of these jobs, please go to the /tmp directory and manually delete all files and directories belonging to you. On the other hand, adding the -9 signal to the kill command will kill a program immediately with or without its consent. Thus:

> kill -9 1602

will kill process 1602.

Running Multiple Jobs

Linux will allow you to put as many jobs as you want in the background, and it will try to work on them all at once. This means it is quite possible for a single user to run so many jobs that everyone else is "crowded out." If necessary SSCC staff will intervene to stop this. On the other hand, Condor handles multiple jobs very efficiently and has plenty of available capacity. So if you are planning on doing any resource intensive computing, you really should check out Condor.

The general rule on the interactive (non-Condor) Linstat servers is that you should only have one major job running at a time on each server. Text editors, email, etc. are not a problem, but Stata, SAS, SPSS, and most user-written programs are resource intensive and will affect others. Keep in mind that Linux will split the available CPU time among all the running jobs. So if you run three jobs simultaneously, they will each take three times as long to run, saving you no time but making much less CPU time available for others (the one exception to this would be if the server has an idle CPU, but you shouldn't count on this).

If you have multiple jobs to run, please read SSCC's CPU Usage Policy.

Condor

Condor is designed to process large numbers of jobs. For full details please see An Introduction to Condor, but the essence of Condor is that we have a pool of Linux servers which only run jobs submitted to them through the Condor program. Unlike standard Linux jobs, Condor jobs never interfere with each other, since each job gets exclusive use of a CPU. Thus if you submit your jobs to Condor, they will not slow down the server for anyone else (or be slowed down by anyone else).

The price is that it takes about 30 seconds for Condor to process a job and assign it to a machine. Thus if you are running a 20 second job and will be waiting for the results, it would be counterproductive to use Condor. But if you have many jobs to run, or a single big job, Condor is a great tool. It's not quite a panacea since it can only be used for Stata, R, MatLab, and most user-written C/C++ and FORTRAN code, but that covers the bulk of the computing done at the SSCC.

We have written several scripts which make submitting Stata jobs to Condor almost identical to running them as usual. The standard command for running a Stata do file in batch mode is stata -b do dofile (where dofile would be replaced by the name of the do file you want to run. To submit the job to Condor instead, simply replace stata with one of the following:

> condor_stata -b do dofile

condor_stata is the command you'll normally use. It will send your job to a multi-processor machine if one is available, but if not it will send your job to the first available machine.

If you want to run programs other than Stata using Condor, or want to submit many jobs at once, please see An Introduction to Condor.

Scripts

Consider the following two scripts. Both run three SAS jobs. The one on the left will tie up the server it is run on, the one on the right will not. And it will execute in about the same amount of time:

Bad Script Good Script

sas prog1 &
sas prog2 &
sas prog3 &

sas prog1
sas prog2
sas prog3

The bad script places all three jobs in the background, so they all run at the same time and compete for resources. The good script runs them in the foreground, so they will run one at a time. However you do not need to wait for them: simply run the script itself in the background and your shell will be available for other work.

Of course if you could use Condor those three SAS programs would be run on three different CPUs and thus execute in one third the time.

Running a Job Later

The at command allows you to run a job at a time you specify. For example, you could run a big, resource intensive job at 1:00 AM when no one is likely to be on. There are several ways to use at.

If you want to just type in the job you want to run later, type

> at time

and you can then enter the command(s) at the prompt (at>). When you are done, press CTRL-D. The time parameter will understand just about any reasonable format, including at 1:00, at 1:00am, at 1am, at 13:00 (1:00pm), at noon, at midnight, or at teatime (4:00pm). Note that if you do not specify am or pm, it is assumed you are using 24-hour time.

You can also put the commands you want executed in a file. To do this type:

> at time -f file

To list the jobs currently waiting to run, type:

> atq

To remove a job, type:

> atrm job

where job is an ID obtained by listing your jobs.

Note that if you submit your jobs to Condor, they will not affect other users and will get plenty of resources no matter when you run them.

7. Summary of Commands

The table below is a quick reference for the most common Linux commands. Following the link will take you to a more in-depth explanation of the command.

Command Name Command Description
at run a job at a specified time
cd change directory
clear clear the terminal screen
compress, uncompress compress and expand file
condor_status lists state of SSCC's Condor flock
cp copy files and directories
df report file system disk space usage
du estimate file space usage
enscript print file(s)
exit log off
gzip, gunzip compress and expand file
hostname display name of computer logged into
jobs display status of jobs in the current session
kill terminate a job
lookup display information about UW employees and students
ls list directory tables
man display the on-line help pages
mkdir create a directory
more display a file one screenful at a time
mv move or rename files
ps display job status
pwd display present working directory
quota display disk usage and limits
rm remove (delete) files or directories
rmdir remove (delete) directories
soft list SSCC software availability
ssh remote login and remote execution of commands
top display top CPU processes
uptime tell how busy the system is

Other Sources of Information

Many resources are available to learn about the Linux operating system, both at SSCC and at your local book store. SSCC staff maintain numerous on-line Knowledge Base articles on Linux topics including the use of editors, such as EMACS and PICO, and use of statistical software like SAS, STATA, and SPSS. All of SSCC's Knowledge Base articles are available online at https://www.ssc.wisc.edu/sscc/pubs.

SSCC also teaches mini-courses, ranging from one-hour courses, to classes that meet for half a day, or for an hour a week for several weeks. See BROADCAST, SSCCNEWS, or SSCC's training web pages for registration and other information about these courses.

 

Last Revised: 10/26/2011