Over the past two weeks we've experienced intermittent delays in delivering mail. This was due to the server being simply too busy to deliver the messages as quickly as they arrived. The immediate cause was the additional steps we took to filter spam: now that SpamAssassin is looking for more clues that a given message is spam, it takes more processor time to examine each message.
However, this was a case of "the straw that broke the camel's back" as mail volume was already approaching the limit of what our mail cluster could process. Since December of last year, the total number of email messages we process on a typical day has doubled, and the total size of the email traffic has increased sixfold. Much of the increase in size is due to spammers disguising their advertisements in images so filters like SpamAssassin can't read them.
To fix these delays, we've moved the SpamAssassin program from the main mail server cluster to its own cluster of four servers. By spreading the load among four servers which do nothing but run SpamAssassin, we can process today's mail load in a timely manner so mail delivery will go back to being nearly instant. As mail volume continues to increase, we can add more servers to the SpamAssassin cluster or the main mail cluster as needed.
We are still fine-tuning the SpamAssassin cluster, so you may notice more spam in your inbox than usual and that the Not Legit folder is not getting cleaned out every night as it should. We expect to resolve these issues very soon.
SpamAssassin "learns" by keeping a database of words and how often they are used in both legitimate messages and spam messages. When you put a message SpamAssassin failed to identify as spam in the not legit folder, SpamAssassin corrects its database. One message won't change it by much--in particular it doesn't guarantee that SpamAssassin won't miss the next message that you can see is similar, because the spammers are careful to mix up the words they use. But over time having an accurate database will help. It's also the only way we can tell how well SpamAssassin is working.
The SSCC's Condor flock has been upgraded with all-new servers and will run jobs as quickly as any of the SSCC servers. In addition, four of the Condor servers have Stata/MP installed (the multi-processor edition). Stata/MP will run most jobs substantially faster than Stata/SE, and Stata jobs submitted to Condor will automatically be run using Stata/MP if available. This makes Condor the fastest way to run a Stata job at the SSCC. (We also have a single license for 64-bit Stata/MP on FALCON, if you need both speed and large amounts of memory.)
To submit a Stata job to Condor, log in to KITE and replace the standard "stata -b do {file}" with "condor_stata -b do {file}." To submit other jobs, type "condor_do {job}". For more details see An Introduction to Condor.
If you haven't already, please remember to set your security questions. If you forget your password and have your security questions set, you can easily reset your password yourself. If you don't, SSCC staff will have to reset it for you. Unfortunately, we're not here 24 hours a day and we can't take requests to reset passwords by phone or email because we cannot verify your identity, so you'll have to stop by 4226 Sewell Social Sciences Building (and please bring some form of photo ID). Thus the further you are from the Sewell Building, the more important it is that you set your security questions!
SSCC's Fall training schedule is underway. Check out our remaining offerings on SSCC's training web pages. In October we are offering "Introduction to Parallel Computing," "An Introduction to SAS Data Steps," "A Hands-On Introduction to NVivo," "Social Science Article Databases Overview," and "RSS and Alert Services." Remember that all SSCC training sessions require preregistration.