Tutorial For Lpi Exam 201: Part 7

Topic 213: System Customization and Automation


David Mertz, Ph.D.
Professional Neophyte
August, 2005

Welcome to "System Customization and Automation", the seventh of eight tutorials designed to prepare you for LPI exam 201. In this tutorial you will learn some basic approaches to scripting and automating system events such as report and status generation, cleanup and general upkeep.

Before You Start

About this series

The Linux Professional Institute (LPI) certifies Linux system administrators at junior and intermediate levels. There are two exams at each certification level. This series of eight tutorials helps you prepare for the first of the two LPI intermediate level system administrator exams--LPI exam 201. A companion series of tutorials is available for the other intermediate level exam--LPI exam 202. Both exam 201 and exam 202 are required for intermediate level certification. Intermediate level certification is also known as certification level 2.

Each exam covers several or topics and each topic has a weight. The weight indicate the relative importance of each topic. Very roughly, expect more questions on the exam for topics with higher weight. The topics and their weights for LPI exam 201 are:

Topic 201: Linux Kernel (5) Topic 202: System Startup (5) Topic 203: Filesystems (10) Topic 204: Hardware (8) Topic 209: File Sharing Servers (8) Topic 211: System Maintenance (4) Topic 213: System Customization and Automation (3) Topic 214: Troubleshooting (6)

About this tutorial

Welcome to "System Customization and Automation", the seventh of eight tutorials designed to prepare you for LPI exam 201. In this tutorial you will learn some basic approaches to scripting and automating system events such as report and status generation, cleanup and general upkeep.

Prerequisites

To get the most from this tutorial, you should already have a basic knowledge of Linux and a working Linux system on which you can practice the commands covered in this tutorial.

About customization and automation

One of the task categories a system administrator needs to perform is the automation of various things that should occur either periodically, or with minimal fuss when a recurrent need arises. For automatic scheduling, your primary tools are cron and at. Tasks themselves, whether regularly scheduled or manually launched can be scripted with various languages, including bash, awk, perl or python. Tools in the GNU text utilities are very frequently useful as part of many processing tasks; these are most often used within bash scripts, since more sophisticated languages like awk, perl, and python build in most of the capabilities in the text utilities.

Scheduling Periodic Tasks

Configuring cron

The daemon cron is used to run commands periodically. You can use cron for all manner of system housekeeping and administration. Anything you want to happen repeatedly on a schedule should be controlled by cron. cron has a granularity of one minute--that is, it wakes up once a minute to check if it needs to do anything, but cannot peform tasks more than once per minute (if you want to do that, you probably want a daemon of some sort, not a "cron job"). cron logs its action to the syslog facility.

There are several places where cron searches for configuration files that indicate environment settings and commands to run. The first place is in /etc/crontab. These are "system" tasks. As well, the directory /etc/cron.d/ may contain multiple configuration files that are treated as supplements to /etc/crontab. Special packages may add files (matching package name) to /etc/cron.d/, but system administrators should use /etc/crontab.

User-level cron configurations are stored in /var/spool/cron/crontabs/$USER. However, these should always be configured using the tool crontab. Using crontab, users can schedule their own recurrent tasks.

Daily, weekly and monthly jobs

A special convention is used to jobs that should be run daily, weekly or monthly, rather than on other more complex schedules. These are probably the most common schedules in practice. Directories called /etc/cron.daily/, /etc/cron.weekly/ and /etc/cron.monthly/ are created with collections of scripts to run on those respective schedules. Adding or removing scripts from these directories is a simple way to schedule system tasks. For example, a system I maintain rotates logs daily simply by having a script file:

$ cat /etc/cron.daily/logrotate
#!/bin/sh
test -x /usr/sbin/logrotate || exit 0
/usr/sbin/logrotate /etc/logrotate.conf

cron and anacron

anacron can be used to execute commands periodically, with a frequency specified in days. Unlike cron, anacron checks whether each job has been executed in the last N days, where N is the period specified for that job (as opposed to whether the current time matches the scheduled execution). If not, anacron runs the job�s command, after waiting for the number of minutes specified as the delay parameter. Hence, on machines that are not running continuously periodic jobs will still be executed once the machine is actually running (obviously, the exact timing might vary, but the task will not be forgotten).

anacron reads a list of jobs from the configuration file /etc/anacrontab. Each job entry specifies a period in days, a delay in minutes, a unique job identifier, and a shell command. For example, on one Linux system I maintain, anacron is used to run daily, weekly and monthly jobs even if the machine is not running at the scheduled time of day:

$ cat /etc/anacrontab
# /etc/anacrontab: configuration file for anacron
SHELL=/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin
# These replace cron's entries
1         5  cron.daily    nice run-parts --report /etc/cron.daily
7        10  cron.weekly   nice run-parts --report /etc/cron.weekly
@monthly 15  cron.monthly  nice run-parts --report /etc/cron.monthly

The contents of a crontab

The format of /etc/crontab (or the contents of /etc/cron.d/ files) is slightly different from user crontab files. Basically, this just amounts to an extra field in /etc/crontab that indicates the user a command runs as. This is not needed for user crontab files since they are already stored in a file matching username (/var/spool/cron/crontabs/$USER).

Each line of /etc/crontab either sets an environment variable or configures a recurring job. Comment and blank lines are ignored. For "cron jobs", the first five fields specify times to run, where each zero-based field may have a list and/or a range. The fields are minute, hour, day-of-month, month, day-of-week (space or tab separated). An asterisk in any position indicates "any". For example, to run a task at midnight on Tuesdays and Thursdays during August through October, you could use:

# line in /etc/crontab
0 0 * 7-9 2,5  root /usr/local/bin/the-task -opt1 -opt2

Special scheduling values

Some common scheduling patterns have shortcut names that may be used in place of the first five fields:

@reboot        Run once, at startup.
@yearly        Run once a year, "0 0 1 1 *".
@annually      (same as @yearly)
@monthly       Run once a month, "0 0 1 * *".
@weekly        Run once a week, "0 0 * * 0".
@daily         Run once a day, "0 0 * * *".
@midnight      (same as @daily)
@hourly        Run once an hour, "0 * * * *".

For example, you might have a configuration containing:

@hourly       root /usr/local/bin/hourly-task
0,29 * * * *  root /usr/local/bin/twice-hourly-task

Using crontab

To setup a user-level scheduled task, use the crontab command (as opposed to the /etc/crontab file. Specifically, crontab -e launches and editor to modify a file. You can list current jobs with crontab -l, and remove the file with crontab -r. Optionally, you can specify crontab -u user to schedule tasks for a given user, but the default is to do so for yourself (permission limits apply).

The file /etc/cron.allow, if present, must contain the names of all user allowed to schedule jobs. Alternately, if there is no /etc/cron.allow then a user must not be in /etc/cron.deny if they are allowed to schedule tasks (if neither file exists, everyone may use crontab).

Scheduling One-time Tasks

at and friends

If you need to run a task in the future rather than immediately, you can use the command at. The command at takes a command either from STDIN or from a file (using the -f option), and accepts time descriptions in a flexible collection of formats.

A family of commands are used in association with at itself'. atq lists pending tasks. atrm removes a task fro the pending queue. batch works much like at except it only runs jobs if the system load is low when the job is requested, otherwise deferring run until a low system load exists.

Permissions

Much like with /etc/cron.allow and /etc/cron.deny, at has /etc/at.allow and /etc/at.deny to configure permissions for running at. The file /etc/at.allow, if present, must contain the names of all user allowed to schedule jobs. Alternately, if there is no /etc/at.allow then a user must not be in /etc/at.deny if they are allowed to schedule tasks (if neither file exists, everyone may use crontab).

Time specifications

See the manpage on your at version for full details. You can specify a particular time as HH:MM, which will happen the next time that time occurs (if it is passed today, it means tomorrow). If you use 12 hour time, you may add AM or PM. You may give a date as MMDDYY or MM/DD/YY or DD.MM.YY or month-name day. You may increment from the current time with now + N units, which N is a number and units is minutes, hours, days or weeks. The words today and tomorrow have their obvious meaning; as do midnight and noon (teatime is 4pm). Some examples:

% at -f ./foo.sh 10am Jul 31
% echo 'bar -opt' | at 1:30 tomorrow

The exact definition of the time specification can be found in /usr/share/doc/at/timespec.

Scripting Tips

Outside resources

This relatively short tutorial cannot really touch on the ins-and-outs of even one major scripting language. A number of excellent books have been written on each of Awk, Perl, Bash, Python. The author, however, might particularly recommend his own Text Processing in Python as a good starting point for scripting in Python. Most scripts you will write for system administration are aimed at text manipulation: extracting values from logs and configuration files; generating reports and summaries; but also cleaning up system cruft and sending notifications of tasks performed.

The most common scripts in Linux system administration are written in bash. However, bash itself has relatively few built-in capabilities. Instead, bash makes it particularly easy to utilize external tools, both basic file utilities like ls, find, rm, cd, and the like; but especially text tools like those found in the GNU text utilities. A good introduction to these utilities can be found in "Using the GNU text utilities" (http://www-128.ibm.com/developerworks/edu/l-dw-linux-gnutex-i.html).

Bash tips (part one)

One particularly helpful setting to include in bash scripts that run on a schedule is the set -x switch that echos the commands run to STDERR. This is helpful in debugging scripts if they do not seem to have the effects anticipated. Another useful option during testing is set -n that causes a script to look for syntax problems, but not actually run. Obviously, you don't want a -n version scheduled in cron or at, but to get it working in the first place, it can help.

A cron job that runs a bash script might start something like:

#!/bin/bash
exec 2>/tmp/my_stderr
set -x
# functional commands here

This redirects STDERR to a file, and outputs the commands run to STDERR. Examining that file later can be useful.

The manpage for bash is quite good, though quite long. Of particular interest are all the options that the builtin set can accept.

Bash tips (part two)

One of the common things you do in a system administration script is process a collection of files, often with the files of interest identified using the find command. However, a problem can arise when filenames contain whitespace, and especially newline characters. Much of the looping and processing of filenames you are likely to do can be confused by these internal whitespace characters. For example, these two commands are different:

% rm foo bar baz bam
% rm 'foo bar' 'baz bam'

The first unlinks four files (assuming they exist to start with); the second removes just two files, each with an internal space in the name. Filenames with spaces are particularly common among multimedia content.

Fortunately, the GNU version of the find command has a a -print0 option to NULL terminate each result; and the xargs command has a corresponding -0 command to treat arguments as NULL separated. Putting these together, you might cleanup stray files (that might contain whitespace in their names) using:

#!/bin/bash
# Cleanup some old files
set -x
find /home/dqm \( -name '*.core' -o -name '#*' \) -print0 \
 | xargs -0 rm -f

Perl taint mode

Perl has a handy switch -T to enable so-called "taint mode." In this mode, Perl takes a variety of extra security precautions, but most especially it limits execution of commands arising from external input. If you use sudo execution, taint mode might be enabled automatically, but the safest thing is to start your administration scripts with:

#!/usr/local/bin/perl �T

Once you do this, all command line arguments, environment variables, locale information (see perllocale), results of certain system calls (readdir(), readlink(), the variable of shmread(), the messages returned by msgrcv(), the password, gcos and shell fields returned by the getpwxxx() calls), and all file input are marked as "tainted". Tainted data may not be used directly or indirectly in any command that invokes a sub-shell, nor in any command that modifies files, directories, or processes, with a few exceptions.

It's possible to "untaint" particular external values by (cautiously) checking them for expected patterns, e.g.:

if ($data =~ /^([-\@\w.]+)$/) {
   $data = $1;                     # $data now untainted
} else {
   die "Bad data in �$data�";      # log this somewhere
}

Perl CPAN packages

One of the handy things about Perl is that it comes with a very convenient mechanism for installing extra support packages, called "CPAN" (Comprehensive Perl Archive Network). RubyGems is similar in function. Python, unfortunately, does not yet have as automated of an installation mechanism (but comes with more in the default installation). Simpler languages like Bash and Awk do not really have many add-ons to install in an analogous sense.

The manpage on the cpan command is a good place to get started. In general, if you have a task to perform for which you think someone might have done most of the work already, you can look for candidate modules athttp://www.cpan.org/modules/.

The tool cpan has both an interactive shell and a command-line operation. Once configured (run the interactive shell once to be prompted on configuration options), cpan handles dependencies and download locations in an automated fashion. For example, suppose I discover that I have a system administration task that involves processing configuration files in YAML format. Installing support for YAML is as simple as:

% cpan -i YAML  # maybe with 'sudo' first

Once installed your scripts can contain use YAML; at top. Similarly for whatever capabilities you need that someone has created a package for.