David Mertz, Ph.D.
Professional Neophyte
August, 2005
Welcome to "System Customization and Automation", the seventh of eight tutorials designed to prepare you for LPI exam 201. In this tutorial you will learn some basic approaches to scripting and automating system events such as report and status generation, cleanup and general upkeep.
The Linux Professional Institute (LPI) certifies Linux system administrators at junior and intermediate levels. There are two exams at each certification level. This series of eight tutorials helps you prepare for the first of the two LPI intermediate level system administrator exams--LPI exam 201. A companion series of tutorials is available for the other intermediate level exam--LPI exam 202. Both exam 201 and exam 202 are required for intermediate level certification. Intermediate level certification is also known as certification level 2.
Each exam covers several or topics and each topic has a weight. The weight indicate the relative importance of each topic. Very roughly, expect more questions on the exam for topics with higher weight. The topics and their weights for LPI exam 201 are:
Topic 201: Linux Kernel (5) Topic 202: System Startup (5) Topic 203: Filesystems (10) Topic 204: Hardware (8) Topic 209: File Sharing Servers (8) Topic 211: System Maintenance (4) Topic 213: System Customization and Automation (3) Topic 214: Troubleshooting (6)
Welcome to "System Customization and Automation", the seventh of eight tutorials designed to prepare you for LPI exam 201. In this tutorial you will learn some basic approaches to scripting and automating system events such as report and status generation, cleanup and general upkeep.
To get the most from this tutorial, you should already have a basic knowledge of Linux and a working Linux system on which you can practice the commands covered in this tutorial.
One of the task categories a system administrator needs to perform is
the automation of various things that should occur either
periodically, or with minimal fuss when a recurrent need arises. For
automatic scheduling, your primary tools are cron
and at
. Tasks
themselves, whether regularly scheduled or manually launched can be
scripted with various languages, including bash
, awk
, perl
or
python
. Tools in the GNU text utilities are very frequently useful
as part of many processing tasks; these are most often used within
bash
scripts, since more sophisticated languages like awk
, perl
,
and python
build in most of the capabilities in the text utilities.
The daemon cron
is used to run commands periodically. You can use
cron
for all manner of system housekeeping and administration.
Anything you want to happen repeatedly on a schedule should be
controlled by cron
. cron
has a granularity of one minute--that is,
it wakes up once a minute to check if it needs to do anything, but
cannot peform tasks more than once per minute (if you want to do that,
you probably want a daemon of some sort, not a "cron job"). cron
logs
its action to the syslog facility.
There are several places where cron
searches for configuration files
that indicate environment settings and commands to run. The first
place is in /etc/crontab
. These are "system" tasks. As well, the
directory /etc/cron.d/
may contain multiple configuration files that
are treated as supplements to /etc/crontab
. Special packages may
add files (matching package name) to /etc/cron.d/
, but system
administrators should use /etc/crontab
.
User-level cron
configurations are stored in
/var/spool/cron/crontabs/$USER
. However, these should always be
configured using the tool crontab
. Using crontab
, users can
schedule their own recurrent tasks.
A special convention is used to jobs that should be run daily, weekly
or monthly, rather than on other more complex schedules. These are
probably the most common schedules in practice. Directories called
/etc/cron.daily/
, /etc/cron.weekly/
and /etc/cron.monthly/
are
created with collections of scripts to run on those respective
schedules. Adding or removing scripts from these directories is a
simple way to schedule system tasks. For example, a system I maintain
rotates logs daily simply by having a script file:
$ cat /etc/cron.daily/logrotate #!/bin/sh test -x /usr/sbin/logrotate || exit 0 /usr/sbin/logrotate /etc/logrotate.conf
anacron
can be used to execute commands periodically, with a
frequency specified in days. Unlike cron
, anacron
checks whether
each job has been executed in the last N days, where N is the period
specified for that job (as opposed to whether the current time matches
the scheduled execution). If not, anacron
runs the job�s command,
after waiting for the number of minutes specified as the delay
parameter. Hence, on machines that are not running continuously
periodic jobs will still be executed once the machine is actually
running (obviously, the exact timing might vary, but the task will not
be forgotten).
anacron
reads a list of jobs from the configuration file
/etc/anacrontab
. Each job entry specifies a period in days, a delay
in minutes, a unique job identifier, and a shell command. For example,
on one Linux system I maintain, anacron
is used to run daily, weekly
and monthly jobs even if the machine is not running at the scheduled
time of day:
$ cat /etc/anacrontab # /etc/anacrontab: configuration file for anacron SHELL=/bin/sh PATH=/sbin:/bin:/usr/sbin:/usr/bin # These replace cron's entries 1 5 cron.daily nice run-parts --report /etc/cron.daily 7 10 cron.weekly nice run-parts --report /etc/cron.weekly @monthly 15 cron.monthly nice run-parts --report /etc/cron.monthly
The format of /etc/crontab
(or the contents of /etc/cron.d/
files)
is slightly different from user crontab
files. Basically, this just
amounts to an extra field in /etc/crontab
that indicates the user a
command runs as. This is not needed for user crontab
files since
they are already stored in a file matching username
(/var/spool/cron/crontabs/$USER
).
Each line of /etc/crontab
either sets an environment variable or
configures a recurring job. Comment and blank lines are ignored. For
"cron jobs", the first five fields specify times to run, where each
zero-based field may have a list and/or a range. The fields are
minute, hour, day-of-month, month, day-of-week (space or tab
separated). An asterisk in any position indicates "any". For example,
to run a task at midnight on Tuesdays and Thursdays during August
through October, you could use:
# line in /etc/crontab 0 0 * 7-9 2,5 root /usr/local/bin/the-task -opt1 -opt2
Some common scheduling patterns have shortcut names that may be used in place of the first five fields:
@reboot Run once, at startup. @yearly Run once a year, "0 0 1 1 *". @annually (same as @yearly) @monthly Run once a month, "0 0 1 * *". @weekly Run once a week, "0 0 * * 0". @daily Run once a day, "0 0 * * *". @midnight (same as @daily) @hourly Run once an hour, "0 * * * *".
For example, you might have a configuration containing:
@hourly root /usr/local/bin/hourly-task 0,29 * * * * root /usr/local/bin/twice-hourly-task
To setup a user-level scheduled task, use the crontab
command (as
opposed to the /etc/crontab
file. Specifically, crontab -e
launches and editor to modify a file. You can list current jobs with
crontab -l
, and remove the file with crontab -r
. Optionally, you
can specify crontab -u user
to schedule tasks for a given user, but
the default is to do so for yourself (permission limits apply).
The file /etc/cron.allow
, if present, must contain the names of all
user allowed to schedule jobs. Alternately, if there is no
/etc/cron.allow
then a user must not be in /etc/cron.deny
if they
are allowed to schedule tasks (if neither file exists, everyone may
use crontab
).
If you need to run a task in the future rather than immediately, you
can use the command at
. The command at
takes a command either
from STDIN or from a file (using the -f
option), and accepts time
descriptions in a flexible collection of formats.
A family of commands are used in association with at
itself'. atq
lists pending tasks. atrm
removes a task fro the pending queue.
batch
works much like at
except it only runs jobs if the system
load is low when the job is requested, otherwise deferring run until a
low system load exists.
Much like with /etc/cron.allow
and /etc/cron.deny
, at
has
/etc/at.allow
and /etc/at.deny
to configure permissions for
running at
. The file /etc/at.allow
, if present, must contain the
names of all user allowed to schedule jobs. Alternately, if there is
no /etc/at.allow
then a user must not be in /etc/at.deny
if they
are allowed to schedule tasks (if neither file exists, everyone may
use crontab
).
See the manpage on your at
version for full details. You can specify
a particular time as HH:MM
, which will happen the next time that
time occurs (if it is passed today, it means tomorrow). If you use 12
hour time, you may add AM
or PM
. You may give a date as MMDDYY
or MM/DD/YY
or DD.MM.YY
or month-name day
. You may increment
from the current time with now + N units
, which N is a number and
units is minutes
, hours
, days
or weeks
. The words today
and
tomorrow
have their obvious meaning; as do midnight
and noon
(teatime
is 4pm). Some examples:
% at -f ./foo.sh 10am Jul 31 % echo 'bar -opt' | at 1:30 tomorrow
The exact definition of the time specification can be found in
/usr/share/doc/at/timespec
.
This relatively short tutorial cannot really touch on the ins-and-outs of even one major scripting language. A number of excellent books have been written on each of Awk, Perl, Bash, Python. The author, however, might particularly recommend his own Text Processing in Python as a good starting point for scripting in Python. Most scripts you will write for system administration are aimed at text manipulation: extracting values from logs and configuration files; generating reports and summaries; but also cleaning up system cruft and sending notifications of tasks performed.
The most common scripts in Linux system administration are written in
bash
. However, bash
itself has relatively few built-in
capabilities. Instead, bash
makes it particularly easy to utilize
external tools, both basic file utilities like ls
, find
, rm
,
cd
, and the like; but especially text tools like those found in the
GNU text utilities. A good introduction to these utilities can be
found in "Using the GNU text utilities"
(http://www-128.ibm.com/developerworks/edu/l-dw-linux-gnutex-i.html).
One particularly helpful setting to include in bash
scripts that run
on a schedule is the set -x
switch that echos the commands run to
STDERR. This is helpful in debugging scripts if they do not seem to
have the effects anticipated. Another useful option during testing is
set -n
that causes a script to look for syntax problems, but not
actually run. Obviously, you don't want a -n
version scheduled in
cron
or at
, but to get it working in the first place, it can help.
A cron
job that runs a bash
script might start something like:
#!/bin/bash exec 2>/tmp/my_stderr set -x # functional commands here
This redirects STDERR to a file, and outputs the commands run to STDERR. Examining that file later can be useful.
The manpage for bash
is quite good, though quite long. Of
particular interest are all the options that the builtin set
can
accept.
One of the common things you do in a system administration script is
process a collection of files, often with the files of interest
identified using the find
command. However, a problem can arise
when filenames contain whitespace, and especially newline characters.
Much of the looping and processing of filenames you are likely to do
can be confused by these internal whitespace characters. For example,
these two commands are different:
% rm foo bar baz bam % rm 'foo bar' 'baz bam'
The first unlinks four files (assuming they exist to start with); the second removes just two files, each with an internal space in the name. Filenames with spaces are particularly common among multimedia content.
Fortunately, the GNU version of the find
command has a a -print0
option to NULL terminate each result; and the xargs
command has a
corresponding -0
command to treat arguments as NULL separated.
Putting these together, you might cleanup stray files (that might
contain whitespace in their names) using:
#!/bin/bash # Cleanup some old files set -x find /home/dqm \( -name '*.core' -o -name '#*' \) -print0 \ | xargs -0 rm -f
Perl has a handy switch -T
to enable so-called "taint mode." In
this mode, Perl takes a variety of extra security precautions, but
most especially it limits execution of commands arising from external
input. If you use sudo
execution, taint mode might be enabled
automatically, but the safest thing is to start your administration
scripts with:
#!/usr/local/bin/perl �T
Once you do this, all command line arguments, environment variables, locale information (see perllocale), results of certain system calls (readdir(), readlink(), the variable of shmread(), the messages returned by msgrcv(), the password, gcos and shell fields returned by the getpwxxx() calls), and all file input are marked as "tainted". Tainted data may not be used directly or indirectly in any command that invokes a sub-shell, nor in any command that modifies files, directories, or processes, with a few exceptions.
It's possible to "untaint" particular external values by (cautiously) checking them for expected patterns, e.g.:
if ($data =~ /^([-\@\w.]+)$/) { $data = $1; # $data now untainted } else { die "Bad data in �$data�"; # log this somewhere }
One of the handy things about Perl is that it comes with a very convenient mechanism for installing extra support packages, called "CPAN" (Comprehensive Perl Archive Network). RubyGems is similar in function. Python, unfortunately, does not yet have as automated of an installation mechanism (but comes with more in the default installation). Simpler languages like Bash and Awk do not really have many add-ons to install in an analogous sense.
The manpage on the cpan
command is a good place to get started. In
general, if you have a task to perform for which you think someone
might have done most of the work already, you can look for candidate
modules athttp://www.cpan.org/modules/.
The tool cpan
has both an interactive shell and a command-line
operation. Once configured (run the interactive shell once to be
prompted on configuration options), cpan
handles dependencies and
download locations in an automated fashion. For example, suppose I
discover that I have a system administration task that involves
processing configuration files in YAML format. Installing support for
YAML is as simple as:
% cpan -i YAML # maybe with 'sudo' first
Once installed your scripts can contain use YAML;
at top.
Similarly for whatever capabilities you need that someone has created
a package for.