TUTORIAL FOR LPI EXAM 201: part 7 Topic 213: System Customization and Automation David Mertz, Ph.D. Professional Neophyte August, 2005 Welcome to "System Customization and Automation", the seventh of eight tutorials designed to prepare you for LPI exam 201. In this tutorial you will learn some basic approaches to scripting and automating system events such as report and status generation, cleanup and general upkeep. BEFORE YOU START ------------------------------------------------------------------------ About this series The Linux Professional Institute (LPI) certifies Linux system administrators at junior and intermediate levels. There are two exams at each certification level. This series of eight tutorials helps you prepare for the first of the two LPI intermediate level system administrator exams--LPI exam 201. A companion series of tutorials is available for the other intermediate level exam--LPI exam 202. Both exam 201 and exam 202 are required for intermediate level certification. Intermediate level certification is also known as certification level 2. Each exam covers several or topics and each topic has a weight. The weight indicate the relative importance of each topic. Very roughly, expect more questions on the exam for topics with higher weight. The topics and their weights for LPI exam 201 are: * Topic 201: Linux Kernel (5) * Topic 202: System Startup (5) * Topic 203: Filesystems (10) * Topic 204: Hardware (8) * Topic 209: File Sharing Servers (8) * Topic 211: System Maintenance (4) * Topic 213: System Customization and Automation (3) * Topic 214: Troubleshooting (6) About this tutorial Welcome to "System Customization and Automation", the seventh of eight tutorials designed to prepare you for LPI exam 201. In this tutorial you will learn some basic approaches to scripting and automating system events such as report and status generation, cleanup and general upkeep. Prerequisites To get the most from this tutorial, you should already have a basic knowledge of Linux and a working Linux system on which you can practice the commands covered in this tutorial. About customization and automation One of the task categories a system administrator needs to perform is the automation of various things that should occur either periodically, or with minimal fuss when a recurrent need arises. For automatic scheduling, your primary tools are 'cron' and 'at'. Tasks themselves, whether regularly scheduled or manually launched can be scripted with various languages, including 'bash', 'awk', 'perl' or 'python'. Tools in the GNU text utilities are very frequently useful as part of many processing tasks; these are most often used within 'bash' scripts, since more sophisticated languages like 'awk', 'perl', and 'python' build in most of the capabilities in the text utilities. SCHEDULING PERIODIC TASKS Configuring cron The daemon 'cron' is used to run commands periodically. You can use 'cron' for all manner of system housekeeping and administration. Anything you want to happen repeatedly on a schedule should be controlled by 'cron'. 'cron' has a granularity of one minute--that is, it wakes up once a minute to check if it needs to do anything, but cannot peform tasks more than once per minute (if you want to do that, you probably want a daemon of some sort, not a "cron job"). 'cron' logs its action to the syslog facility. There are several places where 'cron' searches for configuration files that indicate environment settings and commands to run. The first place is in '/etc/crontab'. These are "system" tasks. As well, the directory '/etc/cron.d/' may contain multiple configuration files that are treated as supplements to '/etc/crontab'. Special packages may add files (matching package name) to '/etc/cron.d/', but system administrators should use '/etc/crontab'. User-level 'cron' configurations are stored in '/var/spool/cron/crontabs/$USER'. However, these should always be configured using the tool 'crontab'. Using 'crontab', users can schedule their own recurrent tasks. Daily, weekly and monthly jobs A special convention is used to jobs that should be run daily, weekly or monthly, rather than on other more complex schedules. These are probably the most common schedules in practice. Directories called '/etc/cron.daily/', '/etc/cron.weekly/' and '/etc/cron.monthly/' are created with collections of scripts to run on those respective schedules. Adding or removing scripts from these directories is a simple way to schedule system tasks. For example, a system I maintain rotates logs daily simply by having a script file: $ cat /etc/cron.daily/logrotate #!/bin/sh test -x /usr/sbin/logrotate || exit 0 /usr/sbin/logrotate /etc/logrotate.conf cron and anacron 'anacron' can be used to execute commands periodically, with a frequency specified in days. Unlike 'cron', 'anacron' checks whether each job has been executed in the last N days, where N is the period specified for that job (as opposed to whether the current time matches the scheduled execution). If not, 'anacron' runs the jobÕs command, after waiting for the number of minutes specified as the delay parameter. Hence, on machines that are not running continuously periodic jobs will still be executed once the machine is actually running (obviously, the exact timing might vary, but the task will not be forgotten). 'anacron' reads a list of jobs from the configuration file '/etc/anacrontab'. Each job entry specifies a period in days, a delay in minutes, a unique job identifier, and a shell command. For example, on one Linux system I maintain, 'anacron' is used to run daily, weekly and monthly jobs even if the machine is not running at the scheduled time of day: $ cat /etc/anacrontab # /etc/anacrontab: configuration file for anacron SHELL=/bin/sh PATH=/sbin:/bin:/usr/sbin:/usr/bin # These replace cron's entries 1 5 cron.daily nice run-parts --report /etc/cron.daily 7 10 cron.weekly nice run-parts --report /etc/cron.weekly @monthly 15 cron.monthly nice run-parts --report /etc/cron.monthly The contents of a crontab The format of '/etc/crontab' (or the contents of '/etc/cron.d/' files) is slightly different from user 'crontab' files. Basically, this just amounts to an extra field in '/etc/crontab' that indicates the user a command runs as. This is not needed for user 'crontab' files since they are already stored in a file matching username ('/var/spool/cron/crontabs/$USER'). Each line of '/etc/crontab' either sets an environment variable or configures a recurring job. Comment and blank lines are ignored. For "cron jobs", the first five fields specify times to run, where each zero-based field may have a list and/or a range. The fields are minute, hour, day-of-month, month, day-of-week (space or tab separated). An asterisk in any position indicates "any". For example, to run a task at midnight on Tuesdays and Thursdays during August through October, you could use: # line in /etc/crontab 0 0 * 7-9 2,5 root /usr/local/bin/the-task -opt1 -opt2 Special scheduling values Some common scheduling patterns have shortcut names that may be used in place of the first five fields: @reboot Run once, at startup. @yearly Run once a year, "0 0 1 1 *". @annually (same as @yearly) @monthly Run once a month, "0 0 1 * *". @weekly Run once a week, "0 0 * * 0". @daily Run once a day, "0 0 * * *". @midnight (same as @daily) @hourly Run once an hour, "0 * * * *". For example, you might have a configuration containing: @hourly root /usr/local/bin/hourly-task 0,29 * * * * root /usr/local/bin/twice-hourly-task Using crontab To setup a user-level scheduled task, use the 'crontab' command (as opposed to the '/etc/crontab' file. Specifically, 'crontab -e' launches and editor to modify a file. You can list current jobs with 'crontab -l', and remove the file with 'crontab -r'. Optionally, you can specify 'crontab -u user' to schedule tasks for a given user, but the default is to do so for yourself (permission limits apply). The file '/etc/cron.allow', if present, must contain the names of all user allowed to schedule jobs. Alternately, if there is no '/etc/cron.allow' then a user must not be in '/etc/cron.deny' if they are allowed to schedule tasks (if neither file exists, everyone may use 'crontab'). SCHEDULING ONE-TIME TASKS at and friends If you need to run a task in the future rather than immediately, you can use the command 'at'. The command 'at' takes a command either from STDIN or from a file (using the '-f' option), and accepts time descriptions in a flexible collection of formats. A family of commands are used in association with 'at' itself'. 'atq' lists pending tasks. 'atrm' removes a task fro the pending queue. 'batch' works much like 'at' except it only runs jobs if the system load is low when the job is requested, otherwise deferring run until a low system load exists. Permissions Much like with '/etc/cron.allow' and '/etc/cron.deny', 'at' has '/etc/at.allow' and '/etc/at.deny' to configure permissions for running 'at'. The file '/etc/at.allow', if present, must contain the names of all user allowed to schedule jobs. Alternately, if there is no '/etc/at.allow' then a user must not be in '/etc/at.deny' if they are allowed to schedule tasks (if neither file exists, everyone may use 'crontab'). Time specifications See the manpage on your 'at' version for full details. You can specify a particular time as 'HH:MM', which will happen the next time that time occurs (if it is passed today, it means tomorrow). If you use 12 hour time, you may add 'AM' or 'PM'. You may give a date as 'MMDDYY' or 'MM/DD/YY' or 'DD.MM.YY' or 'month-name day'. You may increment from the current time with 'now + N units', which N is a number and units is 'minutes', 'hours', 'days' or 'weeks'. The words 'today' and 'tomorrow' have their obvious meaning; as do 'midnight' and 'noon' ('teatime' is 4pm). Some examples: % at -f ./foo.sh 10am Jul 31 % echo 'bar -opt' | at 1:30 tomorrow The exact definition of the time specification can be found in '/usr/share/doc/at/timespec'. SCRIPTING TIPS Outside resources This relatively short tutorial cannot really touch on the ins-and-outs of even one major scripting language. A number of excellent books have been written on each of Awk, Perl, Bash, Python. The author, however, might particularly recommend his own _Text Processing in Python_ as a good starting point for scripting in Python. Most scripts you will write for system administration are aimed at text manipulation: extracting values from logs and configuration files; generating reports and summaries; but also cleaning up system cruft and sending notifications of tasks performed. The most common scripts in Linux system administration are written in 'bash'. However, 'bash' itself has relatively few built-in capabilities. Instead, 'bash' makes it particularly easy to utilize external tools, both basic file utilities like 'ls', 'find', 'rm', 'cd', and the like; but especially text tools like those found in the GNU text utilities. A good introduction to these utilities can be found in "Using the GNU text utilities" (http://www-128.ibm.com/developerworks/edu/l-dw-linux-gnutex-i.html). Bash tips (part one) One particularly helpful setting to include in 'bash' scripts that run on a schedule is the 'set -x' switch that echos the commands run to STDERR. This is helpful in debugging scripts if they do not seem to have the effects anticipated. Another useful option during testing is 'set -n' that causes a script to look for syntax problems, but not actually run. Obviously, you don't want a '-n' version scheduled in 'cron' or 'at', but to get it working in the first place, it can help. A 'cron' job that runs a 'bash' script might start something like: #!/bin/bash exec 2>/tmp/my_stderr set -x # functional commands here This redirects STDERR to a file, and outputs the commands run to STDERR. Examining that file later can be useful. The manpage for 'bash' is quite good, though quite long. Of particular interest are all the options that the builtin 'set' can accept. Bash tips (part two) One of the common things you do in a system administration script is process a collection of files, often with the files of interest identified using the 'find' command. However, a problem can arise when filenames contain whitespace, and especially newline characters. Much of the looping and processing of filenames you are likely to do can be confused by these internal whitespace characters. For example, these two commands are different: % rm foo bar baz bam % rm 'foo bar' 'baz bam' The first unlinks four files (assuming they exist to start with); the second removes just two files, each with an internal space in the name. Filenames with spaces are particularly common among multimedia content. Fortunately, the GNU version of the 'find' command has a a '-print0' option to NULL terminate each result; and the 'xargs' command has a corresponding '-0' command to treat arguments as NULL separated. Putting these together, you might cleanup stray files (that might contain whitespace in their names) using: #!/bin/bash # Cleanup some old files set -x find /home/dqm \( -name '*.core' -o -name '#*' \) -print0 \ | xargs -0 rm -f Perl taint mode Perl has a handy switch '-T' to enable so-called "taint mode." In this mode, Perl takes a variety of extra security precautions, but most especially it limits execution of commands arising from external input. If you use 'sudo' execution, taint mode might be enabled automatically, but the safest thing is to start your administration scripts with: #!/usr/local/bin/perl ÐT Once you do this, all command line arguments, environment variables, locale information (see perllocale), results of certain system calls (readdir(), readlink(), the variable of shmread(), the messages returned by msgrcv(), the password, gcos and shell fields returned by the getpwxxx() calls), and all file input are marked as "tainted". Tainted data may not be used directly or indirectly in any command that invokes a sub-shell, nor in any command that modifies files, directories, or processes, with a few exceptions. It's possible to "untaint" particular external values by (cautiously) checking them for expected patterns, e.g.: if ($data =~ /^([-\@\w.]+)$/) { $data = $1; # $data now untainted } else { die "Bad data in Õ$dataÕ"; # log this somewhere } Perl CPAN packages One of the handy things about Perl is that it comes with a very convenient mechanism for installing extra support packages, called "CPAN" (Comprehensive Perl Archive Network). RubyGems is similar in function. Python, unfortunately, does not yet have as automated of an installation mechanism (but comes with more in the default installation). Simpler languages like Bash and Awk do not really have many add-ons to install in an analogous sense. The manpage on the 'cpan' command is a good place to get started. In general, if you have a task to perform for which you think someone might have done most of the work already, you can look for candidate modules at http://www.cpan.org/modules/. The tool 'cpan' has both an interactive shell and a command-line operation. Once configured (run the interactive shell once to be prompted on configuration options), 'cpan' handles dependencies and download locations in an automated fashion. For example, suppose I discover that I have a system administration task that involves processing configuration files in YAML format. Installing support for YAML is as simple as: % cpan -i YAML # maybe with 'sudo' first Once installed your scripts can contain 'use YAML;' at top. Similarly for whatever capabilities you need that someone has created a package for.