Advanced Bash Scripting

Taking your sh-fu to the next level

About the Instructor

Nathan Isburgh
- instructor@edgecloud.com
Unix user 15+ years, teaching it 10+ years
Unix Administration and Software Development Consultant
RHCE on RHEL 5 & 6
All around über-geek
Goofy, forgetful ( remember that )

About the Course

2 days, lecture/lab format
Hours: 8:30 - 5:00
Lunch: 11:45 – 1:00
Breaks about every hour
Throw something soft at me if I get too long in the tooth
Telephone policy
Take it outside, please
Restrooms
Across from central stairs
Refreshments
Downstairs in break room, mini-fridge in classroom, machines by stairs

About the Students

Name?
Time served, I mean employed, at Rackspace?
Department?
General Unix skill level? What about Linux?
And familiarity with Bash?
How do you use Linux in your position?
What are you hoping to take away from this class?

Expectations of Students

Strong foundation in basic Linux use and administration
- Preferably through RHCE
Strong understanding of working in the shell
Comfortable with topics from Intro to Bash Scripting
Ask Questions!
Complete the labs
Email if you’re going to be late/miss class
Have fun
Learn something

Scripting Best Practices

Overview

So you’re getting serious about scripting? You want the advanced stuff? That’s what you’re here for, right?
Well before we go too much further, we need to lay down some laws..
- Style guidelines
- Scripting best practices
I know, I know – you want to play with fire NOW!
But first, we need to learn some skills and practices that will make your scripts more readable, more maintainable and less buggy

Commenting

Remember from the introduction class:
Commenting falls under the larger topic of coding style
- Note that style is an individual attribute, developed over time as a software developer
- It is also often lightly or strictly specified by organization
To simplify this discussion, let us recall the Golden Rules of Commenting…

The Golden Rules of Commenting

Always comment code which is not obvious to a non-author reader
- You should not comment “i=i+1”
- You should comment “rsync –vazpc $WHAT $WHERE”
Always comment functions: their purpose, use, arguments, expectations and results
Always comment the overall program’s purpose and behavior at the top of the file.
Include dates and authors ( maybe an abbreviated revision history? )
Always comment when not sure if you should - They don’t cost anything!

Proper Script Structure

generally

#!Shebang!
#
# Script comment block ( purpose, arguments, rev history, etc )
#

# Config variables with comments
CONFIG_VAR1=”user can tweak this”

# END OF CONFIGURATION – NO EDITS BELOW THIS LINE

# Function definitions
fail() { echo boohoo ; exit -1 }

# Main code block
if [ $# -lt 2 ] ; then fail ; fi
...

Always Initialize Variables

You should always initialize your variables
- It looks cleaner, and for complex scripts, a short comment can be left indicating the purpose of the variable
- Security! If variables aren’t initialized, an educated user can easily pre-initialize a variable from the command line and cause all sorts of problems, some maybe nefarious!

Indentation

Ah yes, good old indentation
Many a bloody nerd war has erupted over disagreements on indentation styles
To avoid this same fate, let us agree on one simple rule:
Pick an indentation style, and stick to it 100% of the time
The possibilities are endless:
- Tabs, two spaces, four spaces? Suggest: 2 spaces
- Indent all the blocks, only the multiline blocks, or? Suggest: all
- Reserved words: same line, different lines, indented? Suggest: different lines, indent the blocks only
- Etc, etc, etc

Check Those Arguments

Users rarely do anything right – train yourself to expect that at all times, and you’ll write better code.
Case in point: Arguments
- Check for the expected number of arguments
- Check for the expected types of data: numbers, strings, flags
- Check argument values if appropriate, eg: if it is supposed to be a pathname, check that it’s valid and exists
On very large or complex scripts with many arguments, it might be prudent to consider an argument parsing library like getopt ( external program, some inconsistencies ) or getopts ( shell builtin, consistent but no long arguments )

Check Commands and Versions

If a script uses tools that are even remotely uncommon, it should check for their existence early on and error out if anything is missing
Along the same lines, if there are any feature expectations, or important bug fixes tied to a version of a tool, library or even the shell itself, those version details should be verified early on
- Note that this requires a judgment call – there is no need to check version information on every piece of software touched – just the ones that could be off. For example:
  - If a script relies on associative arrays, it should check that the bash interpreter is at least version 4 ( EL5 ships with v3! )

Assign Exit Codes

Exit codes can be extremely useful to the users of your script
At the very least, always exit 0 for success and non-zero for failure
Best case scenario: assign exit codes to different conditions, eg
1. invalid arguments
2. insufficient permissions
3. missing required software
4. httpd not running
5. unknown error

Write Common Functions

Write some common, useful functions, such as:
- fail(code, msg) – Prints message to stderr and exits with given code
- succeed() – Maybe print happy message, then exit 0
- cleanup() – For complex scripts, cleanup things like logs, locks, etc. Usually called from fail() and succeed()
- debug(msg) – Prints a debug message to stderr. Bonus: use a config variable and/or command line flag to control behavior
- usage() – Print a detailed usage message to the user if there is a mistake in arguments, or -h/-? Passed
Perhaps a good case for a library…

Speaking of stderr

USE IT! Correctly!
Recall:
- stdout – Normal command output/results
- stderr – Warnings, errors, fails of any kind
Quick and easy ways to output to stderr:
- printf blah > /dev/stderr
- printf blah >&2
This is one of the benefits of writing those common functions!

Command Substitution

Recall the awesomely powerful backtick, `
- It runs the command in backtacks, takes its stdout and substitutes it, minus any trailing newlines, onto the calling command line
- echo `whoami`
- becomes
- echo student
Very useful in many situations, and it is backwards compatible with some older shells
But…

Command Substitution

Try to avoid the backtick for command substitution
- It is not POSIX compliant
- It does not nest properly
- Quotes can be a serious pain
Instead, use the $() syntax:
- echo $(whoami)
Same behavior, but:
- POSIX compliant
- Nests
- Handles quotes much more simply

Lab

Put together a properly styled skeleton for a shell script, called skel.sh
This should include:
- All of the components discussed in lecture, and placeholders for the pieces which are not known yet ( like config variables )
- The various common functions
- Come up with at least five common script failures, and assign them default exit codes ( example: ‘invalid arguments’ assigned -2 )
Copy skel.sh to health-report.sh, with synopsis:
- ./health-report.sh [-td] email
- -t will email one output iteration from top to the email address
- -d will email the output of ‘df -h’ to the email address
- email is the email address for the recipient of the report

Advanced Variables

Special Variables

Recall that the shell has many special variables with useful information and settings
- Positional parameters ( arguments )
- Exit status of previous command
- Bash information
- Feature control variables ( IFS, OPT*, DIRSTACK, etc )
During future labs, be sure to peruse the bash man page sections on:
- Special Parameters - @, #, ?, $, -
- Shell Variables - LINENO, SECONDS, PIPESTATUS
- Parameter Substitution - ${#PATH}, ${INPUT:5:10}

Arrays

In addition to simple variables containing just strings and numbers, bash also supports array variables
An array is just a collection of values, all stored within one variable, logically:
- TEST -> val1,val2,val3,val4,val5
Traditionally, the different values in the array are referenced using numbers, called indexes, starting at zero:
- TEST[0] -> val1
- TEST[1] -> val2
- …
This is known as an Indexed Array

Indexed Array Example

# To create the array, just start assigning values:
MYDIRS[0]=”/”
MYDIRS[1]=”/home”
MYDIRS[2]=”/usr”

echo $MYDIRS
# will just show ”/” since that is first member

echo ${MYDIRS[1]}
# will show ”/home”
# Note that you must use the braced expansion syntax, due to
# overloading of the square bracket characters ( pathname wildcard )

echo ${#MYDIRS[*]}
# shows 3, since there are three values in the array

Associative Arrays

As of bash version 4, Associative Arrays are available
An associative array uses strings to get at values, as opposed to numbers
Associative arrays have to be created specially, using the declare builtin
declare -A MYDICTIONARY MYDICTIONARY[apple]=fruit MYDICTIONARY[carrot]=vegetable MYDICTIONARY[linux]=”Awesome operating system”

Lab

Copy skel.sh to proc-count.sh and implement as:
- proc-count.sh [-f filter]... [-c] email
This script will count processes with command names that match one or more filters, emailing one of two possible reports, either a TSV ( which is default ) or a CSV ( selected with the -c flag )
- processname count
- Or
- processname,count
If no filter is given, all processes should be reported
Use arrays to track filters and results

Advanced Expansions

Overview

An expansion occurs when the shell acts on metacharacters in a command to automatically expand their contents based on rules, generally so the user does not have to type as much ( wildcards ), can reference variables and more
There are seven different kinds of expansions in bash:
- Brace expansion, tilde expansion, parameter/variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion
On operating systems that support named pipes ( like Linux! ), there is one additional form, known as process substitution

Brace Expansion

Brace expansion allows for the automatic creation of arbitrary strings
Consider:
- $ echo a{1..5}b a1b a2b a3b a4b a5b
- $ echo a{f,h,g}b afb ahb agb
As seen in the examples, you can expand ranges of numbers or letters, as well as comma separated lists of values

Tilde Expansion

You should already be familiar with tilde expansion, which evaluate to user home directories:
- $ echo ~ /home/student
- $ echo ~alice /home/alice
What you might not know is that tilde can be used to reference current directories ( ~+ ) and previous directories ( ~- ):
- $ cd /home ; cd / ; echo ~+ ; echo ~- / /home
- Started in /home, then moved to /. ~+ expanded to /, ~- expanded to /home

Parameter/Variable Expansion

This topic was covered in depth during the intro bash scripting class
Quick reminder:
- $PATH
- ${PATH}
The second form is more precise, and should generally be used anytime a variable reference is embedded within additional content, to protect from misinterpretation
- Also note, the curly brace expansion syntax allows for extremely powerful capabilities, including arrays, searching, substrings, character counts, case manipulation and more

Command Substitution

Command substitution is incredibly useful, as it instructs the shell to run a given command in a new shell, and capture its output in some particular manner
Recall the backtick and $() from an earlier lecture:
- echo `whoami`
- echo $(whoami)
whoami will be run from a new shell, and it’s standard output, minus any trailing newlines, will be substituted into the quoted/parenthesis section of the command line, which is then executed from the main shell, as:
- echo student

Arithmetic Expansion

Sometimes, it’s incredibly useful to have the shell perform some simple math, and it’s also incredibly easy to use:
- echo $((6*8))
- 48
Bash has a slew of operations available, including add/subtract/multiply/divide, exponentiation, bitwise operations including shifts, negations and logical operations, increments, decrements and more
See the manpage under Arithmetic Evaluation

Word Splitting

Word splitting is an interesting feature of the shell, that allows it to identify words within a parameter expansion, command substitution and arithmetic evaluation, and then split them out
There is a shell variable known as IFS, which stands for Internal Field Separator
- This variable defines the characters which can separate words, and the default IFS is ‘<space><tab><newline>’
- Also note that the first character of IFS is used to separate the found words during splitting
Try the following:
- echo $(w)

Pathname Expansion

Pathname expansion is nerd-speak for how wildcards work in the shell
This shouldn’t require review, but recall the three wildcards:
- *
- ?
- [set]

Process Substitution

Process substitution is a very neat shorthand for dynamically creating named pipes which are used for input or output

Consider the first form:

cat /etc/passwd <(w) <(df -h) <(uname -a) > report

The <() syntax creates the process substitutions
What’s really going on here, is that the inner command is executed, with its stdout connected to a named pipe dynamically created under /dev/fd
That pathname is then substituted on the outer command line, which becomes an argument, and in this case, cat simply reads from the /dev/fd file like any other

Process Substitution

The second form of process substitution is similar, except the other direction
The /dev/fd file is created to accept input from the outer command, and the file is attached as stdin on the inner command
Consider:
```
tar cf - . > >(gzip -9c > crazy.tgz)
```
It looks crazy, but just step through the operations
- tar is outputting to stdout, which is redirected to the process substitution ( which in reality is a /dev/fd pathname )
- gzip is reading from stdin, which is the /dev/fd path

Sample Code

Next, we will spend some time breaking down and understanding a number of commonly used scripts from Rackspace
These scripts are available at:
- http://rackspace.edgecloud.com/adv-bash-scripting

Lab

Modify health-report.sh from the earlier lab:
- Add a new flag, -m, to create a list of process names and memory percentages, sorted descending by memory usage.
- Also, add a -c flag to indicate “collect only” mode. The user should not need to supply an email in this mode. In this mode, the script should produce the requested reports ( from the other flags ), but instead of emailing them immediately, it should collect them in a file under /tmp called health-report.YYYY-MM-DD
  - You can simply append each new report to the file, but include a header in front of each new report that has the date/time
- Finally, add a -r flag which accepts a date in YYYY-MM-DD form, and emails the requested report to the supplied email address
- Make sure to produce meaningful error messages for all failures

Additional Topics

Overview

There are a few other topics that should be covered, but did not fall under any of the previous topics
- Here documents
- Subshell executions
- Command separators
- Conditionals with the shell

Here documents

Here documents are a really convenient way to enter multiple lines of text at the command line, or from within a shell script

Usage is fairly simple:

cat << samp
Everything I type will go to 
stdin of the command
Until a line with just samp
samp

Just be careful about spacing – everything is literal, and the delimiter ( samp in this example ) must not have anything else on the line
Also see <<- for indenting here documents

Subshell Executions

Sometimes, it is convenient to execute a command within a subshell, which isolates it from the current shell
- It can not impact the environment or working directory of the current shell
- You can treat the subshell as an individual command, using redirection and pipes as needed
Simple example:
( cd /home ; ls a* ) | wc –l
This will list a count of the home directories starting with the letter a. The cd did not change the working directory of the main shell

Command Separators

There are several ways to separate commands:
- Semicolon ( ; )
  - This separates commands and does not provide any relation between the commands. They are simply executed one after another, left to right.
- Ampersand ( & )
  - This puts the left command in the background and starts executing the next command immediately
- Double Ampersand ( && )
  - This will execute the right command if the left command exited with a zero/success
- Double pipe ( || )
  - This will execute the right command if the left command exited with a non-zero/fail

Shell Conditionals

Very similar to the test command, the shell can perform quite a few simple condition checks using the [[expression]] syntax

Consider:

[[ -d /tmp/mytool ]] && mv logfile /tmp/mytool

See Conditional Expressions in the manpage for a complete reference on all of the available tests

Useful Tools in Scripting

Overview

There are, of course, many, many tools to use while scripting, but some are more powerful, or more frequently used
We will overview three of these tools now:
- awk
- sed
- xargs

awk

awk is an incredibly powerful tool, which contains it’s own programming language
One of the most commonly used features of awk, is to grab particular columns of information from stdin

Consider the columns from ps aux:

USER     PID %CPU %MEM    VSZ   RSS TTY    STAT START   TIME COMMAND

Using awk, grabbing particular columns is easy!

ps aux | awk ‘{print $2” is using “$3”% of the CPU”}’

Let’s play with awk for a few minutes

sed

sed stands for “stream editor” and that is exactly what it does
sed takes an expression describing an operation to perform, and then applies that operation to each line of input

It is commonly used to perform find/replace operations:

sed -e ‘s/example.com/mycompany.com/’ /etc/httpd/conf/httpd.conf

This example replaces every occurrence of example.com with mycompany.com
sed can do way, waaaay more. Consult a google or get the O’Reilly book: “Sed and Awk”

xargs

xargs is another very useful tool at the command line, and in scripting
It takes a second to wrap your head around what xargs does:
- Accepts input from stdin
- For each line or lines of input, run a given command with the input lines as arguments for the command

For example:

ps aux | fgrep bad_cron | awk ‘{print $2}’ | xargs kill

Let’s discuss what’s happening with the example

Getting Fancy

Overview

Finally, a few topics to get fancy!
- Trapping signals
- Terminal codes to get colors and special modes
- Automagic logging with coproc

Trapping Signals

Sometimes, it’s useful to react to signals when they get delivered to your script
This is easily done with the trap command:
```
trap “echo DING” ALRM
kill -ALRM $$
```
This instructs the shell to run the echo command when an alarm signal is delivered
This technique is commonly used to trigger cleanup routines when the script is interrupted
- See Sample4.sh for a good illustration of this technique

Terminal codes

Most terminals support various colors and modes to display information to the user
If you find the codes for the connected terminal, you can output text with different foreground and background colors, blinking, dim, underlined and more
A common technique for this is to use hard coded codes in your strings:
```
echo -e ”\033[31mRed\033[39m and \033[32mGreen\033[39m”
```
This gets hard to read and do correctly, so variables are commonly employed

Terminal Codes with Variables

Using Variables:

RED=“\033[31m”
GREEN=“\033[32m”
NORMAL=“\033[39m”
echo -e “${RED}Red${NORMAL} and ${GREEN}Green${NORMAL}”

With variables, things are a little easier to read, and the codes can be changed with the terminal
Could functions help here too?
Check out nifty-colors.sh on the example server
Also:
- http://wiki.bash-hackers.org/scripting/terminalcodes

Automagic Logging

This is a really neat trick to attach stdout of your script to both the terminal and a logfile at the same time

#!/bin/bash
# we start tee in the background
# redirecting its output to the stdout of the script
{ coproc tee { tee logfile ;} >&3 ;} 3>&1 
# we redirect stding and stdout of the script to our coprocess
exec >&${tee[1]} 2>&1

Final Lab

Copy skel.sh to kill-thread.sh
Implement kill-thread.sh to kill mysql connections based on certain parameters:
```
kill-thread.sh [-u user] [-h host] [-d db] [-c command]
```
Just use mysqladmin, and assume there is no root password, or it is supplied by ~/.my.cnf automatically
Just do simple searches by the various columns, and if the user supplies more than one flag, all must match to kill the connection
```
./kill-thread.sh -h localhost -d test
```
- Would kill anyone connected to the test database from localhost
Snaz up the output with colors!