Advanced Bash Scripting


Taking your sh-fu to the next level

About the Instructor


  • Nathan Isburgh
    • instructor@edgecloud.com
  • Unix user 15+ years, teaching it 10+ years
  • Unix Administration and Software Development Consultant
  • RHCE on RHEL 5 & 6
  • All around über-geek
  • Goofy, forgetful ( remember that )

About the Course


  • 2 days, lecture/lab format
    Hours: 8:30 - 5:00
    Lunch: 11:45 – 1:00
  • Breaks about every hour
    Throw something soft at me if I get too long in the tooth
  • Telephone policy
    Take it outside, please
  • Restrooms
    Across from central stairs
  • Refreshments
    Downstairs in break room, mini-fridge in classroom, machines by stairs

About the Students


  • Name?
  • Time served, I mean employed, at Rackspace?
  • Department?
  • General Unix skill level? What about Linux?
  • And familiarity with Bash?
  • How do you use Linux in your position?
  • What are you hoping to take away from this class?

Expectations of Students


  • Strong foundation in basic Linux use and administration
    • Preferably through RHCE
  • Strong understanding of working in the shell
  • Comfortable with topics from Intro to Bash Scripting
  • Ask Questions!
  • Complete the labs
  • Email if you’re going to be late/miss class
  • Have fun
  • Learn something

Scripting Best Practices


Overview


  • So you’re getting serious about scripting? You want the advanced stuff? That’s what you’re here for, right?
  • Well before we go too much further, we need to lay down some laws..
    • Style guidelines
    • Scripting best practices
  • I know, I know – you want to play with fire NOW!
  • But first, we need to learn some skills and practices that will make your scripts more readable, more maintainable and less buggy

Commenting


  • Remember from the introduction class:
  • Commenting falls under the larger topic of coding style
    • Note that style is an individual attribute, developed over time as a software developer
    • It is also often lightly or strictly specified by organization
  • To simplify this discussion, let us recall the Golden Rules of Commenting…

The Golden Rules of Commenting

  • Always comment code which is not obvious to a non-author reader
    • You should not comment “i=i+1
    • You should comment “rsync –vazpc $WHAT $WHERE
  • Always comment functions: their purpose, use, arguments, expectations and results
  • Always comment the overall program’s purpose and behavior at the top of the file.
    Include dates and authors ( maybe an abbreviated revision history? )
  • Always comment when not sure if you should - They don’t cost anything!

Proper Script Structure


    Scripts should generally be laid out as:
    #!Shebang!
    #
    # Script comment block ( purpose, arguments, rev history, etc )
    #
    
    # Config variables with comments
    CONFIG_VAR1=”user can tweak this”
    
    # END OF CONFIGURATION – NO EDITS BELOW THIS LINE
    
    # Function definitions
    fail() { echo boohoo ; exit -1 }
    
    # Main code block
    if [ $# -lt 2 ] ; then fail ; fi
    ...
    

Always Initialize Variables


  • You should always initialize your variables
    • It looks cleaner, and for complex scripts, a short comment can be left indicating the purpose of the variable
    • Security! If variables aren’t initialized, an educated user can easily pre-initialize a variable from the command line and cause all sorts of problems, some maybe nefarious!

Indentation


  • Ah yes, good old indentation

    Many a bloody nerd war has erupted over disagreements on indentation styles

  • To avoid this same fate, let us agree on one simple rule:

    Pick an indentation style, and stick to it 100% of the time

  • The possibilities are endless:
    • Tabs, two spaces, four spaces? Suggest: 2 spaces
    • Indent all the blocks, only the multiline blocks, or? Suggest: all
    • Reserved words: same line, different lines, indented? Suggest: different lines, indent the blocks only
    • Etc, etc, etc

Check Those Arguments


  • Users rarely do anything right – train yourself to expect that at all times, and you’ll write better code.
  • Case in point: Arguments
    • Check for the expected number of arguments
    • Check for the expected types of data: numbers, strings, flags
    • Check argument values if appropriate, eg: if it is supposed to be a pathname, check that it’s valid and exists
  • On very large or complex scripts with many arguments, it might be prudent to consider an argument parsing library like getopt ( external program, some inconsistencies ) or getopts ( shell builtin, consistent but no long arguments )

Check Commands and Versions


  • If a script uses tools that are even remotely uncommon, it should check for their existence early on and error out if anything is missing
  • Along the same lines, if there are any feature expectations, or important bug fixes tied to a version of a tool, library or even the shell itself, those version details should be verified early on
    • Note that this requires a judgment call – there is no need to check version information on every piece of software touched – just the ones that could be off. For example:
      • If a script relies on associative arrays, it should check that the bash interpreter is at least version 4 ( EL5 ships with v3! )

Assign Exit Codes


  • Exit codes can be extremely useful to the users of your script
  • At the very least, always exit 0 for success and non-zero for failure
  • Best case scenario: assign exit codes to different conditions, eg
    1. invalid arguments
    2. insufficient permissions
    3. missing required software
    4. httpd not running
    5. unknown error

Write Common Functions


  • Write some common, useful functions, such as:
    • fail(code, msg) – Prints message to stderr and exits with given code
    • succeed() – Maybe print happy message, then exit 0
    • cleanup() – For complex scripts, cleanup things like logs, locks, etc. Usually called from fail() and succeed()
    • debug(msg) – Prints a debug message to stderr. Bonus: use a config variable and/or command line flag to control behavior
    • usage() – Print a detailed usage message to the user if there is a mistake in arguments, or -h/-? Passed
  • Perhaps a good case for a library

Speaking of stderr


  • USE IT! Correctly!
  • Recall:
    • stdout – Normal command output/results
    • stderr – Warnings, errors, fails of any kind
  • Quick and easy ways to output to stderr:
    • printf blah > /dev/stderr
    • printf blah >&2
  • This is one of the benefits of writing those common functions!

Command Substitution


  • Recall the awesomely powerful backtick, `
    • It runs the command in backtacks, takes its stdout and substitutes it, minus any trailing newlines, onto the calling command line
    • echo `whoami`
    • becomes
    • echo student
  • Very useful in many situations, and it is backwards compatible with some older shells
  • But…

Command Substitution


  • Try to avoid the backtick for command substitution
    • It is not POSIX compliant
    • It does not nest properly
    • Quotes can be a serious pain
  • Instead, use the $() syntax:
    • echo $(whoami)
  • Same behavior, but:
    • POSIX compliant
    • Nests
    • Handles quotes much more simply

Lab


  • Put together a properly styled skeleton for a shell script, called skel.sh
  • This should include:
    • All of the components discussed in lecture, and placeholders for the pieces which are not known yet ( like config variables )
    • The various common functions
    • Come up with at least five common script failures, and assign them default exit codes ( example: ‘invalid arguments’ assigned -2 )
  • Copy skel.sh to health-report.sh, with synopsis:
    • ./health-report.sh [-td] email
    • -t will email one output iteration from top to the email address
    • -d will email the output of ‘df -h’ to the email address
    • email is the email address for the recipient of the report

Advanced Variables

Special Variables


  • Recall that the shell has many special variables with useful information and settings
    • Positional parameters ( arguments )
    • Exit status of previous command
    • Bash information
    • Feature control variables ( IFS, OPT*, DIRSTACK, etc )
  • During future labs, be sure to peruse the bash man page sections on:
    • Special Parameters - @, #, ?, $, -
    • Shell Variables - LINENO, SECONDS, PIPESTATUS
    • Parameter Substitution - ${#PATH}, ${INPUT:5:10}

Arrays


  • In addition to simple variables containing just strings and numbers, bash also supports array variables
  • An array is just a collection of values, all stored within one variable, logically:
    • TEST -> val1,val2,val3,val4,val5
  • Traditionally, the different values in the array are referenced using numbers, called indexes, starting at zero:
    • TEST[0] -> val1
    • TEST[1] -> val2
  • This is known as an Indexed Array

Indexed Array Example


# To create the array, just start assigning values:
MYDIRS[0]=”/”
MYDIRS[1]=”/home”
MYDIRS[2]=”/usr”

echo $MYDIRS
# will just show ”/” since that is first member

echo ${MYDIRS[1]}
# will show ”/home”
# Note that you must use the braced expansion syntax, due to
# overloading of the square bracket characters ( pathname wildcard )

echo ${#MYDIRS[*]}
# shows 3, since there are three values in the array

Associative Arrays


  • As of bash version 4, Associative Arrays are available
  • An associative array uses strings to get at values, as opposed to numbers
  • Associative arrays have to be created specially, using the declare builtin
  • declare -A MYDICTIONARY MYDICTIONARY[apple]=fruit MYDICTIONARY[carrot]=vegetable MYDICTIONARY[linux]=”Awesome operating system”

Lab


  • Copy skel.sh to proc-count.sh and implement as:
    • proc-count.sh [-f filter]... [-c] email
  • This script will count processes with command names that match one or more filters, emailing one of two possible reports, either a TSV ( which is default ) or a CSV ( selected with the -c flag )
    • processname count
    • Or
    • processname,count
  • If no filter is given, all processes should be reported
  • Use arrays to track filters and results

Advanced Expansions

Overview


  • An expansion occurs when the shell acts on metacharacters in a command to automatically expand their contents based on rules, generally so the user does not have to type as much ( wildcards ), can reference variables and more
  • There are seven different kinds of expansions in bash:
    • Brace expansion, tilde expansion, parameter/variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion
  • On operating systems that support named pipes ( like Linux! ), there is one additional form, known as process substitution

Brace Expansion


  • Brace expansion allows for the automatic creation of arbitrary strings
  • Consider:
    • $ echo a{1..5}b a1b a2b a3b a4b a5b
    • $ echo a{f,h,g}b afb ahb agb
  • As seen in the examples, you can expand ranges of numbers or letters, as well as comma separated lists of values

Tilde Expansion


  • You should already be familiar with tilde expansion, which evaluate to user home directories:
    • $ echo ~ /home/student
    • $ echo ~alice /home/alice
  • What you might not know is that tilde can be used to reference current directories ( ~+ ) and previous directories ( ~- ):
    • $ cd /home ; cd / ; echo ~+ ; echo ~- / /home
    • Started in /home, then moved to /. ~+ expanded to /, ~- expanded to /home

Parameter/Variable Expansion


  • This topic was covered in depth during the intro bash scripting class
  • Quick reminder:
    • $PATH
    • ${PATH}
  • The second form is more precise, and should generally be used anytime a variable reference is embedded within additional content, to protect from misinterpretation
    • Also note, the curly brace expansion syntax allows for extremely powerful capabilities, including arrays, searching, substrings, character counts, case manipulation and more

Command Substitution


  • Command substitution is incredibly useful, as it instructs the shell to run a given command in a new shell, and capture its output in some particular manner
  • Recall the backtick and $() from an earlier lecture:
    • echo `whoami`
    • echo $(whoami)
  • whoami will be run from a new shell, and it’s standard output, minus any trailing newlines, will be substituted into the quoted/parenthesis section of the command line, which is then executed from the main shell, as:
    • echo student

Arithmetic Expansion


  • Sometimes, it’s incredibly useful to have the shell perform some simple math, and it’s also incredibly easy to use:
    • echo $((6*8))
    • 48
  • Bash has a slew of operations available, including add/subtract/multiply/divide, exponentiation, bitwise operations including shifts, negations and logical operations, increments, decrements and more
  • See the manpage under Arithmetic Evaluation

Word Splitting


  • Word splitting is an interesting feature of the shell, that allows it to identify words within a parameter expansion, command substitution and arithmetic evaluation, and then split them out
  • There is a shell variable known as IFS, which stands for Internal Field Separator
    • This variable defines the characters which can separate words, and the default IFS is ‘<space><tab><newline>’
    • Also note that the first character of IFS is used to separate the found words during splitting
  • Try the following:
    • echo $(w)

Pathname Expansion


  • Pathname expansion is nerd-speak for how wildcards work in the shell
  • This shouldn’t require review, but recall the three wildcards:
    • *
    • ?
    • [set]

Process Substitution


  • Process substitution is a very neat shorthand for dynamically creating named pipes which are used for input or output
  • Consider the first form:
    cat /etc/passwd <(w) <(df -h) <(uname -a) > report
  • The <() syntax creates the process substitutions
  • What’s really going on here, is that the inner command is executed, with its stdout connected to a named pipe dynamically created under /dev/fd
  • That pathname is then substituted on the outer command line, which becomes an argument, and in this case, cat simply reads from the /dev/fd file like any other

Process Substitution


  • The second form of process substitution is similar, except the other direction
  • The /dev/fd file is created to accept input from the outer command, and the file is attached as stdin on the inner command
  • Consider:
    tar cf - . > >(gzip -9c > crazy.tgz)
  • It looks crazy, but just step through the operations
    • tar is outputting to stdout, which is redirected to the process substitution ( which in reality is a /dev/fd pathname )
    • gzip is reading from stdin, which is the /dev/fd path

Sample Code


  • Next, we will spend some time breaking down and understanding a number of commonly used scripts from Rackspace
  • These scripts are available at:
    • http://rackspace.edgecloud.com/adv-bash-scripting

Lab


  • Modify health-report.sh from the earlier lab:
    • Add a new flag, -m, to create a list of process names and memory percentages, sorted descending by memory usage.
    • Also, add a -c flag to indicate “collect only” mode. The user should not need to supply an email in this mode. In this mode, the script should produce the requested reports ( from the other flags ), but instead of emailing them immediately, it should collect them in a file under /tmp called health-report.YYYY-MM-DD
      • You can simply append each new report to the file, but include a header in front of each new report that has the date/time
    • Finally, add a -r flag which accepts a date in YYYY-MM-DD form, and emails the requested report to the supplied email address
    • Make sure to produce meaningful error messages for all failures

Additional Topics

Overview


  • There are a few other topics that should be covered, but did not fall under any of the previous topics
    • Here documents
    • Subshell executions
    • Command separators
    • Conditionals with the shell

Here documents


  • Here documents are a really convenient way to enter multiple lines of text at the command line, or from within a shell script
  • Usage is fairly simple:
    cat << samp
    Everything I type will go to 
    stdin of the command
    Until a line with just samp
    samp
  • Just be careful about spacing – everything is literal, and the delimiter ( samp in this example ) must not have anything else on the line
  • Also see <<- for indenting here documents

Subshell Executions


  • Sometimes, it is convenient to execute a command within a subshell, which isolates it from the current shell
    • It can not impact the environment or working directory of the current shell
    • You can treat the subshell as an individual command, using redirection and pipes as needed
  • Simple example:
    ( cd /home ; ls a* ) | wc –l
  • This will list a count of the home directories starting with the letter a. The cd did not change the working directory of the main shell

Command Separators


  • There are several ways to separate commands:
    • Semicolon ( ; )
      • This separates commands and does not provide any relation between the commands. They are simply executed one after another, left to right.
    • Ampersand ( & )
      • This puts the left command in the background and starts executing the next command immediately
    • Double Ampersand ( && )
      • This will execute the right command if the left command exited with a zero/success
    • Double pipe ( || )
      • This will execute the right command if the left command exited with a non-zero/fail

Shell Conditionals


  • Very similar to the test command, the shell can perform quite a few simple condition checks using the [[expression]] syntax
  • Consider:
    [[ -d /tmp/mytool ]] && mv logfile /tmp/mytool
  • See Conditional Expressions in the manpage for a complete reference on all of the available tests

Useful Tools in Scripting

Overview


  • There are, of course, many, many tools to use while scripting, but some are more powerful, or more frequently used
  • We will overview three of these tools now:
    • awk
    • sed
    • xargs

awk


  • awk is an incredibly powerful tool, which contains it’s own programming language
  • One of the most commonly used features of awk, is to grab particular columns of information from stdin
  • Consider the columns from ps aux:
    USER     PID %CPU %MEM    VSZ   RSS TTY    STAT START   TIME COMMAND
  • Using awk, grabbing particular columns is easy!
    ps aux | awk ‘{print $2” is using “$3”% of the CPU”}’
  • Let’s play with awk for a few minutes

sed


  • sed stands for “stream editor” and that is exactly what it does
  • sed takes an expression describing an operation to perform, and then applies that operation to each line of input
  • It is commonly used to perform find/replace operations:
    sed -e ‘s/example.com/mycompany.com/’ /etc/httpd/conf/httpd.conf
  • This example replaces every occurrence of example.com with mycompany.com
  • sed can do way, waaaay more. Consult a google or get the O’Reilly book: “Sed and Awk”

xargs


  • xargs is another very useful tool at the command line, and in scripting
  • It takes a second to wrap your head around what xargs does:
    • Accepts input from stdin
    • For each line or lines of input, run a given command with the input lines as arguments for the command
  • For example:
    ps aux | fgrep bad_cron | awk ‘{print $2}’ | xargs kill
  • Let’s discuss what’s happening with the example

Getting Fancy

Overview


  • Finally, a few topics to get fancy!
    • Trapping signals
    • Terminal codes to get colors and special modes
    • Automagic logging with coproc

Trapping Signals


  • Sometimes, it’s useful to react to signals when they get delivered to your script
  • This is easily done with the trap command:
    trap “echo DING” ALRM
    kill -ALRM $$
  • This instructs the shell to run the echo command when an alarm signal is delivered
  • This technique is commonly used to trigger cleanup routines when the script is interrupted
    • See Sample4.sh for a good illustration of this technique

Terminal codes


  • Most terminals support various colors and modes to display information to the user
  • If you find the codes for the connected terminal, you can output text with different foreground and background colors, blinking, dim, underlined and more
  • A common technique for this is to use hard coded codes in your strings:
    echo -e ”\033[31mRed\033[39m and \033[32mGreen\033[39m”
  • This gets hard to read and do correctly, so variables are commonly employed

Terminal Codes with Variables


  • Using Variables:
    RED=“\033[31m”
    GREEN=“\033[32m”
    NORMAL=“\033[39m”
    echo -e “${RED}Red${NORMAL} and ${GREEN}Green${NORMAL}”
  • With variables, things are a little easier to read, and the codes can be changed with the terminal
  • Could functions help here too?
  • Check out nifty-colors.sh on the example server
  • Also:
    • http://wiki.bash-hackers.org/scripting/terminalcodes

Automagic Logging


  • This is a really neat trick to attach stdout of your script to both the terminal and a logfile at the same time
    #!/bin/bash
    # we start tee in the background
    # redirecting its output to the stdout of the script
    { coproc tee { tee logfile ;} >&3 ;} 3>&1 
    # we redirect stding and stdout of the script to our coprocess
    exec >&${tee[1]} 2>&1

Final Lab


  • Copy skel.sh to kill-thread.sh
  • Implement kill-thread.sh to kill mysql connections based on certain parameters:
    kill-thread.sh [-u user] [-h host] [-d db] [-c command]
  • Just use mysqladmin, and assume there is no root password, or it is supplied by ~/.my.cnf automatically
  • Just do simple searches by the various columns, and if the user supplies more than one flag, all must match to kill the connection
  • ./kill-thread.sh -h localhost -d test
    • Would kill anyone connected to the test database from localhost
  • Snaz up the output with colors!