A repost, continuing along the lines of bashing the shell.
Having examined Learning the bash Shell (In a Nutshell (O’Reilly)) (see here, here, and here), it is now time to turn to a more advanced reference to help you geek out on your Linux computer. If you want to have only one book on bash, get Classic Shell Scripting by Robbins and Beebe. This book has an excellent mixture of history, philosophy, rigorously described details and creative solutions.
For instance, after giving a brief history of Unix (required in all such books) the authors layout the basic principles of what is considered good Unix programming. It is so good I’d like to summarize parts of it for you.
Do one thing well. The idea here is to divide complex problems into small bits, and solve each bit in the best possible way that you can manage. Then solve the next bit. Then the next. Eventually, you’ll have all the bits solved. One advantage of doing this is that someone else may end up solving one of these bits before you get to it, or they may do it better than you did, and you can steal their solution and stick it in with theirs. In most cases, for any problem of reasonable complexity, many of the bits are already solved because of work done by people sometimes decades ago. Unix and Linux were not built up under the business model of always having a solution that looks new and slick. Once something is solved in this environment, it tends to stay solved.
Also, solved bits and pieces can be reorganized and used in new and creative ways. A lot of Linux “command” are exactly this sort of solution. The commands sort, grep, head, and all those other neat tools are bits. Solved.
Process text, not binary. This is a fundamental difference between Linux (and to some extent Mac) on one hand and Windows on the other. Look in virtually any Microsoft Office file, for instance, and you will see gobbledygook. Look inside virtually any file on your Linux computer and you will see text. Geeky incomprehensible text, yes, but text. If the files are text and not binary, life is easy.
Harness the power of regular expressions. OMG regular expressions are so cool. If everything is text, then Regular Expressions are the ultimate power tool. In this post I’ll give an example.
Default to standard I/O. Well written solution-bits should expect to eat from and send their output to the standard I/O streams. What are the standard I/O streams? Well, they are called input, output, and error. When you are busy making bits of software to solve problem, it helps if they are all able to take from and output to these streams that, essentially, the system knows about and handles for you. Programs become like those plastic childrens’ chew toys that string together like beads. Each unit of the toy is a different color or shape, maybe with a letter or a number or a picture of a duck on it, but they all string together with the same nibby-thingies on the end. Or like Legos. All the Legos, no matter what, lock together.
One of the most common and useful examples of bits and streams can be explored with this example. The command ‘ls’ produces a list of files. From your home directory, type “ls -l” and the names of the files in your home directory, with details about size, permissions, and stuff, will go whizzing by so fast you can’t read it. If you did this, you saw standard output whizzing by.
There is a program called “more” that takes a stream from standard input and outputs it … though standard output … in chunks that are just long enough to fill the screen, then waits for you to hit the space bar, and then gives you another stream. Wouldn’t it be nice to have the output of ls somehow feed into the more command, so that you could page through the file listing? If this was Windows, that would be easy. Just use OLE, a little DDE, wait 11 years for Microsoft to make that work, upgrade your system a few times, and so on. In linux, it is hard. You need to know about this:
|
That’s a vertical line. It stands for “pipe.” If you stick that between two commands one command (the one on the left) will send its standard output to the standard input of the next command in line. so, if you type
ls -l | more
then you get the desired effect.
This does not work for all Linux commands, but many, if not most, can be strung together. This is not even close to the only way to string commands together! In fact, the stringing of commands via pipes and redirection constitutes about 30-40 pages of “Classic Shell Scripting”
For instance, if you want to put the results of the file listing command into a file for later processing (or to print out and hang on your wall) you can do this:
ls -l > listoffiles.txt
This creates or clobbers a file called “listoffiles.txt” and fills it with the output of ‘ls -l’ … the “>” command redirects standard output from ls to the file it is pointing to. If the file does not exist, it is created. If the file exists, it is clobbered (unless you have “noclobber” options set) and the contents replaced with this stream of data. If you use “>>” instead of “>” then the output stream from ls is appended to the file. And so on.
One of the first examples of a useful bash program given in this book is a script that helps you quickly and easily solve crossword puzzles. Rather than give you the script, I’ll give you the basic idea of how it works by demonstrating “grep” and a primitive use of regular expressions. The script given in the book allows you to generalize this solution, using all of the philosophical points listed above … solving the problem, using standard input and output, and harnessing the power of regular expressions, and it uses text files that are almost certainly somewhere on your computer already.
If you are using Ubuntu, go to this sub directory:
/usr/share/dict
and, using ls, verify that there is a dictionary there. The name of the dictionary probably the word “words.” If you are not using Ubuntu, search around for files with the word “words” as part of the file name. Those will be your dictionaries.
Now, just for fun, dump the contents of the dictionary onto the screen with the ‘cat’ command (cat filename causes the contents of a file to stream to standard output).
cat words
There are a lot of words in there, so they will scroll off the screen and you won’t be able to see most of it. To verify that the dictionary starts with “a” words, use the head command (which streams out the first ten lines of a file by default):
greg@greg-laptop:/usr/share/dict$ head words
A
A's
AOL
AOL's
Aachen
Aachen's
Aaliyah
Aaliyah's
Aaron
greg@greg-laptop:/usr/share/dict$
OK, enough playing around, let’s do something important. I’ve got this crossword puzzle that I can’t finish because I can’t think of a word that has five letters, where the third letter is an a and the fourth letter is a v. Just to make this clear, let me represent the word using dots (periods, full stops) for the spaces, and lower case letters for the leters.
..av.
Now, let me represent that as a word in “regular expression” format by using the symbols for the beginning and end of a line. Since the standard Linux dictionary has one word per line, this expression might help us to find the missing word:
^..ve.$
There are different formats for regular expressions, but in bash, the dot is any single letter. The ^ anchors the regular expression to the start of a line, and the $ anchors the regular expression to the end of a line. The command ‘grep’ can read a regular expression and filter for lines where that expression is matched.
Before looking up this word, let’s demonstrate how this works in a simpler case. Try the following two commands on the “words” dictionary file:
grep "..." words
grep "^...$" words
The first one filters for any line that has three letters in it. But this includes lines with four letters, five letters, on up (to some maximum number which, if you read Classic Shell Scripting, you will learn). The second one filters for a line that has a beginning (as all lines that exist do) three letters (no more no less) and an end. So you get all three letter words in your dictionary.
So now we are ready to grep the dictionary for the word we are looking for. The clue, by the way, is “zigzag” … we are looking for a five letter word for “zigzag” (and no, it’s not “paper” or “joint” … you freakkin’ pothead). So, we enter:
greg@greg-laptop:/usr/share/dict$ grep "^..av.$" words
The -h option is not necessary in all cases, but it suppresses the output of filenames in some versions of grep. The -i option in this command, and often in other commands by the way, causes grep to ignore the case of the letters being matched, so you will find words with upper case or lower case in spots where you specified only lower (or upper) case. (For the most part, in this sort of activity, that only matters for the first letter).
The output is:
Slavs
Soave
agave
brave
bravo
crave
grave
gravy
guava
heave
heavy
knave
leave
shave
slave
stave
suave
weave
greg@greg-laptop:/usr/share/dict$
The most likely answer is weave. Hey, weave fits and is verified.
Onward!
NIce post. I’ve been a Linux user for about a year now and I love it. I’m currently running Arch on my desktop, Debian on my laptop, and have a server running Gentoo. Keep up the good work, I always enjoy your Linux posts.
Look in virtually any Microsoft Office file, for instance, and you will see gobbledygook.
For the record, that’s no longer the case with Office 2007.
Also, I only discovered last week that Windows’s FINDSTR supports regular expressions as well, and has for a long time.
Yes, the Windows command-line doesn’t hold a candle to bash (well, at least not without installing PowerShell, at which point it’s arguably more powerful), but it’s not THAT bad 🙂
*nix runs the various components of a compound command (i..e. with pipes or “ or 😉 as concurrent processes. Windows used to run them sequentially, not starting one until its predecessor had finished producing output and terminated. Is this still the case, or does XP/Vista run them concurrently now? The difference can be quite significant in terms of resources used and time to first output.
I also recall Windows had a very restricted concept of redirection – e.g. no equivalent to 2>& – and an equally restricted concept of Regular Expressions. Yes, it could ‘pattern match’, but give it multiple * (e.g. ab*de*ff) and it would fall apart. Is this all fixed too? I mistrust it so much in the area of shell scripting that when I am forced to use Windows I script in bash under cygwin. Or write Perl or Tcl scripts.
“Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.” – Jamie Zawinski
That’s humorously overstated, of course, but regular expressions can get awfully hairy for more complex cases.