Tag Archives: Book review

How to do science with a computer: workflow tools and OpenSource philosophy

I have two excellent things on my desk, a Linux Journal article by Andy Wills, and a newly published book by Stefano Allesina and Madlen Wilmes.

They are:

Computing Skills for Biologists: A Toolbox by Stefano Allesina and Madlen Wilmes, Princeton University Press.

Open Science, Open Source, and R, by Andy Wills, Linux Journal

Why OpenSource?

OpenSource science means, among other things, using OpenSource software to do the science. For some aspects of software this is not important. It does not matter too much if a science lab uses Microsoft Word or if they use LibreOffice Write.

However, since it does matter if you use LibreOffice Calc as your spreadsheet, as long as you are eschewing proprietary spreadsheets, you might as well use the OpenSource office package LibreOffice or equivalent, and then use the OpenSource presentation software, word processor, and spreadsheet.

OpenSource programs like Calc, R (a stats package), and OpenSource friendly software development tools like Python and the GPL C Compilers, etc. do matter. Why? Because your science involves calculating things, and software is a magic calculating box. You might be doing actual calculations, or production of graphics, or management of data, or whatever. All of the software that does this stuff is on the surface a black box, and just using it does not give you access to what is happening under the hood.

But, if you use OpenSoucre software, you have both direct and indirect access to the actual technologies that are key to your science project. You can see exactly how the numbers are calculated or the graphic created, if you want to. It might not be easy, but at least you don’t have to worry about the first hurdle in looking under the hood that happens with commercial software: they won’t let you do it.

Direct access to the inner workings of the software you use comes in the form of actually getting involved in the software development and maintenance. For most people, this is not something you are going to do in your scientific endeavor, but you could get involved with some help from a friend or colleague. For example, if you are at a University, there is a good chance that somewhere in your university system there is a computer department that has an involvement in OpenSource software development. See what they are up to, find out what they know about the software you are using. Who knows, maybe you can get a special feature included in your favorite graphics package by helping your new found computer friends cop an internal University grant! You might be surprised as to what is out there, as well as what is in there.

In any event, it is explicitly easy to get involved in OpenSource software projects because they are designed that way. Or, usually are and always should be.

The indirect benefit comes from the simple fact that these projects are OpenSource. Let me give you an example form the non scientific world. (it is a made up example, but it could reflect reality and is highly instructive.)

Say there is an operating system or major piece of software competing in a field of other similar products. Say there is a widely used benchmark standard that compares the applications and ranks them. Some of the different products load up faster than others, and use less RAM. That leaves both time (for you) and RAM (for other applications) that you might value a great deal. All else being equal, pick the software that loads faster in less space, right?

Now imagine a group of trollish deviants meeting in a smoky back room of the evile corporation that makes one of these products. They have discovered that if they leave a dozen key features that all the competitors use out of the loading process, so they load later, they can get a better benchmark. Without those standard components running, the software will load fast and be relatively small. It happens to be the case, however, that once all the features are loaded, this particular product is the slowest of them all, and takes up the most RAM. Also, the process of holding back functionality until it is needed is annoying to the user and sometimes causes memory conflicts, causing crashes.

In one version of this scenario, the concept of selling more of the product by using this performance tilting trick is considered a good idea, and someone might even get a promotion for thinking of it. That would be something that could potentially happen in the world of proprietary software.

In a different version of this scenario the idea gets about as far as the water cooler before it is taken down by a heavy tape dispenser to the head and kicked to death. That would be what would certainly happen in the OpenSource world.

So, go OpenSource! And, read the paper from Linux Journal, which by the way has been producing some great articles lately, on this topic.

The Scientists Workflow and Software

You collect and manage data. You write code to process or analyze data. You use statistical tools to turn data into analytically meaningful numbers. You make graphs and charts. You write stuff and integrate the writing with the pretty pictures, and produce a final product.

The first thing you need to understand if you are developing or enhancing the computer side of your scientific endevour is that you need the basic GNU tools and command line access that comes automatically if you use Linux. You can get the same stuff with a few extra steps if you use Windows. The Apple Mac system is in between with the command line tools already built in, but not quite as in your face available.

You may need to have an understanding of Regular Expressions, and how to use them on the command line (using sed or awk, perhaps) and in programming, perhaps in python.

You will likely want to master the R environment because a) it is cool and powerful and b) a lot of your colleagues use R so you will want to have enough under your belt to share code and data now and then. You will likely want to master Python, which is becoming the default scientific programming language. It is probably true that anything you can do in R you can do in Python using the available tools, but it is also true that the most basic statistical stuff you might be doing is easier in R than Python since R is set up for it. The two systems are relatively easy to use and very powerful, so there is no reason to not have both in your toolbox. If you don’t chose the Python route, you may want to supplement R with gnu plotting tools.

You will need some sort of relational database setup in your lab, some kind of OpenSource SQL lanaguge based system.

You will have to decide on your own if you are into LaTex. If you have no idea what I’m talking about, don’t worry, you don’t need to know. If you do know what I’m talking about, you probably have the need to typeset math inside your publications.

Finally, and of utmost importance, you should be willing to spend the upfront effort making your scientific work flow into scripts. Say you have a machine (or a place on the internet or an email stream if you are working collaboratively) where some raw data spits out. These data need some preliminary messing around with to discard what you don’t want, convert numbers to a proper form, etc. etc. Then, this fixed-up data goes through a series of analyses, possibly several parallel streams of analysis, to produce a set of statistical outputs, tables, graphics, or a new highly transformed data set you send on to someone else.

If this is something you do on a regular basis, and it likely is because your lab or field project is set up to get certain data certain ways, then do certain things to it, then ideally you would set up a script, likely in bash but calling gnu tools like sed or awk, or running Python programs or R programs, and making various intermediate files and final products and stuff. You will want to bother with making the first run of these operations take three times longer to set up, so that all the subsequent runs take one one hundredth of the time to carry out, or can be run unattended.

Nothing, of course, is so simple as I just suggested … you will be changing the scripts and Python programs (and LaTeX specs) frequently, perhaps. Or you might have one big giant complex operation that you only need to run once, but you KNOW it is going to screw up somehow … a value that is entered incorrectly or whatever … so the entire thing you need to do once is actually something you have to do 18 times. So make the whole process a script.

Aside form convenience and efficiency, a script does something else that is vitally important. It documents the process, both for you and others. This alone is probably more important than the convenience part of scripting your science, in many cases.

Being small in a world of largeness

Here is a piece of advice you wont get from anyone else. As you develop your computer working environment, the set of software tools and stuff that you use to run R or Python and all that, you will run into opportunities to install some pretty fancy and sophisticated developments systems that have many cool bells and whistles, but that are really designed for team development of large software projects, and continual maintenance over time of versions of that software as it evolves as a distributed project.

Don’t do that unless you need to. Scientific computing often not that complex or team oriented. Sure, you are working with a team, but probably not a team of a dozen people working on the same set of Python programs. Chances are, much of the code you write is going to be tweaked to be what you need it to be then never change. There are no marketing gurus coming along and asking you to make a different menu system to attract millennials. You are not competing with other products in a market of any sort. You will change your software when your machine breaks and you get a new one, and the new one produces output in a more convenient style than the old one. Or whatever.

In other words, if you are running an enterprise level operation, look into systems like Anaconda. If you are a handful of scientists making and controlling your own workflow, stick with the simple scripts and avoid the snake. The setup and maintenance of an enterprise level system for using R and Python is probably more work before you get your first t-test or histogram than it is worth. This is especially true if you are more or less working on your own.

Culture

Another piece of advice. Some software decisions are based on deeply rooted cultural norms or fetishes that make no sense. I’m an emacs user. This is the most annoying, but also, most powerful, of all text editors. Here is an example of what is annoying about emac. In the late 70s, computer keyboards had a “meta” key (it was actually called that) which is now the alt key. Emacs made use of the metakey. No person has seen or used a metakey since about 1979, but emacs refuses to change its documentation to use the word “alt” for this key. Rather, the documentation says somethin like “here, use the meta key, which on some keyboards is the alt key.” That is a cultural fetish.

Using LaTeX might be a fetish as well. Obliviously. It is possible that for some people, using R is a fetish and they should rethink and switch to using Python for what they are doing. The most dangerous fetish, of course, is using proprietary scientific software because you think only if you pay hundreds of dollars a year to use SPSS or BMD for stats, as opposed to zero dollars a year for R, will your numbers be acceptable. In fact, the reverse is true. Only with an OpenSource stats package can you really be sure how the stats or other values are calculated.

And finally…

And my final piece of advice is to get and use this book: Computing Skills for Biologists: A Toolbox by Allesina and Wilmes.

This book focuses on Python and not R, and covers Latex which, frankly, will not be useful for many. This also means that the regular expression work in the book is not as useful for all applications, as might be the case with a volume like Mastering Regular Expressions. But overall, this volume does a great job of mapping out the landscape of scripting-oriented scientific computing, using excellent examples from biology.

Mastering Regular Expressions can and should be used as a textbook for an advanced high school level course to prep young and upcoming investigators for when they go off and apprentice in labs at the start of their career. It can be used as a textbook in a short seminar in any advanced program to get everyone in a lab on the same page. I suppose it would be treat if Princeton came out with a version for math and physical sciences, or geosciences, but really, this volume can be generalized beyond biology.

Stefano Allesina is a professor in the Department of Ecology and Evolution at the University of Chicago and a deputy editor of PLoS Computational Biology. Madlen Wilmes is a data scientist and web developer.

A Guide To Using Command Line Tools

There are a lot of books out there to help you learn command line tools, and of course, they mostly cover the same things because there is a fixed number of things you need to learn to get started down this interesting and powerful path.

Small, Sharp, Software Tools: Harness the Combinatoric Power of Command-Line Tools and Utilities by Brian P. Hogan is the latest iteration (not quite in press yet but any second now) of one such book.

I really like Hogan’s book. Here’s what you need to know about it.

First, and this will only matter to some but is important, the book does cover using CLI tools across platforms (Linux, Mac, Windows) in the sense that it helps get you set up to use the bash command line system on all three.

Second, this book is does a much better than average job as a tutorial, rather than just as a reference manual, than most other books I’ve seen. You can work from start to finish, with zero knowledge at the start, follow the examples (using the provided files that you are guided to download using command line tools!) and become proficient very comfortably and reasonably quickly. The topic are organized in such a way that you can probably skip chapters that interest you less (but don’t skip the first few).

Third, the book does give interesting esoteric details here and there, but the author seems not compelled to obsessively fill your brain with entirely useless knowledge such as how many arguments the POSIX standard hypothetically allows on a command line (is it 512 or 640? No one seems to remember) as some other books do.

I found Small, Sharp, Software Tools a very comfortable, straight forward, well organized, accurate read from Pragmatic.

Serious Python Programming

Julien Danjou’s Serious Python: Black-Belt Advice on Deployment, Scalability, Testing, and More is serious.

This book takes Python programming well beyond casual programming, and beyond the use of Python as a glorified scripting language to access statistical or graphics tools, etc. This is level one or even level two material. If you are writing software to distribute to others, handling time zones, want to optimize code, or experiment with different programming paradigms (i.e. functional programming, generating code, etc.) then you will find Serious Python informative and interesting. Multi-threading, optimization, scaling, methods and decorators, and integration with relational databases are also covered. (A decorator is a function that “decorates,” or changes or expands, a function without motifying i.) The material is carefully and richly explored, and the writing is clear and concise. Continue reading Serious Python Programming

Minecraft Blockopedia

Minecraft is probably the most creative video game out there, not in the sense that its creators are creative, but rather, that it is all about creating things, and this is done by constructing novelty out of a relatively simple set of primitives. But to do so, the player needs to know about the building blocks of Minedraft, such as Lava, Fencing, Redstone, Levers, various chest and chest related things, and so on.

The Blockopedia in use.
Yes, you (or your child) can learn as you go playing the game, watch a few YouTube videos, etc. But if we want to fully enjoy and integrate the Minecraft experience, and help that child (or you?) get in some more reading time, there must be books. For example, the Minecraft: Blockopedia by Alex Wiltshire. Continue reading Minecraft Blockopedia

Time itself as a resource that drives evolution

Many of the key revolutions, or at least, overhauls, in biological thinking have come as a result of the broad realization that a thentofore identified variable is not simply background, but central and causative.

I’m sure everyone always thought, since first recognized, that if genes are important than good genes would be good. Great, even. But it took a while for Amotz Zahavi and some others to insert good genes into Darwin’s sexual selection as the cause of sometimes wild elaboration of traits, not a female aesthetic or mere runaway selection. Continue reading Time itself as a resource that drives evolution

Making Raspberry Pi Robots

At the core of this post is a review of a new book, Learn Robotics with Raspberry Pi: Build and Code Your Own Moving, Sensing, Thinking Robots. I recommend it as a great above-basic level introduction to building a standard robot, learning a bit about the Linux operating system, learning to program in Python, and learning some basic electronics. However, I want to frame this review in a bit more context which I think will chase some readers away from this book while at the same time making others drool. But don’t drool on the electronics. Continue reading Making Raspberry Pi Robots

How to be a hacker

Wikipedia tells us that a “computer hacker is any skilled computer expert that uses their technical knowledge to overcome a problem.” The all knowing one goes on to note that the term has been linked in popular parlance with the made up Wikipedia word “security hacker.” Such an individuals “uses bugs or exploits to break into computer systems.”

Continue reading How to be a hacker

Violence in the United States Congress

There is probably a rule, in the chambers of the United States Congress, that you can’t punch a guy. Living rules are clues to the past. Where I live now, there is probably one Middle or High School age kid across 130 homes, but we have a rule: You can’t leave your hockey goals or giant plastic basketball nets out overnight. So all the old people who live on my street have to drag those things into the garage at the end of every day, after their long sessions of pickup ball. Or, more likely, years ago, there were kids everywhere and the “Get off my lawn” contingent took over the local board and made all these rules. So, today, in Congress, you can’t hit a guy.

But in the old days, that wasn’t so uncommon. You have heard about the caning of Charles Sumner. Southern slavery supporter Preston Brooks beat the piss out of Senator Charles Sumner, an anti-slave guy from Massachusetts. They weren’t even in the same chamber. Brooks was in the House, Sumner was in the Senate. Sumner almost didn’t survive the ruthless and violent beating, which came after a long period of bullying and ridicule by a bunch of southern bullies. Witnesses describe a scene in which Brooks was clearly trying to murder Sumner, and seems to have failed only because the cane he was using broke into too many pieces, depriving the assailant of the necessary leverage. Parts of that cane, by the way, were used to make pendants worn by Brook’s allies to celebrate this attempted murder of a Yankee anti-slavery member of Congress.

Here’s the thing. You’ve probably heard that story, or some version of it, because it was a major example of violence in the US Congress. But in truth, there were many other acts of verbal and physical violence carried out among our elected representatives, often in the chambers, during the decades leading up to the civil war. Even a cursory examination of this series of events reveals how fisticuffs, sometimes quite serious, can be a prelude to a bloody fight in which perhaps as many as a million people all told were killed. Indeed, the number of violent events, almost always southerner against northerner, may have been large enough to never allow the two sides, conservative, southern, right wing on one hand vs. progressive, liberal not as southern, on the other, to equalize in their total level of violence against each other. Perhaps there are good people on both sides, but the preponderance of thugs reside on one side only.

Which brings us to this. You hears of the caning of Sumner, but you probably have not read The Field of Blood: Violence in Congress and the Road to Civil War by Yale historian Joanne B. Freeman.

Professor Freeman is one of the hosts of a podcast I consider to be in my top free favorite, Backstory, produced by Virginia Humanities. Joanne is one of the “American History Guys,” along with Ed Ayers (19th century), Brian Balogh (20th Century), Nathan Connolly (Immigration history, Urban history) and emeritus host Peter Onuf (18th century). Freeman writes in her newest book of the first half of the 19th century, but her primary area of interest heretofore is the 18th century, and her prior works have focused, among other things, on Alexander Hamilton: Affairs of Honor: National Politics in the New Republic about the nastiness among the founding fathers, and two major collections focused on A.H., The Essential Hamilton: Letters & Other Writings: A Library of America Special Publication and Alexander Hamilton: Writings .

I strongly urge you to have a look at Freeman’s book, in which she brings to light a vast amount of information about utter asshatitude among our elected representatives, based on previously unexplored documents. I also strongly urge you to listen to the podcast. The most recent edition as of this writing is on video games and American History. The previous issue is covers the hosts’ book picks for the year.

We Don’t Need No Stinking Astronauts: The History of Unmanned Space Exploration

Not that astronauts necessarily stink. Well, actually, they probably do after a while, but I suppose one gets used to it.

Anyway, we are all faced, or at least those of us who live in countries that have rocket ships all face, the question of personed vs. un-personed space flight as a way of doing science abroad and related quests. I’m not sure myself what I think about it, but considering the huge cost and difficulty, and the physical limitations, of using humans to run instruments on other planets or in space, and the sheer impossibility of human space missions really far away, the best approach is probably to use a lot of robots. Continue reading We Don’t Need No Stinking Astronauts: The History of Unmanned Space Exploration

Great new kids’ science book

Don’t Mess With Me: The Strange Lives of Venomous Sea Creatures by Paul Erickson is part of a series that is currently small but hopefully growing by Tilbury House. I previously reviewed One Iguana Two Iguanas (about iguanas).

Like the Iguana book, Erickson’s book for third through seventh graders (8-12 or so years of age) contains real, actual, science, evolutionary theory, and facts about nature, along with great pictures. The key message is that toxins exist because they provide an evolutionary advantage to those organisms that use them. Why are venomous animals so common in watery environments? Read the book to find out.

Species mentioned includ the blue-ringed octopi, stony corals, sea jellies, stonefish, lionfish, poison-fanged blennies, stingrays, cone snails, blind remipedes, fire urchins.

Highly recommended as a STEM present this holiday season.

One Iguana Two Iguanas: Children’s evolutionary biology book, with lizards!

The land and marine iguanas of the Galapagos Islands are famous. Well, the marine iguanas are famous, and the land iguanas, representing the ancestral state for that clade of two species, deserve a lot of credit as well. The story of these iguanas is integral with, and parallel to, the story of the Galapagos Islands, and of course, that story is key in our understanding of and pedagogy of evolutionary biology, and Darwin’s history. Continue reading One Iguana Two Iguanas: Children’s evolutionary biology book, with lizards!

Millipedes as long as a car, scorpions as big as a dog. A large dog.

There are connections between the Carboniferous and our modern problem with Carbon. Some of the connections are conceptual, or object lessons, about the drastic nature of large scale climate change. Some are lessons about the carbon cycle at the largest possible scale — first you turn a double digit percentage of all life related matter into coal, then you wait a few hundred million years, then you burn all the coal and see what happens! There are also great mysteries that you all know about because every Western person and a lot of non Western people have, at one time or another, stood in front of a museum exhibit declaring, “The very spot you stand was the site of an ancient sea bla bla bla” and somewhere that exhibit, or near it, is a life size diorama with scorpions and millipedes the size of a dog. Continue reading Millipedes as long as a car, scorpions as big as a dog. A large dog.

A Beginner’s Guide to Circuits

Some time ago I reviewed Electronics for Kids: Play with Simple Circuits and Experiment with Electricity! by √ėyvind Nydal Dahl, which is a very good introduction to electricity and how to hvae fun with it. There is now a new book that is a somewhat simplified version by the same author, A Beginner’s Guide to Circuits: Nine Simple Projects with Lights, Sounds, and More!.

This new book is smaller, has fewer projects, requires the purchase of fewer components, is an accordingly less expensive book, and perhaps most important for some people, requires no solder! Continue reading A Beginner’s Guide to Circuits

Build Miniature Cities with LEGO

LEGO Micro Cities: Build Your Own Mini Metropolis! is a LEGO building idea book that provides a macro number of examples of building buildings, or other structures using a very small number of bricks. It is like the N-guage of LEGO. This is sort of the opposite of the LEGO idea book I recently reviewed, The LEGO Architecture Idea Book, because the latter is for large scale, and the former for very tiny scale.

Author Jeff Friesen is a famous LEGO builder, and a photographer, who tweets at @jeff_works.

You get an idea of how to build skyscraper, bridges, public transit elements, and tightly packed downtown zones. There are suggestions for how to build the geology that underlies the buildings and other infrastructure. And the subways. Continue reading Build Miniature Cities with LEGO