Tag Archives: Computers

How to do science with a computer: workflow tools and OpenSource philosophy

I have two excellent things on my desk, a Linux Journal article by Andy Wills, and a newly published book by Stefano Allesina and Madlen Wilmes.

They are:

Computing Skills for Biologists: A Toolbox by Stefano Allesina and Madlen Wilmes, Princeton University Press.

Open Science, Open Source, and R, by Andy Wills, Linux Journal

Why OpenSource?

OpenSource science means, among other things, using OpenSource software to do the science. For some aspects of software this is not important. It does not matter too much if a science lab uses Microsoft Word or if they use LibreOffice Write.

However, since it does matter if you use LibreOffice Calc as your spreadsheet, as long as you are eschewing proprietary spreadsheets, you might as well use the OpenSource office package LibreOffice or equivalent, and then use the OpenSource presentation software, word processor, and spreadsheet.

OpenSource programs like Calc, R (a stats package), and OpenSource friendly software development tools like Python and the GPL C Compilers, etc. do matter. Why? Because your science involves calculating things, and software is a magic calculating box. You might be doing actual calculations, or production of graphics, or management of data, or whatever. All of the software that does this stuff is on the surface a black box, and just using it does not give you access to what is happening under the hood.

But, if you use OpenSoucre software, you have both direct and indirect access to the actual technologies that are key to your science project. You can see exactly how the numbers are calculated or the graphic created, if you want to. It might not be easy, but at least you don’t have to worry about the first hurdle in looking under the hood that happens with commercial software: they won’t let you do it.

Direct access to the inner workings of the software you use comes in the form of actually getting involved in the software development and maintenance. For most people, this is not something you are going to do in your scientific endeavor, but you could get involved with some help from a friend or colleague. For example, if you are at a University, there is a good chance that somewhere in your university system there is a computer department that has an involvement in OpenSource software development. See what they are up to, find out what they know about the software you are using. Who knows, maybe you can get a special feature included in your favorite graphics package by helping your new found computer friends cop an internal University grant! You might be surprised as to what is out there, as well as what is in there.

In any event, it is explicitly easy to get involved in OpenSource software projects because they are designed that way. Or, usually are and always should be.

The indirect benefit comes from the simple fact that these projects are OpenSource. Let me give you an example form the non scientific world. (it is a made up example, but it could reflect reality and is highly instructive.)

Say there is an operating system or major piece of software competing in a field of other similar products. Say there is a widely used benchmark standard that compares the applications and ranks them. Some of the different products load up faster than others, and use less RAM. That leaves both time (for you) and RAM (for other applications) that you might value a great deal. All else being equal, pick the software that loads faster in less space, right?

Now imagine a group of trollish deviants meeting in a smoky back room of the evile corporation that makes one of these products. They have discovered that if they leave a dozen key features that all the competitors use out of the loading process, so they load later, they can get a better benchmark. Without those standard components running, the software will load fast and be relatively small. It happens to be the case, however, that once all the features are loaded, this particular product is the slowest of them all, and takes up the most RAM. Also, the process of holding back functionality until it is needed is annoying to the user and sometimes causes memory conflicts, causing crashes.

In one version of this scenario, the concept of selling more of the product by using this performance tilting trick is considered a good idea, and someone might even get a promotion for thinking of it. That would be something that could potentially happen in the world of proprietary software.

In a different version of this scenario the idea gets about as far as the water cooler before it is taken down by a heavy tape dispenser to the head and kicked to death. That would be what would certainly happen in the OpenSource world.

So, go OpenSource! And, read the paper from Linux Journal, which by the way has been producing some great articles lately, on this topic.

The Scientists Workflow and Software

You collect and manage data. You write code to process or analyze data. You use statistical tools to turn data into analytically meaningful numbers. You make graphs and charts. You write stuff and integrate the writing with the pretty pictures, and produce a final product.

The first thing you need to understand if you are developing or enhancing the computer side of your scientific endevour is that you need the basic GNU tools and command line access that comes automatically if you use Linux. You can get the same stuff with a few extra steps if you use Windows. The Apple Mac system is in between with the command line tools already built in, but not quite as in your face available.

You may need to have an understanding of Regular Expressions, and how to use them on the command line (using sed or awk, perhaps) and in programming, perhaps in python.

You will likely want to master the R environment because a) it is cool and powerful and b) a lot of your colleagues use R so you will want to have enough under your belt to share code and data now and then. You will likely want to master Python, which is becoming the default scientific programming language. It is probably true that anything you can do in R you can do in Python using the available tools, but it is also true that the most basic statistical stuff you might be doing is easier in R than Python since R is set up for it. The two systems are relatively easy to use and very powerful, so there is no reason to not have both in your toolbox. If you don’t chose the Python route, you may want to supplement R with gnu plotting tools.

You will need some sort of relational database setup in your lab, some kind of OpenSource SQL lanaguge based system.

You will have to decide on your own if you are into LaTex. If you have no idea what I’m talking about, don’t worry, you don’t need to know. If you do know what I’m talking about, you probably have the need to typeset math inside your publications.

Finally, and of utmost importance, you should be willing to spend the upfront effort making your scientific work flow into scripts. Say you have a machine (or a place on the internet or an email stream if you are working collaboratively) where some raw data spits out. These data need some preliminary messing around with to discard what you don’t want, convert numbers to a proper form, etc. etc. Then, this fixed-up data goes through a series of analyses, possibly several parallel streams of analysis, to produce a set of statistical outputs, tables, graphics, or a new highly transformed data set you send on to someone else.

If this is something you do on a regular basis, and it likely is because your lab or field project is set up to get certain data certain ways, then do certain things to it, then ideally you would set up a script, likely in bash but calling gnu tools like sed or awk, or running Python programs or R programs, and making various intermediate files and final products and stuff. You will want to bother with making the first run of these operations take three times longer to set up, so that all the subsequent runs take one one hundredth of the time to carry out, or can be run unattended.

Nothing, of course, is so simple as I just suggested … you will be changing the scripts and Python programs (and LaTeX specs) frequently, perhaps. Or you might have one big giant complex operation that you only need to run once, but you KNOW it is going to screw up somehow … a value that is entered incorrectly or whatever … so the entire thing you need to do once is actually something you have to do 18 times. So make the whole process a script.

Aside form convenience and efficiency, a script does something else that is vitally important. It documents the process, both for you and others. This alone is probably more important than the convenience part of scripting your science, in many cases.

Being small in a world of largeness

Here is a piece of advice you wont get from anyone else. As you develop your computer working environment, the set of software tools and stuff that you use to run R or Python and all that, you will run into opportunities to install some pretty fancy and sophisticated developments systems that have many cool bells and whistles, but that are really designed for team development of large software projects, and continual maintenance over time of versions of that software as it evolves as a distributed project.

Don’t do that unless you need to. Scientific computing often not that complex or team oriented. Sure, you are working with a team, but probably not a team of a dozen people working on the same set of Python programs. Chances are, much of the code you write is going to be tweaked to be what you need it to be then never change. There are no marketing gurus coming along and asking you to make a different menu system to attract millennials. You are not competing with other products in a market of any sort. You will change your software when your machine breaks and you get a new one, and the new one produces output in a more convenient style than the old one. Or whatever.

In other words, if you are running an enterprise level operation, look into systems like Anaconda. If you are a handful of scientists making and controlling your own workflow, stick with the simple scripts and avoid the snake. The setup and maintenance of an enterprise level system for using R and Python is probably more work before you get your first t-test or histogram than it is worth. This is especially true if you are more or less working on your own.

Culture

Another piece of advice. Some software decisions are based on deeply rooted cultural norms or fetishes that make no sense. I’m an emacs user. This is the most annoying, but also, most powerful, of all text editors. Here is an example of what is annoying about emac. In the late 70s, computer keyboards had a “meta” key (it was actually called that) which is now the alt key. Emacs made use of the metakey. No person has seen or used a metakey since about 1979, but emacs refuses to change its documentation to use the word “alt” for this key. Rather, the documentation says somethin like “here, use the meta key, which on some keyboards is the alt key.” That is a cultural fetish.

Using LaTeX might be a fetish as well. Obliviously. It is possible that for some people, using R is a fetish and they should rethink and switch to using Python for what they are doing. The most dangerous fetish, of course, is using proprietary scientific software because you think only if you pay hundreds of dollars a year to use SPSS or BMD for stats, as opposed to zero dollars a year for R, will your numbers be acceptable. In fact, the reverse is true. Only with an OpenSource stats package can you really be sure how the stats or other values are calculated.

And finally…

And my final piece of advice is to get and use this book: Computing Skills for Biologists: A Toolbox by Allesina and Wilmes.

This book focuses on Python and not R, and covers Latex which, frankly, will not be useful for many. This also means that the regular expression work in the book is not as useful for all applications, as might be the case with a volume like Mastering Regular Expressions. But overall, this volume does a great job of mapping out the landscape of scripting-oriented scientific computing, using excellent examples from biology.

Mastering Regular Expressions can and should be used as a textbook for an advanced high school level course to prep young and upcoming investigators for when they go off and apprentice in labs at the start of their career. It can be used as a textbook in a short seminar in any advanced program to get everyone in a lab on the same page. I suppose it would be treat if Princeton came out with a version for math and physical sciences, or geosciences, but really, this volume can be generalized beyond biology.

Stefano Allesina is a professor in the Department of Ecology and Evolution at the University of Chicago and a deputy editor of PLoS Computational Biology. Madlen Wilmes is a data scientist and web developer.

Manga Guide to Microprocessors: Excellent tech graphic novel

It has been a long time since I’ve written any machine or assembler code, and it is a rare day that I hand construct a logic circuit using transistors. But it is comforting to know that these skills and the knowledge associated with them still reside in some form or another in the world of microprocessors.

The Manga Guides published by No Starch Press and written by a wide range of authors manga-based graphic novels on diverse topics in science, math, statistics, and technology. I’ve reviewed several here (see this post for a partial list of some of the other guides). And the newest entry to this growing and rather large and excellent library is The Manga Guide to Microprocessors by Michio Shibuya, Takashi Tonagi, and Office Sawa.

This book is really thorough, packing in piles of details about computers, focusing on the microprocessor level technology but covering a lot of related things as well such as memory and data storage and programming, with a whole section on controllers.

But this information is embedded in a story, as is the case with all the Manga guides.

This is the story of Ayumi, a master chess player who is beaten by a computer. She engages with the computer’s programmer, Kano, in a quest to learn all she can about her nemesis.

The book has three modes. One is a standard manga graphics novel sequence of frames with the main story. That is most of the book. The other is a more detailed conversation between iconic versions of the protagonists, in which detail that would be difficult to easily convey in pure cartoon form is gone over. The third is a retrospective or detailed section at the end of each chapter which is lightly illustrated, text heavy, and serves to contextualize the previous material.

I strongly recommend this book.

Here is what the various modes look like:

Most of the book looks like this.

Some of the book looks like this.

Books On Computer Programming and Computers

Python

Learning Python
Python Crash Course: A Hands-On, Project-Based Introduction to Programming is a fast-paced, thorough introduction to programming with Python that will have you writing programs, solving problems, and making things that work in no time.

In the first half of the book, you’ll learn about basic programming concepts, such as lists, dictionaries, classes, and loops, and practice writing clean and readable code with exercises for each topic. You’ll also learn how to make your programs interactive and how to test your code safely before adding it to a project. In the second half of the book, you’ll put your new knowledge into practice with three substantial projects: a Space Invaders-inspired arcade game, data visualizations with Python’s super-handy libraries, and a simple web app you can deploy online.

My review: How to learn Python programming

MORE COMING SOON

Learn Scratch Programming (For Kids And Adults)

Scratch, the colorful drag-and-drop programming language, is used by millions of first-time learners, and in Scratch Programming Playground, you’ll learn to program by making cool games. Get ready to destroy asteroids, shoot hoops, and slice and dice fruit!

Each game includes easy-to-follow instructions, review questions, and creative coding challenges to make the game your own. Want to add more levels or a cheat code? No problem, just write some code.

Coding projects in Scratch and other items.

Learn Python Using Minecraft

Write Computer Games In Python

Invent Your Own Computer Games with Python will teach you how to make computer games using the popular Python programming language–even if you’ve never programmed before!

Begin by building classic games like Hangman, Guess the Number, and Tic-Tac-Toe, and then work your way up to more advanced games, like a text-based treasure hunting game and an animated collision-dodging game with sound effects. Along the way, you’ll learn key programming and math concepts that will help you take your game programming to the next level.

Scratch Programming For Kids, By The Cards

Want to introduce kids to coding in a fun and creative way?

With the Scratch Coding Cards, kids learn to code as they create interactive games, stories, music, and animations. The short-and-simple activities provide an inviting entry point into Scratch, the graphical programming language used by millions of kids around the world.

Kids can use this colorful 75-card deck to create a variety of interactive programming projects. They’ll create their own version of Pong, Write an Interactive Story, Create a Virtual Pet, Play Hide and Seek, and more!

Each card features step-by-step instructions for beginners to start coding with Scratch. The front of the card shows an activity kids can do with Scratch–like animating a character or keeping score in a game. The back shows how to put together code blocks to make the projects come to life! Along the way, kids learn key coding concepts, such as sequencing, conditionals, and variables.

This collection of coding activity cards is perfect for sharing among small groups in homes and schools.

Adding a third party keyboard in iOS 8

One of the great things about Apple is that they maintain tight control over the hardware/system/software triad that bad design can’t creep into your digital life and ruin your day. One of the bad things about Apple is that they maintain tight control over the hardware/system/software triad so you can’t always have what you want. If you’ve been wanting one of those great keyboard replacements that, maybe, your spouse has been using all along on her android, you can have it now on your iPad or iPhone after you upgrade to iOS 8

Go buy Swiftkey or Swype or whatever from the App store. Then go to settings, general, keyboard. There, you can select a new keyboard. You may have an option to “allow full access” … do that.

There you go.