I would like to respond to a post, Ask Science Woman: How do I organize journal articles?, by Science Woman. I think this is a very important topic for all aspiring scholars. Science Woman’s advice is excellent. I have just a few suggestions to add.I think my problem … the problem of organizing “offprints” (copies) of papers … may be an order of magnitude greater than many other scholars because of some opportunities I had while in graduate school. My first advisor, Glenn Isaac, had an incredible collection of African (mainly) archaeological (mainly) materials, and I was at a school that did not see a photocopy machine as a major resource to be protected by uniformed armed guards. So I photocopied his collection. Actually, there were a few of involved in this … we simply divided up the work and made multiple copies of each paper and redistributed them. We even took a few stacks to the copy center when we had extra cash that we could not spend on beer. This took about a year.After Glynn died, my new adviser, Irv DeVore, was found to have about the same size collection with virtually no overlap. That one, I pilfered on my own because our photocopy club by then had dispersed and we were all in the field or elsewhere at different times. Simultaneously, I pilfered David Pilbeam’s photocopy collection, but more selectively (by the time, there was now a substantial overlap between my own collection and David’s)Having spent very little time in classwork for my undergraduate degree (four credits worth to be exact) I took every course I could in graduate school … I found taking classes to be great fun … and every one of them required inspection and sometimes even careful reading of between fifty and a couple of hundred papers. The Borg known as my offprint collection absorbed them all. And I’ve continued this apace. My collection of papers numbers about 10,000 different articles stored in about 16 file cabinet drawers.While pilfering various collections, and working with various senior colleagues, I was able to see how different systems work, and clearly, organizing papers by author is the only way to go, as Science Woman suggests. The sheer size of my collection, however, suggested a more efficient system than having each author having its own folder (with sub folders) as she does. I have a system that is much more efficient in terms of storage … both with respect to space and the work involved in putting articles away, but with a modest cost in efficiency in accress.I have my files divided alphabetically with a series of folders for the first few letters of each last name. So, in theory, “Brooks, Brookings, Broohaha, and Broomhilda” are all in the file labeled “BROO.” If I wanted to also include “Brown, Browning, and Brozeski” I would instead have a folder named “BRO,” but that would have more papers in it. If I adjust the number of letters, I can keep the number of papers filed per folder to a manageable number. I can then find an article by searching through the small stack of papers with those last names. If I have a zillion papers by one author, I can give that author her own file, as in Brooks, Alison (lots of papers).One might view this system as less than ideal, and perhaps it is, but with 10,000 papers it works for me.Every single paper that is in my cabinets has two attributes: 1) It is in my database; and 2) it has either an “x” or a stamp (as in rubber stamp, of my name) on it. This way I know that if I am using an article and want to return it, if it does not have the stamp or x, I have to enter it into the database before I file it. (More about the database below.)A few years ago, PDF files became standard, and several thousand papers are now in my collection in only electronic form.As with Science Woman, I name the PDF files in a systematic way, though I don’t keep sufficiently up on it. My method is to use the last name of each author for up to two or three authors, followed by EtAl if there are more, and a year. I don’t bother with journal name, etc. This is usually sufficient. So, a PDF file that comes to me as:2323sdfoj.pdfmay becomeSmithJones_2007.pdfNotice that there are no spaces in the filename. Spaces in filenames are evil.Some of these PDF files are in the database, some are not. Instead of spending time on that, I’ve opted to have a more general index of the PDF files. I’m still playing around with this and making adjustments, but my current method is to use Beagle.Beagle uses pipes and translators. A translator converts the contents of a PDF file (or pretty much any kind of file that you might want to index) into a text stream, which is then run through an indexer with the results added to an index. So you can find a PDF by Brooks and McBrearty by entering those two names into Beagle. This does not work for PDF files that are photocopies of papers rather than true PDF text files.On the database: My original database was in dBase (II or III? Maybe IV? Can’t remember). Later, I translated this into Endnote.Science Woman uses Endnote, as I did until I made the transition to Linux. Actually, I used Endnote on Linux, running Windows Endnote on Crossover, and that worked flawlessly. But, I now use Bibus. It was not difficult to import my Endnote file into Bibus. Bibus is an SQL database that imports from medline, etc. and will put properly formatted references into document files (I use Open Office) and so on. Just like Endnote but you can look under the hood, it is faster, and free. Since it is a MySQL database, you can also access the raw data directly and play with it that way if you want.I plan on changing my database system to use my own SQL system, in the medium future.The most important message from both Science Woman and me is this: Start now. Get a good system and use it early in your career or you will be swimming in chaos later.

11 thoughts on “How to organize your papers

  1. OK, seriously, am I the only one that doesn’t keep a library? I have maybe a dozen or so papers that I’m actively reading/referencing at any given time. Any thing else I print off, read, then recycle. My “library” is pubmed, and web of science. If I say to myself, “that paper, by so-and-so, looked at X”, it’s a hell of a lot easier and far less space consuming to just look it up and print off another copy than to keep hardcopies lying around indefinitely.OF course, it helps that anything I’m likely to be interested in is online in pdf.

  2. Johnny: You remind me of my friend Tom who, back in graduate school days, said “if it was published before 1978, it is not important).Of the 10K papers I’ve got in my file cabinets, maybe 25 percent could ever be accessed on line.Also, the database allows me to keep track, compile, think, organize thoughts and not just objects, in a way that recycling would interfere with.Finally, it is a bit of an avocational thing with some of us. I also have a substantial library of books even though there are libraries that I could use instead. Its instead of collecting stamps, maybe.

  3. I wish I had that stuff when I was a pup–I started by using index cards, and quickly learned that cross-indexing, while very useful, was very tedious. I finally bought Papyrus, which was way cheaper than the rest (and they made of point of stating that their license did not prohibit loading the software on as many computers as you wish). My current problem is paper storage. I employ the one-pile-on-the-floor-here, -another pile-on-the-floor-there,-etc. method, as all cabinets are crammed full.

  4. “You remind me of my friend Tom who, back in graduate school days, said “if it was published before 1978, it is not important”Well, there’s probably a lot of truth to that, if I’m being honest (although, of course now the cutoff would probably be, say, 1997 ?).OK, I just looked at the paper I published last month, and the earliest cite was 1985, but that was for a side point in the discussion. All the rest are from 2000 on.But I suspect you’re right, it’s the collection aspect that is most important for a lot of people.

  5. For most scientists, only papers from the last 10 years or so are usually cited. The recent ones are usually available online, and many of the older journal issues have been scanned as well (at least in molecular biology & bacteriology). So it is possible to rely on the web to “get back” to papers, and to “print and recycle” as Johnny puts it.Of course, there are some fields in which older work is often cited (e.g., archeology, taxonomy, mathematics), and then keeping a collection of physical papers is more important. I’m not sure I see the advantage of keeping a PDF collection, though.Even though almost everything I need is online, I do maintain a citation database. It’s an important fall-back to have, even though Pubmed and Google Scholar are usually more convenient unless I’m actually adding a citation to a paper. My “database” is just a bibtex file — categorizing papers by topic and such seems like too much effort unless I’m writing a review (in which case I recommend organizing papers by the paper outline) or learning a new field (in which case I recommend writing a review!).

  6. Thanks Greg, I feel inspired to add “organize papers” onto my list of resolutions for a year. I finally invested in some filing folders and endnote in recent weeks, but the program hasn’t been touched yet and the papers are in disarray after several attempts at filing by subject (as it turns out, the worst idea ever).

  7. Johnny: My friend said that in the 80s, so he was going back less than ten years. By those standards, a 1985 paper should be in the rare book room!Morgan, I have very mixed feelings about keeping the PDF’s. I find it very handy to have the PDF’s I’m working with at hand (thus, it’s handy, I suppose) but then why store them?I recently sent a student off to the field with a CD of PDF’s that she and I were using for a collaborative project. Sometimes the networks are down, and the means of getting to the PDF’s vis many different sources is annoying, not always reliable, sometimes changes interfaces (more annoyance) etc., so having them on my hard drive is nice.

  8. To echo Anu’s words, thanks for the inspiration, Greg. I was just informed that I had to move out of my Graduate Lounge because more students are enrolled this semester (and I’ve completed my thesis-writing). Which means that I will be bringing home a box of papers (to add to those that have already been brought home earlier). So I definitely must file some 10 to 20 papers in Zotero before I leave to do my Ph.D. this year :D

