It’s the 21st Century. Do you know where your files are?

Spread the love

I would wager that you don’t know where many of your most important files are. If you are into music, and use iTunes, you can’t find a particular song file using your file manager. You would need to locate it using iTunes. iTunes would then give you limited access to that file. It does not let you do the same thing your file manager would let you do. Many of your most important pieces of information are in emails or attached to emails. Where exactly are those things? Can you access them with your file manager with little effort, print, copy, delete, duplicate, or otherwise work with these files? Probably not.

If you use certain operating systems, and access the hard drives on which those systems are installed from the “outside” (as a non-‘user’ but with total access to the contents of the drive) you may not be able to find any of your files very easily, though once you did find them, you’d find a whole bunch of them in one place. However, if you are reading this from your desktop computer at work, there’s a pretty good chance that you are using a thin client. The stuff you think you see as being on “your computer” via, for example, the “My Documents” link on a Windows computer, is not there in the room with you. You don’t really know where those files are. In fact, you may not even be allowed to know where those files are!

In the old days, this was somewhat different. My data were on punch cards backed up on magnetic tape. I didn’t know exactly where the tape was, but I could look through the big glass window and see the tape library, and if I requested a set of data files, someone would mount the appropriate tape reel onto a big tape drive. When writing ‘software’ (including SPSS instruction sets, etc.) I could throw data and software into “scratch” which was a space on a hard drive (not the same as today’s hard drives) or a drum (a drum-shaped hard drive about the size of a Smart Car). The “scratch” space would be deleted on a regular basis, so I would put stuff there specifically knowing that I would not have to clean it up later. It would just go away.

Instruction sets/software and data could be stored on cards, magnetic tape, unruly disks or drums, or even paper punch tape (the mid 20th century version of the USB stick). But no matter what, not only did I (and everyone else) pretty much know where the data were, we also made specific decisions as to where to put the data to accommodate the not very automated process of accessing it or backing it up.

When I got my first real desktop PC (not counting my TRS-80) I ordered it (from my brother, who had access to such things) with two 20 meg hard drives, in an explicit effort to control the “where” part of this process. I configured the computer like this:

Drive C:
Main DOS system and software files
Scratch space (temporary folders, files to delete at next opportunity)

Drive D:
Files to use and now and then back up

This way, I could back up an entire drive and thus back up my data without all those pesky program files. The system and software, in turn, was “backed up” by virtue of me having floppy disks to reinstall everything. In those days, re-installation was not a lot more time consuming than restoring from backup, and allowed for re-configuring software and getting rid of crap software that one regretted installing to begin with.

You may or may not think that this approach was wise, but that is not my point. My point is that I thought about how to manage my digital life with explicit reference to where my stuff was, in particular, my data files.

Most people do this to some extent, but as shown in the examples above, not really. A lot of people probably think that the CD’s they copied into their computer are “in iTunes” as though iTunes was at thing that stuff could be “in.” Which it is not. Those music files are regular computer files no different at the system level than your c.v. stored in a file called “resume_version22_1982.doc” or whatever. But, they, the songs, are located collectively in a set of folders that are in turn organized in other folders in some arrangement that makes it impossible for a normal human to figure out, using folder and files with non human-meaningful names. Same with your email, most likely.

So no, thinking “My music is in iTunes and my email is in Thunderbird” does not allow you to find, copy, delete, read, edit, or otherwise access any of your songs or emails other than by using said software, and then, you can only do with those files what that software allows you to do. Which might or might not include exporting it for use in another format, with exporting capabilities determined by things other than what is technologically possible. Which, in turn, is like having a car that occasionally refuses to turn left for marketing reasons.

The degree to which this disconnect between you and your files is true depends on the operating system you use, and the software you use. Linux/Unix is all about directory systems, and is the most straight forward. In theory, ever single file that is yours is in one directory called “home” or a subdirectory thereof. This is in contrast to Windows, in which your files are … somewhere else, and the system seems to be designed to make it as difficult as possible for you to find them.

This does not mean that you can easily find everything that you might consider a “file” or similar entity in Linux. There is a good chance that your email software uses some bizarro file that you can’t easily see inside of. (I use alpine which puts the emails inside a text file, but hardy anybody does that.) There are “hidden” files in Linux just like in Windows (in Linux, everything that starts with a “dot” (“.”) is automatically “hidden” …. meaning you can’t see it unless you “unhide” that which is hidden). There are other strangeosities as well.

This where to put stuff issue came up when I had a conversation with my brother the other day about backing up. We probably have different backup philosophies. My brother is an actual computer expert who runs other people’s computer and so on, so to him backing up means doing what you need to do so that you can restore the state of the system … software, settings, and data … accurately and quickly when something breaks. To me, backing up is securing the data files (including pictures, documents, whatever) and I do not really care about the software. Of course, there is an overlap. The programs and scripts I’ve written are data, not software, by this thinking.

Also, I have at least to kinds of data: Stuff that does not change and stuff that could change The stuff that does not change includes photographs downloaded off a camera. Yes, I can modify those but I’d generally leave the original intact. Also, all the documents, such as presentations and handouts, for a given class I’ve taught are archived, for various reasons. the original is left untouched and a subsequent year’s material modified from the originals (in theory).

So, there are two kinds of data: Archive and dynamic. In backing up, I can add to the archive without having to check if anything previously backed up has changed, because it hasn’t. But to do this, I have to have a space where I put archive material… a separate directory or device where such things go. That might seem dumb, like extra work, like I should have the computer just back everything up or otherwise automate the system. But consider this: Not counting .iso files (which are a whole ‘nuther issue) I have about 194 gigabytes of “archive” material and 10 gigabytes of “dynamic” material. Is it really smart for me to run a ca 200 gigabyte backup every day, or even every month? No, of course not. Better to have the dynamic stuff backed up all the time, and to maintain the archive in a much slower, more ponderous, but still effective manner. If I’m using an online backup, I don’t need to verify that 200 gigabytes of data is the same as it was yesterday, or last week, or whatever if I know it is. Also, any change in archive is not routine … it is something broken. Once something is in the archive, the copying is always one way, out of the archive from a given source. My archive is worot (write once, read only thereafter).

So this all leads to two pieces of advice: 1) Make choices that allow you maximum direct access to your files (like using Linux instead of Windows, or appropriate choices of application software) and 2) Divide your files and stuff into categories that have to do with the nature of the backup and your access to it, and then by topic. For me, my desktop is my scratch space, Dropbox is my dynamic file space, and I’m not going to tell you where I keep my archive. This means that I have photographs in two places: The archives (downloaded off a camera, left alone) and the various cropped or otherwise revised, and in some cases simply selected and resampled or compressed photos. I have course material in two places: Archived away but accessible, and dynamically changing until I’m done with it (then it goes in the archive).

How do you store your files? How do yo do backups?


ADDED: It has been pointed out (below, in comments) that iTunes now organizes your files in a sensible human readable way. Good for iTunes I was working with older information.

The same commenter suggests that email must also be organized that way. Don’t believe it. Especially if you use Outlook!!

In any case, it is still true that your files are where the secondary software puts them, not where you want them, necessarily. Which is not an entirely bad thing.

Have you read the breakthrough novel of the year? When you are done with that, try:

In Search of Sungudogo by Greg Laden, now in Kindle or Paperback
*Please note:
Links to books and other items on this page and elsewhere on Greg Ladens' blog may send you to Amazon, where I am a registered affiliate. As an Amazon Associate I earn from qualifying purchases, which helps to fund this site.

Spread the love

31 thoughts on “It’s the 21st Century. Do you know where your files are?

  1. RAID array on a server that backs itself up continuously, plus periodic (not too frequent) disk image(s) …

  2. I miss the days when every element of a program was included in its folder, to be easily removed or moved without having to hunt for stragglers or “trust the uninstaller.”

  3. I know where iTunes stores my music files in Windows Vista. It’s C:Users[username]MusiciTunesiTunes MediaMusic. It makes for long file paths, but at least all of my user files (including application profiles) are stored in the same directory. (C:Users[username]), which is a bit of an improvement over past versions of Windows. Unfortunately the default file manager (Windows Explorer) leaves a lot to be desired, but there are free programs that are much better.

    As for backups and archiving, I back up the personal files on my computer to two external hard drives. I should probably do this more often than I do, but so far I haven’t lost anything. I use Backupify to maintain a personal archive of my online activities, such as my blog and Twitter account. I should probably use a cloud service to back up my computer files, but so far I haven’t signed up for one.

  4. In the case of iTunes, I know exactly where my music is. If I’m looking for Particular Song, which I imported from a CD I own, the path will be $HOME/Music/iTunes/iTunes Music/$artist/$album/xx Particular Song.m4a where xx is the track sequence number. If I bought it from the iTunes store and it came with Apple’s DRM, the extension will be m4p (the p means “protected”). iTunes can also store mp3 files, though I’m not sure exactly where under the iTunes Music directory they end up if they’re imported sans metadata (I don’t have any examples on this computer). This is on a Mac running Snow Leopard (10.6). I’ve never used iTunes on a Windows box, so I don’t know where Windows hides those files.

    I can probably find my e-mail files in a similar way. I know to look under $HOME/Library, and it would be another minute or two to find the mailboxes under there.

    MacOS 10.5 and later come with Time Machine, which is a relatively painless way to keep backups of the state of your hard drive (including OS and applications). Previously I had to do backups by hand, as much of the backup software out there was abysmal. For example, one program which came free with an external drive would only back up files with certain extensions, and if you wanted it to back up anything else you had to manually add each of the additional extensions to the list. I found it easier to simply use the Finder’s drag and drop capability, and go away for several hours while it was doing the backup.

  5. i use a small python script that runs under cron for both my personal use and for backups at our congregation. it takes advantage of a truly wondrous option to the “rsync” command (the swiss army knife of file duplication). i forget which option it is, but you tell give it a directory to compare to, and it wll either copy the file or create a link to the old one. this lets you easily build backup scripts that do two things: create a historical archive arranged by date, and avoid any unnecessary copies. drives from the machines to be backed up are automounted over smb.

    the result is a directory structure where, for example, if i wanted to see what my resume looked like last june 4, i would just look at /backups/20090604/home/garry/docs/resume.rtf.
    and if the resume had not changed since then, each of the other copies of that file would just be linked to that one, consuming no additional space.

    i also have things set up so i can access these from the localhost web server, to make it easy for folks to find their stuff if they need it.

    it’s a simple arrangement, but it does what i need. and best of all, i know exactly where my files are.

    🙂

  6. “John, are your two external drives sufficiently far apart that if a meteor hits one of them the other wine will be safe?”

    Doesn’t that always depend on the size of the meteor(ite)?

  7. I wrote up my thoughts on backup strategies a while back, and they’ve been folded into the FreeBSD Handbook.

    I think this is more interesting than how I actually do my own backups. However, for the record, my method is filesystem (Unix dump) backups, with a full backup every few months and incremental backups a couple of times a week (stored locally, on DVDs, and on a remote system). Plus nightly filesystem snapshots for when I delete something by mistake.

  8. I have over 400,000 files on my hard drive.

    One tenth of them are in my home folder, ie, have anything to do with me, my data, my stuff, etc.

    One tenth of THOSE are dynamic files, in current use and being backed up constantly.

    I’m not sure why I need a daily backup of a half million files when I use 4,000 files. A system snapshot now and then makes sense, but not a daily backup.

  9. I’m going to call that ratio: 1:10:100 Greg’s Rule. For every file you use, you have ten you’ve touched but don’t really use, and 100 you don’t even know what they are because they are mainly system and application software related.

    Everybody OK with that?

  10. Yes I do know where my files are, including my music, my video and my email files – In the case of the email files they are indeed human readable.

    Also my ebook files are all accessible via the file manager, even my Accounting system files are human readable, and I know exactly where they are, and can access then via the file manager.

    So too are my configuration files, both for the applications I use (in dot or hidden directories in my home directory) and the System configuration files (in /etc).

    I know where they are and that they are human readable via the file manager, because

    1) I’ve checked, and
    2) I use Linux. (currently Ubuntu Linux 9.10)

  11. if you can’t find your iTunes music library via the file manager on either windows or OSX, you are an idiot.

  12. The depressing thing about Outlook and its evil .pst/.msg files, is that they are considered a de facto industry standard in the forensic and judicial email retrieval/analysis market.

    PST and MSG files are so utterly bad for archival purposes, that when programmers write to Microsoft asking for help with regard to some unexpected behavior, the official responses is that “you should under no circumstances use pst and msg files for archival purposes”. And yet.

  13. Ugh, file management, the last thing I care about. More than happy with the direction things are moving in, applications being in control and files becoming increasingly invisible. For the vast majority of users I think this is a move in the direction of usability – the computer as file manager paradigm needs to die.

  14. Spiv @2, what, the way it still works on Mac OS X? No really, most apps are self-contained items in your “Applications” folder (or wherever else you want to put them) and don’t even have an uninstaller. Some do, and I humbly submit their developers just Don’t Get It. Of course, most apps will put config data or user files in your home directory, which if you feel so inclined you’d have to delete separately.

    E-Mail is really the only major thing where I can’t access things from outside the application, and it’s been that way since I first started using e-mail (Outlook Express on Windows 98, *cringe*), rather annoyingly.

    And sorry to continue on in the Steve-Jobs-fanboy vein, but I do quite like the Time Machine backup system on OS X, because it does only backup the files that have changed. Sure, it backs up the whole system by default which is not strictly necessary – but I can tell it not to. I also do have some “archive data” (mostly audio/video) on external drives. Much of which isn’t adequately backed up, but then it’s not irreplaceable.

  15. The iTunes on Windows must have been an adaptation for a while. On Macs, iTunes has always stored every file in the most findable possible place: $HOME:Music:iTunes:iTunes Library. Even on Classic Macs.

    I use Thunderbird, not Outlook, but I’ll defend mail clients keeping mail in complex structures. Every system has at one time or another, or what do you think an mbox file is? Mail is more like a “spool” than a set of neat files, anyway. It so happens on Macs that the Mail.app actually keeps all attachments in a “dot” (invisible) folder, or did, and I only cared because I was able to put a Folder Action on it 🙂

    By the way, of the 3 systems, I found Mac Folder Actions work out of the box, Windows is harder to set up and harder to script but it works fairly easily, and Linuxes you have to install polling libraries and it’s a mess anyway, and you’re way better off running a hand-built demon in a VHLL or a cron job.

  16. Marion, that may have been an early adaptation. The files where as you say they were, but within that they were not organized in human readable format, in that a human would like to see the names of the songs or artists or something.

    I have no idea what you are talking about with the “folder action” thing.

  17. Greg, I got curious so I looked, and found my email messages under User/library/mail/mailboxes/(particular mailbox)/messages. They’re listed by date and when you click on one, it opens up in Mail, so I don’t know if that is user readable or not. They are.emlx files. G5 iMac running Leopard.

    I don’t actually know anything about computers, files, etc but I can poke around and find things on the mac.

  18. Graycat, you did not see any of your mail messages. You saw a file that you were not able to read.

    But actually, that file type is text, a readable file, and that’s a good way to store email.

  19. I know exactly where all of my media files are, because they are all either backed up on a external drive or on DVD – some are on both. And I know where all of my document files are – both those written by me and papers I have used for source material or just want to read (though the latter is a huge fucking mess that makes me extremely grateful for search functions). Oi, and I know where all of my pictures are too.

    All of my files are backed up on an external drive and every once in a while I take new original documents and pictures and burn them to DVD. And all my pictures also get uploaded to my picasa account, while all my documents are backed up in google documents and Scribd.

    I do not – I repeat do not use bloody iTunes either. Since I got my Cowon media player, I use Jet media player on my computer, because it has fucking incredible sound (as does the player itself) and was free with my mp3 player (it does play video, but I am not one for watching shit on a 3″x4″ screen). But unlike iTunes, I can just pick the file(s) I want to play and click them – they automatically open in Jet media player. Or I take the files I want on my player and drop them onto my player’s memory.

    I manually handle all of my files. I don’t let anything download into a temp file – even software installers. I save all my software installers to an external drive with everything else. Not everything translates perfectly – such as updates, but if I need to transfer everything to another computer, I can do so with minimal fuss and bother. I can completely copy the setup I have on my computer now, in less than three hours. And I save very few passwords on my computer, so I don’t have to worry about that.

    Though it occurs to me that I could totally streamline that whole process if I were to make a DVD of all my software installers, with the settings I like in a notepad file.

  20. Oi – and as far as email goes, I use gmail in browser only. Honestly, I might have started using thunderbird or something, but have no idea how to couple it to gmail and really don’t care. Anything particularly important gets copied to google docs and if I really want to have offline access, I download and save it to an external drive. I would note that I very rarely actually do that. Attachments are always saved, OTH…

  21. For those who are truly attached to to seeing their data as files, it is possible to export even webmail to text files—one file ascii RFC2822 format per email. For example, getmail running in Maildir mode can do so. Email is intrinsically a very human-readable format (at least RFC email).

    Regarding computer-as-filesystem-manager, it may be that the the directories-and-files model is due for a tuneup, but I think the application-centered model is a dead end. The web is built of html and not applets for extremely good reasons, for example. Data is what the users are there for, not your code.

  22. In my case, I periodically copy files from my laptop to my desktop, which has a RAID-1 array mounted as my /home directory. One of those drives is in a caddy, and once in a while I yank it out, drive halfway across the city, swap it with another one, drive back, and slam the “new” drive back in.

    It’s not quite meteor-proof, and the offsite backup can go weeks without a swap, but it does the job for me!

  23. Greg:

    On Macs, the easiest way to automate tasks based on emails sent to me turned out to be using the file system, but in other systems that wasn’t the case.

    Built-in polling and triggers are known as Folder Actions in the Mac OSes, as just a subset of WMI events in Windows Script/VBScript/PowerShell, and are optional software on Linuxes.

Leave a Reply

Your email address will not be published. Required fields are marked *