Fun with Linux: The Existential File System

Spread the love

The Linux file system is very different from that found in other systems. You use Linux files all the time, because most of the time that your browser “gets” a web page, it is accessing a Linux file system. Here, I just want to point out a few cool things about the system. In some ways, the system is annoyingly complex, but for good reasons. Some of the key differences between, say, the Linux system and the Windows/DOS system are the very reasons that it is fairly easy to set a virus or other damaging bit of code loose in a Windows computer but difficult in a Linux computer.In Unix/Linux, all filenames are pointers, or links, to files. A file consists of a bunch of data on a disk, but the file does not exist unless it is named. “I have a name therefore I am” is what the file says. If you delete the name, the file ceases to exist, even though it is still there.That is not too different from other file systems, but there is an interesting twist. There is no theoretical limit on the number of filenames that may be linked to a given file. (There are system-wide limits that are usually very large, like 512 maximum number of names per file, etc.)So let’s say I have a file on a linux system named foo.txt. I can then give it another name, perhaps bar.txt. Now there is one file with two names. If I delete foo.txt, the file remains untouched, it has simply reverted to a file with one name (but not the original name).This allows Linux systems to do all sorts of interesting things, as you can imagine. Other systems have “links” or “shortcuts” but they are not real. The Linux links are all of identical status. There is no practical way to tell the difference between two different filenames that are pointing to the same file. This allows Linux systems to do all sorts of interesting things, as you can imagine.For example: On the typical Unix systems and many Linux systems, there are many different users. Each user has separate “home” space where files are kept, and these spaces are strictly separated. One way to keep this separation true, but to co-own data, is to share a file by giving it two names, one in each of two user’s spaces. Under these conditions, each user independently controls whether the file exists or does not exist, from the perspective of that user’s home space. It’s all very post modern.However, this cool feature only works within a given file system. That makes sense because the existential reality of a file exists in a table kept on one file system that connects the name to the actual file. This is at the heart of the Linux system of security and management of data. It would be very stupid to have this key feature duplicated across a network. From a system administrators point of view, you might as well drop your pants and bend over in the Republican Cloakroom in DC.The file system is restricted physically (as in physical disks) and in ways that I do not understand in relation to a given system. But people using computers collaborate across this boundary all the time. So, Unix gurus invented a thing called the “symlink” (symbolic link). The job of the simlink is to … simulate a link!Now, if you make foo.txt, then add the name bar.txt as a simlink, then delete foo.txt, then, bar.txt is an orphan. It points to nothing. That is like the Windows “shortcut.”OK, that’s enough. I know you are drooling for more interesting information about Linux files but it will have to wait. Next time: How the way Linux file systems work determines how the world wide web works.

Have you read the breakthrough novel of the year? When you are done with that, try:

In Search of Sungudogo by Greg Laden, now in Kindle or Paperback
*Please note:
Links to books and other items on this page and elsewhere on Greg Ladens' blog may send you to Amazon, where I am a registered affiliate. As an Amazon Associate I earn from qualifying purchases, which helps to fund this site.

Spread the love

14 thoughts on “Fun with Linux: The Existential File System

  1. It’s “symlink”, for “symbolic link”. And if you check the link, it’s actually a short text file with the path to the actual file inside. The only thing that distinguishes it from a normal file is that there is one bit in the file permission data that tells the system it’s a link, not just some random text file, and makes it treat it accordingly.

  2. It’s also nifty the way everything is presented like a file, including raw access to devices. This makes it easy to do things like save an image of a disk drive or CDROM to a file, then “mount” the file as if it were an ordinary disk drive.

    That feature alone makes it possible to do all kinds of potentially perverse (in a fun and benign way) tricks…

  3. No kidding, SMC. We were playing around with scripts for capturing and transcoding video the other day. Now the tv-capture card appears under linux as just a file – /dev/video0 . So when wehad the idea of transcoding the video on-the-fly without first capturing to a file it was trivial to implement – we just passed /dev/video0 directly to the encoder script in place of the captured filename.

  4. Janne, thanks for the correction on sym vs. sim. I’m making it in the post.I usually try to mention when I post on Linux tech stuff that no one should ever consider me an expert. If you follow my instructions very very carefully, you will ruin your afternoon.

  5. Greg -As a fairly expert linux and windows developper, I think this post is pretty misleading and confusing. There is nothing particularly special about linux’s handling of hard links (multiply-named files, as you describe them) or soft links (aka symlinks, i.e “symbolic” links, not “simulated” links). NTFS 5 has both, and other than the earlier windows filesystems which only have soft links, most every modern filesystem has both as well. And they are hardly used at all in practice… most users get along fine without *ever* explicitly creating a hard link.And none of this really has anything to do with the web, or with security, or with viruses. If anything, the only real difference for the web is that windows treats file names case insensitive while linux treats them case sensitive.For security, one could easily argue that traditional linux-style permission bits (ugo/rwx) are pretty pathetic, cumbersome, and limiting, compared to windows-style ACLs.-Kevin

  6. Kevin,Thanks for the comments. Nothing I discussed regarding Linux file systems has to do with the web. But the way linux file systems work does have important historical and pragmatic implications. I think its OK for me to mention this without giving the details (if you read carefully you will see what I mean).I did not bother to mention that many modern file systems like NTFS have been playing catchup to the Unix file system, thanks for bringing that up.NTFS is probably the worst thing that ever happened to file systems ever, anywhere. I’ve never in my life lost data to a hard drive except one using NTFS.In any event, thanks for your expert comments. Always appreciated.

  7. For me the only confusing thing would be mentioning “Linux Filesystem”. There really isn’t a Linux filesystem, because you can use nearly any filesystem you want on Linux, ext2, ext3, RiserFS, JFS, etc… Likewise, on Windows 2000 you could use NTFS or FAT 32. Linux the OS knows how to handle symbolic links and hard links, but the filesystem chosen doesn’t really care as the OS tells the filesystem what to write.Maybe that was the confusing part to Kevin and sorta for myself? I think I understood your underlying message though, and I hate to sound like a Linux troll. Besides my gripe above, it was a well written post about Linux. I can’t wait to read the next one!

  8. Webs: You are absolutely correct!The confusion here … and it is confused in the literature, this time it is not my fault 🙂 (not this time, anyway), is that “file system” can be and often is applied to two different things. One is the way in which files are managed on the hard drive (like ext or NTFS, not what I was talking about) and the other is the way in which the system handles files. What I wrote in my post applies to ext*, NTFS, etc. …. there is no difference. Thank you for reading my post carefully and pointing out this issue.I could have said something like “the way Linux handles files”The problem with at topic like this is that any one small concise piece will always have many unconnected threads …And yes, Troll Kevin told us he was an expert but revealed the truth when he bitched at me about how NTFS does this too, etc. He is utterly confused. Not well trained, I suspect. But well meaning, I’m sure.

  9. The spiffiest use for hard links has to be the Time Machine feature in OS X Jaguar. (Jaguar, right? Lessee, Panther, Tiger,… yeah, that’s the right one. Getting hard to keep all these cats straight.)Each snapshot in Time Machine contains your entire disk (or, the part you have chosen to back up). But for files that haven’t changed since the previous backup, it just creates another hard link to the already-copied file. So you end up with a given file being shared between dozens (or hundreds, eventually) of different hard links in different directories.Gives the space savings of only copying changed files, while making each backup an actual complete copy of your disk at that point in time. Sweet.Yes, I’m sure this has been done before, but perhaps not with such style. Or anyway, marketing.

  10. Leopard is the newest one if that is what you are referring to Johnny.Crap! Now I’m a MAC troll ;)But anyways, Time Machine is just another example of taking an idea in Technology that already exists and making it easy to understand, use, and making it practical. As with nearly all Apple features.

  11. Dammit! I told you I can’t keep these cats straight!Probably because the new box doesn’t have any cat on it, just a big X and a fakety-fake swirly galaxy. Yeah, Leopard is what I mean.Posting from 00:19:e3:d4:53:3cCrap! Now I’m a MAC troll.

Leave a Reply to SMC Cancel reply

Your email address will not be published. Required fields are marked *