These data were scraped by security consultant Ron Bowles, using code that scans Facebook profiles and collects all data not hidden by privacy settings. Which, as some but not all of you know, are probably not set the way you think they are set for your account.
You can have the data if you want. Click here to download it.
I’m a little unconcerned about these data being spread around. I was already available to anyone who simply went on (after joining) facebook.
Bowles did this in order to highlight the fact that stuff you put on Facebook is pretty much like stuff you leave around on the seat of your car or on a table near an unurtained window. Not invisible.
Bowle’s data set is incomplete, as there are about a half billion Facebook users.
Now, I wonder. What language was the script written in? Please tell me it was in bash using wget and sed! That would be so cool!
The story is all over, but here is one source.
A thought occurs to me that I’d like to add. I don’t now what the format of this data set is, but…. if YOU keep certain things private, like the town you live in, but you indicate family members as your relatives and that is not private, then it would be pretty easy to guess (and yes, only a guess but a good guess) where you live. For instance. In other words, your setting of privacy settings may not be as much in your control as you think. With a wll placed awk script.
Privacy issues like this are why I’m very picky what info about myself I put on social media sites, and why when I do I heavily restrict its visibility.
If we don’t take care of our data, how can we expect anyone else to?
I am on facebook, and I constantly check my privacy settings. I discovered that my phone number had been publicized recently. FB is invaluable in keeping track of family members all over the country, but I resent the steps I have to continually take to keep some semblance of privacy, but it seems an uphill battle. I can’t imagine what a person trying to flee an abusive relationship would have to deal with. I do not publish my city of residence, a network or my birth date. I don’t post anything I would not tell the world, and I have had to contact others to tell them to remove posts on their ‘wall’ that can/would cause them to lose their jobs (HIPAA violations!), and perhaps their licenses.
Not quite sure what the purpose of the data trawl exercise was or why anyone would need to write a script. If it’s all publicly available on Facebook then Google, Bing or Yahoo will find whatever you want.
I like how the BBC News website describes the data as ‘leaked’. How’s that for spin?