The Facebook data dump: privacy lessons for users in Ireland
by Maciej Makowski
For anybody who has even the slightest interest in digital privacy, this Easter long weekend was probably spent reading about (or hands-on researching) the Facebook data dump situation, with over 530 million user records going public.
After the initial announcement of the data dump appearing online, made by an Israeli cyber security researcher @UnderTheBreach on the 3rd of April, the media all around the world began reporting on “the Facebook hack”.
In reality there is no evidence (at least for now) of any hacking incident at Facebook – while the company did not officially comment on the revelations, high level Facebook employees say that the leaked records are “old data”, related to a vulnerability that was identified and fixed in August 2019.
The vulnerability allowed users’ data to be scraped without their knowledge and often bypassing privacy settings applied to their accounts. This means that while Facebook did not intend to make the profile information public and while hidden from regular users, the data was still accessible to automated crawlers and scrapers.
The leaked records include: phone numbers, Facebook IDs, name and surname on the account, locations, email addresses (only in some cases), dates of birth (also only in some cases) account creation / update dates, employment details, relationship status, spouse name (in some cases) and whatever details people disclosed in their bio information.
Because the revealed content is dependent on whatever details the users themselves supplied to Facebook, not every breached profile presents a complete data set and is equally abundant in information.
Also, not every single Facebook account has been scraped in this way, so it’s not the case of a complete leak with 100% of FB accounts – although it’s pretty bad, when you consider that Mark Zuckerberg’s own phone number was included in the dump.
The size of this data dump is enormous – 106 countries, over 533 million users. Certain countries (for example the African ones) are grouped together in merged data sets, while others (like the US) have multiple files due to the huge volume of records.
Here I want to focus on a very narrow slice of this material – Ireland and the Irish users.
Before I dive into the details, I want to preface the rest of this post with the following:
- The views and opinions voiced here are mine and mine alone. They do not represent the views and opinions of my employer or any organisation or institution that I might be affiliated with.
- The data in question is well out there in the public domain and it can be acquired with minimum effort from several file sharing sites. That said, I am not going to publish any links to those or share links with individual readers. I think the best approach here is to avoid making a bad situation worse for those whose data has been publicised.
- The leaked materials do not contain user passwords, private messages, photographs or any similar content.
- I will not jump on the bandwagon and bash Facebook. This is currently the dominant narrative in this discourse and I think enough has been said already. The facts are plain and obvious regarding what’s happened and I don’t think they necessitate additional commentary.
- The point of this is post is to offer some solutions and privacy & security advice – all of this will be detailed in the last paragraphs.
- My own details are not in this data set – I just never had a Facebook account under my own real identity. But I know, hindsight is always 20/20…
So here is what we’ve got:
The uncompressed Ireland text file is 131 MB in size and contains approximately 1.45 million records. The data set contains false positives: a fractional amount of those records are not actually Irish users, but somehow they got included on the list. Also, a small amount of phone numbers are generic or invalid.
The first thing one can notice with this data dump is that it’s sorted by user phone numbers, in increasing order. It looks as if whoever was running the crawler was enumerating phone numbers sequentially, possibly trying to get a match on every single phone number to a Facebook profile, with phone number values increasing by 1. Obviously, only those numbers that got a hit were included.
This technique is similar to the automated phone number selection used by phone scammers, who initiate VoIP calls to the enumerated numbers and check which of those numbers are valid.
The Irish data set does not include only Irish nationals – it includes any person whose account can be tied to an Irish phone number. But the user characteristics can be broken down further by the presence of the following criteria:
- male – 655,667 profiles
- female – 647,462 profiles
NOTE: The rest of the profiles did not have a specified gender selected.
The 5 most popular counties from user profiles in Ireland are:
- Dublin – 213,080 mentions
- Cork – 50,708 mentions
- Galway – 31,486 mentions
- Limerick – 25,740 mentions
- Waterford – 15,634 mentions
This data is very general and could be narrowed down further by segregating records accordingly by the “from” and “living” fields. “Living” appears more accurate vs where the person say they are from, so let’s have a look at the top 10, sorting by unique records this time of where people declare they live:
- Dublin – 175,007 records
- Cork – 49,678 records
- Galway – 29,075 records
- Limerick – 24,679 records
- Waterford – 13,815 records
- Wexford – 11,998 records
- Kilkenny – 10,315 records
- Dundalk – 7,405 records
- Kildare – 7,367 records
- Drogheda – 6,825 records
There are a total of 9155 email addresses present in the data dump, which in the context of 1.45 million records is a miniscule amount. Some of these email addresses do not appear to be stored in the email field, creating a discrepancy between the amount of times an email address is mentioned vs the number of records with a valid email address.
Going by just mentions of a matching email domain, in order of popularity, the top 10 email account domains are:
- @gmail.com – 3,876 mentions
- @hotmail.com – 2,794 mentions
- @yahoo.com – 848 mentions
- @eircom.net – 455 mentions
- @yahoo.ie – 413 mentions
- @live.ie – 336 mentions
- @yahoo.co.uk – 326 mentions
- @hotmail.co.uk – 158 mentions
- @live.com – 94 mentions
- @wp.pl – 83 mentions
All of the records with a specified relationship status are broken down in the following way:
- Married – 144,287 records
- Single – 95,085 records
- In a relationship – 81,371 records
- Engaged – 17,315 records
- It’s complicated – 2,215 records
- Separated – 1,821 records
- Divorced – 1,643 records
- Widowed – 1,521 records
- In an open relationship – 813 records
- In a civil union – 659 records
- In a domestic partnership – 636 records
The top 10 specified occupations are:
- Self-Employed – 23,928 records (plus 2,474 for “self employed”, without a hyphen)
- Hollister Co. – 4,747 records
- Stay-at-home parent – 2,754 records
- Retired – 2,673 records
- McDonald’s – 2,490 records
- HSE – 2,479 records
- Dunnes Stores – 2,359 records
- Department of Education and Skills – 2,053 records
- Tesco – 1,938 records
- being a full time mad bastard – 1,890 records
Government employees - sensitive positions
I could not resist the temptation to focus on this.
A sensitive position can be defined in various different terms, but for the purpose of this research let’s define it as a job where revealing your personal information can have particularly negative results for both your professional and private life.
To me it would be primarily law enforcement, the Army, tax services, customs, state prosecutors and other types of regulatory enforcement or oversight. But also medical professionals, given the current pandemic climate…
So here it goes:
- Óglaigh na hÉireann / Irish Defence Forces – 1,423 records
- Irish Defence Forces – 512 records (separate to the above one)
- HSE Ireland – 492 records
- Nurse – 468 records
- Garda – 361 records
- Department of Justice – 174 records
- Revenue – 171 records
- Department of Defence – 112 records
- Department of Foreign Affairs – 74 records
- Department of Health – 41 records
- Customs – 20 records
- Medical doctor – 6 records
- Ombudsman – 6 records
- DPP (Director of Public Prosecutions) – 3 records
The Privacy Advice
No matter how hard you might try, you can’t turn a sausage back into a pig…
As seen above, many companies like Intelligence X that live from collecting, organising and selling breached data have jumped into the process of indexing the data dumps.
The one very obvious piece of advice in all of this that many people right now probably wish they listened to:
Whatever gets out on the Internet, stays on the Internet. Forever.
There is nothing you can do about the leaked data, but you can mitigate the adverse effects of this incident on your privacy. Here’s how:
- The Facebook account you had, even if set to private, is now public. It is forever linked to your real details and is now part of a potential attack surface for cyber stalkers, scammers, phishers, random criminals, hackers or hacktivists. Consider deleting it and also consider whether you want to make a new one or not.
- The phone number (and the email address, if also leaked) is reverse searchable and can be used to identify your other accounts, even if those were created under a pseudonym. Consider getting rid of this phone number (email), even if you have been using it for years.
- If an attacker knows your phone number, they can attempt to take control of it by SIM swap. This is particularly dangerous of you use your phone number for receiving two factor authentication codes. Use application based 2FA or a physical USB key.
- From now on, treat anything you put online, whether you do this in private / restricted mode or not, as information that at any given time might be publicly revealed without your consent. This includes unencrypted on-platform private messages, on any social media service – not just Facebook.
- And finally – if you are a government employee, especially a highly ranked one, do not use your state provided phone number to create a personal Facebook account…
About the Author:
Maciej Makowski - information security specialist with a strong background in criminal investigations and online safety. Spent nearly 13 years working as a police officer and cyber crime detective in An Garda Siochana, Ireland’s National Police and Security Service. Graduate of University College Dublin, also received professional qualification in data protection from the Law Society of Ireland. Experienced Axiom, Encase and FTK digital investigator, certified Cellebrite forensic mobile examiner. Author of osintme.com, a blog on open source intelligence and digital privacy.
The article originally published at: https://www.osintme.com/index.php/2021/04/05/the-facebook-data-dump-privacy-lessons-for-users-in-ireland/