LinkedIn Data Breach... Er Data Scrape

Ray Alner
Jul 14, 2021
4 min read

The LinkedIn Data Saga

The saga continues! LinkedIn's original "data scrape" this year has continued with even more data for sale on the dark web than previously thought. We are now up to 700 million records being shared online with people other than LinkedIn profiting from our data. The question, that's not being answered is what data was released, who has this data, what is protecting us against these sorts of leaks, and what is LinkedIn doing (or can do) to protect us in the future about other similar data leaks?

What data was released?

Well according to Microsoft, the owner of LinkedIn, there was a wide range of data that is claimed to be "public" that could have been leaked. The list of of data available for "scraping" is here. What's interesting is this list includes data that anywhere else would be considered Personally Identifiable Information (PII), which in any other method of data leak would be considered as a data breach. There is no easy way to determine if you were part of the leak, because most sites only report on data breaches, not data leaks. One site called cybernews.com has a "leak checker" here, although I not sure about how effective it is.

Who got the data?

So far, its listed on the dark web, supposedly for $5,000, based on one article. It is hard to track down because of how fluid the dark web is, and frankly I'm not about to go looking just yet. The data could be sold to bad actors, willing to pay top dollar for quick information on low hanging targets for a phish attack, taking links of who is connected to whom and using that for a targeted name dropping campaign or email fraud.

What is protecting us against this type of data leak?

Usually, most private data is protected by various laws depending on what state, or country you are located. Things like CCPA, GDPR, HIPAA, are some of the laws in the United States, that are backed up by various other laws. But guess what, there is a loophole in the Computer Fraud and Abuse Act (CFAA) written when computers were a twinkle in the eye of most people in the year AD 1986. It, of course, has been updated since then, with the most recent update in 2008, but hasn't changed since then. In the time since then, social media companies, data API's, and data privacy in general has evolved far beyond what was originally thought.

The loophole is this: because the data is searchable and available on the public internet and is intended and provided by a user to be searchable on the internet, companies who find the data can do anything they want with the data, including scraping that data and selling it to whoever they want. There have been case after case with companies asking the government to update and restrict the ability for companies to scrape this data with little recourse and little success as there are successful business models based on this data scraping.

What is LinkedIn doing about the data leak?

This question is a mixed bag at the moment, with critics thinking that LinkedIn could do more. LinkedIn's dilemma is they want to be a service that could show peoples skills to the wider world, and provide tools to industries to be better connected to people who want to be found.

LinkedIn has, in the past, taken companies to court about their data scraping practices, but eventually lost. At this point, with the data they hold and share, the only thing they can do is block bad actors from using their API, as long as the bad actors don't try making another company/account to continue their scraping efforts. Their response does seem rather curt, and doesn't seem like they are trying hard to change the way they handle users data, to the point where it sounds like they are trying to make it clear it isn't a data breach just to protect their backside.

What could be done about data leaks like this?

I think companies could do quite a few things to help with data privacy.

Things like:

Making certain PII data (like birthdays) private by default
Making the API's available to restricted companies who can prove they have a good data management score, or only allow de-identified data on the API
Locking down what data is available through an API
Giving options to a user asking them if they want to be part of an API, or enable API management on their account
Giving users a better and more standardized view of what data is available/visible to other users

Finally...

Companies that all but require we use their platforms to be noticed or be heard or hired for a job need to act like they care about the data we put up. I get it. We are voluntarily putting this data up there to be seen, but we put it up there to be seen by individuals. We get it data breaches and data scraping is part of our world now. But the way you REACT, either good, bad, or neutral will make us decide whether to continue to use your platform or encourage other services to take your place.