The Mitigating Unauthorized Scraping Alliance (MUSA) sets out an explainer on unauthorized webscraping, as well as its impact on industry, individuals and privacy.
The Problem Space
User data has become a valuable commodity which threat actors seek and platforms protect. Threat actors have turned to automated mass collection of user data to create and sell datasets, replicate existing legitimate webpages, or exploit information for purposes such as stalking or surveillance. In order to raise awareness about the importance of safeguarding data, it is valuable to understand the rise of unauthorized scraping and its impact.
Defining Unauthorized Scraping
‘Authorized scraping’ is the automated collection of data with expressed permission. ‘Unauthorized scraping’ is the automated collection of data that violates a platform’s Terms of Service. This involves the collection of data that a user shares with other users or is accessible as a result of a user unwittingly sharing access to their account. Therefore, unauthorized scraping is not considered a breach of a platform’s security protections. The use of unauthorized scraping to access user data creates the possibility of data misuse. Given the threat of unauthorized scraping, it is important to highlight its implications and raise awareness around safeguarding data and user protection.
How Scraped Data is Used
Demand for data that informs marketing, business development, and personal targeting has significantly increased over the past decade and has fueled the growing market for user data. Simultaneously, companies have limited the supply of data by restricting its access to protect against user data misuse. As a result, there has been an unprecedented rise in the amount of unauthorized scraping incidents with negative implications for both companies and users.
Threat actors are motivated to engage in unauthorized scraping for their own personal and financial gain. Some threat actors scrape to create datasets and databases of aggregated scraped user information that can be bought, sold, or posted online by third-party actors for profit. Depending on the nature of the scraped data, it may be possible to facilitate phishing or spamming attacks, plant spyware, or steal credentials to further exploit individuals. Threat actors can also use unauthorized scraped data to create clone sites, which impersonate legitimate webpages.
In addition, they can aggregate scraped data into datasets for sale on data broker websites or for targeted advertising and marketing purposes. Often legitimate businesses or researchers are not aware that the services they rely on use unauthorized scraped data. Threat actors also access user data for political value by using targeted datasets for purposes such as reconnaissance or surveillance. Enemy nation states can also take advantage of unauthorized scraped data for their own gain. It is important to note that not all instances of unauthorized scraping lead to the aforementioned impacts.
The Impact of Unauthorized Scraping
The impacts of unauthorized scraping are far-reaching. Both unauthorized scraping and the subsequent use of the data decreases public trust and threatens industry reputations. It can also lead to system slowdowns, increased costs, and the loss of control over data. For users, unauthorized scraping reduces user control over information and can lead to spamming, fraudulent communication, identity targeting, surveillance, and unexpected disclosures of content intended to be temporary.
Combating Unauthorized Scraping
Currently, there are no industry standards for combating unauthorized scraping. A recent study conducted by NewtonX highlighted that nearly 90% of experts surveyed believe unauthorized scraping prevention is either important or very important, but only 42% of respondents have established strategies to address the practice. To address these gaps, NewtonX concluded that effectively tackling unauthorized scraping requires a collaborative and multi-stakeholder effort. While there is no singular approach to combating unauthorized scraping, there are an array of practices that companies engage in to mitigate unauthorized scraping. Consequently, there is a demonstrable need to foster public-private dialogue and to mitigate the current lack of industry-wide collaboration to combat unauthorized scraping.
About The Mitigating Unauthorized Scraping Alliance
Mitigating Unauthorized Scraping Alliance (MUSA) brings together industry members to address these challenges to offer a unified front against unauthorized scraping and data misuse. MUSA is working with member companies and experts to publish industry-aligned practices for unauthorized scraping mitigation with the goal of making unauthorized scraping more difficult across member platforms, reducing the attack vector for unauthorized scraping threat actors, and serving as a resource for media and policymaker engagement.
MUSA provides insight, knowledge, and expertise to the public on unauthorized scraping by hosting public education events like an International Data Privacy Day Panel Event on January 31, 2023 and publishing a monthly newsletter highlighting unauthorized scraping related news and events.
If you would like to learn more about MUSA and stay informed about unauthorized scraping visit our website and connect on LinkedIn. If you are interested in joining a diverse group of industries and experts in combating unauthorized scraping and want to get involved with MUSA, contact us or fill out the: Membership Inquiry Form.
This article was originally published at techUK.org.