marc

Webinar: Good vs. Bad Scraping: Protecting Critical Research Activities While Combating Unauthorized Scraping

Event Date/Time:

Tuesday, December 5, 2023 | 10:00 – 11:00 a.m. PT/ 1:00 – 2:00 p.m. ET

Industry actors, policymakers, regulators, and academics alike share concerns about scraping: the large-scale collection of data available on websites and applications without the authorization of the platform or in violation of Terms of Service. As demand for platform data has grown, companies have taken steps to limit access to user data. In response, unauthorized scraping is on the rise.

For researchers seeking to better understand platform abuses, such as mis- and dis-information and online extremism, web scraping can be an important tool for studying threat actors’ tactics and understanding patterns and trends of platform activity. The Mitigating Unauthorized Scraping Alliance (MUSA), which brings together leading companies and experts in a unified front against unauthorized data scraping, and the Center for Information Technology Research in the Interest of Society and the Banatao Institute (CITRIS) Policy Lab, invite you to a joint webinar to dive into exploring “good” vs. “bad” web scraping applications. We will examine safeguards necessary to prevent privacy risks; explore key definitional questions to improve shared understanding of “good” vs. “bad” scraping activities; and discuss areas of collaboration for researchers, industry, and regulators to enable researchers to examine key questions for tech and society while ensuring websites and platforms maintain essential user protections.

Speaker and Moderator:

David Patariu, Associate, Venable LLP; Privacy Law Specialist (PLS); IAPP Fellow of Information Privacy, CIPP/US, CIPP/E, CIPM

Dr. Brandie Nonnecke, Director, CITRIS Policy Lab; Associate Research Professor at the Goldman School of Public Policy, UC Berkeley

Talking Past Each Other: The Legal and Technical Challenges of Harmful Web Scraping—White Paper Fast Facts

In a new research paper, Professor Timothy H. Edgar, Professor of the Practice of Computer Science, Brown University, and Lecturer on Law, Harvard Law School, examines the legal and technological challenges of harmful web scraping, highlights trends that could exacerbate the problem, and proposes possible legislative solutions, including amending the Computer Fraud and Abuse Act (CFAA) or addressing harmful web scraping through broader privacy legislation in the United States and elsewhere. Read some Fast Facts from the paper here.

MUSA Hosts Webinar on Harmful Web Scraping Research

The Mitigating Unauthorized Scraping Alliance hosted a webinar on June 22, 2023 focused on a new research paper by Timothy Edgar, “Talking Past Each Other: The Legal and Technical Challenges of Harmful Web Scraping”. The webinar featured David Patariu, Attorney at Venable LLP, in conversation with Timothy Edgar, Professor of the Practice of Computer Science at Brown University, Senior Fellow at the Watson Institute for International Studies, and Public Affairs Lecturer on Law at Harvard Law School.

The discussion set the stage with an overview of web scraping and clarified its distinction from web crawling. Timothy Edgar defined scraping as the automated practice of collecting data from websites based on predefined patterns and more invasive than web crawling, which involves identifying and indexing content on web pages. The conversation highlighted the important role of robots.txt files in instructing web crawlers and scrapers which portions of a website they can visit, but emphasized the recent breakdown of norms of scrapers and crawlers adhering to these instructions. David Patariu explained that this can be partially explained by the rise of generative AI and its relationship to scraping since web scraping provides large amounts of real-world data for training models, but commercial pressures to stay competitive have contributed to a growing disregard for instructions like robots.txt and Terms of Service.

Timothy Edgar also discussed what he refers to in his paper as “unwanted” or unauthorized scraping, defined as the automated collection of data that violates a website’s terms of service. He indicated that some scraping can be considered more innocuous than others such as scraping for scientific research, though acknowledged that all scraping poses certain risks. This led to an exploration of the potential harms of web scraping, particularly in regard to the automated collection of personal information for commercial or criminal exploitation. Speakers identified that the misuse of personal information intended for specific contexts, for example dating profiles, poses significant privacy risks and even harm to users. Timothy Edgar cited real-world examples, such as that of Clearview AI’s scraping of billions of photographs for facial recognition purposes, to illustrate the possible privacy violations associated with web scraping. The conversation further revealed that web scraping is often the “bread and butter” reconnaissance for more malicious activity, and the collection of data such as email addresses can serve as a gateway to phishing and access to a public site’s more private sections.

The webinar also highlighted the importance of authentication, authorization, and access control (AAA), which are well understood in the field of cybersecurity and explored in the paper. These technical concepts play a crucial role in achieving cybersecurity goals, verifying identity, granting access, and setting limitations. The speakers discussed the common misunderstanding that lawyers have when dealing with these terms, clarifying that authentication and authorization, while related, are distinct processes that can happen in any order and are often erroneously conflated with a login process that allows a computer to limit access to a particular user with an account. Speakers discussed how the Computer Fraud and Abuse Act (CFAA), the main anti-hacking law which was written in the 1980s, uses the terms “without authorization” and “exceeding authorized access” without mention of authentication, which has created problems for the courts who have struggled to accurately interpret these terms. Timothy Edgar proposed that one solution to this problem would be updating the CFAA to provide a technically-sound definition of authorization and clarify the role of authentication, though he emphasized that amending the CFAA would only be a partial remedy since the CFAA only addresses the rights of owners but not rights of users whose privacy is violated, and indicated that we need a comprehensive privacy law in the US.

In conclusion, the webinar provided valuable insights into the complex world of web scraping, raising important questions about privacy, security, and ethical considerations. Speakers emphasized the need for regulators and policymakers to be aware of the real potential for harm that unauthorized web scraping poses and the privacy problems that can result from the misuse of publicly available user data. Balancing the benefits of web scraping with the protection of personal information remains a challenge, requiring collaboration between legal and technical experts and the forming of public-private partnerships much like the ones the Mitigating Unauthorized Scraping Alliance is driving in order to find effective solutions.

New Research Paper Examines Legal and Technical Challenges of Harmful Web Scraping, Proposes Solutions

Mitigating Unauthorized Scraping Alliance to Host Webinar Discussion on June 22, 2023

WASHINGTON, D.C., June 16, 2023 – In a new research paper, Professor Timothy H. Edgar, Professor of the Practice of Computer Science, Brown University, and Lecturer on Law, Harvard Law School, examines the legal and technological challenges of harmful web scraping, highlights trends that could exacerbate the problem, and proposes possible legislative solutions, including amending the Computer Fraud and Abuse Act (CFAA) or addressing harmful web scraping through broader privacy legislation in the United States and elsewhere.

Professor Edgar seeks to define the problem of harmful web scraping and explains how lawyers and technologists have defined the technical concepts of authentication, authorization, and access control in different ways, with lawyers and the courts confused by these technical concepts.

In the paper, Professor Edgar notes three new trends that are aggravating the problem of malicious web scraping, including increased demand for scraped data related to the advent of generative AI, the breakdown of norms in the tech community against unwanted scraping, and the perception that decisions of federal courts to narrow the CFAA could be misinterpreted and misunderstood as a green light for unwanted scraping.

Professor Edgar asserts that website owners should be incentivized – or at least not deterred – from using technical measures to limit scraping that could impact the privacy and security of users and others whose data could otherwise be at the mercy of malicious bots, scrapers, and scammers. He argues that a useful first step in addressing the problem of harmful web scraping is for lawyers, policymakers, and computer security experts to come together to give terms like authentication, authorization, and access control their technical meaning to facilitate policy and legal solutions to these challenges.

In the paper, he says that policymakers in the United States and elsewhere should also take steps to prevent harmful scraping, while ensuring appropriate exceptions for scraping for valid commercial purposes and for legitimate and ethical research. Professor Edgar believes that Congress could do so by amending the CFAA, by addressing the problem of harmful scraping in comprehensive privacy legislation, or by doing both.

Talking Past Each Other: Webinar on The Legal and Technical Challenges of Harmful Web Scraping

The Mitigating Unauthorized Scraping Alliance (MUSA) will host a complimentary webinar discussion with Professor Edgar on Thursday, June 22, 2023 at 2:30 p.m. EDT to discuss this new research.

Media is invited.

About MUSA

MUSA brings together leading companies committed to protecting data from unauthorized scraping and misuse. In collaboration with industry members, policymakers, and the public, MUSA is focused on protecting user data through education, advocacy, public-private partnerships, and the sharing of reasonable practices to mitigate unauthorized scraping.

Highlights of MUSA International Data Privacy Day 2023 Event: The State of Unauthorized Scraping

In observance of International Data Privacy Day, the Mitigating Unauthorized Scraping Alliance (MUSA) hosted an event on January 31, 2023, featuring industry, legal, and academic experts who examined the issue of unauthorized data scraping and its impacts. Check out the videos below for some of the highlights from the event.

Highlights from Panels #1 and #3: The State of Unauthorized Scraping Enforcement & Its Impact on Industry

Highlights from Panel #2: The Impact of Unauthorized Scraping on Users

Webinar: The Legal and Technical Challenges of Harmful Web Scraping

Event Date/Time:

Thursday, June 22, 2023 | 2:30 p.m. – 3:10 p.m. ET

MUSA hosted an engaging discussion with cybersecurity expert and privacy lawyer Professor Timothy H. Edgar about his newest research on the issue of harmful web scraping. This webinar addressed common misunderstandings that lawyers and technologists may have regarding this topic. We delved into the technical concepts of authentication, authorization, and access control; explored their relationship to web scraping; and examined the current legislative limitations and potential solutions.

Speaker and Moderator:

David Patariu, Associate with Venable, Privacy Law Specialist (PLS); International Association of Privacy Law Specialists Fellow of Information Privacy, CIPP/US, CIPP/E, CIPM; (ISC)² CCSP, CISSP

Timothy Edgar, Professor of the Practice of Computer Science, Brown University; Senior Fellow, Watson Institute for International and Public Affairs; Lecturer on Law, Harvard Law School