Research

Defining Web Scraping to Improve Mitigation by Morgan Guesdon

In his white paper, Morgan Guesdon outlines the steps involved in a web scraping operation, also known as data extraction. By establishing a consensus on these specific steps, the various actors involved in mitigation of unauthorized web scraping can more efficiently coordinate and exchange relevant information. The primary goal of this framework is to streamline communication and enhance the effectiveness of countermeasures against such activities.

Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

Download PDF [698.24 KB]

Assessing the Use of Scraped and Hand Collected Online Data to Understand Crime by Dr. Thomas J. Holt

In a research paper, “Assessing the Use of Scraped and Hand Collected Online Data to Understand Crime,” Dr. Thomas J. Holt, Professor of Criminal Justice at Michigan State University, outlines how scholars collect online data to study crime, including extremism and terror, online illicit markets, sexual offenses, and other crimes. The paper identifies the use of automated scraping tools to gather data from online platforms and manually saving content from browsers and other applications as the primary methods used by researchers. Holt highlights issues for researchers to consider in gathering such data and offers suggestions to protect the privacy of users whose behaviors are retained in any online data set. He emphasizes the importance of  researchers using data scraping tools in ways that comply with the policies of the originating platform and don’t harm the operations of the hosting platforms. The paper encourages collaboration between groups like the Mitigating Unauthorized Scraping Alliance (MUSA), criminologists, university Institutional Review Boards, and federal funding agencies to promote understanding of best practices for data collection, storage, and analysis.
 
Dr. Holt discussed the paper and its findings in a webinar hosted by the MUSA on January 31, 2024.
Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

Download PDF [451.34 KB]

Talking Past Each Other: The Legal and Technical Challenges of Harmful Web Scraping

In a new research paper, Professor Timothy H. Edgar, Professor of the Practice of Computer Science, Brown University, and Lecturer on Law, Harvard Law School, examines the legal and technological challenges of harmful web scraping, highlights trends that could exacerbate the problem, and proposes possible legislative solutions, including amending the Computer Fraud and Abuse Act (CFAA) or addressing harmful web scraping through broader privacy legislation in the United States and elsewhere.

Professor Edgar seeks to define the problem of harmful web scraping and explains how lawyers and technologists have defined the technical concepts of authentication, authorization, and access control in different ways, with lawyers and the courts confused by these technical concepts.

In the paper, Professor Edgar notes three new trends that are aggravating the problem of malicious web scraping, including increased demand for scraped data related to the advent of generative AI, the breakdown of norms in the tech community against unwanted scraping, and the perception that decisions of federal courts to narrow the CFAA could be misinterpreted and misunderstood as a green light for unwanted scraping.

Professor Edgar asserts that website owners should be incentivized – or at least not deterred – from using technical measures to limit scraping that could impact the privacy and security of users and others whose data could otherwise be at the mercy of malicious bots, scrapers, and scammers.  He argues that a useful first step in addressing the problem of harmful web scraping is for lawyers, policymakers, and computer security experts to come together to give terms like authentication, authorization, and access control their technical meaning to facilitate policy and legal solutions to these challenges.

In the paper, he says that policymakers in the United States and elsewhere should also take steps to prevent harmful scraping, while ensuring appropriate exceptions for scraping for valid commercial purposes and for legitimate and ethical research. Professor Edgar believes that Congress could do so by amending the CFAA, by addressing the problem of harmful scraping in comprehensive privacy legislation, or by doing both.

Professor Edgar discussed the paper and its finding in a webinar hosted by MUSA on June 22, 2023.

Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

Download PDF [377.95 KB]