Algorithmic Copyright Management: Background Audio, False Positives and De facto Censorship
On July 1, 2020 a video of an interview with BLM protestors posted by the alternative media org Unicorn Riot was removed from various social media platforms due to the presence of copyrighted audio in the background of the video. In this case it was apparently because of Marvin Gaye’s “Let’s Get it On” and 2Pac’s “Keep Ya Head Up,” along with songs by a few other artists.

In another example, the digital media company Tegna recently sent a DMCA notice to Twitter regarding a video quote-tweeted by Greg Doucette, a criminal defense and 1st Amendment attorney from North Carolina, who has been maintaining a mega-thread of videos with examples of police combativeness since the beginning of the Black Lives Matter protests. Other Twitter accounts were also targeted by the same DMCA. Tweet #701 was taken down premised on the idea that the video clip eventually linked to infringed the copyright of a Seattle news station. Doucette, clearly relishing the possibilities of the situation, counter-noticed the DMCA and published a lengthy Twitter thread with an extensive legal analysis of the situation. Powerfully highlighting some of the issues with the overall takedown system, a representative of Tegna claimed in the replies to Doucette that Tegna had tried to "chat with Twitter to clear this up - and then they moved forward with the DCMA claim anyway?" The video was eventually restored, so no lawsuit ensued.
The removal of these videos is by no means anomalous, they are simply some of the the most recent examples in a long series of takedowns rooted in algorithmic copyright enforcement. Although no human being “chose” to take the Unicorn Riot video down, the practical outcome, especially in situations like this, where the music is both incidental and inadvertent, is one of de facto censorship of the primary content.
The list of famous ‘false positive’ online takedowns in the past two decades is quite long, including, of course, the famous “Dancing Baby” case, which went all the way to the 9th Circuit’s Court of Appeals. In that ruling, the court notably ruled that DMCA notice senders must consider fair use when sending notices, but did not provide an objective standard. Another noteworthy instance of a clear false positive was the removal of a 2012 video of the uploader gardening in the wilderness, with no accompanying music or inserted audio. Nevertheless, YouTube’s Content ID algorithm somehow detected a match to copyrighted audio in the video. Videos of the Hugo Awards, the Mars Curiosity Rover, the recent Space X launch, and musicians playing Mozart and other music centuries old, as well as their own, have all been taken down because of mistakes made by algorithmic copyright enforcement, even though it was working as designed, and indeed only as it can, by matching audio.
These false positives, and many more like them, are the result of the unavoidable shortcomings of the algorithmic tools deployed by online platforms to monitor for and address the presence of unlicensed copyrighted material in the flood of uploaded material. These tools, such as YouTube’s Content ID, are meant to detect when copyrighted content is present on the platform, and respond appropriately. However, while an algorithmic system may or may not successfully recognize the presence of copyrighted content, it cannot successfully evaluate the larger context or determine if the use of the content was fair, a notoriously complex evaluation.
Although these removals are all accidental, in the sense that they are false positives, there is also the possibility of deliberately leveraging these flaws in the system. For example, as education has moved online during the pandemic, professors concerned with their recorded lectures being copied or shared without permission have discussed the possibility of taking advantage of the way in which automatic systems flag background music. Theoretically, by including copyrighted music in one’s online lectures at volume levels too low to distract a human viewer, but present enough to trigger the copyright bots, a professor might reduce the frequency with which the lecture videos or recordings were successfully reposted on various online platforms. Similarly, it is easy to imagine how similar techniques could be used to suppress videos of the ongoing protests. Law enforcement, or indeed anyone of any ideological persuasion who was seeking to prevent videos of a particular event from being shared online, need only make sure that copyrighted audio is present with sufficiently recognizable clarity and volume in the background of a protest or other event. A chilling prospect indeed.
[UPDATE - 7/24/2020: The always prescient Cory Doctorow notes that almost exactly one year ago, a Twitter user proposed using exactly this technique to suppress videos of fascist rallies.]
Regardless of the sophistication of any automated content moderation, it is inevitable that a portion of the content removed will consist of false positives. Even when removals are on the surface amusing accidents, like the gardener or the classical musicians, there are still real people being affected, both reputationally and financially. But when the takedowns affect the news or other socially relevant content, as with the videos of the ongoing protests, the consequences can be much more far-reaching and powerful. However, it is very difficult to know the true extent of the false positive phenomenon, since most takedowns happen in the background, and don’t themselves make the news. Better transparency regarding online takedowns, combined with more research into them, can lead to a better understanding of their scope, nature, and consequences, ultimately leading to not only better content moderation practices, but better legal and regulatory policy regarding takedowns in general, copyright related or otherwise.