search
HomeTechnology peripheralsAICopyright battle: The Sword of Damocles hangs over AI creations

​Author | JAMES VINCENT

Translator | Zhu Xianzhong

Generative artificial intelligence has become increasingly popular since last year. Companies like Microsoft, Adobe, and GitHub are integrating the technology into their products; startups are raising hundreds of millions of dollars to compete; the software is even having a cultural impact, with text-to-image AI models spawning countless models. Because of culture. But listen carefully to any industry discussion about generative AI, and you'll hear, in private, advocates and critics alike murmuring the question in increasingly concerned tones: Are these, in fact, legal?

The problem arises because of the way generative AI systems are trained. Like most machine learning software, they work by identifying and replicating patterns in data. But because the training data these programs use to generate code, text, music, and art is itself created by humans, scraped from the web, and much of it is itself copyrighted in some way.

For AI researchers living in the distant past (aka the “2010s”), this isn’t a big deal. At the time, state-of-the-art models could only generate blurry black-and-white images of faces the size of a fingernail. There is no apparent threat to humans. But in 2022, when a mere amateur can replicate an artist’s style in a matter of hours using software like Stable Diffusion, or when companies sell AI-generated prints and social media filters, these are all things that are alive and well. Issues of legality and ethics become even more pressing when it comes to designer knockoffs.

Take Disney illustrator Hollie Mengert, who found her artistic style cloned for an artificial intelligence experiment by a mechanical engineering student in Canada. The student downloaded 32 of Monguet's works and spent several hours training a machine learning model that could reproduce her style. As Mongert told tech expert Andy Baio (who covered the case): "Personally, it feels like someone is doing the job that I do, using the things that I learned — which I did from 2011 Been a working artist since art school - and used it to make art that I don't agree with and don't allow."

But is this fair? Is there anything Mongert can do about this?

To answer these questions and understand the legal landscape of generative AI, The Verge spoke to a range of experts, including lawyers, analysts, and employees at AI startups. Some say with confidence that these systems have a high risk of copyright infringement and may face serious legal challenges in the near future. Others argue the opposite: everything currently happening in the field of generative AI is legal, and any lawsuits are bound to fail.

“I see people on both sides being very confident in their position, but the reality is no one knows,” Baio, who has been closely following the generative AI scene, told the Verge personnel. "Anyone who says they are confident they know how this matter will play out in court would be wrong." Andres Guadamuz, a scholar of intelligence and intellectual property law, said that although there are many unknown factors, there are also several key questions from which many uncertainties on this topic unfold. First, can you copyright the output of a generative AI model? If so, who owns it? Second, if you own the copyright to the inputs used to train the AI, do you have any legal claims on the model or the content it creates? Once these questions are answered, a larger question arises: How do you deal with the impact of this technology? What legal restrictions can or should be imposed on data collection? Can there be peace between the people who build these systems and those who need the data to create them?

Let’s answer these questions one by one.

Copyright battle: The Sword of Damocles hangs over AI creations3. Output question: Can you obtain copyright for something created by an artificial intelligence model?

In September of this year, the U.S. Copyright Office approved the registration of a comic book for the first time, which was created with the help of text-to-image AI Midstravel, an AI artwork generator similar to Open AI company’s DALL·E product software). The comic is a complete work: 18 pages of narrative, with characters, dialogue and traditional comic layout. The comic's copyright registration has not actually been revoked, despite reports that the U.S. Copyright Office is reviewing its decision. It appears that one factor in the review will be the level of human investment involved in producing the comics. Kristina Kashtanova, the artist who created the work, told ipwatchdog.com that the U.S. Copyright Office asked her to "provide details of my process to show that there were a large number of human beings involved in the creation of this graphic novel Participation.” (The U.S. Copyright Office itself does not comment on specific cases.)

According to Guadamuz, this will be an ongoing issue when it comes to generating content for AI-powered Copyright is granted for the work. "If you just type 'cat by van Gogh,' I don't think that's enough to get copyrighted in the United States," he said. "But if you start experimenting with prompts and make a few images, start fine-tuning the images, start working with torrent data, and start engineering more, I can totally see that being copyrighted."

Considering this criterion, the vast majority of the output of the generative artificial intelligence model Most likely cannot be protected by copyright. They are usually mass-produced using a few keywords as prompts. But a more involved process yields a better case. These may include controversial works, such as an AI-generated print that won a National Art Fair competition. In this case, the creator said he spent weeks honing the prompts and manually editing the finished work, indicating considerable intellectual investment.

Computer scientist Giorgio Franceschelli, who has written about copyright issues in artificial intelligence, said measuring human input was "particularly true" for deciding cases in the European Union. And in the UK – another major jurisdiction focused on by Western AI startups – the law is different again. Unusually, the UK is one of the few countries to provide copyright for works that are solely computer-generated, but it considers the author to be "the person who made the arrangements necessary for the creation of the work". Again, there is the issue of multiple "readers" (is this "person" the developer of the model or its operator?), but it provides precedence for granting some kind of copyright protection.

Ultimately, though, Guadamuz warns, registering a copyright is only the first step. "The U.S. Copyright Office is not a court," he said. "If you want to sue someone for copyright infringement, you need to register, but ultimately it will be the courts that decide whether this is legally enforceable."

Copyright battle: The Sword of Damocles hangs over AI creations

##5. Input question: Can you use copyrighted data to train an artificial intelligence model?

For most experts, the biggest questions about artificial intelligence and copyright relate to the data used to train these models. Most systems are trained on large amounts of content obtained from the web; it can be text, code, or images. For example, the training dataset for Stable Diffusion—one of the largest and most influential text-to-painting technologies—contains billions of images pulled from hundreds of domains; from personal blogs on WordPress and Blogspot to DeviantArt There are art platforms like Shutterstock and stock image sites like Shutterstock and Getty Images. In fact, the training data sets for generative AI are so huge that there’s a good chance you’re already one of them.

The rationale used by artificial intelligence researchers, startups, and multi-billion dollar tech companies is that use of these images (at least in the United States) is protected by the fair use doctrine, which aims to In encouraging the use of copyrighted works to promote freedom of expression.

Vanderbilt Law School professor Daniel Gervais explains that there are many considerations when deciding whether something is a fair use factor. Gervais specializes in intellectual property law and has written extensively about how this intersects with artificial intelligence. However, he said two factors "stand out more". "What is the purpose or nature of the use, and what is the impact on the market." In other words: whether the use case changes the nature of the material in some way (often described as a "transformative" use), and whether it changes the nature of the material by communicating with the original author. Are their livelihoods threatened by competing works?

6. Training generative AI on copyrighted data may be legal but you may be using the model in illegal ways

Given the liability associated with these factors, Gervais said there is a "high likelihood" that the copyrighted data training system will be used fairly. But this is not necessarily true for generated content. In other words: you can train an artificial intelligence model using someone else's data, but what you do with the model may be infringing. Think of it like the difference between making counterfeit money to go to the movies and trying to buy a car with counterfeit money.

Consider the same text-to-image AI model deployed in different scenarios. If the model is trained on millions of images and used to generate new images, then this is extremely unlikely to constitute Piracy. In the process, the training data has been transformed and the output does not threaten the market for the original art. But if you fine-tune your model on 100 photos of a particular artist and generate images in the same style, an unhappy artist could make a strong case against you.

Gervais said, “If you gave an AI 10 Stephen King novels and then said ‘make a Stephen King novel,’ then you would be directly talking to Stephen King. Competition. Is this a fair use of artificial intelligence? Probably not."

Crucially, however, between the poles of fair and unfair use, there are countless situations where inputs, purposes and outputs are all balanced in different ways, and may affect in some way any legal ruling.

Ryan Khurana, chief of staff at generative AI company Wombo, said most companies selling these services are aware of the differences. "Intentionally using hints based on copyrighted works to generate output... violates every major player's terms of service," he told the Verge in an email. But, he added, "this is difficult to enforce" and the company feels more The interest is "in coming up with ways to prevent models from being used in copyright-infringing ways... rather than limiting training data." This is especially true for open source text-to-image models like Stable Diffusion, which can be trained and used with zero supervision or filtering. The company may be covering itself, but it may also be facilitating uses that infringe copyright.

Another variable in determining fair use is whether the training data and models were created by academic researchers and nonprofit organizations. This often strengthens fair use defenses, and startups know this. For example, Stability AI does not directly collect training data for the model, nor does it train the model behind the software. Instead, it funded and coordinated this work by academics, and the stable diffusion model was licensed from a German university. This allows Stability AI to turn models into commercial services (DreamStudio) while maintaining legal distance from their inventors.

Baio calls this approach "AI data cleaning." He noted that this approach has been used before in creating facial recognition AI software, pointing to the example of MegaFace, a dataset compiled by researchers at the University of Washington by scraping photos from Flickr. "Academic researchers took the data, cleaned it, and then used it by commercial companies," Baio said. He said the data - including millions of personal photos - was held by "facial recognition company Clearview AI, law enforcement departments and the Chinese government”. This repeatedly tested and proven cleaning process may help protect the creators of generative AI models from liability.

There’s one final twist to all this, however, as Gervais points out that there’s currently a lot of controversy surrounding the pending Supreme Court cases involving Andy Warhol and Prince. The interpretation of fair use may change in the coming months. The case involved Warhol's use of Prince's photographs to create artwork. Is this fair use, or copyright infringement?

Gervais said: “The Supreme Court often gets the benefit of the doubt, so when they do, they usually do something significant. I think they’ll do the same here. ". "And, while waiting for the Supreme Court to change the law, there is a risk in saying anything is settled law."

Copyright battle: The Sword of Damocles hangs over AI creations

7. How can artists and artificial intelligence companies achieve peace?

Even if the training of generative artificial intelligence models is found to be covered by fair use, this will hardly solve the problem in the field. This will not appease artists who are angry that their work is used to foster business models, nor will it necessarily apply to other areas of generative AI, such as code and music. With this in mind, the question is: What remedies, technical or otherwise, can be introduced to allow generative AI to flourish while providing credit or compensation to the creators whose work makes the field possible?

The most obvious suggestion is to license the data and pay its creators. For some, though, this will kill the industry. Bryan Casey and Mark Lemley, authors of "Fair Learning," said the training data set was so large that "there was no reasonable option to license all the underlying photos." , video, audio files or text for new purposes”. Allowing any copyright claim, they argue, "is tantamount to saying that instead of the copyright owner being compensated, the use is not allowed at all." Allowing “equitable learning,” as they call it, not only encourages innovation but also allows for the development of better AI systems.

Others point out, however, that we have already solved copyright issues of similar scale and complexity and can do so again. Several experts interviewed by The Verge cited a comparison to the era of music piracy, when file-sharing programs were built on massive copyright infringement and thrived before legal challenges emerged that led to new protocols that respected copyright.

Earlier this month, Matthew Butterick told the Verge: “So, in the early 2000s, you had Napster, and everyone loved it, But it was completely illegal. And today, we have companies like Spotify and iTunes." Currently, Matthew Butrick is a lawyer who sues companies for collecting data to train artificial intelligence models. “How do these systems come about? With companies doing licensing deals and bringing content in legally. All the stakeholders are involved and make it work. To me, the idea that something like that is impossible to happen with artificial intelligence is a disaster. 8, companies and researchers are already experimenting with ways to compensate creators

##Wombo’s Ryan Khurana predicts a similar result. “Music has by far the most complex copyright rules because of the different types of licenses, the variety of copyright holders, and the various intermediaries involved,” he told the Verge. “Given the nuances of the legal issues surrounding AI The difference, I think the whole field of generation will evolve into a licensing system similar to music."

Other alternatives are also being trialled. Shutterstock, for example, said it plans to set up a fund to compensate individuals who sell their work to AI companies to train their models, while DeviantArt created a metadata tag for images shared online that warns AI researchers not to tamper with its content. (At least one small social network, Cohost, already uses the label on its site and says it "will not rule out legal action" if researchers are found to have gone out of their way to remove its images.) However, these methods have met with criticism from Art different groups in the community. Will a one-time license fee make up for lost livelihoods? How can scratchless tags being deployed now help artists whose work is already used to train commercial artificial intelligence systems?

For many creators, it seems the damage has been done. But AI startups are at least proposing new approaches for the future. A clear advance is that AI researchers only need to create databases where there is no possibility of copyright infringement - either because the material is properly licensed or because it was created for the specific purpose of AI training of. One such example is “The Stack” – a dataset used to train artificial intelligence designed specifically to avoid accusations of copyright infringement. It only contains code with the most permissive open source licenses and provides developers with an easy way to delete data upon request. Its creators say their model can be used across the industry.

“The Stack’s approach can definitely be applied to other media,” Yacine Jernite, director of machine learning and society at Hugging Face, which created The Stack in partnership with ServiceNow, told The Verge . “This is an important first step in exploring the widespread consent mechanisms that work best given the rules of the platforms from which AI training data is extracted.” Jernite said Hugging Face hopes to help create how AI researchers treat A “fundamental transformation” of the creator. But so far, the company's approach has been unusual.

9. What will happen next?

No matter where we fall on these legal issues, various players in the field of generative artificial intelligence are already preparing for something. Companies that make millions from this technology are solidifying their position by repeatedly claiming that everything they do is legal (while probably hoping that no one actually questions that claim). On the other side of the “no man’s land,” copyright holders have made tentative stances without fully committing to action. Getty Images recently banned AI content because of the potential legal risks it posed to clients (CEO Craig Peters told the Verge last month): “I don’t think it’s the responsible thing to do; on the contrary, I think it's probably illegal). And the music industry trade group RIAA claims that AI-powered music mixers and extractors infringe members' copyrights (although they haven't launched any actual legal challenges ).

However, with the recent class action lawsuit filed against Microsoft, GitHub and OpenAI, the first shot in the artificial intelligence copyright war has been fired. The case accuses the above three companies of failing to obtain The open source code was deliberately copied through the AI ​​coding assistant Copilot without proper permission. In an interview with The Verge, lawyers for the lawsuit said that this may set a precedent for the entire field of generative artificial intelligence.

“However, once someone reveals the truth, I think lawsuits will start flying everywhere. ”

Meanwhile, both Guadamuz and Baio said they were surprised there haven’t been more related legal challenges. “Honestly, I’m stunned,” Guadamou said "But I think that's partly because these industries are afraid of being the first to lose (a prosecution)," Mutz said. However, once someone reveals the truth, I think lawsuits will start flying everywhere. "

One difficulty, Baio said, is that many of the people most affected by this technology - artists and others - simply are not in a good position to mount a legal challenge. "They There are no resources," he said. "This type of litigation is very expensive and time-consuming and should only be pursued if you know you will win. That's why I've thought for a while that the first lawsuits surrounding AI art would come from stock image sites. They seem poised to lose significantly from this technology, they can clearly prove that their large corpus was used to train these models, and they have the funds to take it to court. "

Guadamuz agreed. "Everyone knows how expensive it will be," he said. "Whoever files the lawsuit will get a ruling in the lower court and then They will appeal, and then appeal again, and ultimately, this may go all the way to the Supreme Court. ”

##Original link: https://www.theverge.com/23444685/generative-ai-copyright-infringement-legal-fair-use-training-data

Translator Introduction

Zhu Xianzhong, 51CTO community editor, 51CTO expert blogger, lecturer, computer teacher at a university in Weifang, and a veteran in the freelance programming industry.​

The above is the detailed content of Copyright battle: The Sword of Damocles hangs over AI creations. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
2023年机器学习的十大概念和技术2023年机器学习的十大概念和技术Apr 04, 2023 pm 12:30 PM

机器学习是一个不断发展的学科,一直在创造新的想法和技术。本文罗列了2023年机器学习的十大概念和技术。 本文罗列了2023年机器学习的十大概念和技术。2023年机器学习的十大概念和技术是一个教计算机从数据中学习的过程,无需明确的编程。机器学习是一个不断发展的学科,一直在创造新的想法和技术。为了保持领先,数据科学家应该关注其中一些网站,以跟上最新的发展。这将有助于了解机器学习中的技术如何在实践中使用,并为自己的业务或工作领域中的可能应用提供想法。2023年机器学习的十大概念和技术:1. 深度神经网

人工智能自动获取知识和技能,实现自我完善的过程是什么人工智能自动获取知识和技能,实现自我完善的过程是什么Aug 24, 2022 am 11:57 AM

实现自我完善的过程是“机器学习”。机器学习是人工智能核心,是使计算机具有智能的根本途径;它使计算机能模拟人的学习行为,自动地通过学习来获取知识和技能,不断改善性能,实现自我完善。机器学习主要研究三方面问题:1、学习机理,人类获取知识、技能和抽象概念的天赋能力;2、学习方法,对生物学习机理进行简化的基础上,用计算的方法进行再现;3、学习系统,能够在一定程度上实现机器学习的系统。

超参数优化比较之网格搜索、随机搜索和贝叶斯优化超参数优化比较之网格搜索、随机搜索和贝叶斯优化Apr 04, 2023 pm 12:05 PM

本文将详细介绍用来提高机器学习效果的最常见的超参数优化方法。 译者 | 朱先忠​审校 | 孙淑娟​简介​通常,在尝试改进机器学习模型时,人们首先想到的解决方案是添加更多的训练数据。额外的数据通常是有帮助(在某些情况下除外)的,但生成高质量的数据可能非常昂贵。通过使用现有数据获得最佳模型性能,超参数优化可以节省我们的时间和资源。​顾名思义,超参数优化是为机器学习模型确定最佳超参数组合以满足优化函数(即,给定研究中的数据集,最大化模型的性能)的过程。换句话说,每个模型都会提供多个有关选项的调整“按钮

得益于OpenAI技术,微软必应的搜索流量超过谷歌得益于OpenAI技术,微软必应的搜索流量超过谷歌Mar 31, 2023 pm 10:38 PM

截至3月20日的数据显示,自微软2月7日推出其人工智能版本以来,必应搜索引擎的页面访问量增加了15.8%,而Alphabet旗下的谷歌搜索引擎则下降了近1%。 3月23日消息,外媒报道称,分析公司Similarweb的数据显示,在整合了OpenAI的技术后,微软旗下的必应在页面访问量方面实现了更多的增长。​​​​截至3月20日的数据显示,自微软2月7日推出其人工智能版本以来,必应搜索引擎的页面访问量增加了15.8%,而Alphabet旗下的谷歌搜索引擎则下降了近1%。这些数据是微软在与谷歌争夺生

荣耀的人工智能助手叫什么名字荣耀的人工智能助手叫什么名字Sep 06, 2022 pm 03:31 PM

荣耀的人工智能助手叫“YOYO”,也即悠悠;YOYO除了能够实现语音操控等基本功能之外,还拥有智慧视觉、智慧识屏、情景智能、智慧搜索等功能,可以在系统设置页面中的智慧助手里进行相关的设置。

人工智能在教育领域的应用主要有哪些人工智能在教育领域的应用主要有哪些Dec 14, 2020 pm 05:08 PM

人工智能在教育领域的应用主要有个性化学习、虚拟导师、教育机器人和场景式教育。人工智能在教育领域的应用目前还处于早期探索阶段,但是潜力却是巨大的。

30行Python代码就可以调用ChatGPT API总结论文的主要内容30行Python代码就可以调用ChatGPT API总结论文的主要内容Apr 04, 2023 pm 12:05 PM

阅读论文可以说是我们的日常工作之一,论文的数量太多,我们如何快速阅读归纳呢?自从ChatGPT出现以后,有很多阅读论文的服务可以使用。其实使用ChatGPT API非常简单,我们只用30行python代码就可以在本地搭建一个自己的应用。 阅读论文可以说是我们的日常工作之一,论文的数量太多,我们如何快速阅读归纳呢?自从ChatGPT出现以后,有很多阅读论文的服务可以使用。其实使用ChatGPT API非常简单,我们只用30行python代码就可以在本地搭建一个自己的应用。使用 Python 和 C

人工智能在生活中的应用有哪些人工智能在生活中的应用有哪些Jul 20, 2022 pm 04:47 PM

人工智能在生活中的应用有:1、虚拟个人助理,使用者可通过声控、文字输入的方式,来完成一些日常生活的小事;2、语音评测,利用云计算技术,将自动口语评测服务放在云端,并开放API接口供客户远程使用;3、无人汽车,主要依靠车内的以计算机系统为主的智能驾驶仪来实现无人驾驶的目标;4、天气预测,通过手机GPRS系统,定位到用户所处的位置,在利用算法,对覆盖全国的雷达图进行数据分析并预测。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.