Home > Article > Technology peripherals > Don't you have to worry about copyright issues with AI training data? The Japanese government's stance sparked heated debate
Now that generative AI is booming, the copyright issue of the information data used to train the models behind it has always been the focus of people's attention - what is considered legal training data? Will I inadvertently infringe someone else's copyright?
In response, some foreign media reported that Japan’s government Artificial Intelligence Strategy Committee submitted a draft on May 26, stating that it would not force the data used in artificial intelligence training to comply with copyright laws. Japan’s Minister of Education, Culture, Sports, Science (equivalent to the domestic Ministry of Education) Keiko Nagaoka confirmed the news at a local meeting, saying that Japan’s laws do not protect the copyright of materials used for AI training.
Keiko Nagaoka Picture source Japan’s "Ministry of Education, Culture, Sports, Science and Technology" official website
Specifically, on April 24, at the Second Subcommittee of the Settlement Management Supervision Committee of the House of Representatives of Japan, a Japanese congressman named Takashi Kii had a direct discussion with Keiko Nagaoka.
According to the questions and answers compiled by Ji Yilong after the meeting, it can be seen that when Keiko Nagaoka talked about Japan’s legal system (copyright law) on the use of AI for information analysis, she said “
In Japan, no matter how it is used, method, whether for profit or non-profit purposes, whether for conduct other than copying, or for content obtained from illegal websites, the information may be used to analyze the work.” In this regard, Ji Yilong believes that from the perspective of rights protection, the fact of "use against the will of the copyright owner" is problematic, and it is still necessary to formulate new regulations to protect copyright owners.
Additionally, they discussed issues regarding the use of artificial intelligence chatbots (such as ChatGPT) in schools as educational guidelines. Foreign media reported that the Japanese education system is expected to adopt this technology or tool as early as March 2024. In response, Keiko Nagaoka did not give a specific time, but said that she would reply "as soon as possible".
This matter has caused extremely extensive discussions. Meta chief scientist Yann LeCun, one of the three AI giants, posted this tweet on Twitter:
Japan has become a paradise for machine learning.
However, some netizens accused him of not calling "being able to steal intellectual property rights without being affected" a "paradise". Yang Likun replied below the comment: The essence of what constitutes intellectual property "property" is defined and enforced by the government. of. It is also subject to government restrictions. The driving principle is to maximize the public good, not the rights of content owners.
In fact, netizens have launched a heated discussion around whether AI training materials should receive copyright protection. Some people agree with the Japanese minister's position and propose that a batch of image data to be trained will be processed layer by layer and eventually converted into AI models or data, codes or other electronic formats that can be understood by computers. That is to say, the data used for model training is basically highly lossy, so even in the worst case it is just a "derivative work", which is certainly fair use. Copyright infringement only occurs when a model copies copyrighted code, images, or books and distributes them in the final output.
There is an immediate objection to this: whether it is lossy or not, training data provides value. Training of a model would not be possible without a variety of people spending time producing what is ultimately used as training data.
Some netizens pointed out that it is meaningless to argue about copyright issues. Human beings always learn and evolve by studying things that already exist and have been released. Therefore, it is more important to discuss how to use, share or use more conveniently and reasonably from a regulatory level. The organization controls this "shared information."
Written by: Nandu reporter Yang Bowen
The above is the detailed content of Don't you have to worry about copyright issues with AI training data? The Japanese government's stance sparked heated debate. For more information, please follow other related articles on the PHP Chinese website!