OpenAI admits to using copyright material

Specious Coda-Bishop
11 months ago

You don’t have to look too far into the news be it here at Phandroid or other outlets to see the impact AI has been having. Recently, there has been a huge concern about the ethical implications of AI and its usage. Today there has been a huge update in this as OpenAI – the group behind the popular AI chatbot ChatGPT – has confirmed that it has been trained on copyright IP.

The ongoing legal dispute with The New York Times amplifies the significance of transparency and responsible AI practices. In evidence given by OpenAI, the group has given the following statement about the process of developing the chatbot.

Because copyright today covers virtually every sort of human expressionincluding blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.

The dialogue between innovation and copyright compliance is complex. Ethical considerations become pivotal as AI technologies shape the future landscape of digital innovation. There must be guidelines in place for these things to be safe for all parties.

Unethically Sourced

The ongoing legal dispute with The New York Times underscores the tension between innovation and the protection of intellectual property rights. The House of Lords submission, while defending OpenAI’s practices, also brings to light the growing need for greater transparency in the development and deployment of AI models. Questions abound about the consequences of using copyrighted material without explicit permission.

OpenAI’s approach, particularly its lack of transparency, could potentially pose challenges in the future. The company’s reliance on copyrighted material and the legal concept of “fair use” is met with justifiable scrutiny. You only need to have a look over at YouTube to see corporations hammering down on things that are legitimate fair use. Regardless, these are still taken down. There is already a precedent set here with copyright being heavily handed on people who are using it within the legal parameters. Let alone those who are deliberately, knowingly and outside of the parameters of fair use.

What’s Next?

The ethical implications of accessing and utilizing copyrighted data become more pronounced. OpenAI’s commitment to AI safety is well-intentioned. But it lacks real teeth. The lack of clarity regarding the specific copyrighted materials used in training raises questions about accountability and responsible AI practices. OpenAI either don’t know where all of the information is coming from or can’t tell us where it’s coming from. I’m not sure which one of those options is worse.