Generative AI is rapidly changing the way we work. OpenAI's ChatGPT helps you draft documents. GitHub's CoPilot can become a coding companion. StabilityAI's Stable Diffusion can give you a newfound artistic flair. Given the capability of these tools, it comes as no surprise that businesses are now seeking to take advantage of these tools to make aspects of their operations more efficient.
Some businesses are building bespoke products and workflows using existing generative AI platforms; others are creating new generative AI models that are tailored specifically to their business' needs; and many are simply starting to permit staff to use generative AI tools to expediate particular tasks.
The excitement in this space is palpable. But before adopting these tools, it is important to understand emerging copyright issues. Here, we help you navigate the copyright challenges that may arise with generative AI, by explaining:
- The copyright basics of generative AI;
- The risks involved;
- Some practices to mitigate risks; and
- Current developments.
The copyright basics of generative AI
First, it's important to understand what copyright and moral rights actually are.
Copyright protects the expression of ideas that have been reduced to some tangible form (though not the ideas themselves). If copyright subsists in a work or other protected material, then the owner of that copyright has the exclusive right to do certain things with that work or material, such as the right to copy it, electronically transmit it, and make it available online.
'Moral rights' are rights belonging to the individual (human) author of the copyright material, and cannot be assigned. Moral rights include:
- the right to not have one's copyright material used in a way that results in a material distortion of, or a material alteration to, the work that is prejudicial to one's reputation;
- the right to be attributed for one's material; and
- the right to not have one's material falsely attributed.
Copyright and moral rights are relevant in this context because generative AI platforms have largely been 'trained' on large amounts of material scraped from various sources online. Much of this material is protected by copyright, and is likely to have been used without permission from the relevant copyright owner.
It is also conceivable that a copyright owner's exclusive rights may be infringed, or an author's moral rights may be breached, when an end user prompts a generative AI model to produce an output and then goes on to use that generated material in some way.
With these scenarios in mind, we'll consider the risks you should be across before diving in with generative AI.
The risks involved in using generative AI
Legal action from copyright owners
There is a risk that a copyright owner may seek to bring legal proceedings in instances where they consider that their material has been appropriated by a generative AI platform. These might include as follows:
- Infringement in relation to training data: A business may create a generative AI model by using various online sources as the model's training data, but fail to obtain the necessary permissions from the copyright owners of that online material. This issue is currently being litigated out in a number of US class actions and other cases, but the common argument is that, in the process of copying and collecting data from numerous online sources without permission, generative AI companies have infringed the owners’ copyright in that material. While there are some defences in Australia, these are much more restrictive than the US 'fair use' defence, and are more difficult to establish before a court. We discuss some of these cases below.
- Infringement in relation to output: In this scenario, an end user prompts an AI model to generate an output, and this output contains a substantial part (that is, an important, recognisable or essential feature) of the owner's copyright material. The end user then posts the output to the internet. This may still constitute copyright infringement, even though the end user (or their employer) was not aware at the time that the output reproduced a substantial part of someone else's copyright work. Again, there may be defences available in Australia (such as parody or research), but these are, in general, narrowly construed.
- Moral rights infringement: In either of the scenarios above, a moral rights issue could arise where the individual author considers that their work was altered or used in such a way that is prejudicial to their reputation; if they are not given credit for it; or someone else is falsely given credit for it.
- Other actions: Even though, given the complexities of copyright and moral rights, an infringement action may not be viable, creators might attempt to commence proceedings on other bases, such as under the Australian Consumer Law for misrepresenting that the AI generated work is an original work of the organisation that published it, or that it was created with the approval of, or has some association with, the artist whose work it reproduces or evokes.
In each of these scenarios, the claimants may seek to be compensated in the form of damages or an account of profits, or may seek an injunction in the event that damages are deemed insufficient. In the case of a moral rights claim, the claimant may also seek some correction to the attribution. In any case, any legal action may also have significant public relations implications for an affected business.
Lack of protection for the output
An output produced by a generative AI model (whether an audio file, artistic work, or a paragraph of text) may not be afforded copyright protection. This is because in most jurisdictions, including Australia, copyright will only subsist in material if it was created by a human author. In the context of generative AI, this inquiry is not straightforward, and whether the output receives copyright protection will depend on the extent of human involvement in the production process.
The US Copyright Office (USCO) has recently provided some clarity on exactly what the extent of 'human involvement' should be, in order for copyright to subsist in the material. We cover this more detail in part four below, but the USCO has placed the 'human authorship' bar relatively high. If Australian courts and relevant regulators follow this approach, this will may adversely impact the economic value that businesses derive from AI-generated content, particularly because the absence of clear copyright protection would make it difficult for businesses to seek legal recourse against a third party's appropriation of their content, including where it has been published or otherwise disseminated.
In light of the above and to mitigate these risks, it is worthwhile considering following best practice principles:
Practices to help mitigate the copyright risks
Know your training data
As a starting point, you should understand the source(s) of your generative AI model's training data and the extent to which that data is being reproduced.
This means ensuring appropriate licences or permissions are in place with respect to the data copied and pooled to train your model. If you are engaging an existing AI company to create a new product, this might mean conducting due diligence on the company's sources, and ensuring that your contract with that company gives you appropriate licences and other rights and contains appropriate non-infringement warranties and indemnities. If you are creating your own model, this might entail obtaining copyright permissions from third parties who own or have rights in the copyright material being used. Those licences should cover all proposed uses of the material, including any substantial reproductions of the material in generative AI outputs.
You should keep a record of the origins of this data, to ensure it is traceable in the event there's a need to verify its origins.
Contracts and ownership
Whether you are building your own model or collaborating with an existing AI company to create a new product, you should ensure:
- that there are contracts in place with any third parties you engage in the process (including contractor agreements), and that these contracts contain terms that make clear who owns the copyright in the model and its outputs; and
- while Australian copyright law generally recognises that works created by employees pursuant to the terms of their employment belong to the employer, that the position in any employment contract as to ownership of copyright in relation to the creation and use of AI models is clear.
Be transparent and attribute
When a business publishes content or makes it available online, if the content is the output of an AI model, it is best to be upfront to consumers, stakeholders and others about this. This is important from a credibility perspective, and also avoids any unnecessary misrepresentation to consumers, particularly in relation to written, audio and visual content, who may assume that content emanates from humans unless told otherwise.
If you are building your own AI model using content that is licensed for that purpose, consider whether a feature could be built into the technology that enables outputs to appropriately attribute any copyright works that were used, and in what circumstances. For example, Getty Images’ CEO has indicated that his company has been looking into how its photographers could be appropriately attributed, such as at a “pixel level" or through some other mechanism.
The latest developments – copyright and generative AI
If you are planning to integrate generative AI into your business, it is important to keep abreast of the latest copyright developments. This is an area of law that will be tested in the next few years, as copyright owners react and litigate and governments seek to deal with these issues through law reform. Regulatory changes may affect what material can and cannot be used in relation to AI, or how it can be used, which in turn, could affect the utility of the software and its cost, as well as the value of copyright assets used in the business.
The following is an update on some of the recent copyright developments in relation to AI.
A number of high-profile companies have recently pushed for AI companies to obtain a licence to crawl their websites and use their data for model training purposes:
- Reddit has announced it plans to restrict third parties from accessing its user-generated data, unless they obtain a licence to do so.
- Disney, the New York Times, CNN and others have moved to block third parties from accessing and crawling their content for the development of generative AI models without prior written consent.
- Google and Universal Music Group are reportedly negotiating the terms of a licence to use established artists' melodies and voices as part of songs generated by its AI products.
Getty Images has just recently announced they will release their own 'commercially safe' generative AI tool, trained exclusively on Getty Images' creative library (to which it holds the necessary copyright permissions).
Adobe and Canva have commenced integrating their own AI-powered content-producing tools into their platforms, and have introduced payment and bonus schemes for those who opt-in to contribute their work to the data used to train the AI models.
- The Attorney-General's department released an issues paper in December 2022 as part of its copyright enforcement review. The issues paper sought written submissions from those dealing with copyright infringement and enforcement issues in practice, to evaluate the effectiveness of existing copyright enforcement mechanisms. Those submissions have been received and the government is now focussing on developing options for legislative reform.
- The Department held a ’Ministerial Roundtable on Copyright’ in September 2023, to consider the implications of AI for copyright law. One outcome of this roundtable was that the Department should establish an ’AI and Copyright Expert Reference Group’ consisting of various industry representatives to provide an ongoing forum to inform the Government's consideration of copyright issues as they relate to AI.
At an international level:
- In the UK, as part of the government's 'Pro-innovation and Regulation of Technologies review' the Chief Scientific Advisor recommended that the government ’announce a clear policy position on the relationship between intellectual property law and generative AI to provide confidence to innovators and investors’, noting that ’[t]here is an urgent need to prioritise practical solutions to the barriers faced by AI firms in accessing copyright and database materials’. The UK Government is now working with stakeholders on a code of practice on copyright and AI, which aims to make licences for data mining more readily available.
- In the US, the Senate Judiciary Subcommittee on Intellectual Property has conducted a number of hearings on the subject of AI and copyright issues, with regulatory oversight in this space being considered by Congress. Separately, the US Copyright Office (for which Australia does not have an equivalent, given the differences between our legal systems) produced some guidance earlier this year on how it intends to deal with applications seeking copyright registration for works containing material generated by AI. The guidance indicated that there must be human authorship, and that a simple 'prompt' into an AI system to produce a complex result will not suffice to meet the elements of authorship. More on this below.
- the World Intellectual Property Organization (WIPO) held its eighth 'WIPO Conversation on IP and AI' in September 2023 to help provide a map for navigating the challenges to the copyright system raised by generative AI.
Copyright infringement in the Courts
A number of copyright infringement proceedings have been launched overseas, and include various class actions. The majority of these proceedings involve (among other things) claims of copyright infringement, false attribution or a violation of the United States' Digital Millennium Copyright Act (DMCA) in the context of generative AI. The actions include:
- Stability AI: in February of this year, Getty Images sued the creators of artistic tool Stable Diffusion in both the UK and the US, alleging copyright infringement. In the US, court documents show Getty Images is alleging that up to 12 million photos have been copied and used to train the Stable Diffusion model.
- Alphabet: a class action was brought in July of this year against Google and its owner, Alphabet Inc, in relation to the alleged infringement of copyright materials existing online (including text, images, music and video), and associated violations of the DMCA, in the process of training Google’s Bard generative AI model.
- Meta and OpenAI: on 7 July 2023, Meta and OpenAI were sued by a number of book authors who allege that the defendants’ use of their works to train their large language models amounts to copyright infringement and a violation of the DMCA.
- OpenAI: OpenAI was sued on 28 June2023 by a class of book authors who claim they did not consent to OpenAI’s use of their copyright books as training material for ChatGPT. The plaintiffs allege that those works were unlawfully copied, ingested and used to train ChatGPT’s language models, and that every time ChatGPT assembles a text output, it relies on the material in its training dataset.
- StabilityAI and MidJourney: StabilityAI and MidJourney were sued by a class of artists and stakeholders claiming (among other things) copyright infringement and DMCA violations.
- OpenAI, GitHub and Microsoft: GitHub, its parent company Microsoft, and OpenAI have been sued by (unidentified) software developers in relation to their creation of OpenAI’s ’Codex’ machine learning model, and Github’s ’CoPilot’ programming assistant. Essentially, the developers claim that their copyright has been infringed by the tech giants by reproduction of their code without consent.
- OpenAI: in early September, another group of US authors sued OpenAI, claiming that the company infringed their copyright by copying their literary works in the process of training ChatGPT.
The US Copyright Office has declined the registration of a number of applications relating to works generated with the assistance of AI. For example:
- In June 2023, Dr Stephen Thaler lodged a complaint in the US District Court in Washington DC, in respect to the USCO’s previous decision to deny copyright registration of his visual piece 'A Recent Entrance to Paradise'. The USCO Office, with whom the Court agreed, found the work in question was created autonomously and therefore lacked the human authorship necessary to support a copyright claim.
- In September 2023, the USCO declined an application made by Jason Allen for copyright protection of a science-fiction themed artwork, which he produced using MidJourney. Mr Allen entered over 600 text prompts into MidJourney before selecting one panel out of four potential images (after hundreds were already generated). He then used Adobe Photoshop to 'beautify' the image and then upscaled the image using Gigapixel AI. Notwithstanding this, the USCO determined that the work contained more than a 'de minimis' amount of content generated by AI.
There is much happening in the generative AI space, and it is important to keep abreast of developments in order to best position your business going forward.
Generative AI offers businesses the potential of enhancing creativity and operational efficiency. If your organisation is planning to use or interact with generative AI in some way and you need more detailed advice with respect copyright, contact us.