Hidden risks of AI and open-source software

7 minute read 11.09.2024 Prasanth Kapilan, Kylie Diwell

AI and open-source technologies present new legal, compliance, and operational risks that organisations cannot afford to ignore.

Key takeouts

Legal ramifications of AI-generated code, which integrates open source components, have come under increasing scrutiny.

AI tools often draw upon strings of open source code that are licensed under open source software licences to create proprietary software.

Certain open source licences (particularly 'copyleft' licences) impose an array of onerous obligations on organisations that have integrated open source components into their proprietary software.

Decoding risks within AI and open source software

In the evolving landscape of software development, the integration of artificial intelligence (AI) has become increasingly prevalent, offering individuals and organisations opportunities for innovation and efficiency. The integration of AI has substantially streamlined software development processes and democratised access to programming skills. This has led to a surge in the approximate value of the market for AI coding tools, which is projected to be worth approximately $12.6 billion by 2028.

The surge in value is, in part, contributed to large tech corporations and their active engagement in the AI arms race through investing significant resources in pioneering the development of advanced AI coding tools. AI coding tools, utilising machine learning algorithms to analyse patterns within public code repositories, have the capacity to automatically generate or optimise code. These AI coding tools are leveraged internally by organisations to create proprietary software for the purposes of commercialisation within the broader software market and to advance their own technological capabilities and competitive edge. However, the adoption of AI in proprietary software development, whether through in-house development or vendor procurement, raises complex legal considerations, particularly in the context of open source software (OSS).

Organisations in Australia who are seeking to use proprietary software developed by AI tools or utilise such technologies to develop their own proprietary software should be aware of the potential legal risks.

What is open source software?

OSS represents a collaborative approach to software development, where the source code is freely available for anyone to view, modify and distribute. The concept of 'software' is integral to understanding OSS. Specifically, software refers to programs or instructions inhabiting a computer's memory for the purpose of performing particular tasks or functions. It can be found in source code, which is readable by humans, or object code, also known as 'executables', which computers interpret directly. Object code, produced through compilation or translation, instructs the computer on how to execute tasks based on the source code. In a usual commercial arrangement, the licensor provides the licensee with its proprietary object code through a licence which contains certain restrictions on its use (e.g. the licensee may not redistribute or modify the source code).

On the other hand, OSS arrangements challenge this traditional commercial arrangement. In the world of OSS, the source code is often accessible to licensees via public repositories like GitHub and licensees are encouraged to enhance and modify it.

Open source software licences

OSS licences regulate the ways in which the licensee is permitted to use, modify or distribute licensed software. Licensees are often allowed to freely use, modify and distribute the software, provided they adhere to requirements to, for example, share the source code for the developed software (called 'derivative works') on the same open-source terms. These licences are typically distinguished between two key categories – 'copyleft' and 'permissive'.

'Copyleft' licences

The concept of copyleft was formulated to uphold the belief that freely accessible software serves as a public asset. Unlike conventional proprietary models that rely on copyright law (such as the Copyright Act 1968 (Cth)) for control, copyleft allows for unrestricted usage, modification and distribution of software or other works, with the stipulation that the source code of any modified version must also be accessible to other developers under identical conditions.

There are variations in the degree of 'copyleft' within a licence (e.g. 'weak' copyleft and 'strong' copyleft licences). For instance, the family of GNU General Public Licences (such as the GNU General Public Licence v3.0) are considered 'strong' copyleft licences which require, among other aspects, the distribution of source code developed using the relevant OSS.

'Permissive' licences

Permissive licences typically impose minimal restrictions on the modification or redistribution of OSS. Such licences often allow for the commercial use of OSS without royalties to other contributors and are crafted to align with commercial proprietary software licences. Unlike certain copyleft licences, permissive licences do not mandate the sharing of source code in derivative works. They are widely regarded as low-risk and are among the most common forms of free OSS licences. For example, in contrast to the more onerous 'copyleft' GNU General Public Licences, the Apache 2.0 Licence does not require the distribution of source code developed using the relevant OSS. Instead, relevant developers must, amongst other requirements, provide copies of the licence terms, the original copyright notice, a statement of any significant changes made to the original code and the relevant notice file with attribution notes (if applicable).

The intersection between OSS and AI

AI coding tools developed by large tech companies, including GitHub Copilot, Amazon CodeWhisperer, Tabnine and OpenAI Codex, have become more prevalent within the development of software products.

These tools, which are essentially based on large language models (LLMs), use deep learning algorithms and large neural networks trained on vast troves of public source code (which are available within open source repositories) to produce full or partial lines of code. Sophisticated tools, such as Amazon CodeWhisperer, analyse billions of lines of publicly available code to co-produce outputs alongside the developer user.

The benefit of these tools lies in their ability to empower non-developer users (such as product managers or designers) to code whilst simultaneously improving the efficiency of developer users by automating repetitive tasks and enabling them to create 'cleaner code' (e.g. code that requires less 'debugging'). However, to achieve these benefits, these tools indiscriminately analyse source code, particularly source code licensed under the more restrictive 'copyleft' OSS licences.

Key implications

Within Australia, there are no reported cases of OSS licensors enforcing their rights. Comparatively, the United States has been an active ground for OSS-related lawsuits, such as the ongoing lawsuit brought against GitHub, OpenAI and Microsoft alleging that GitHub and OpenAI's tools reproduced open-source code in violation of applicable OSS licensing requirements and copyright laws.

In addition to potentially breaching its obligations under an OSS licence, organisations should be aware of the below legal implications.

Distribution of derivative works

In the context of OSS, a derivative work refers to new software that is based on, derived from or built upon existing OSS. The concept captures modifications, enhancements or adaptations of the original OSS. If an organisation has developed source code with the assistance of AI coding tools which leverage publicly available code licensed under a copyleft licence, the organisation may be required to share its newly developed source code publicly as a 'derivative work' (‘tainting’ the entire proprietary code). This is a particularly significant issue for organisations in the business of commercialising software as they may be obligated to publicly distribute their confidential proprietary source code (and lose their revenue source).

Security risks

Open-source libraries (a key 'feeding ground' for AI coding tools) are prime targets for exploitation by malicious actors due to their widespread usage and potential for unnoticed vulnerabilities. Malicious actors may exploit weaknesses in these public libraries to inject malicious code, compromise systems or gain unauthorised access to sensitive data, underscoring the critical importance of proactive security measures in open-source development. During the development phase of proprietary code, AI coding tools may integrate compromised OSS which potentially leaves an organisation susceptible to the consequences of malicious activity, such as data breaches or other material disruptions to the organisation's business. These risks are amplified from a supply chain perspective where organisations integrate or adopt third party software products that incorporate open-source components.

Broad exclusions of liability

OSS licences typically include extensive exclusions of liability for losses suffered by a licensee arising from use of the OSS. These wide-ranging exclusions pose risks to an organisation, particularly when OSS is integrated into critical business systems. If the availability or integrity of the organisation's systems were compromised due to a security incident or allegations of intellectual property infringement linked to OSS inputted by an AI coding tool arise, an organisation would likely have no recourse against the owner of the flawed or infringing code.

Conclusion

The adoption and use of emerging technologies in software requires a careful approach to achieve the right balance between maximising efficiencies and navigating risks (particularly within the legal context). With the prevalence and rapid adoption of AI coding tools, organisations must adapt to sustain their competitive edge and enhance their internal technical capabilities, while also exercising caution to mitigate potential risks associated with the use of OSS. Organisations that have developed proprietary code should ensure that they undertake adequate due diligence of its code for applicable OSS licensing requirements and potential security vulnerabilities. Organisations that are seeking to access third party proprietary code (whether through licence or assignment), should seek assurances from the third party vendor regarding OSS licensing terms, intellectual property rights and any other potential liabilities associated with security vulnerabilities.

The team at MinterEllison can assist you in understanding the legal issues and risks associated with the use of AI in software development. If your organisation is planning to use or interact with AI and you need more detailed advice, contact us.

Contact

Kylie Diwell

Partner, Education Industry Leader, Melbourne
- +61 3 8608 2019
- +61 411 163 613

Trending

Building cyber resilience: Cyber Security Bill 2024 insights

Australia's evolving cyber security landscape: Consultation launched

ASIC updates breach reporting guidance for AFS and credit licensees