1. Background
The Office of the Australian Information Commissioner's (OAIC) long-awaited AI guidance was published on 21 October 2024, in two parts:
- Part 1: A guide for deployers of commercially available AI products, which we summarised in our earlier article; and
- Part 2: A guide for AI developers involved in the collection and use of personal information to train generative AI systems (Developer Guidance).
In this article we highlight the key takeaways from the second publication, the Developer Guidance.
The Developer Guidance adopts the definitions of 'AI system' and 'generative AI' given in the Voluntary AI Safety Standard (Standard), and refers to 'AI model' as defined in the Safe and responsible AI in Australia: Proposals Paper for introducing mandatory guardrails for AI in high-risk settings (Proposed Mandatory Guardrails). The OAIC has confirmed that compliance with that Standard will help APP Entities develop and deploy AI systems in compliance with their privacy obligations. For further information about the Standard and Proposed Mandatory Guardrails see our separate article, Responsible use of AI: New Australian guardrails released.
The Developer Guidance encourages AI developers to consider privacy risks and mitigation strategies at every stage of the AI lifecycle, from design and training, to building and testing. Due to the large volumes of data usually required to train and finetune an AI model, the OAIC has classified this as a 'high privacy risk activity' (Developer Guidance, Introduction). The OAIC has indicated that it expects developers to properly assess the privacy risks involved in the development of the particular tool, after considering the purpose of the tool, the context in which it will operate, and the information on which it is trained. Developers will be expected to take commensurate measures to mitigate identified risks.
2. Who is a developer?
The Developer Guidance adopts the definition of 'developer' from Proposed Mandatory Guardrails. A 'developer' is any organisation that 'designs, builds, trains, adapts or combines AI models and applications'. Notably, this definition includes APP Entities that use personal information to fine tune or modify a commercially available AI system. For example, if a hospital purchased a commercially available AI-based diagnostic tool designed to enhance imaging analysis, then fine-tuned it using its own patient imaging data to improve the system's accuracy for detecting region-specific diseases, both the commercial developer and the hospital would qualify as a 'developer', and be required to follow the Developer Guidance.
This means that any organisation involved in fine tuning AI should be aware of the Developer Guidance.
3. Developing and training generative AI: The key takeaways
- Developers must take reasonable steps to ensure generative AI models are accurate. The steps required to ensure accuracy will be greater where the level of risk posed in the particular AI context is greater. Developers are obliged to use high quality datasets and undertake appropriate testing, and should use disclaimers to (1) identify particular risks; and (2) signpost the limits of any AI output.
- Any use of publicly available data containing personal information must comply with privacy laws. Not all publicly accessible data can be used to train generative AI. Developers may need to take steps such as de-identifying personal information or minimising the amount of personal information collected, and consider whether they have the appropriate consents to collect and/or use publicly available data.
- Sensitive information must be handled with particular care. Developers generally require individual consent before collecting or using sensitive information. This includes images from which sensitive information can be inferred. Where developers are unable to obtain consent, and no exception under law applies, sensitive information must be deleted from the AI training data set.
- Developers who wish to use personal information that they already hold to train AI should carefully consider whether they have the appropriate consent for this secondary use. If the developer does not have specific consent to use the personal information to train AI, the developer must be able to show this secondary use was reasonably expected by the individual, and that it is related (or for sensitive information, is directly related) to the primary purpose for which that personal information was collected (or an exception applies). Developers must provide individuals with informed opt-out procedures and sufficient time to exercise the option if the developer cannot establish they have consent to use personal information to train AI, and cannot establish such use is reasonably expected.
4. 'Privacy by design' and AI
The OAIC has urged developers to take a cautious approach when it comes to using personal information to develop or fine-tune AI, including by undertaking a comprehensive Privacy Impact Assessment (PIA) to identify the impact the AI system or model might have on individual privacy, and prepare recommendations to manage or minimise that impact. Developers are also encouraged to consider whether the proposed use of personal information by the AI system or model will be acceptable to the community. The Developer Guidance provides examples of AI-specific privacy risks that developers should consider when preparing PIAs, such as:
- misuse of the generative AI systems by malicious actors;
- re-identification from multiple data sets; and
- bias and discrimination.
The OAIC also acknowledges that the impacts of generative AI may be felt more severely by vulnerable groups of the population and children, and so additional safeguards should be considered when dealing with personal information of individuals from these groups.
5. Other points of note from the Developer Guidance
Collecting personal information to train AI
Ideally, personal information should be collected directly from the individual to whom it relates, unless it is unreasonable or impracticable to do so (APP 3.6). This general principle also applies to developers training AI tools.
APP 10 also requires developers to take reasonable steps to ensure the personal information they collect is accurate, up to date and complete. Exactly what steps are required will depend on the specific context and the intended purpose of the AI model, and how outputs will be used. For instance, use of AI in a way that will impact individual rights is a high risk use, so more stringent steps will be required. The OAIC provides the following as examples of reasonable steps a developer could take to ensure accuracy and completeness of personal information:
- ensure training data is accurate, factual and up to date;
- understand and document the impact the accuracy of the training data has on AI outputs;
- tag content as AI generated; and
- include references to source material in AI output.
Developers should only collect personal information that is reasonably necessary for their functions or activities (APP 3.2 and 3.8). In addition, data should be collected by lawful and fair means (APP 3.5).
Developers should carefully consider these obligations and the APPs more broadly before using scraped data or purchasing personal information to train an AI model from a third party. OAIC has explicitly noted that 'developers should not assume information posted publicly can be used to train models' (see APP 3.5). We recommend advice is sought before scraped data or third party data sets are used to train AI.
Training AI with information you already hold – convenient or complex?
Developers are increasingly seeking to leverage datasets they already hold to train new AI models and systems. However, the OAIC warns developers to carefully consider their privacy obligations before doing so.
- Developers should assess if their intended training dataset contains personal information, and if so, identify the types of personal information. This includes considering all elements such as metadata and annotations, and whether the dataset includes sensitive information. It is important that developers recognise that information which may not constitute personal information when viewed in isolation may become personal information when combined with other data and information. This risk is greater in the AI era given increased likelihoods of data cross-matching.
- Developers may only use personal information to train AI if:
- training the AI tool is the primary purpose for which the personal information was collected; or
- the individual to whom the information relates has given consent; or
- the individual would reasonably expect their personal information to be used for this secondary purpose and the secondary purpose is related, or directly related (in the case of sensitive information) to the primary purpose of collection.
The OAIC has highlighted that in many cases it will be difficult to establish that using personal information previously collected by a developer for a non-AI purpose to train AI would be permitted as a reasonably expected secondary purpose. Simply providing a notice or updating a privacy policy to include that an organisation may use personal information previously collected for another purpose to train AI will generally not be sufficient to change an individual's 'reasonable expectations'. In relation to secondary uses of sensitive information, the OAIC defines 'directly related' as 'closely associated with the primary purpose' for which information was collected.
The OAIC flags this may be a particularly difficult threshold to meet particularly if the generative AI model being trained is intended to be commercialised outside of a service, rather than to enhance the service provided. In such cases, the OAIC recommends developers seek new consent from individuals and provide an easily accessible clear explanation of how and what kinds of personal information will be used to train AI. Vague descriptions such as 'use for research purposes' will not be sufficient.
Developers must also provide individuals with a meaningful opportunity to opt-out of consenting to such use. This means providing sufficient time for individuals to exercise this right. Where personal information cannot be de-identified or removed from training data sets, the OAIC recommends fresh consent is sought from individuals.
Sensitive information warrants special consideration
Sensitive information (which can include photographs or videos revealing sensitive information) is a subset of personal information and is afforded a higher level of privacy protection. If no exceptions apply, developers require valid express or implied consent to collect sensitive information under the APPs. Where sensitive information is inadvertently collected and the developer is unable to obtain or establish consent for use, developers will need to destroy or delete that information from the training dataset.
Ensuring the accuracy of AI output
Even incorrect AI outputs that identify individuals will qualify as personal information. APP 10 requires organisations to take reasonable steps to ensure the quality of personal information. Generative AI models carry inherent risks of inaccuracy due to their probabilistic nature and potential to generate hallucinations. Failure to correct inaccuracies in AI output may also mean they compound over time, resulting in overall deterioration of the quality of AI output. Developers should also be cognisant of the AI model's limitations. The further removed the input problem is from its training data, the greater the risk of inaccurate output. For instance, general-use generative AI not intended to be used for healthcare purposes may be more prone to incorrect output when deployed in therapeutic settings.
6. Actions for developers
The OAIC has included a checklist of privacy considerations when developing or training an AI model within the Developer Guidance. The checklist considers the main points outlined above as well as other key privacy obligations a developer may have.
The OAIC recommends that developers conduct their own due diligence and seek assurances from third parties from whom personal information is collected. These include:
- requiring contractual terms in commercial agreements to clarify the collection, use and disclosure of any personal information complies with privacy laws. This will provide developers with comfort that information disclosed to them can lawfully be used for AI-related purposes.
- requesting information about the source of personal information. This will assist developers to verify the quality of input data sets and evaluate the accuracy of AI output.
- requesting copies of information and notices provided to individuals regarding how their personal information would be handled. This enables developers to understand the scope of individuals' consent and reasonable expectations regarding the use of personal information.
- requiring third parties to comply with the APPs even if they are not usually covered by the Privacy Act. By complying with the APPs, developers can be sure the third parties they engage are following best practice when handling personal information.
7. MinterEllison comments
The practical challenges of the OAIC's Developer Guidance
The Developer Guidance raises practical complexities for compliance in Australian organisations, particularly those developing generative AI applications via large language models (LLMs).
Most Australian organisations are not building generative AI from the ground up. Instead, they typically develop applications that integrate proprietary data with existing LLMs—using approaches such as Retrieval-Augmented Generation, fine-tuning open-source models (for instance, LLaMA or Mistral), or developing specialised solutions on these foundations.
Developers can manage and control their implementation data and fine-tuning datasets, yet they have little visibility into—and no control over—the training data used in the base models. This raises several complex questions:
- How can Australian developers ensure privacy compliance when they cannot verify the provenance of the underlying training data?
- If an open-source model was initially trained on scraped data without consent, will the organisation subsequently involved in fine-tuning with compliant data alone satisfy regulatory requirements?
- While the guidelines recommend seeking “assurances” regarding the origins of training data, such measures are often impractical for widely used open-source models.
The global nature of AI development adds further complexity. Australian organisations frequently rely on internationally developed models, datasets, and cloud infrastructure, all of which can operate across multiple jurisdictions. This creates practical hurdles for complying with Australian privacy obligations when working with technology shaped by different regulatory frameworks.
Additional challenges include:
- Implementing consent and opt-out mechanisms becomes increasingly complex when managing multiple data streams - training data, fine-tuning datasets, and retrieval systems often operate simultaneously and may contain overlapping personal information. Historical data poses particular challenge as retroactive consent may be impossible to obtain, while international datasets may operate under different consent frameworks.
- Mechanisms for tracking and honouring individual access, correction, and deletion rights across complex AI implementations is technically complex – especially when personal data might exist in training sets, fine-tuning data, and retrieval systems.
- AI models often require broad datasets to function effectively, which could cause tension with data minimisation requirements.
- The documentation burden of audit trail requirements to demonstrate compliance across the AI stack is particularly challenging when base models are black boxes.
Ultimately, the OAIC has acknowledged that compliance measures should be proportionate to risk. Given the complexities of Australia's 'taker vs maker' AI ecosystem, perhaps the appropriate approach is a risk-based approach, focusing particularly on high-risk uses - such as AI systems making decisions with significant effects on individuals, or relying on scraped or third party data sets - while implementing reasonable and practical privacy protections for lower-risk applications.
The emphasis on 'reasonable steps' suggests regulators understand the practical challenges of implementation, particularly for developers building on existing models. However, developers should document their privacy-protection efforts, risk assessments, and decision-making processes to demonstrate genuine commitment to privacy compliance, even where perfect compliance with all aspects of the guidance may be technically challenging.
Contact us to explore the most appropriate guardrails for your organisation, in relation to the latest AI Developer guidance.