AI and synthetic data: whose responsibility is it?
The use of artificial intelligence (AI) to generate synthetic data is becoming increasingly common in sectors such as healthcare, finance and cybersecurity. However, this technological progress raises critical questions about the legal liability associated with the creation and use of this data. In the event of a privacy violation, data bias or illegal use, who is responsible? Let's delve into the topic from a legal point of view.

What is synthetic data and why is it important?
Synthetic data is information artificially generated by AI algorithms, rather than collected directly from real sources. This method allows creating realistic datasets but devoid of personal information attributable to specific individuals, reducing the risk of privacy violations and facilitating the sharing and analysis of data in accordance with regulations. The use of synthetic data is crucial for multiple applications.They allow software to be tested and developed without compromising user privacy, improving security of computer systems. In addition, they are used for predictive analysis in the health and financial sector, reducing legal constraints related to the use of real personal data. Another field of application is the training of machine learning models, where synthetic data offer diversified and scalable datasets, helping to improve the accuracy of the algorithms.
Synthetic data: whose responsibility is it?
The absence of explicit references to synthetic data in current legislation makes it difficult to assign responsibility in the event of legal problems. The fundamental questions that emerge are:
1 - Is the AI generator responsible? If an AI system produces synthetic data that is distorted, discriminatory, or that leads to harmful decisions, does the algorithm manufacturer bear the responsibility? The European legislation on the AI Act provides specific responsibilities for suppliers of high-risk AI systems, which may include some synthetic data generators.
2- Is the user who uses the synthetic data responsible? If a company uses synthetic data to train predictive models and these produce discriminatory or incorrect results, could the company itself be held responsible? The GDPR requires companies to demonstrate compliance with the correct use of data, including synthetic data.
3- Who guarantees compliance with the GDPR? Although synthetic data are not directly attributable to individuals, this does not mean that they are automatically exempt from data protection legislation. According to the GDPR, if a synthetic data can be traced back to an individual (even indirectly), it must be treated with the same protections provided for personal data.
Best practices for responsible management of synthetic data
To reduce legal risks and ensure ethical use of synthetic data, AI companies and developers should follow some fundamental guidelines:Audits and risk assessments: verify that synthetic data has no hidden bias or can be re-identifiable.Transparency and documentation: keep detailed records on the generation and use of synthetic data.Regulatory compliance: take measures that ensure that synthetic data complies with the guidelines of the GDPR and the AI Act.Contractual responsibility: enter into agreements clear between synthetic data providers and user companies to define the limits of liability.
Conclusion
Synthetic data represents an extraordinary opportunity to innovate and protect privacy, but without clear regulation it can create significant legal risks.
Defining precise responsibilities between developers, user companies and regulators is essential to ensure a fair and compliant use of AI-based technologies.
CRCLEX supports companies and professionals in the legal management of synthetic data, helping them navigate a constantly evolving regulatory landscape. Do you need advice on the legal management of synthetic data?
Contact CRCLEX for a personalized evaluation.