[Purpose] At the earliest stages in AI lifecycle, training, verification and validation of machine learning and deep learning algorithm require vast datasets that usually contain personal data, which however is not obtained directly from the data subject, while very often the controller is not in a position to identify the data subjects or such identification may result to disproportionate effort. This situation raises the question on how the controller can comply with its obligation to provide information for the processing to the data subjects, especially when proving the information notice is impossible or requires a disproportionate effort. There is little to no guidance on the matter. The purpose of this paper is to address this gap by designing a clear risk-assessment methodology that can be followed by controllers when providing information to the data subjects is impossible or requires a disproportionate effort.
[Methodology] After examining the scope of the transparency principle, Article 14 and its proportionality exemption in the training and verification stage of machine learning and deep learning algorithms following a doctrinal analysis, we assess whether already existing tools and methodologies can be adapted to accommodate the GDPR requirement of carrying a balancing test, in conjunction with, or independently of a DPIA.
[Findings] Based on an interdisciplinary analysis, comprising theoretical and descriptive material from a legal and technological point of view, we propose a risk-assessment methodology as well as a series of risk-mitigating measures to ensure the protection of the data subject's rights and legitimate interests while fostering the uptake of the technology.
[Practical Implications] The proposed balancing exercise and additional measures are designed to facilitate entities training or developing AI, especially SMEs, within and outside of the EEA, that wish to ensure and showcase the data protection compliance of their AI-based solutions.