Researchers at the University of Technology Sydney (UTS) say they have developed a fully homomorphic encryption (FHE)-enabled deep reinforcement learning (DRL) system that can train and make decisions while data remains encrypted.
The work has been published in Nature Machine Intelligence, with UTS stating the framework is intended to allow AI systems to operate in complex environments without exposing underlying sensitive information.
The project was led by Associate Professor Hoang Dinh, Associate Professor Diep N. Nguyen and PhD student Hieu Nguyen through the Australia Vietnam Strategic Technologies Centre, and involved collaboration with Dr Kristin Lauter (Director of Research Science, North American Labs, Meta AI Research) and Associate Professor Miran Kim (Hanyang University, Korea).
Reinforcement learning is used in applications including robotics and autonomous systems, and is often described as an enabling approach for decision-making in areas that overlap with generative AI workflows. However, training typically requires access to real datasets, raising privacy and confidentiality concerns when data is processed on external platforms.
UTS said its main technical contribution was an “HE-compatible Adam optimiser”, aimed at addressing a common barrier for training models on encrypted data. The university noted that modern AI training relies on non-linear operations such as inverse square roots, which are difficult to compute under homomorphic encryption constraints.
According to the researchers, the team redesigned learning algorithms to avoid high-degree polynomial approximations of inverse square roots on encrypted data, with the aim of improving efficiency for the DRL agent.
UTS reported early results showing the encrypted DRL model performed within 10% accuracy of standard, unencrypted techniques while keeping data encrypted throughout the process.
Under the proposed approach, a user encrypts system information before sending it to an external AI agent. The agent produces decisions in encrypted form, which are then decrypted and applied locally by the user, with the process repeating until the system reaches a target performance level.
Associate Professor Diep Nguyen said the approach “could be a cornerstone” for future AI systems, including generative AI. Associate Professor Hoang Dinh also framed privacy-preserving learning as both a technical and ethical issue as AI becomes more embedded in everyday use.

