Toronto-based Private AI, a developer of data privacy software offerings, yesterday launched PrivateGPT, an artificial intelligence (AI)-powered tool it said can help organizations “safely leverage OpenAI’s chatbot without compromising customer or employee privacy.”
According to a release issued by the company, PrivateGPT “redacts 50+ types of Personally Identifiable Information (PII) prompts before sending it through to ChatGPT – and then re-populates the PII within the answer for a seamless and secure user experience.
“Entities can be toggled on or off to provide ChatGPT with the context it needs to successfully answer the query, or privacy mode can be turned off entirely if no sensitive information needs to be filtered out.”
Patricia Thaine, co-founder and chief executive officer (CEO) of the company launched in 2019 by privacy and machine learning experts from the University of Toronto, said that large language models or LLMs are not excluded from data protection laws such as Canada’s proposed Privacy Protection Act (CPPA), Europe’s General Data Protection Regulation (GDPR) and others.
“The GDPR, for example, requires companies to get consent for all uses of their users’ personal data and also comply with requests to be forgotten,” said Thaine. “By sharing personal information with third-party organizations, they lose control over how that data is stored and used, putting themselves at serious risk of compliance violations.”
An example of what can go wrong took place on March 20 of this year, when a major privacy breach occurred at OpenAI and the company was forced to take ChatGPT temporarily offline due to what it described as a “bug” in the open source library.
In a statement issued four days later, the company stated that “upon deeper investigation, we also discovered that the same bug may have caused the unintentional visibility of payment-related information of 1.2 per cent of the ChatGPT Plus subscribers who were active during a specific nine-hour window.
“In the hours before we took ChatGPT offline on Monday, it was possible for some users to see another active user’s first and last name, email address, payment address, credit card type and the last four digits (only) of a credit card number, and credit card expiration date. Full credit card numbers were not exposed at any time.”
In an interview with IT World Canada, Thaine was asked if, in light of that breach, is it possible there could be many more compliance violations in the near future?
Compliance with the GDPR, she said, “means that companies, such as the ones using OpenAI, must get positive consent from individuals whose personal data is being used, for the specific purposes of use, and also requires all locations of said data to be kept track of. The purpose of tracking the data is to be able to comply with key aspects of the regulation, such as the right to be forgotten.
“It also mandates data minimization, so companies should only use the personal data they need to use.”
Transferring personal data to a third-party, added Thaine, “is risky business, so one should only do so when it’s absolutely necessary.
“In addition, LLMs trained on personal data can, in a sense, memorize that information and spew that personal data out in production. While the latter is solved by OpenAI promising not to train on your data if you tell them not to, even if that data is not being used to train the model, if it is being stored for a period of time, it is at risk of a data breach. The best practice is always to limit risk and exposure by minimizing the amount of personal data shared.”
The company, which is backed by M12, Microsoft’s venture fund, and BDC, started development of PrivateGPT in mid-March of this year, said Thaine, “after it became clear that the reaction of most organizations was just to block ChatGPT. We believe there is a better approach where companies can benefit from LLMs while preserving the privacy of their stakeholders.”
While users can access a free demo via this link, the company declined to release what a subscription to the service will cost or when it might be available. “We aren’t prepared to discuss pricing at the moment, but we are actively working with a number of large corporations in North America and Europe on commercial deals,” said Thaine.
Asked if the company intends to release additional versions compatible with other LLMs, she said that will happen. “We plan to extend the product to cover more than just ChatGPT, and to cover other OpenAI services as well as additional organizations’ products like Cohere, Anthropic, and open source LLMs like Dolly from Databricks as an example. We believe this will become a gold standard reference architecture for how to safely leverage third party AI services for internal corporate use.”