Preventing disclosure of confidential data through ChatGPT

In April 2023, reports emerged that engineers from Samsung Electronics accidentally leaked internal source code. They had uploaded it to ChatGPT, presumably as part of their input prompt to the large language model (LLM). In response to this incident, Samsung took swift action, banning employees from using popular generative artificial intelligence (GenAI) tools like ChatGPT. Additionally, the company urged employees who utilised ChatGPT and similar tools on personal devices to refrain from submitting any company-related information or personal data that could potentially unveil its intellectual property.

Furthermore, a research report released in June 2023, titled “Revealing the True GenAI Data Exposure Risk” by LayerX Security, highlighted a troubling trend. It revealed that 6% of employees had pasted sensitive data, including source code, internal business information and personal identifiable information, into GenAI tools. This concerning behaviour could inadvertently result in organisations unknowingly sharing their plans, product details, and customer data with competitors and potential attackers.

While this research primarily focused on private sector employees, I was concerned about the possibility of inadvertent sharing of sensitive official secrets with GenAI tools by civil servants and government contractors.

In light of these concerns, I raised several questions in Parliament regarding the government’s use of large language models (LLMs) owned by private or foreign companies:

a) How does the government ensure that confidential data is not disclosed in the input prompts for LLMs?

b) Whether the government has signed any non-disclosure agreements (NDAs) with these companies?

c) What are the companies that the government has signed NDAs with?

d) How does the government monitor compliance with such NDAs by these companies?

In response, Mrs. Josephine Teo, the Minister for Information and Communications, provided an explanation of the government’s approach. She assured Parliament that highly sensitive applications and data remain shielded from exposure on the Internet. For instances involving LLMs and sensitive data, open-source models may be customised for use but are strictly deployed on government servers and computers.

For less sensitive data use cases, AI models may be owned and managed by commercial and private companies. The government’s contracts with these companies include clauses pertaining to data handling and security. These clauses encompass non-retention of data and restrictions on data usage for training other products or models. She did not reveal which companies the government has signed NDAs with. She said that the government has implemented a range of technical, visual, and governance measures to ensure data security and enforce compliance. The Minister emphasised the government’s commitment to continuously reassessing the adequacy of these measures as technology evolves.

Here are the original questions raised and answers on 9 January 2024 in Parliament:

REGULATIONS ON INPUT PROMPTS FOR LARGE LANGUAGE MODELS TO PREVENT DISCLOSURE OF CONFIDENTIAL DATA

Dr Tan Wu Meng asked the Minister for Communications and Information whether the Government has plans to develop in-house artificial intelligence capabilities to ensure that input prompts for large language models need not be processed by private firms not under the purview of the Government, or by cloud computing units located in foreign territories or under foreign jurisdiction or control.

Mr Gerald Giam Yean Song asked the Minister for Communications and Information (a) when using large language models owned by private or foreign companies, how does the Government ensure that confidential data is not disclosed in the input prompts; (b) whether the Government has signed any non-disclosure agreements (NDAs) with these companies; (c) what are the companies that the Government has signed NDAs with; and (d) how does the Government monitor compliance with such NDAs by these companies.

Mrs Josephine Teo: Large language models (LLMs), such as those powering ChatGPT, have the potential to enhance the delivery of public services and the productivity of public officers. We adopt a risk-managed approach for LLMs, consistent with the existing public sector framework for the handling of classified information when using technologies such as Internet-based applications and the commercial cloud.

Highly sensitive applications and data are not exposed to the Internet. Where use cases involve sensitive data, open-source models may be finetuned for use but must be deployed on Government servers and computers.

For use cases involving less sensitive data, the artificial intelligence (AI) models may be owned and managed by commercial and private companies. Our contracts with these companies are governed by service agreements which include clauses on data handling and security, such as the non-retention of data, and limitations on the use of data to train other products or models. Beyond contractual safeguards, the Government has also implemented technical measures to screen sensitive data, visual cues to remind users on data security practices, and governance measures to enforce compliance.

We continuously re-assess the adequacy of our measures as the technology evolves.

Source: Singapore Parliament Reports (Hansard)

#ChatGPT #AI #Parliament #WorkersParty #MakingYourVoteCount