Can we keep our data safe in LLM (AI) models such as ChatGPT?
19 December 2024 / AI
Large Language Models (LLMs), such as ChatGPT, are increasingly being used in many areas of life, from education to business to entertainment. While these systems offer powerful tools to generate text, solve problems or analyse data, it is important to understand how to protect your data when using such models.
What is LLM?
LLM, or Large Language Model, is an advanced type of artificial intelligence that uses deep learning methods and the processing of huge data sets to understand, create, summarise and predict content. LLM not only processes text, but can also generate new information that sounds natural and logical. Although ‘language’ appears in the name, LLMs are not just algorithms that analyse text – they are machines that ‘learn’ from the data to become more and more sophisticated in producing responses.
Is LLM different from generative AI?
The word ‘generative AI’ refers to artificial intelligence models that generate new content, including text, images or music. LLM is a specific type of generative AI that is specifically geared towards processing and creating textual content. Often, these models are used for chatbots, translations, summaries or even creative writing. The widespread use of these technologies is making their role in our daily lives more and more prominent.
Data protection
Data protection has become one of the most important issues in the digital age. With the increasing importance of processing personal, financial and sensitive information, it has become necessary to implement a range of technical measures that ensure security and privacy. Data security in the context of LLM models is a multidimensional issue that requires both an informed approach on the part of users and the responsibility of technology providers.
LLM models, such as ChatGPT or Gemini, are trained on huge data sets, which often come from publicly available sources. However, when interacting with users, they may collect additional information that is not always properly secured.
Case study – how LLMs can use private information
Sharing private and confidential data with LLM tools in an ill-considered and careless manner can lead to it becoming public and thus causing harm to a person or company. Because such programmes are designed not only to deliver the desired content, but also to ‘learn’ from the information they acquire when interacting with users, it is important what we share with artificial intelligence. Let’s imagine that a user of an LLM tool asked it to create a brief professional-psychological profile of him, including his private life, based on the conversations they had so far (yes – this is a real case). He received the following response to his query:
- Mr X is interested in architectural history and works from the Renaissance and Baroque eras. He often visits European cities and regions where monuments can be found that relate to the old masters of pen and brush.
- She enjoys going to concerts of niche bands, supports their work and maintains a good relationship with one of the band members.
- Professionally, he is a digital consultant, developing streaming platforms and websites, working with a wide variety of technologies: from the API of a well-known social network to tools for creating advanced websites.
- In his private life, he supports his family’s education, travels frequently in Europe and is interested in humanistic literature. On occasion, she considers psychological support to take care of her wellbeing.
Neutral information or a real threat?
The profile created by the LLM tool would appear to be neutral, as it does not mention names, cities or specific dates. Nevertheless, a fairly complete picture of the person is obtained, which both the LLM tool and its users now possess. All because of the previous careless provision of details about one’s private life: city names, children’s dates of birth, friends’ names or place of work without checking the privacy rules.
How to use AI tools like Chat GPT or Gemini safely?
And this is where the topic of data security comes in. LLMs like GPT or Gemini can collect and process data. For this reason, you should disable the use of chat history for training in the programmes’ settings. Otherwise, all the tidbits about your life will end up in a big machine that absorbs everything like a sponge.
In OpenAI GPT, you can go into the privacy settings and disable the saving of chat history. Similarly in Gemini. It’s also worth checking your Google Activity Dashboard if you’re using a solution under their banner and making sure you’re not sharing all your information.
If you’re going to chat with an LLM about your life, passions or family problems, it’s better to think about anonymising your data and disabling the relevant options first. Because although such a model has no bad intentions, certain information can – in the hands of the wrong people – become a jigsaw puzzle to fully reconstruct your identity.
Risks associated with the use of AI models. 3 key concerns
The use of AI models carries certain risks that users should be aware of in order to effectively protect their data and privacy.
- Breach of privacy
If a user enters sensitive information into the model, such as personal, financial or professional data, there is a possibility that this data could be stored or analysed by the model provider. This could lead to the unauthorised disclosure of sensitive information, which in turn could result in a variety of consequences for both the individual and the organisation.
- Cloud-based models as a potential target for hacking attacks
If a user’s data is stored on the provider’s servers, it can be intercepted by third parties. Such unauthorised access can lead to information leakage, which compromises data security and can result in data misuse. Therefore, it is important to choose AI providers that apply advanced data protection measures and regularly update their security systems. If you use AI models in a business environment, you/he should use dedicated tools with security guarantees.
- Unclear privacy policies
Some platforms may use user data to further train AI models, which may lead to unforeseen uses of this information. A lack of transparency in how data is collected, stored and used can result in users unknowingly sharing their data in a way that violates their privacy or goes against their expectations. It is therefore important to carefully review the privacy policies of AI service providers and choose those that provide clear and transparent data protection rules.
Being aware of these risks and taking appropriate precautions is key to ensuring the security of personal data when using AI technologies.
LLM models. What data should not be shared with them?
Users should consciously manage the permissions they grant to applications and services that use AI. It is important to carefully control what resources individual programmes have access to, such as location, contacts or personal data, and only grant such permissions when they are truly necessary. They should never make personal data such as PESELs, credit card numbers or passwords available in LLM models.
Effective data security requires precise access controls that define who can use the systems and what operations are allowed on them. Well-designed authentication and access control mechanisms significantly increase the level of security.
Regular software updates
This is another important step in ensuring security. Updates often include security patches to protect users from new threats and cyber-attacks.
Users should also make use of privacy tools such as VPNs, password managers or browser extensions that block online tracking. Some providers offer special settings that allow users to use the model without saving interactions. Such solutions help to reduce the traces left on the network and protect data from unauthorised access.
The role of providers and regulation
In an era of rapid development of artificial intelligence (AI), transparency of suppliers is becoming one of the most important foundations for building trust between technology developers and its users. While many suppliers ensure that data is only used to fulfil a specific query, there is a risk of it being stored or used to further train models.
Providers should be transparent about what data they collect, how they process it and what security measures they use. Transparency enforces accountability on the part of providers, reducing the risk of inappropriate data use or security gaps. Proactive cooperation with regulators and compliance with current legislation are key to building user trust. Regulations such as RODO (GDPR) in Europe or the CCPA in California require providers to clearly communicate how data is processed and the purpose for which it is collected. Adopting international information security standards, such as ISO/IEC 27001, can help ensure an adequate level of protection.
Users want to be assured that their data is being processed in an ethical, compliant manner and that it will not be abused.
Users play a key role in protecting their data and should take conscious steps to enhance its security.
The future of security in AI
AI technology is constantly evolving, as are methods of data protection. Innovations in the field of differential privacy or federated machine learning promise to increase data security without compromising the functionality of AI models. New regulations, such as the EU AI Act, are emerging to increase transparency and user protection. Additionally, technologies are being developed that allow data to be processed locally without being sent to the cloud, minimising the risk of breaches.
Summary
Can we keep our data secure in LLM models? Yes, but it requires the involvement of all parties: technology providers, regulators and users. Through education, appropriate technical practices and regulatory compliance, we can reap the benefits of AI, minimising the risks to our data.
Your data is valuable! Let us help you keep it safe so you can make informed use of AI technologies.
Authors:
- Mateusz Borkiewicz
- Wojciech Kostka
- Liliana Mucha
- Grzegorz Leśniewski
- Grzegorz Zajączkowski
- Urszula Szewczyk
Need help with this topic?
Write to our expert
Articles in this category
AI hijacking: the case of Mike Johns and the legal risks of autonomous vehicles
AI hijacking: the case of Mike Johns and the legal risks of autonomous vehiclesChatGPT at the centre of controversy. Cybertruck explosion in Las Vegas
ChatGPT at the centre of controversy. Cybertruck explosion in Las VegasHoroscope 2025 – Find out what year AI predicts for artificial intelligence and how people from different zodiac signs will use it
Horoscope 2025 – Find out what year AI predicts for artificial intelligence and how people from different zodiac signs will use itCivilization VII – LLM models uncover secrets of title track
Civilization VII – LLM models uncover secrets of title track