Machine Unlearning in Large Language Models (LLMs): Opportunities and challenges

Background on Machine Unlearning (MU) The astonishing success of AI-generated content has led to a resurgence in the popularity of machine learning (ML) technologies. However the performance of machine learning models relies heavily on a large volume of data from massive distributed clients or customers. On the other hand, given the potential for misuse, legislators worldwide have wisely introduced laws and regulations that mandate user data deletion upon request, these includes the European General Data Protection Regulation [1] California Consumer Privacy Act [2] and Canada’s proposed Consumer Privacy Protection Act [3] have stipulated that ML service providers are obligated to ensure “right to be forgotten” [4] for clients, at the end of a contract, allowing them to remove their data effects from well-trained models. Machine Unlearning is the process of making a machine learning model “forget” information it learned before. It’s like the opposite of machine learning. Instead of teaching the model to find patterns and make predictions, machine unlearning removes patterns or predictions that are no longer needed or correct. It is natural and obvious to point out that laws and regulations are not going to stop irresponsible misuse of machine learning and machine unlearning algorithms. The goal would be to obtain a model which is identical to the alternative model that would be obtained when trained on the dataset after removing the points that need to be forgotten. In this paper we are going to discuss about different types of large language models, their applications in different various fields, approaches and methods that guarantees removal of users information upon request in large language models, the motivations on why we need to implement digital forgetting, types of Digital Forgetting, the needs opportunities and solutions adapted by multiple researches to correctly implement unlearning methods in large language models, varieties of solutions adapted, and then at last a survey on machine unlearning in large language models.

The astonishing success of AI-generated content has led to a resurgence in the popularity of machine learning (ML) technologies. However the performance of machine learning models relies heavily on a large volume of data from massive distributed clients or customers. On the other hand, given the potential for misuse, legislators worldwide have wisely introduced laws and regulations that mandate user data deletion upon request, these includes the European General Data Protection Regulation [1] California Consumer Privacy Act [2] and Canada’s proposed Consumer Privacy Protection Act [3] have stipulated that ML service providers are obligated to ensure “right to be forgotten” [4] for clients, at the end of a contract, allowing them to remove their data effects from well-trained models. Machine Unlearning is the process of making a machine learning model “forget” information it learned before. It’s like the opposite of machine learning. Instead of teaching the model to find patterns and make predictions, machine unlearning removes patterns or predictions that are no longer needed or correct. It is natural and obvious to point out that laws and regulations are not going to stop irresponsible misuse of machine learning and machine unlearning algorithms. The goal would be to obtain a model which is identical to the alternative model that would be obtained when trained on the dataset after removing the points that need to be forgotten. In this paper we are going to discuss about different types of large language models, their applications in different various fields, approaches and methods that guarantees removal of users information upon request in large language models, the motivations on why we need to implement digital forgetting, types of Digital Forgetting, the needs opportunities and solutions adapted by multiple researches to correctly implement unlearning methods in large language models, varieties of solutions adapted, and then at last a survey on machine unlearning in large language models.