Interactive LoRA Steering: A User-Guided Framework for Efficient and Interpretable Machine Unlearning in Neural Networks

AUTHOR(S):

Vaishnavee Sanam, Heramb Patil, Dr. Minakshi Atr

TITLE

Interactive LoRA Steering: A User-Guided Framework for Efficient and Interpretable Machine Unlearning in Neural Networks

PDF

ABSTRACT

The need to selectively remove information from trained machine learning models—a process termed Machine Unlearning—is critical for maintaining data privacy and model relevance. While exact unlearning via retraining is often prohibitively expensive, especially for large models, existing approximate methods can suffer from instability, catastrophic forgetting, or lack fine-grained control. This paper introduces Interactive LoRA Steering, a framework combining the parameter efficiency of Low-Rank Adaptation (LoRA) with stable unlearning objectives like Inverted Hinge Loss (IHL). Crucially, it incorporates a human-in-the-loop mechanism allowing users to guide the unlearning direction and intensity via conceptual "Steering Vectors" acting on LoRA adapters. We demonstrate through experiments on MNIST and CIFAR- 10 that the core LoRA & IHL mechanism effectively and efficiently removes targeted information, significantly outperforming unstable baselines like Gradient Ascent while preserving utility on retained data and achieving favorable privacy metrics (MIA). We further show the conceptual feasibility of the interactive component, representing a step towards more controllable and interpretable "knowledge surgery" in AI systems. Our results show that LoRA+IHL achieves competitive utility on retained data (e.g., 92.6% Retain Acc on CIFAR-10), effectively reduces forget class accuracy (e.g., to 5.1% on CIFAR-10), and yields strong privacy metrics (e.g., 0.66 MIA Efficacy on CIFAR-10), performing comparably to retraining but completing the unlearning process significantly faster (e.g., ~10 seconds vs. ~155 seconds on CIFAR-10)

KEYWORDS

machine unlearning, LoRA, Inverted Hinge Loss, knowledge surgery

Cite this paper

Vaishnavee Sanam, Heramb Patil, Dr. Minakshi Atr. (2025) Interactive LoRA Steering: A User-Guided Framework for Efficient and Interpretable Machine Unlearning in Neural Networks. International Journal of Computers, 10, 244-250