Evaluating Machine Unlearning via Epistemic Uncertainty | 신뢰할 수 있는 인공지능 연구실

Introduction
Preliminary
Main Content
- Evaluation Metric Using Epistemic Uncertainty
- Efficacy Score
Experiments and Results Analysis
Conclusion

Paper: Evaluating Machine Unlearning via Epistemic Uncertainty

Authors: Alexander Becker, Thomas Liebig

URL: https://arxiv.org/pdf/2208.10836

Introduction

Machine Unlearning refers to training techniques that remove specific data from a trained model.

This paper proposes an efficient and quantitative metric for measuring the performance of machine unlearning models using the concept of Epistemic Uncertainty. To validate the metric, two hypotheses are established and verified by applying them to three machine unlearning models.

Preliminary

Retraining

Based on the concept of differential privacy, it guarantees the performance difference before and after data deletion (Certified Removal). Differential privacy is defined by the following formula.

\[e^{-\epsilon} \leq \frac{P(U(A(D), D, x) \in T)}{P(A(D \setminus \{x\}) \in T)} \leq e^{\epsilon}\]

Amnesiac Unlearning

The parameters of the pre-trained model are reverted by the amount influenced by the data to be removed at a specific epoch e and batch b.

\[U_{AU}(\theta, D_f) = \theta - \sum_{e=1}^{E} \sum_{b=1}^{B} 1\left[D_f \cap D_{e,b} \neq \emptyset\right] \Delta_{\theta_{e,b}}\]

Fisher Forgetting

Noise n following a normal distribution is added to the parameters to conceal the performance difference from the Retraining model. It is defined as the original parameters plus noise adjusted by hyperparameters.

\[U_{FF}(\theta, D_f) = \theta + \alpha^{\frac{1}{4}} F^{-\frac{1}{4}} n\]

Adversarial Attack

Among adversarial attacks, the result of MIA (Membership Inference Attack) is used to verify whether specific data has been removed.

Main Content

This paper measures and evaluates the efficacy of each machine unlearning technique using evaluation metrics based on information theory and epistemic uncertainty.

Evaluation Metric Using Epistemic Uncertainty

“Epistemic Uncertainty” refers to the remaining influence of removed data in the model.

By calculating the information content of the removed data using the Fisher Information Matrix (FIM), the epistemic uncertainty of the model regarding that data can be measured. The paper reduces the computational complexity of the information content formula based on the Marquart-Levenberg approximation and presents it as follows.

\[\iota(\theta; D) = \operatorname{tr} \left( \mathcal{I}(\theta; D) \right) = \frac{1}{|D|} \sum_{i=1}^{|\theta|} \sum_{x, y \in D} \left( \frac{\partial \log p_{\theta}(y | x)}{\partial \theta_i} \right)^2\]

This represents the total information that the model has about the dataset, indicating the degree to which the parameters are sensitive to the model’s performance. In other words, at specific parameters, the model’s total information represents the model’s uncertainty about the dataset.

Efficacy Score

To address side effects such as the Streisand effect that occur during data removal, this paper proposes the efficacy score. A lower efficacy score indicates that epistemic uncertainty about the data to be removed has increased.

\[\text{efficacy}(\theta; D) = \begin{cases} \frac{1}{\iota(\theta; D)}, & \text{if } \iota(\theta; D) > 0 \\ \infty, & \text{otherwise} \end{cases}\]

As the dataset size or model size increases, the computational cost for each data point increases. To address this issue, the following theoretical upper bound is presented.

\[\text{efficacy}(\theta; D) \leq \frac{1}{\left\| \nabla \mathcal{L}(\theta, D) \right\|_2^2}\]

Experiments and Results Analysis

This paper verifies the following two hypotheses.

Hypothesis (1): When a pre-trained model is trained with a machine unlearning technique, the efficacy score for the removed data will always decrease.
Hypothesis (2): A lower efficacy score always reduces the success rate of adversarial attacks.

Efficacy Scores of the pre-trained model and three machine unlearning models

(b) Retraining and (d) Fisher Forgetting results show that the efficacy score decreases as the data removal ratio increases, satisfying Hypothesis (1).

However, (c) Amnesiac Unlearning results show that the efficacy score increases as the data removal ratio increases, rejecting Hypothesis (1). This is because the parameter adjustment methods of machine unlearning differ. In cases (b) and (d), the parameters change to narrow the performance gap with the Retraining result, whereas in case (c), the parameters narrow the performance gap with the model before pre-training.

MIA success rates for the pre-trained model and three machine unlearning models

Results from (b), (c), and (d) show that the MIA attack success rate always decreases as the data removal ratio increases. This is used to examine the relationship between the efficacy score and the MIA success rate.

Correlation between efficacy score and MIA success rate

In (b) and (d), the MIA attack success rate decreases as the efficacy score decreases, but (c) shows the opposite result. Hypothesis (2) can also be explained by the same reason that Hypothesis (1) was rejected.

In the experiment measuring the accuracy of each machine unlearning technique using the efficacy score, the results shown in the table below were obtained. Overall, low accuracy is observed, and even in Retraining, which guarantees complete data removal, the accuracy is not high.

Accuracy comparison between the pre-trained model and each machine unlearning model

Conclusion

This paper proposes the efficacy score based on the concept of epistemic uncertainty as a metric for measuring information content, and verifies its utility by measuring three machine unlearning methods.

However, the fact that both hypotheses proposed in the paper were rejected, along with the accuracy measurement results of each machine unlearning technique shown below, raises questions about the utility of this evaluation metric. Nevertheless, the approximation of the information content calculation formula to reduce computational complexity, and the interpretation of information content from the perspective of epistemic uncertainty, will be useful for evaluating the accuracy of comprehensive unlearning techniques in the future.

Introduction

Preliminary

Retraining

Amnesiac Unlearning

Fisher Forgetting

Adversarial Attack

Main Content

Evaluation Metric Using Epistemic Uncertainty

Efficacy Score

Experiments and Results Analysis

Conclusion

Related Articles

Extracting Robust Models with Uncertain Examples

Perturbing Inputs to Prevent Model Stealing

Preventing Neural Network Weight Stealing via Network Obfuscation

Practical Black-Box Attacks against Machine Learning