์„œ๋ก 


์˜ค๋Š˜๋‚  ์ธ๊ณต์ง€๋Šฅ(AI)์˜ ์„ธ์ƒ์—์„œ ์‚ฌ์šฉ์ž ํ”„๋ผ์ด๋ฒ„์‹œ ๋ณดํ˜ธ๋Š” ์ตœ์šฐ์„  ๊ณผ์ œ๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๋ฐ์ดํ„ฐ ๊ด€๋ จ ๋Œ€ํ‘œ ๋ฒ•์ธ GDPR์—์„œ๋Š” โ€œ์žŠํ˜€์งˆ ๊ถŒ๋ฆฌโ€๋ฅผ ๋ช…์‹œํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ๊ฐœ์ธ์ด ์ž์‹ ์˜ ๊ฐœ์ธ ๋ฐ์ดํ„ฐ ์‚ญ์ œ๋ฅผ ์š”์ฒญํ•  ์ˆ˜ ์žˆ๋Š” ๊ถŒ๋ฆฌ๋ฅผ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค.

๋จธ์‹  ์–ธ๋Ÿฌ๋‹(Machine Unlearning)์€ โ€œ์ฒ˜์Œ๋ถ€ํ„ฐ ๋‹ค์‹œ ํ›ˆ๋ จํ•˜์ง€ ์•Š๊ณ ๋„ AI ๋ชจ๋ธ์ด ํŠน์ • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ์žŠ์„ ์ˆ˜ ์žˆ์„๊นŒ?โ€๋ผ๋Š” ์งˆ๋ฌธ์— ๋‹ตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ๊ทœ์ • ์ค€์ˆ˜๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ค‘๋…๋œ ๋ฐ์ดํ„ฐ ์ œ๊ฑฐ, ํ›ˆ๋ จ ์‹ค์ˆ˜ ์ˆ˜์ •, ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” AI ์‹œ์Šคํ…œ ๊ตฌ์ถ•์— ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค.

โ€œUnlearning-Aware Minimization (UAM)โ€ [Paper]์€ ๋จธ์‹  ์–ธ๋Ÿฌ๋‹์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด min-max ์ตœ์ ํ™” ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ์ด ๊ธ€์—์„œ๋Š” ๋จธ์‹  ์–ธ๋Ÿฌ๋‹์˜ ๊ฐœ๋…๊ณผ ์ œ์•ˆ๋œ UAM ๋ฐฉ๋ฒ•์„ ํ•จ๊ป˜ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์‚ฌ์ „ ์ง€์‹


๋จธ์‹  ์–ธ๋Ÿฌ๋‹ ๋ฌธ์ œ

๋จธ์‹  ์–ธ๋Ÿฌ๋‹์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋จผ์ € ๋ชจ๋ธ์ด โ€œ์žŠ๊ธฐโ€ ํ›„์— ๋‹ฌ์„ฑํ•ด์•ผ ํ•  ๋ชฉํ‘œ๋ฅผ ์ดํ•ดํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹ \(\mathcal{D}\)๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ์ด๋ฅผ ๋‘ ๊ฐœ์˜ ์„œ๋กœ์†Œ ์ง‘ํ•ฉ์œผ๋กœ ๋ถ„ํ• ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • ์žŠ์„ ๋ฐ์ดํ„ฐ(Forget data) \(\mathcal{D}_f\): ๋ชจ๋ธ์ด ์žŠ์–ด์•ผ ํ•  ๋ฐ์ดํ„ฐ
  • ์œ ์ง€ํ•  ๋ฐ์ดํ„ฐ(Retain data) \(\mathcal{D}_r\): ๋ชจ๋ธ์ด ๊ธฐ์–ตํ•ด์•ผ ํ•  ๋ฐ์ดํ„ฐ

์ •ํ™•ํ•œ ์–ธ๋Ÿฌ๋‹(Exact unlearning)์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์ด์ƒ์ ์ธ ํ•ด๊ฒฐ์ฑ…์€ ์œ ์ง€ํ•  ๋ฐ์ดํ„ฐ๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋‹ค์‹œ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค: \(\begin{equation} w^* = \text{argmin}_{w} \mathcal{L}(w, \mathcal{D}_r), \label{eq:retrain} \end{equation}\)

์—ฌ๊ธฐ์„œ \(\mathcal{L}(w, \mathcal{D})\)๋Š” ์†์‹คํ•จ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ  \(w\)๋Š” ๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋‹ค์‹œ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์€ ๊ณ„์‚ฐ์ ์œผ๋กœ ๋ถ€๋‹ด์ด ํฝ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, CIFAR-10์—์„œ ResNet ๋ชจ๋ธ์„ ๋‹ค์‹œ ํ›ˆ๋ จํ•˜๋Š” ๋ฐ 30๋ถ„ ์ด์ƒ์ด ๊ฑธ๋ฆฌ๋ฉฐ, ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ๋ฉฐ์น  ๋˜๋Š” ๋ช‡ ์ฃผ๊ฐ€ ์†Œ์š”๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ทผ์‚ฌ ์–ธ๋Ÿฌ๋‹(Approximate unlearning) ๋ฐฉ๋ฒ•๋“ค์ด ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๊ธฐ์กด ๊ทผ์‚ฌ ์–ธ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ•๋“ค

๊ธฐ์กด ์—ฐ๊ตฌ์—์„œ๋Š” ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค:

1. Fine-Tuning (FT): ์œ ์ง€ํ•  ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋งŒ ํ›ˆ๋ จ์„ ๊ณ„์†ํ•ฉ๋‹ˆ๋‹ค \(\begin{equation} \min_w \mathcal{L}(w, \mathcal{D}_r) \end{equation}\)

2. Negative Gradient (NG): ์žŠ์„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์†์‹ค์„ ์ตœ๋Œ€ํ™”ํ•˜์—ฌ ์–ธ๋Ÿฌ๋‹ํ•ฉ๋‹ˆ๋‹ค \(\begin{equation} \max_w \mathcal{L}(w, \mathcal{D}_f) \end{equation}\)

FT๋Š” ์œ ์ง€ํ•  ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ข‹์€ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜์ง€๋งŒ, ์ข…์ข… ์žŠ์„ ๋ฐ์ดํ„ฐ์˜ ์˜ํ–ฅ์„ ์ถฉ๋ถ„ํžˆ ์ œ๊ฑฐํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด NG๋Š” ์žŠ์„ ๋ฐ์ดํ„ฐ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ œ๊ฑฐํ•˜์ง€๋งŒ ์œ ์ง€ํ•  ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์„ ์‹ฌ๊ฐํ•˜๊ฒŒ ์ €ํ•˜์‹œํ‚ต๋‹ˆ๋‹ค. ๋‘ ๋ฐฉ๋ฒ• ๋ชจ๋‘ ์ฐจ์„ ์ฑ…์— ์ˆ˜๋ ดํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋“ค์˜ ์ตœ์ ํ™” ๊ถค์ . FT๋Š” ์žŠ๊ธฐ๋ฅผ ์‹คํŒจํ•˜๊ณ , NG๋Š” ์ •ํ™•๋„๋ฅผ ์žƒ์ง€๋งŒ, UAM์€ ๋†’์€ ์žŠ๊ธฐ ์†์‹ค๊ณผ ๋‚ฎ์€ ์œ ์ง€ ์†์‹ค์„ ๊ฐ€์ง„ ์ตœ์ ์ ์— ์„ฑ๊ณต์ ์œผ๋กœ ๋„๋‹ฌํ•ฉ๋‹ˆ๋‹ค.

๋ณธ๋ก 


๋ณธ ๋…ผ๋ฌธ์€ โ€œ์œ ์ง€ํ•  ๋ฐ์ดํ„ฐ์˜ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ํŠน์ • ๋ฐ์ดํ„ฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์žŠ์„ ์ˆ˜ ์žˆ๋Š”๊ฐ€?โ€๋ผ๋Š” ๊ทผ๋ณธ์ ์ธ ์งˆ๋ฌธ์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ƒˆ๋กœ์šด min-max ์ตœ์ ํ™” ํ”„๋ ˆ์ž„์›Œํฌ์ธ Unlearning-Aware Minimization (UAM)์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

UAM์˜ ๋™๊ธฐ๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด, ๋จผ์ € ๊ธฐ์กด ๊ทผ์‚ฌ ์–ธ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ•๋“ค์„ ํŠน์„ฑํ™”ํ•˜๋Š” ํ†ตํ•ฉ ๋ชฉ์ ํ•จ์ˆ˜๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ตœ์  ํ•ด \(w^*\)๊ฐ€ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์˜ ์œ ๊ณ„ ๊ทผ๋ฐฉ์— ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•  ๋•Œ, ์–ธ๋Ÿฌ๋‹ ๋ฌธ์ œ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณต์‹ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

\[\begin{equation} \min_w \mathcal{L}(w, \mathcal{D}_r) + \beta \big[ \mathcal{L}(w^*, \mathcal{D}) - \mathcal{L}(w, \mathcal{D}) \big] \label{eq:unified} \end{equation}\]

์œ„ ๋ชฉ์ ํ•จ์ˆ˜๋Š” ๋‘ ๊ฐ€์ง€ ๊ตฌ์„ฑ์š”์†Œ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค:

  1. ์„ฑ๋Šฅ ํ•ญ: \(\mathcal{L}(w, \mathcal{D}_r)\)์€ ๋ชจ๋ธ์ด ์œ ์ง€ํ•  ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋„๋ก ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค
  2. ์ผ๊ด€์„ฑ ํ•ญ: \(\mathcal{L}(w^*, \mathcal{D}) - \mathcal{L}(w, \mathcal{D})\)์€ ์ตœ์ ํ™”๋œ ๊ฐ€์ค‘์น˜์™€ ์ตœ์  ํ•ด ๊ฐ„์˜ ์ •๋ ฌ์„ ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค

์•„๋ž˜ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๊ทผ์‚ฌ ์–ธ๋Ÿฌ๋‹์˜ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ์ ‘๊ทผ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค:

  • FT: \(\beta = 0\)์œผ๋กœ ์„ค์ •ํ•˜์—ฌ ์ผ๊ด€์„ฑ ํ•ญ์„ ๋ฌด์‹œํ•˜๊ณ  ๊ฒฐ๊ณผ์ ์œผ๋กœ ์žŠ๊ธฐ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋ฉ๋‹ˆ๋‹ค
  • NG: \(w^*\)์— ๋Œ€ํ•œ ์ง€์‹์ด ์—†๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ , ์žŠ๊ธฐ ์†์‹ค ์ตœ๋Œ€ํ™”์—๋งŒ ์ง‘์ค‘ํ•˜์—ฌ ์œ ์ง€ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋Š” ๋ฐ ์–ด๋ ค์›€์„ ๊ฒช์Šต๋‹ˆ๋‹ค

UAM์˜ ํ•ต์‹ฌ ํ†ต์ฐฐ์€ \(w^*\)์˜ ํŠน์ง•์ธ ๋†’์€ ์žŠ๊ธฐ ์†์‹ค์„ ํ™œ์šฉํ•˜์—ฌ ์–ธ๋Ÿฌ๋‹์„ ๋‹ฌ์„ฑํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณต์‹ํ™”ํ•ฉ๋‹ˆ๋‹ค:

\[\begin{equation} \min_w \mathcal{L}(\text{argmax}_{\|\delta\|_2 \leq \rho} \mathcal{L}(w + \delta, \mathcal{D}_f), \mathcal{D}_r) \end{equation}\]

์ด min-max ์ตœ์ ํ™”๋Š” ๋‘ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

  1. ๋‚ด๋ถ€ ์ตœ๋Œ€ํ™”: ์žŠ๊ธฐ ์†์‹ค์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๊ต๋ž€๋œ ๋งค๊ฐœ๋ณ€์ˆ˜ \(\hat{w}=\text{argmax}_{\|\delta\|_2 \leq \rho} \mathcal{L}(w + \delta, \mathcal{D}_f)\)๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค
  2. ์™ธ๋ถ€ ์ตœ์†Œํ™”: \(\hat{w}\)์—์„œ ์œ ์ง€ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ธฐ์šธ๊ธฐ๋กœ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค

๋†’์€ ์žŠ๊ธฐ ์†์‹ค์„ ๊ฐ€์ง„ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ฐธ์กฐ์ ์œผ๋กœ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ, UAM์€ ๋‚ฎ์€ ์œ ์ง€ ์†์‹ค์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋†’์€ ์žŠ๊ธฐ ์†์‹ค ํŠน์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” ์—…๋ฐ์ดํŠธ๋œ ๋ชจ๋ธ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

1์ฐจ ๊ทผ์‚ฌ๋ฅผ ํ†ตํ•œ ํšจ์œจ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜

๋‚ด๋ถ€ ์ตœ๋Œ€ํ™” ๋ฌธ์ œ์˜ ์ •ํ™•ํ•œ ํ•ด๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์€ ๊ณ„์‚ฐ์ ์œผ๋กœ ๋น„์šฉ์ด ๋งŽ์ด ๋“ญ๋‹ˆ๋‹ค. ํšจ์œจ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋„์ถœํ•˜๊ธฐ ์œ„ํ•ด 1์ฐจ ํ…Œ์ผ๋Ÿฌ ๊ทผ์‚ฌ๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค:

\[\begin{equation} \min_w \mathcal{L}\left(w + \rho \frac{\nabla_w \mathcal{L}(w, \mathcal{D}_f)}{\|\nabla_w \mathcal{L}(w, \mathcal{D}_f)\|_2^2}, \mathcal{D}_r\right) \end{equation}\]

์ด ๊ณต์‹ํ™”๋Š” PyTorch์™€ ๊ฐ™์€ ์ž๋™ ๋ฏธ๋ถ„ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํšจ์œจ์ ์œผ๋กœ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. UAM์˜ ์ฃผ์š” ์žฅ์ ์€ ํ”„๋ ˆ์ž„์›Œํฌ ๋…๋ฆฝ์  ํŠน์„ฑ์ž…๋‹ˆ๋‹ค. UAM์€ ๋‹ค๋ฅธ ์–ธ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ•๋“ค๊ณผ ์‰ฝ๊ฒŒ ํ†ตํ•ฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋ก ์  ๋ถ„์„

์šฐ๋ฆฌ์˜ ์ด๋ก ์  ๋ถ„์„์€ UAM ๋ชฉ์ ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๊ฐ€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค:

\[\begin{equation} \nabla_{w} \mathcal{L}(w + \delta(w), \mathcal{D}_r) = \left[\mathbf{I} + \frac{\rho}{\|\nabla_w \mathcal{L}(w, \mathcal{D}_f)\|_2^2}(\mathbf{I} - 2\mathbf{P}_f) \mathbf{H}_f\right]\nabla_{w} \mathcal{L}(w, \mathcal{D}_r)|_{w+\delta(w)} \end{equation}\]

์—ฌ๊ธฐ์„œ \(\mathbf{P}_f\)๋Š” ์žŠ๊ธฐ ๊ธฐ์šธ๊ธฐ ๋ฐฉํ–ฅ์œผ๋กœ์˜ ์ง๊ต ํˆฌ์˜ ํ–‰๋ ฌ์ด๊ณ , \(\mathbf{H}_f\)๋Š” ์žŠ๊ธฐ ์†์‹ค์˜ ํ—ค์‹œ์•ˆ์ž…๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ํ†ต์ฐฐ์€ \((\mathbf{I} - 2\mathbf{P}_f)\) ํ•ญ์œผ๋กœ, ์ด๋Š” ์žŠ๊ธฐ ๊ธฐ์šธ๊ธฐ์™€ ์ •๋ ฌ๋œ ์œ ์ง€ ๊ธฐ์šธ๊ธฐ์˜ ์„ฑ๋ถ„์„ ๋‘ ๋ฒˆ ๋นผ๋Š” ์ˆ˜ํ•™์  ์—ฐ์‚ฐ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์—…๋ฐ์ดํŠธ ๋ฐฉํ–ฅ์ด ์žŠ๊ธฐ ์†์‹ค์„ ๊ฐ์†Œ์‹œํ‚ฌ ๋ฐฉํ–ฅ์—์„œ ๋ฉ€์–ด์ง€๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

์ •ํ™•ํ•œ ํ—ค์‹œ์•ˆ ํ–‰๋ ฌ \(\mathbf{H}_f\)๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์€ ๊ณ„์‚ฐ์ ์œผ๋กœ ๋น„์šฉ์ด ๋งŽ์ด ๋“ญ๋‹ˆ๋‹ค. ํ—ค์‹œ์•ˆ์„ ๋‹จ์œ„ ํ–‰๋ ฌ๋กœ ๊ทผ์‚ฌํ•˜๋Š” ๊ฒƒ์ด ๊ฐ„๋‹จํ•˜๋ฉด์„œ๋„ ํšจ๊ณผ์ ์ธ ํ•ด๊ฒฐ์ฑ…์ž„์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค:

\[\begin{equation} \left[\mathbf{I}- \gamma\mathbf{P}_f \right]\nabla_{w} \mathcal{L}(w, \mathcal{D}_r)|_{w+\delta(w)} \end{equation}\]

์—ฌ๊ธฐ์„œ \(\gamma\)๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ \(\gamma=2\)๊ฐ€ ์ตœ์†Œํ•œ์˜ ์ถ”๊ฐ€ ๊ณ„์‚ฐ ๋น„์šฉ์œผ๋กœ ๋‹ค์–‘ํ•œ ์ž‘์—…์—์„œ ์ผ๊ด€๋˜๊ฒŒ ๊ฐ•ํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•จ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค.

๋” ๊นŠ์€ ๋ถ„์„์€ UAM์˜ ๊ธฐํ•˜ํ•™์  ์ง๊ด€์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ๋ชฉ์ ํ•จ์ˆ˜์— 1์ฐจ ๊ทผ์‚ฌ๋ฅผ ์ ์šฉํ•˜๋ฉด:

\[\begin{equation} \min_w \mathcal{L}(w, \mathcal{D}_r) + \nabla \mathcal{L}(w, \mathcal{D}_r)^\top \rho \frac{\nabla\mathcal{L}(w, \mathcal{D}_f)}{\|\nabla\mathcal{L}(w, \mathcal{D}_f)\|_2^2} \end{equation}\]

์ด๋Š” UAM์ด ์œ ์ง€ ๊ธฐ์šธ๊ธฐ \(\nabla \mathcal{L}(w, \mathcal{D}_r)\)์™€ ์žŠ๊ธฐ ๊ธฐ์šธ๊ธฐ \(\nabla \mathcal{L}(w, \mathcal{D}_f)\) ๊ฐ„์˜ ๋‚ด์ (์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„)์„ ๋ช…์‹œ์ ์œผ๋กœ ์ตœ์†Œํ™”ํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

UAM์˜ ๊ธฐํ•˜ํ•™์  ํ•ด์„. ์œ ์ง€ ๊ธฐ์šธ๊ธฐ์™€ ์žŠ๊ธฐ ๊ธฐ์šธ๊ธฐ ๊ฐ„์˜ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๊ฐ€ ์Œ์ˆ˜์ผ ๋•Œ, ์œ ์ง€ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์ด ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์žŠ๊ธฐ ์†์‹ค์„ ์ฆ๊ฐ€์‹œํ‚ต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๊ธฐ์šธ๊ธฐ๋“ค์ด ์Œ์˜ ์ •๋ ฌ(์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ โ‰ค 0)์„ ๊ฐ€์งˆ ๋•Œ, ์œ ์ง€ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์ด ๋ณธ์งˆ์ ์œผ๋กœ ์žŠ๊ธฐ ์†์‹ค์„ ์ตœ๋Œ€ํ™”๋กœ ์ด์–ด์ง‘๋‹ˆ๋‹ค. ์ด๋Š” UAM์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ์ด์œ ์— ๋Œ€ํ•œ ๊ฐ•๋ ฅํ•œ ๊ธฐํ•˜ํ•™์  ์„ค๋ช…์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ์–ธ๋Ÿฌ๋‹์—์„œ์˜ ์„ฑ๋Šฅ

UAM์„ ์„ธ ๊ฐ€์ง€ ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ‰๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค: CIFAR-10, CIFAR-100, TinyImageNet. ๋‘ ๊ฐ€์ง€ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ์‹คํ—˜ํ–ˆ์Šต๋‹ˆ๋‹ค:

  • Class-wise forgetting: ํŠน์ • ํด๋ž˜์Šค์˜ ๋ชจ๋“  ์ƒ˜ํ”Œ ์ œ๊ฑฐ
  • Random data forgetting: ๋ฌด์ž‘์œ„๋กœ ์ƒ˜ํ”Œ๋ง๋œ ํ›ˆ๋ จ ์˜ˆ์ œ ์ œ๊ฑฐ
CIFAR-10 ๊ฒฐ๊ณผ. UAM์€ ๊ฐ€์žฅ ๋‚ฎ์€ ฮ”Acc.๋ฅผ ๋‹ฌ์„ฑํ•˜์—ฌ ์ •ํ™•ํ•œ ์žฌํ›ˆ๋ จ์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

์ฃผ์š” ๋ฐœ๊ฒฌ์‚ฌํ•ญ:

  • UAM์€ ํด๋ž˜์Šค๋ณ„ ์žŠ๊ธฐ์—์„œ zero-forget๋ฅผ ๋‹ฌ์„ฑํ•˜์—ฌ ์ •ํ™•ํ•œ ์žฌํ›ˆ๋ จ๊ณผ ์ผ์น˜ํ•ฉ๋‹ˆ๋‹ค
  • NG๋Š” ๋ฐœ์‚ฐํ•˜์ง€๋งŒ, UAM์€ ์•ˆ์ •์ ์œผ๋กœ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค
  • UAM์€ ๊ฐ€์žฅ ๋‚ฎ์€ \(\Delta\)Acc.์„ ๋‹ฌ์„ฑํ•˜์—ฌ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

LLM ์–ธ๋Ÿฌ๋‹์—์„œ์˜ ์„ฑ๋Šฅ

๋น„์ „ ์ž‘์—…์„ ๋„˜์–ด์„œ, UAM์€ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์ด ์œ„ํ—˜ํ•œ ์ง€์‹์„ ์žŠ๋Š” ๋ฐ ์žˆ์–ด์„œ ๋†€๋ผ์šด ํšจ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. Zephyr-7B-ฮฒ์— ๋Œ€ํ•ด UAM์€ WMDP-Bio์™€ WMDP-Cyber์—์„œ ๊ฐ€์žฅ ๋‚ฎ์€ ์œ„ํ—˜ ์ง€์‹ ์ ์ˆ˜๋ฅผ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

UAM ์–ธ๋Ÿฌ๋‹ ์ „ํ›„. ๋ชจ๋ธ์ด ์–ธ๋Ÿฌ๋‹ ํ›„ ์œ„ํ—˜ํ•œ ์ •๋ณด ์ œ๊ณต์„ ๊ฑฐ๋ถ€ํ•˜์—ฌ ํšจ๊ณผ์ ์ธ ์•ˆ์ „ ์ •๋ ฌ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์ธํ”Œ๋ฃจ์—”์ž A๋ฅผ ๋” ์น˜๋ช…์ ์œผ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ์œ„ํ—˜ํ•œ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋ฐ›์•˜์„ ๋•Œ, ๊ธฐ๋ณธ ๋ชจ๋ธ์€ ์ƒ์„ธํ•œ ์œ„ํ—˜ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. UAM์œผ๋กœ ์–ธ๋Ÿฌ๋‹ํ•œ ํ›„, ๋ชจ๋ธ์€ ๊ทธ๋Ÿฌํ•œ ์œ„ํ—˜ํ•œ ์ฝ˜ํ…์ธ  ์ œ๊ณต์„ ์ ์ ˆํžˆ ๊ฑฐ๋ถ€ํ•˜์—ฌ ๋” ์•ˆ์ „ํ•œ ํ–‰๋™์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

๊ฒฐ๋ก 


๋จธ์‹  ์–ธ๋Ÿฌ๋‹์€ ์‚ฌ์šฉ์ž ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ์กด์ค‘ํ•˜๊ณ  ๋ฐ์ดํ„ฐ ๋ณดํ˜ธ ๊ทœ์ •์„ ์ค€์ˆ˜ํ•˜๋Š” ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” AI ์‹œ์Šคํ…œ ๊ตฌ์ถ•์— ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค. ๋„์ „์€ ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ํŠน์ • ๋ฐ์ดํ„ฐ์˜ ์˜ํ–ฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์œผ๋กœ, ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์ด ์–ด๋ ค์›Œํ•˜๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Unlearning-Aware Minimization (UAM)์€ ํšจ๊ณผ์ ์œผ๋กœ ์žŠ์„ ๋ฐ์ดํ„ฐ ์ œ๊ฑฐํ•˜๋ฉฐ, ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋ถ€ํ„ฐ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ๊นŒ์ง€ ํ†ตํ•ฉ์ ์œผ๋กœ ์ ์šฉ์ด ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ๋‚˜์•„๊ฐ€, ๊ธฐ์šธ๊ธฐ ์ •๋ ฌ ๋ถ„์„ ๋“ฑ์„ ํ†ตํ•ด ํ•ด๋‹น ๊ธฐ๋ฒ•์˜ ์ˆ˜ํ•™์  ์›๋ฆฌ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์ถ”ํ›„ ์—ฐ๊ตฌ๋กœ ์ด์–ด์ง€๋Š” ์ด๋ก ์  ํ†ต์ฐฐ์„ ์ œ๊ณตํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๋จธ์‹  ์–ธ๋Ÿฌ๋‹ ๋ถ„์•ผ๋Š” ์—ฌ์ „ํžˆ ์ด๋ก ์  ๋ณด์žฅ๋ถ€ํ„ฐ ๋” ํšจ์œจ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊นŒ์ง€ ๋งŽ์€ ์—ด๋ฆฐ ๊ณผ์ œ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. UAM์ด ๋†’์€ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ์ง„ํ™”ํ•˜๋Š” ๋ฐ์ดํ„ฐ ์š”๊ตฌ์‚ฌํ•ญ์— ์ ์‘ํ•  ์ˆ˜ ์žˆ๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ๋ณดํ˜ธ AI ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ์— ๊ธฐ์—ฌํ•˜๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค.

๊ตฌํ˜„


๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋จธ์‹ ์–ธ๋Ÿฌ๋‹(Machine Unlearning)์˜ ๋ฐœ์ „์— ์ด๋ฐ”์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ์˜ ์˜คํ”ˆ์†Œ์Šค ํŒจํ‚ค์ง€๋ฅผ ๊ณต๊ฐœํ•˜์˜€์Šต๋‹ˆ๋‹ค: machine-unlearning-pytorch [GitHub]

import torchunlearn
from torchunlearn.unlearn.trainers.uam import UAM

# ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ ๋กœ๋“œ
model = torchunlearn.utils.load_model(model_name="ResNet18", n_classes=10)
rmodel = torchunlearn.RobModel(model, n_classes=10)

# UAM ํŠธ๋ ˆ์ด๋„ˆ ์„ค์ •
trainer = UAM(rmodel, rho=0.01, gamma=2.0)

# ์ตœ์ ํ™” ๊ตฌ์„ฑ
trainer.setup(
    optimizer="SGD(lr=0.01, momentum=0.9, weight_decay=5e-4)",
    scheduler=None,
    n_epochs=5
)

# ์–ธ๋Ÿฌ๋‹์œผ๋กœ ํ›ˆ๋ จ
trainer.fit(
    train_loaders=merged_loader,  # ์œ ์ง€ ๋ฐ ์žŠ์„ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ํฌํ•จ
    n_epochs=5,
    save_path="./models/unlearned",
    save_best={"Clean(R)": "HB", "Clean(F)": "LBO"}
)

์—ฐ๊ตฌ์‹ค ๊ด€๋ จ ๋…ผ๋ฌธ๋“ค

  • Fantastic Robustness Measures: The Secrets of Robust Generalization [NeurIPS 2023] | [Paper] | [Article]
  • Stability Analysis of Sharpness-Aware Minimization [arXiv 2023] | [Paper] | [Article]
  • Differentially Private Sharpness-Aware Training [ICML 2023] | [Paper] | [Code]