์„œ๋ก 


์ธ๊ณต์ง€๋Šฅ(Artificial Intelligence) ๋ชจ๋ธ์ด ํ˜„์‹ค์— ๋„์ž…๋  ๋•Œ์—๋Š” โ€˜๋Œ€์ฒด๋กœโ€™ ์ž˜ ์ž‘๋™ํ•˜๋Š” ๊ฒƒ์€ ์˜คํžˆ๋ ค ์ตœ์†Œ ์กฐ๊ฑด์ด ๋ฉ๋‹ˆ๋‹ค. ์˜คํžˆ๋ ค, ์šฐ๋ฆฌ๊ฐ€ ํ•ด๋‹น ๋ชจ๋ธ์„ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š”์ง€๊ฐ€ ์ค‘์š”ํ•ด์ง‘๋‹ˆ๋‹ค.

์ ๋Œ€์  ๊ฐ•๊ฑด์„ฑ(Adversarial Robustness)์ด๋ž€ โ€œ์•…์˜์ ์ธ ๊ณต๊ฒฉ์ž๊ฐ€ ์ž…๋ ฅ์— ํŠน์ • ๋…ธ์ด์ฆˆ(Noise)๋ฅผ ์ฃผ์—ˆ์„ ๋•Œ์—๋„ ๋ชจ๋ธ์ด ์ž˜ ์ž‘๋™ํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?โ€๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ ๋”ฅ๋Ÿฌ๋‹(Deep Learning)์„ ํฌํ•จํ•œ ์ธ๊ณต์ง€๋Šฅ ๊ธฐ์ˆ ์ด ์ง๋ฉดํ•˜๊ณ  ์žˆ๋Š” ๋งŽ์€ ๊ฒฐ์ ๋“ค ์ค‘ ๊ฐ€์žฅ ํฐ ๊ฒฐ์ ์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

โ€œFantastic Robustness Measures: The Secrets of Robust Generalizationโ€ [Paper, Repo] ์€ ์ธ๊ณต์ง€๋Šฅ ์ตœ์šฐ์ˆ˜ ํ•™ํšŒ์ธ NeurIPS 2023์—์„œ ๋ฐœํ‘œ๋œ ๋ณธ ์—ฐ๊ตฌ์‹ค์˜ ๋…ผ๋ฌธ์ด๋ฉฐ, ๋ณธ ๊ธ€์—์„œ๋Š” ๊ธฐ๋ณธ์ ์ธ ์ˆ˜ํ•™๊ณผ ๊ฐ„๋‹จํ•œ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ์ ๋Œ€์  ๊ฐ•๊ฑด์„ฑ์˜ ๊ฐœ๋…์„ ์•Œ์•„๋ณด๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์ „ ์ง€์‹


์ ๋Œ€์  ์˜ˆ์ œ์™€ ์ ๋Œ€์  ๊ณต๊ฒฉ

Source: https://adversarial-ml-tutorial.org/introduction/ [NeurIPS 2018 tutorial, โ€œAdversarial Robustness: Theory and Practiceโ€]

์ ๋Œ€์  ๊ฐ•๊ฑด์„ฑ์„ ๊ฐ€์žฅ ์ž˜ ์ดํ•ดํ•˜๋Š” ๋ฐฉ๋ฒ•์€ โ€œ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ๋งŒ๋“ค์–ด๋ณด๋Š” ๊ฒƒโ€์ž…๋‹ˆ๋‹ค. ์ ๋Œ€์  ์˜ˆ์ œ(Adversarial Example)์ด๋ž€ ๊ณต๊ฒฉ์ž๊ฐ€ ์ •์ƒ์ ์ธ ์ž…๋ ฅ(Benign Input)์— ์•…์˜์ ์ธ ๋…ธ์ด์ฆˆ(Adversarial Noise)๋ฅผ ์‚ฝ์ž…ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๋•Œ ์‚ฌ์šฉ๋˜๋Š” ๋…ธ์ด์ฆˆ๋Š” ์„ญ๋™(Perturbation)์ด๋ผ๊ณ ๋„ ๋ถˆ๋ ค์ง‘๋‹ˆ๋‹ค.

์šฐ์„ , PyTorch ๋‚ด์˜ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ResNet50 ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ •์ƒ์ ์ธ(Benign) ๋ผ์ง€ ์‚ฌ์ง„์„ ๋ถ„๋ฅ˜ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

'Show Pig' licensed under CC BY 2.0

(1) ์šฐ์„  ์‚ฌ์ง„์„ ์ฝ์–ด๋“ค์ด๊ณ  224x224๋กœ ์‚ฌ์ด์ฆˆ๋ฅผ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

from PIL import Image
from torchvision import transforms

# read the image, resize to 224 and convert to PyTorch Tensor
pig_img = Image.open("pig.jpg")
preprocess = transforms.Compose([
   transforms.Resize(224),
   transforms.ToTensor(),
])
pig_tensor = preprocess(pig_img)[None,:,:,:]

# plot image (note that numpy using HWC whereas Pytorch user CHW, so we need to convert)
plt.imshow(pig_tensor[0].numpy().transpose(1,2,0))
'Show Pig' licensed under CC BY 2.0

(2) ํฌ๊ธฐ๊ฐ€ ์กฐ์ ˆ๋œ ์ด๋ฏธ์ง€์— ์ •๊ทœํ™”(Normalization)์„ ๊ฑฐ์นœ ํ›„, ํ•™์Šต๋œ ResNet50 ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€์„œ ์‚ฌ์ง„์„ ๋ถ„๋ฅ˜ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

import torch
import torch.nn as nn
from torchvision.models import resnet50

# simple Module to normalize an image
class Normalize(nn.Module):
    def __init__(self, mean, std):
        super(Normalize, self).__init__()
        self.mean = torch.Tensor(mean)
        self.std = torch.Tensor(std)
    def forward(self, x):
        return (x - self.mean.type_as(x)[None,:,None,None]) / self.std.type_as(x)[None,:,None,None]

# values are standard normalization for ImageNet images, 
# from https://github.com/pytorch/examples/blob/master/imagenet/main.py
norm = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

# load pre-trained ResNet50, and put into evaluation mode (necessary to e.g. turn off batchnorm)
model = resnet50(pretrained=True)
model.eval()

# interpret the prediction
pred = model(norm(pig_tensor))

import json
with open("imagenet_class_index.json") as f:
    imagenet_classes = {int(i):x[1] for i,x in json.load(f).items()}
print(imagenet_classes[pred.max(dim=1)[1].item()])
hog

์œ„ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด, ๋ชจ๋ธ์ด ํ•ด๋‹น ์‚ฌ์ง„์€ ๋ผ์ง€(โ€œhogโ€)์ž„์„ ์ •ํ™•ํžˆ ๋งž์ถ˜ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ์ •๋‹ต์„ ์ž˜ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ๋Š” ์ด์œ ๋Š”, ๋ชจ๋ธ์˜ ์•„๋ž˜์™€ ๊ฐ™์€ ํ•™์Šต ๋ชฉํ‘œ๋ฅผ ์ž˜ ๋‹ฌ์„ฑํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

\begin{equation} \label{eq:min} \min_\theta \ell(h_\theta(x), y) \end{equation}

์ด ๋•Œ, \(h\)๋Š” ๋ชจ๋ธ์„ ์˜๋ฏธํ•˜๋ฉฐ, \(\theta\)๋Š” ํ•™์Šต ๋Œ€์ƒ์ด ๋˜๋Š” ๋ชจ๋ธ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜(parameter) ์˜๋ฏธ ํ•ฉ๋‹ˆ๋‹ค. \(h_\theta(x)\)์™€ \(y\) ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ์ •์˜ํ•˜๋Š” ์†์‹คํ•จ์ˆ˜(loss function) \(\ell\)๋ฅผ ์ตœ์†Œํ™”ํ•˜์—ฌ, ์šฐ๋ฆฌ๋Š” ๋ชจ๋ธ์ด ํŠน์ • ์ด๋ฏธ์ง€ \(x\)์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ๊ฐ’์ธ \(h_\theta(x)\)์™€ ์ •๋‹ต์ธ \(y\)๊ณผ ์œ ์‚ฌํ•œ ์˜ˆ์ธก์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค.

์ ๋Œ€์  ์˜ˆ์ œ๋Š” โ€œ๋ชจ๋ธ์„ ์†์ด๊ธฐโ€ ์œ„ํ•ด ๊ณ ์•ˆ๋œ ๊ฐœ๋…์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ ๋Œ€์  ์˜ˆ์ œ๋Š” ์œ„์˜ ํ•™์Šต ๋ชฉํ‘œ๋ฅผ ์ €ํ•ดํ•˜๊ธฐ ์œ„ํ•ด ์ตœ์†Œํ™”ํ–ˆ๋˜ ์†์‹คํ•จ์ˆ˜๋ฅผ ์—ญ์œผ๋กœ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๋ฐ์— ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค.

\begin{equation} \label{eq:max} \max_{\hat{x}} \ell(h_\theta(\hat{x}), y) \end{equation}

์œ„ ์‹์€ \(\ell(h_\theta(\hat{x}), y)\)์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€์ธ \(\hat{x}\)๋ฅผ ์ฐพ๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๊ณผ์ •์„ ๊ฑฐ์ณ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€ ํ˜น์€ ์˜ˆ์ œ๋ฅผ ์ ๋Œ€์  ์˜ˆ์ œ(adversarial example)๋ผ๊ณ  ๋ถ€๋ฅด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๋‚˜์•„๊ฐ€, ์•…์˜์ ์ธ ์‚ฌ์šฉ์ž๋Š” ์‚ฌ๋žŒ์ด ๋ณด๊ธฐ์—๋Š” ๋ผ์ง€์ด์ง€๋งŒ, ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ ๋ผ์ง€๊ฐ€ ์•„๋‹ˆ๋ผ๊ณ  ํ•˜๋Š” ์˜ˆ์ œ๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ ๋Œ€์  ์˜ˆ์ œ \(\hat{x}=x+\delta\)๋ฅผ ๋งŒ๋“ค ๋•Œ ๋”ํ•ด์ง€๋Š” ๋…ธ์ด์ฆˆ \(\delta\)๋Š” ์‚ฌ๋žŒ์ด ๋ˆˆ์น˜์ฑ„์ง€ ๋ชปํ•˜๋Š” ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง€๋„๋ก ์ œํ•œ๋ฉ๋‹ˆ๋‹ค.

\begin{equation} \label{eq:max2} \max_{\delta\in\Delta} \ell(h_\theta(x+\delta), y) \end{equation}

๋ณธ ์กฐ๊ฑด ํ•˜์— ๊ตฌํ•ด์ง„ ๋…ธ์ด์ฆˆ \(\delta\)๋ฅผ ์ ๋Œ€์  ์„ญ๋™(adversarial perturbation) ํ˜น์€ ์ ๋Œ€์  ๋…ธ์ด์ฆˆ(adversarial noise)๋ผ๊ณ  ๋ถ€๋ฅด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์œ„๋ฅผ Pytorch๋กœ ๊ตฌํ˜„ํ•œ๋‹ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

import torch.optim as optim
epsilon = 2./255

delta = torch.zeros_like(pig_tensor, requires_grad=True)
opt = optim.SGD([delta], lr=1e-1)

for t in range(30):
    pred = model(norm(pig_tensor + delta))   # ์„ญ๋™(๋…ธ์ด์ฆˆ) ์ถ”๊ฐ€ ํ›„ ์˜ˆ์ธก
    loss = -nn.CrossEntropyLoss()(pred, torch.LongTensor([341]))  # ์†์‹ค๊ฐ’ ๊ณ„์‚ฐ
    if t % 5 == 0:
        print(t, loss.item())
    
    opt.zero_grad()
    loss.backward()  # ์†์‹คํ•จ์ˆ˜ ์ตœ๋Œ€ํ™” (9๋ฒˆ์˜ ์†์‹ค๊ฐ’์— -๊ฐ€ ๊ณฑํ•ด์กŒ์œผ๋ฏ€๋กœ)
    opt.step()
    delta.data.clamp_(-epsilon, epsilon)  # ํฌ๊ธฐ ์ œํ•œ
    
print("True class probability:", nn.Softmax(dim=1)(pred)[0,341].item())
0 -0.0038814544677734375
5 -0.00693511962890625
10 -0.015821456909179688
15 -0.08086681365966797
20 -12.229072570800781
25 -14.300384521484375
True class probability: 1.4027455108589493e-06
'Show Pig' licensed under CC BY 2.0

์œ„ ๋ผ์ง€ ์ด๋ฏธ์ง€๋Š” ์‚ฌ๋žŒ์˜ ๋ˆˆ์—๋Š” ํ‹€๋ฆผ์—†์ด ๋ผ์ง€์ด์ง€๋งŒ, ๋ชจ๋ธ์˜ ๋ˆˆ์—๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด 99%์˜ ํ™•๋ฅ ๋กœ ์›œ๋ฑƒ(wombat)์ด๋ผ๋Š” ๋‹ค๋ฅธ ๋™๋ฌผ๋กœ ๋ถ„๋ฅ˜๋ฉ๋‹ˆ๋‹ค.

Predicted class:  wombat
Predicted probability: 0.9997960925102234

์ด๋ฅผ ์‘์šฉํ•˜๋ฉด, ๋ชจ๋ธ์ด ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๋‹ต์„ ๋‚ด๋†“๋„๋กํ•˜๋Š” ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ๊ตฌํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

delta = torch.zeros_like(pig_tensor, requires_grad=True)
opt = optim.SGD([delta], lr=5e-3)

for t in range(100):
    pred = model(norm(pig_tensor + delta))
    loss = (-nn.CrossEntropyLoss()(pred, torch.LongTensor([341])) + 
            nn.CrossEntropyLoss()(pred, torch.LongTensor([404])))
    if t % 10 == 0:
        print(t, loss.item())
    
    opt.zero_grad()
    loss.backward()
    opt.step()
    delta.data.clamp_(-epsilon, epsilon)
0 24.00604820251465
10 -0.1628284454345703
20 -8.026773452758789
30 -15.677117347717285
40 -20.60370635986328
50 -24.99606704711914
60 -31.009849548339844
70 -34.80946350097656
80 -37.928680419921875
90 -40.32395553588867
max_class = pred.max(dim=1)[1].item()
print("Predicted class: ", imagenet_classes[max_class])
print("Predicted probability:", nn.Softmax(dim=1)(pred)[0,max_class].item())
Predicted class:  airliner
Predicted probability: 0.9679961204528809
์—ฌ๊ฐ๊ธฐ๋กœ ๋ถ„๋ฅ˜๋˜๋„๋ก ๋งŒ๋“ค์–ด์ง„ ์ ๋Œ€์  ์˜ˆ์ œ์™€ ์ ๋Œ€์  ์„ญ๋™

์ด ์™ธ์—๋„ ๋‹ค์–‘ํ•œ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ์ƒ์„ฑํ•ด๋‚ด๋Š” ๋‹ค์ˆ˜์˜ ์ ๋Œ€์  ๊ณต๊ฒฉ ๋ฐฉ๋ฒ•์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ์‚ฌํ•ญ์€ torchattacks๋ฅผ ์ฐธ๊ณ ๋ฐ”๋ž๋‹ˆ๋‹ค.

์ ๋Œ€์  ๊ฐ•๊ฑด์„ฑ๊ณผ ์ ๋Œ€์  ๋ฐฉ์–ด

2003๋…„์— ์ ๋Œ€์  ์˜ˆ์ œ์˜ ์กด์žฌ๊ฐ€ ๋ฐœ๊ฒฌ๋œ ์ดํ›„, ์„ ํ–‰ ๋…ผ๋ฌธ๋“ค์€ ๋ชจ๋ธ์ด ์ ๋Œ€์  ์˜ˆ์ œ์— ๋Œ€ํ•ด์„œ๋„ ์ •ํ™•ํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ๊ณ ์•ˆํ•ด์™”์Šต๋‹ˆ๋‹ค. ์ ๋Œ€์  ๊ฐ•๊ฑด์„ฑ(Adversarial Robustness)์€ โ€˜๋ชจ๋ธ์ด ์ ๋Œ€์  ๊ณต๊ฒฉ์—๋„ ์–ผ๋งˆ๋‚˜ ์ž˜ ๋ฒ„ํ‹ธ ์ˆ˜ ์žˆ๋Š”์ง€โ€™๋ฅผ ์ˆ˜์น˜ํ™”ํ•œ ์ง€ํ‘œ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚˜์•„๊ฐ€, ์ ๋Œ€์  ๊ณต๊ฒฉ์— ๋Œ€ํ•œ ๊ฐ•๊ฑด์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์„ ์ ๋Œ€์  ๋ฐฉ์–ด(Adversarial Defense)๋ผ๊ณ  ๋ถ€๋ฅด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๋‹ค์–‘ํ•œ ์ ๋Œ€์  ๋ฐฉ์–ด ๊ธฐ๋ฒ•๋“ค์ด ์ œ์•ˆ๋˜์—ˆ์ง€๋งŒ, ๊ทธ ์ค‘์—์„œ๋„ ํ™œ๋ฐœํ•˜๊ฒŒ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋Š” ๋ฐฉ์–ด ๊ธฐ๋ฒ•์€ ์ ๋Œ€์  ํ•™์Šต(Adversarial Training)์ž…๋‹ˆ๋‹ค. ์ ๋Œ€์  ํ•™์Šต์ด๋ž€ ๋ชจ๋ธ ํ•™์Šต ์ค‘์— ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ๋งž์ถ”๋„๋ก ํ•™์Šตํ•˜์—ฌ ๊ฐ•๊ฑด์„ฑ์„ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ฆ‰, ๋ฐฑ์‹ ์„ ๋งž๋Š” ๊ฒƒ๊ณผ ์œ ์‚ฌํ•œ ์›๋ฆฌ๊ฐ€ ๋˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ ๋Œ€์  ํ•™์Šต ๋ฐฉ๋ฒ•์€ ํ•™์Šต ์‹œ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ํ•™์Šต์‹œ์ผœ ๋ฐฑ์‹ ์„ ๋งž๋Š” ๊ฒƒ๊ณผ ๊ฐ™์€ ํšจ๊ณผ๋ฅผ ์ค€๋‹ค.

์ด๋Š” ์ˆ˜ํ•™์ ์œผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์€ min-max ๋ฌธ์ œ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

\begin{equation} \min_{\theta} \max_{\hat{x}} \ell(h_\theta(\hat{x}), y) \end{equation}

์œ„์˜ min-max ๋ฌธ์ œ๋ฅผ ํ’€๊ธฐ ์œ„ํ•ด, ๋‹ค์–‘ํ•œ ์ ๋Œ€์  ๋ฐฉ์–ด ๊ธฐ๋ฒ•(AT, TRADES, MART ๋“ฑ)์ด ์ œ์•ˆ๋˜์—ˆ์œผ๋ฉฐ, ๋น„์•ฝ์ ์ธ ๊ฐ•๊ฑด์„ฑ ํ–ฅ์ƒ์„ ์ด๋ฃจ์–ด๋ƒˆ์Šต๋‹ˆ๋‹ค. ๋ณด๋‹ค ์ตœ์‹  ์ˆ˜์น˜๋Š” https://robustbench.github.io/๋ฅผ ์ฐธ๊ณ  ๋ฐ”๋ž๋‹ˆ๋‹ค.

๋ณธ๋ก 


ํ˜„์žฌ๊นŒ์ง€์˜ ๋‹ค์–‘ํ•œ ์ ๋Œ€์  ๋ฐฉ์–ด ๊ธฐ๋ฒ•๋“ค์€ ๊ฐ์ž ๋‹ค๋ฅธ ๋ฐฉ์‹์œผ๋กœ ๋†’์€ ๊ฐ•๊ฑด์„ฑ์„ ๋‹ฌ์„ฑํ•ด์™”์Šต๋‹ˆ๋‹ค. ๊ทธ ๊ณผ์ •์—์„œ, ์„ ํ–‰ ๋…ผ๋ฌธ๋“ค์€ โ€œ๋ชจ๋ธ์˜ ํŠน์ • ํŠน์„ฑ(measure)์ด ์ข‹์œผ๋ฉด, ๊ฐ•๊ฑด์„ฑ๋„ ์ข‹๋‹คโ€๋ผ๋Š” ์ „๊ฐœ ๋ฐฉ์‹์„ ์ฑ„ํƒํ•ด์˜ค๊ธฐ๋„ ํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠน์ • ํŠน์„ฑ์œผ๋กœ๋Š” ๋งˆ์ง„(margin), ๊ฒฝ๊ณ„๋ฉด ๋‘๊ป˜(boundary thickness), ๋ฆฝ์‹œ์ธ  ๊ณ„์ˆ˜(Lipschitz Value) ๋“ฑ์ด ๊ฑฐ๋ก ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์€ โ€œ๊ณผ์—ฐ ๊ทธ๋Ÿฌํ•œ ์—ฐ๊ตฌ ๊ฐ€์„ค๋“ค์ด ์‹คํ—˜์ ์œผ๋กœ๋„ ๊ฒ€์ฆ๋  ์ˆ˜ ์žˆ๋Š”๊ฐ€?โ€์— ๋Œ€ํ•œ ๊ณ ์ฐฐ์„ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์„ ํ–‰ ๋…ผ๋ฌธ์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” 8๊ฐœ์˜ ํ•™์Šต ํ™˜๊ฒฝ(๋ชจ๋ธ ๊ตฌ์กฐ, ์ ๋Œ€์  ๋ฐฉ์–ด ๊ธฐ๋ฒ•, ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ ๋“ฑ)์„ ๊ณ ๋ คํ•˜์—ฌ ์ด 1,300๊ฐœ๊ฐ€ ๋„˜๋Š” ๋ชจ๋ธ์„ CIFAR-10 ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ํ•™์Šต์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ , ๊ฐ ๋ชจ๋ธ์˜ ํŠน์„ฑ๋“ค์„ ์ธก์ •ํ•œ ๋’ค, ํ•ด๋‹น ํŠน์„ฑ(measure)์ด ์‹ค์ œ๋กœ ๊ฐ•๊ฑด์„ฑ(robustness)์™€ ์œ ์˜๋ฏธํ•œ ๊ด€๊ณ„๋ฅผ ๊ฐ–๋Š”์ง€ ํŒŒ์•…ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ•๊ฑด์„ฑ ๊ฐ’ ์ž์ฒด๋ณด๋‹ค๋„ ํ•ด๋‹น ๋ชจ๋ธ์ด ํ•™์Šต ๋ฐ์ดํ„ฐ(training set)์— ๋Œ€ํ•œ ๊ฐ•๊ฑด์„ฑ๊ณผ ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ(test set)์— ๋Œ€ํ•œ ๊ฐ•๊ฑด์„ฑ์˜ ์ฐจ์ด๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ž‘์€์ง€๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด โ€œ๊ฐ•๊ฑด์„ฑ ์ผ๋ฐ˜ํ™” ์ฐจ์ด(Robust generalization gap)โ€๋ฅผ ์ธก์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค. (โ€ป ๋ถ€๋ก์„ ํ†ตํ•ด ๊ฐ•๊ฑด์„ฑ ๊ฐ’ ์ž์ฒด๋ฅผ ์ธก์ •ํ•˜๋ฉด ํฌ๊ฒŒ ์œ ์˜๋ฏธํ•œ ๊ฒฐ๊ณผ๊ฐ€ ์—†์Œ์„ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค)

๋‹ค์–‘ํ•œ ํ•™์Šต ํ™˜๊ฒฝ์—์„œ ํ•™์Šต๋œ 1,300๊ฐœ ๋ชจ๋ธ์˜ ํ•™์Šต/ํ‰๊ฐ€ ๊ฐ•๊ฑด์„ฑ (์™ผ์ชฝ)๊ณผ ๊ทธ ์ฐจ์ด๋ฅผ ์˜๋ฏธํ•˜๋Š” ๊ฐ•๊ฑด์„ฑ ์ผ๋ฐ˜ํ™” ์ฐจ์ด (์˜ค๋ฅธ์ชฝ).

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์— ์˜ํ•œ ํ™•์ธ ๊ฒฐ๊ณผ, ์„ ํ–‰ ์—ฐ๊ตฌ์—์„œ ์ œ์•ˆ๋œ ํŠน์„ฑ๋“ค ์ค‘ ์™„๋ฒฝํžˆ ๊ฐ•๊ฑด์„ฑ๊ณผ ๋น„๋ก€ํ•˜๋Š” ํŠน์„ฑ์€ ์กด์žฌํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, ์ ๋Œ€์  ๋ฐฉ์–ด ๋ฐฉ๋ฒ•์— ๋”ฐ๋ผ์„œ๋„ ํŽธ์ฐจ๊ฐ€ ํฐ ํŠน์„ฑ๋“ค์ด ๋งŽ์•˜์œผ๋ฉฐ, ๋Œ€ํ‘œ์ ์œผ๋กœ ๊ฒฝ๊ณ„๋ฉด ๋‘๊ป˜(boundary thickness)๊ฐ€ ๊ทธ๋Ÿฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ ๋Œ€์  ๋ฐฉ์–ด ๋ฐฉ๋ฒ•์— ๋”ฐ๋ฅธ ํŠน์„ฑ๊ณผ ๊ฐ•๊ฑด์„ฑ ์‚ฌ์ด์˜ ์ƒ๊ด€๊ด€๊ณ„

์˜คํžˆ๋ ค, ๊ธฐ์กด์— ๊ฐ•๊ฑด์„ฑ๊ณผ ๊ธด๋ฐ€ํ•œ ์ƒ๊ด€ ๊ด€๊ณ„๊ฐ€ ์žˆ๋‹ค๊ณ  ์•Œ๋ ค์ง„ ๋งˆ์ง„(margin)์ด๋‚˜ ์†์‹คํ•จ์ˆ˜์˜ ํ‰ํ‰ํ•จ(Flatness)๋Š” ๊ธฐ์กด ํ•ด์„๋“ค๊ณผ ์ •๋ฐ˜๋Œ€๋˜๋Š” ๋ชจ์Šต์„ ๋ณด์ด๊ธฐ๋„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

ํŠน์ • ์กฐ๊ฑด ํ•˜์—์„œ๋Š” ๊ธฐ์กด์— ์ฃผ๋ชฉ์„ ์ž˜ ๋ฐ›์ง€ ๋ชปํ–ˆ๋˜ ์ž…๋ ฅ ๊ธฐ์šธ๊ธฐ์˜ ํฌ๊ธฐ(Input gradient norm)์ด ๊ฐ•๊ฑด์„ฑ ์ผ๋ฐ˜ํ™” ์ฐจ์ด์™€ ๊ฐ€์žฅ ๋†’์€ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ์Œ์„ ๋ณด์ด๊ธฐ๋„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์€ ์ž์ฒด์ ์œผ๋กœ ํ•™์Šตํ•œ 1,300๊ฐœ ๋ชจ๋ธ ์ด์™ธ์—๋„, https://robustbench.github.io/์— ์—…๋กœ๋“œ๋œ ๋ฒค์น˜๋งˆํฌ(Benchmark) ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ๋„ ํ‰๊ฐ€๋ฅผ ์ง„ํ–‰ํ•˜์˜€์œผ๋ฉฐ, ์œ ์‚ฌํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•ด๋‚ด์—ˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก 


๊ฐ•๊ฑด์„ฑ์€ ์ธ๊ณต์ง€๋Šฅ์˜ ์‹ ๋ขฐ์„ฑ ๋ถ€๋ฌธ์—์„œ ํ•ต์‹ฌ ๊ฐœ๋… ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ๊ฐ•๊ฑด์„ฑ์„ ๋†’์ด๋Š” ๊ฒƒ์€ ๋ฏธ๋ž˜ ์•ˆ์ „ํ•œ ์ธ๊ณต์ง€๋Šฅ ์‚ฌ์šฉ์„ ์œ„ํ•ด ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜์–ด์•ผ ํ•  ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” โ€˜A ๋ชจ๋ธ์ด B ๋ชจ๋ธ๋ณด๋‹ค ์ด ํŠน์„ฑ์ด ์ข‹์•„์„œ ๋” ์šฐ์ˆ˜ํ•œ ๋“ฏํ•˜๋‹คโ€™๋ผ๋Š” ๋ช…์ œ๋Š” ์ถฉ๋ถ„ํ•œ ํ…Œ์ŠคํŠธ ๋ฒ ๋“œ๋ฅผ ํ†ตํ•ด ๊ฒ€์ฆ๋˜์–ด์•ผ ํ•จ์„ ์ƒ๊ธฐ์‹œํ‚ค๋ฉฐ, ๋ณด๋‹ค ์›ํ™œํ•œ ๊ฒ€์ฆ์„ ์œ„ํ•ด PyTorch ๊ธฐ๋ฐ˜์˜ ์ ๋Œ€์  ๋ฐฉ์–ด ํ”„๋ ˆ์ž„์›Œํฌ [MAIR]๋ฅผ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ํŠน์„ฑ์— ๋Œ€ํ•œ ๋ณธ ๋…ผ๋ฌธ์˜ ๋ฐœ๊ฒฌ์ด ์ ๋Œ€์  ๊ณต๊ฒฉ์— ๋Œ€ํ•œ ๊ฐ•๊ฑด์„ฑ ๋ถ„์•ผ์˜ ๋ฐœ์ „์— ๊ธฐ์—ฌํ•˜๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค.

๊ด€๋ จ ์—ฐ๊ตฌ์‹ค ๋…ผ๋ฌธ

  • Understanding catastrophic overfitting in single-step adversarial training [AAAI 2021] | [Paper] | [Code]
  • Graddiv: Adversarial robustness of randomized neural networks via gradient diversity regularization [IEEE Transactions on PAMI] | [Paper] | [Code]
  • Generating transferable adversarial examples for speech classification [Pattern Recognition] | [Paper]