Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformer models have dramatically shifted the landscape of forecasting, their effectiveness remains debated. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches, highlighting the potential for more streamlined architectures. In this paper, we shift focus from the overall architecture of the Transformer to the effectiveness of self-attentions for time series forecasting. To this end, we introduce a new architecture, Cross-Attention-only Time Series transformer (CATS), that rethinks the traditional Transformer framework by eliminating self-attention and leveraging cross-attention mechanisms instead. By establishing future horizon-dependent parameters as queries and enhanced parameter sharing, our model not only improves long-term forecasting accuracy but also reduces the number of parameters and memory usage. Extensive experiment across various datasets demonstrates that our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.
AAAI
Fair Sampling in Diffusion Models through Switching Mechanism
Yujin Choi, Jinseong Park, Hoki Kim, Jaewook Lee, and Saerom Park
In Proceedings of the AAAI Conference on Artificial Intelligence, 2024
Diffusion models have shown their effectiveness in generation tasks by well-approximating the underlying probability distribution. However, diffusion models are known to suffer from an amplified inherent bias from the training data in terms of fairness. While the sampling process of diffusion models can be controlled by conditional guidance, previous works have attempted to find empirical guidance to achieve quantitative fairness. To address this limitation, we propose a fairness-aware sampling method called \textitattribute switching mechanism for diffusion models. Without additional training, the proposed sampling can obfuscate sensitive attributes in generated data without relying on classifiers. We mathematically prove and experimentally demonstrate the effectiveness of the proposed method on two key aspects: (i) the generation of fair data and (ii) the preservation of the utility of the generated data.
EAAI
Evaluating practical adversarial robustness of fault diagnosis systems via spectrogram-aware ensemble method
Hoki Kim, Sangho Lee, Jaewook Lee, Woojin Lee, and Youngdoo Son
Engineering Applications of Artificial Intelligence, 2024
While machine learning models have shown superior performance in fault diagnosis systems, researchers have revealed their vulnerability to subtle noises generated by adversarial attacks. Given that this vulnerability can lead to misdiagnosis or unnecessary maintenance, the assessment of the practical robustness of fault diagnosis models is crucial for their deployment and use in real-world scenarios. However, research on the practical adversarial robustness of fault diagnosis models remains limited. In this work, we present a comprehensive analysis on rotating machinery diagnostics and discover that existing attacks often over-estimate the robustness of these models in practical settings. In order to precisely estimate the practical robustness of models, we propose a novel method that unveils the hidden risks of fault diagnosis models by manipulating the spectrum of signal frequencies—an area that has been rarely explored in the domain of adversarial attacks. Our proposed attack, Spectrogram-Aware Ensemble Method (SAEM), the hidden vulnerability of fault diagnosis systems through achieving a higher attack performance in practical black-box settings. Through experiments, we reveal the potential dangers of employing non-robust fault diagnosis models in real-world applications and suggest directions for future research in industrial applications.
NeurIPS
Fantastic Robustness Measures: The Secrets of Robust Generalization
Hoki Kim, Jinseong Park, Yujin Choi, and Jaewook Lee
In Thirty-seventh Conference on Neural Information Processing Systems, 2023
Adversarial training has become the de-facto standard method for improving the robustness of models against adversarial examples. However, robust overfitting remains a significant challenge, leading to a large gap between the robustness on the training and test datasets. To understand and improve robust generalization, various measures have been developed, including margin, smoothness, and flatness-based measures. In this study, we present a large-scale analysis of robust generalization to empirically verify whether the relationship between these measures and robust generalization remains valid in diverse settings. We demonstrate when and how these measures effectively capture the robust generalization gap by comparing over 1,300 models trained on CIFAR-10 under the norm and further validate our findings through an evaluation of more than 100 models from RobustBench across CIFAR-10, CIFAR-100, and ImageNet. We hope this work can help the community better understand adversarial robustness and motivate the development of more robust defense methods against adversarial attacks.
NeuNet
Bridged Adversarial Training
Hoki Kim, Woojin Lee, Sungyoon Lee, and Jaewook Lee
Adversarial robustness is considered a required property of deep neural networks. In this study, we discover that adversarially trained models might have significantly different characteristics in terms of margin and smoothness, even though they show similar robustness. Inspired by the observation, we investigate the effect of different regularizers and discover the negative effect of the smoothness regularizer on maximizing the margin. Based on the analyses, we propose a new method called bridged adversarial training that mitigates the negative effect by bridging the gap between clean and adversarial examples. We provide theoretical and empirical evidence that the proposed method provides stable and better robustness, especially for large perturbations.
PR
Generating Transferable Adversarial Examples for Speech Classification
Despite the success of deep neural networks, the existence of adversarial attacks has revealed the vulnerability of neural networks in terms of security. Adversarial attacks add subtle noise to the original example, resulting in a false prediction. Although adversarial attacks have been mainly studied in the image domain, a recent line of research has discovered that speech classification systems are also exposed to adversarial attacks. By adding inaudible noise, an adversary can deceive speech classification systems and cause fatal issues in various applications, such as speaker identification and command recognition tasks. However, research on the transferability of audio adversarial examples is still limited. Thus, in this study, we first investigate the transferability of audio adversarial examples with different structures and conditions. Through extensive experiments, we discover that the transferability of audio adversarial examples is related to their noise sensitivity. Based on the analyses, we present a new adversarial attack called noise injected attack that generates highly transferable audio adversarial examples by injecting additive noise during the gradient ascent process. Our experimental results demonstrate that the proposed method outperforms other adversarial attacks in terms of transferability.
ASOC
Fast Sharpness-Aware Training for Periodic Time Series Classification and Forecasting
Jinseong Park, Hoki Kim, Yujin Choi, Woojin Lee, and Jaewook Lee
Various deep learning architectures have been developed to capture long-term dependencies in time series data, but challenges such as overfitting and computational time still exist. The recently proposed optimization strategy called Sharpness-Aware Minimization (SAM) optimization prevents overfitting by minimizing a perturbed loss within the nearby parameter space. However, SAM requires doubled training time to calculate two gradients per iteration, hindering its practical application in time series modeling such as real-time assessment. In this study, we demonstrate that sharpness-aware training improves generalization performance by capturing trend and seasonal components of time series data. To avoid the computational burden of SAM, we leverage the periodic characteristics of time series data and propose a new fast sharpness-aware training method called Periodic Sharpness-Aware Time series Training (PSATT) that reuses gradient information from past iterations. Empirically, the proposed method achieves both generalization and time efficiency in time series classification and forecasting without requiring additional computations compared to vanilla optimizers.
IEEE
Exploring Diverse Feature Extractions for Adversarial Audio Detection
Yujin Choi, Jinseong Park, Jaewook Lee, and Hoki Kim
Although deep learning models have exhibited excellent performance in various domains, recent studies have discovered that they are highly vulnerable to adversarial attacks. In the audio domain, malicious audio examples generated by adversarial attacks can cause significant performance degradation and system malfunctions, resulting in security and safety concerns. However, compared to recent developments in the audio domain, the properties of the adversarial audio examples and defenses against them still remain largely unexplored. In this study, to provide a deeper understanding of the adversarial robustness in the audio domain, we first investigate traditional and recent feature extractions in terms of adversarial attacks. We show that adversarial audio examples generated from different feature extractions exhibit different noise patterns, and thus can be distinguished by a simple classifier. Based on the observation, we extend existing adversarial detection methods by proposing a new detection method that detects adversarial audio examples using an ensemble of diverse feature extractions. By combining the frequency and self-supervised feature representations, the proposed method provides a high detection rate against both white-box and black-box adversarial attacks. Our empirical results demonstrate the effectiveness of the proposed method in speech command classification and speaker recognition.
arXiv
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim, Jinseong Park, Yujin Choi, and Jaewook Lee
Training deep learning models with differential privacy (DP) results in a degradation of performance. The training dynamics of models with DP show a significant difference from standard training, whereas understanding the geometric properties of private learning remains largely unexplored. In this paper, we investigate sharpness, a key factor in achieving better generalization, in private learning. We show that flat minima can help reduce the negative effects of per-example gradient clipping and the addition of Gaussian noise. We then verify the effectiveness of Sharpness-Aware Minimization (SAM) for seeking flat minima in private learning. However, we also discover that SAM is detrimental to the privacy budget and computational time due to its two-step optimization. Thus, we propose a new sharpness-aware training method that mitigates the privacy-optimization trade-off. Our experimental results demonstrate that the proposed method improves the performance of deep learning models with DP from both scratch and fine-tuning.
TPAMI
Graddiv: Adversarial Robustness of Randomized Neural Networks via Gradient Diversity Regularization
Sungyoon Lee, Hoki Kim, and Jaewook Lee
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022
Deep learning is vulnerable to adversarial examples. Many defenses based on randomized neural networks have been proposed to solve the problem, but fail to achieve robustness against attacks using proxy gradients such as the Expectation over Transformation (EOT) attack. We investigate the effect of the adversarial attacks using proxy gradients on randomized neural networks and demonstrate that it highly relies on the directional distribution of the loss gradients of the randomized neural network. We show in particular that proxy gradients are less effective when the gradients are more scattered. To this end, we propose Gradient Diversity (GradDiv) regularizations that minimize the concentration of the gradients to build a robust randomized neural network. Our experiments on MNIST, CIFAR10, and STL10 show that our proposed GradDiv regularizations improve the adversarial robustness of randomized neural networks against a variety of state-of-the-art attack methods. Moreover, our method efficiently reduces the transferability among sample models of randomized neural networks.
PR
Variational Cycle-consistent Imputation Adversarial Networks for General Missing Patterns
Woojin Lee, Sungyoon Lee, Junyoung Byun, Hoki Kim, and Jaewook Lee
Imputation of missing data is an important but challenging issue because we do not know the underlying distribution of the missing data. Previous imputation models have addressed this problem by assuming specific kinds of missing distributions. However, in practice, the mechanism of the missing data is unknown, so the most general case of missing pattern needs to be considered for successful imputation. In this paper, we present cycle-consistent imputation adversarial networks to discover the underlying distribution of missing patterns closely under some relaxations. Using adversarial training, our model successfully learns the most general case of missing patterns. Therefore our method can be applied to a wide variety of imputation problems. We empirically evaluated the proposed method with numerical and image data. The result shows that our method yields the state-of-the-art performance quantitatively and qualitatively on standard datasets.
PR
Compact Class-conditional Domain Invariant Learning for Multi-class Domain Adaptation
Neural network-based models have recently shown excellent performance in various kinds of tasks. However, a large amount of labeled data is required to train deep networks, and the cost of gathering labeled training data for every kind of domain is prohibitively expensive. Domain adaptation tries to solve this problem by transferring knowledge from labeled source domain data to unlabeled target domain data. Previous research tried to learn domain-invariant features of source and target domains to address this problem, and this approach has been used as a key concept in various methods. However, domain-invariant features do not mean that a classifier trained on source data can be directly applied to target data because it does not guarantee that data distribution of the same classes will be aligned across two domains. In this paper, we present novel generalization upper bounds for domain adaptation that motivates the need for class-conditional domain invariant learning. Based on this theoretical framework, we then propose a class-conditional domain invariant learning method that can learn a feature space in which features in the same class are expected to be mapped nearby. We empirically experimented that our model showed state-of-the-art performance on standard datasets and showed effectiveness by visualization of latent space.
AAAI
Understanding Catastrophic Overfitting in Single-step Adversarial Training
Hoki Kim, Woojin Lee, and Jaewook Lee
In Proceedings of the AAAI Conference on Artificial Intelligence, 2021
Although fast adversarial training has demonstrated both robustness and efficiency, the problem of "catastrophic overfitting" has been observed. This is a phenomenon in which, during single-step adversarial training, the robust accuracy against projected gradient descent (PGD) suddenly decreases to 0% after a few epochs, whereas the robust accuracy against fast gradient sign method (FGSM) increases to 100%. In this paper, we demonstrate that catastrophic overfitting is very closely related to the characteristic of single-step adversarial training which uses only adversarial examples with the maximum perturbation, and not all adversarial examples in the adversarial direction, which leads to decision boundary distortion and a highly curved loss surface. Based on this observation, we propose a simple method that not only prevents catastrophic overfitting, but also overrides the belief that it is difficult to prevent multi-step adversarial attacks with single-step adversarial training.
arXiv
Torchattacks: A PyTorch Repository for Adversarial Attacks
Torchattacks is a PyTorch library that contains adversarial attacks to generate adversarial examples and to verify the robustness of deep learning models.