Learning Memory-guided Normality for Anomaly Detection



* equal contribution


We address the problem of anomaly detection, that is, detecting anomalous events in a video sequence. Anomaly detection methods based on convolutional neural networks (CNNs) typically leverage proxy tasks, such as reconstructing input video frames, to learn models describing normality without seeing anomalous samples at training time, and quantify the extent of abnormalities using the reconstruction error at test time. The main drawbacks of these approaches are that they do not consider the diversity of normal patterns explicitly, and the powerful representation capacity of CNNs allows to reconstruct abnormal video frames. To address this problem, we present an unsupervised learning approach to anomaly detection that considers the diversity of normal patterns explicitly, while lessening the representation capacity of CNNs. To this end, we propose to use a memory module with a new update scheme where items in the memory record prototypical patterns of normal data. We also present novel feature compactness and separateness losses to train the memory, boosting the discriminative power of both memory items and deeply learned features from normal data. Experimental results on standard benchmarks demonstrate the effectiveness and efficiency of our approach, which outperforms the state of the art.


Overview of our framework for reconstructing a video frame. Our model mainly consists of three parts: an encoder, a memory module, and a decoder. The encoder extracts a query map $\mathbf{q}_t$ of size $H\times W \times C$ from an input video frame ${\bf{I}}_t$ at time $t$. The memory module performs reading and updating items $\mathbf{p}_m$ of size $1\times 1\times C$ using queries $\mathbf{q}_t^k$ of size $1\times 1\times C$, where the numbers of items and queries are $M$ and $K$, respectively, and $K=H\times W$. The query map $\mathbf{q}_t$ is concatenated with the aggregated items $\hat {\bf{p}}_t$. The decoder then inputs them to reconstruct the video frame $\hat {\bf{I}}_t$. For the prediction task, we input four successive video frames to predict the fifth one.


H. Park, J. Noh, B. Ham
Learning Memory-guided Normality for Anomaly Detection
[Paper] [Code]


  	title={Learning Memory-guided Normality for Anomaly Detection},
  	author={Park, Hyunjong and Noh, Jongyoun and Ham, Bumsub},
  	booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},


This research was partly supported by Samsung Electronics Company, Ltd., Device Solutions under Grant, Deep Learning based Anomaly Detection, 2018–2020, and R&D program for Advanced Integrated-intelligence for Identification (AIID) through the National Research Foundation of KOREA(NRF) funded by Ministry of Science and ICT (NRF-2018M3E3A1057289).