J. Woody, T.L Wu, Z. Zhao
Mississippi State University,
United States
Keywords: fraud detection, cluster detection, anomaly detection, scan statistics, model misspecification, theta-filtering
Summary:
The scan statistic has been applied in various areas for anomaly/cluster detection (see attached references). Accurate bounds and approximations are developed for the implementation of scan statistics. However, to the best of our knowledge, there is no scientific work on the scan statistic and model misspecification. Model misspecification is a crucial issue in real-world applications as the results cannot be reliable if the model assumptions do not hold. That means even if the parametric model assumptions are not valid, our method can still produce reliable results. Indeed, the key idea is that we may even strike a balance between intentionally over smoothing density estimates in a nonparametric setting while allowing for the model misspecification thereby created. The theta-filtering tool was newly created for this task and is therefore new to the scan statistical setting. It will remove the effect of discrepancy between the assumed model and the observed data on scan statistics, while allowing us to identify local clusters and quantify their intensity. This tool also allows us to filter out weak signals which are likely to be false positives. We also may put reasonable p-value to individual observations which will help prioritize investigative resources interested in anomalous or criminal behavior.