Identifying high-risk online gamblers: A comparison of data mining procedures

Abstract

Using play data from a sample of virtual live action sports betting gamblers, this study evaluates a set of classification and regression algorithms to determine which techniques are more effective in identifying probable disordered gamblers. This study identifies a clear need for validating results using players not appearing in the original sample, as even methods that use in-sample cross-validation can show substantial differences in performance from one data set to another. Many methods are found to be quite accurate in correctly identifying player types in training data, but perform poorly when used on new samples. Artificial neural networks appear to be the most reliable classification method overall, but still fail to identify a large group of likely problem gamblers. Bet intensity, variability, frequency and trajectory, as well as age and gender are noted to be insufficient variables to classify probable disordered gamblers with arbitrarily reasonable accuracy.

Problem with this document? Please report it to us.