Title: Graphical tools for assessing information quality: loan application decisions
Authors: Dominique Haughton, Mary Ann Robbert, Linda P. Senne
Addresses: Bentley College, 175 Forest Street, Waltham, MA 02452, USA. ' Bentley College, 175 Forest Street, Waltham, MA 02452, USA. ' Bentley College, 175 Forest Street, Waltham, MA 02452, USA.
Abstract: Using a loan application data set, this paper demonstrates the use of several graphical tools to assess information quality: histograms to study individual variables, scatter plots to compare original and cleaned variables as well as to examine the effects that cleaning a particular predictor has on models of a decision, decision trees to identify important predictors of a decision, and ROC curves to evaluate the predictive value of each attribute. Proposed techniques for cleaning a data set include eliminating erroneous records, excluding attributes with too many incorrect values from the model and applying domain knowledge. We suggest that our approach can be applied to a small sample of a data set to help prioritise which variables should be cleaned.
Keywords: information quality; data cleaning; logistic regression models; decision trees; ROC curves; decision making; loan applications; decision variables; mortgage loans.
DOI: 10.1504/IJTPM.2005.008634
International Journal of Technology, Policy and Management, 2005 Vol.5 No.4, pp.330 - 347
Published online: 12 Jan 2006 *
Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article