Title: Missing data imputation by the aid of features similarities
Authors: Samih M. Mostafa
Addresses: Mathematics Department, Faculty of Science, South Valley University, Qena, Eygpt
Abstract: The missing data is likely to occur in statistical analyses. The quality of the data is affected by the used imputation method. In this paper, a method is proposed to impute the missing data on variables of interest (i.e., recipient) using observed values from other variables (i.e., donors). Some existing methods rely upon only the recipient (e.g., unconditional means), others rely on the recipient and one donor (i.e., interpolation). The proposed method depends on the similarities of the values in the donor to impute the missing data in the recipient. If the similarities are not sufficient to impute all missing values, another method is combined with the proposed method to impute the residual missing data. The proposed approach is straightforward and can be combined with existing methods. The empirical study validated the superiority of the proposed approach and showed that it can significantly improve the quality of data. In addition, the improvement is more remarkable when the missing values ratio is greater.
Keywords: imputation; unconditional mean; missingness mechanisms; missing values.
DOI: 10.1504/IJBDM.2020.106883
International Journal of Big Data Management, 2020 Vol.1 No.1, pp.81 - 103
Received: 07 Mar 2019
Accepted: 21 Aug 2019
Published online: 24 Apr 2020 *