Publication

Reliable Inference from Human-Centred Datasets with Accumulated Local Effects

2024

In: INFORMS Annual Meeting 2024, 2024, Seattle, USA

Résumé

Despite the increasing promise of big data analysis, operations research and management science (ORMS) problems often involve important human-centred topics with relatively small datasets. This study aims to improve the interpretability and performance of statistical inference techniques used in such cases. We focus on the General Linear Model (GLM) and Generalized Additive Models (GAMs), which are often more reliable than machine learning techniques on smaller datasets. GLM techniques are highly interpretable, and GAMs often provide superior performance as they model nonlinear relationships without prior specification. However, both are prone to issues (collinearity in GLM and concurvity in GAM) that confound the interpretation of correlated predictors.

To address these concerns, we extend accumulated local effects (ALE), a popular technique for visualizing relationships in machine learning models. Our approach combines bootstrapped ALE estimates with new ALE-based effect sizes to produce confidence regions for the predictor variables whose effects can be reliably inferred beyond the sample. ALE provides enhanced interpretability to GLMs and GAMs and is immune to the confounding effects of correlated predictors. Thus, the best-performing model specifications can be readily interpreted without resorting to the artificial exclusion of correlated yet relevant variables. Our results demonstrate that with ALE-based inference, researchers can inductively search for relationships in data without a priori hypotheses while effectively ruling out spurious patterns that plague classic hypothesis testing. Whereas such exploratory research is common with machine learning on large datasets, our extensions to ALE bring such benefits to smaller datasets for many important ORMS challenges.