Prediction Calibration for Reliable Modeling with Deep Networks

The use of deep learning models in critical applications such as healthcare, autonomous driving and scientific discovery has made it imperative to characterize model reliability. Artificial intelligence (AI) techniques have the potential to support critical decision-making, for example from diagnosing diseases to prescribing treatments in healthcare. However, to prioritize patient safety, one must ensure such  methods are accurate and reliable. A model that inherently contains inductive biases (overfit to the data characteristics) can behave unreliably when deployed in the real-world.

Uncertainty Quantification refers to the scientific process of predicting outcomes based on finite amounts data to provide measures of confidence that are used to inform decisions.

Uncertainty Quantification Meets Machine Learning

Since a variety of factors pertinent to data sampling, measurement errors and model approximation contribute to the stochasticity in data-driven methods, we expect uncertainty quantification (UQ) to play a significant role in studying model behavior.

Broadly, a rigorous statistical characterization learning systems can enable us:

  • build reliable models – consistency between predictions and our understanding of the world.
  • allow incorporation of real-world priors.
  • avoid machines from being overly confident even when making mistakes.
  • identify regimes of strengths and weaknesses.
  • design human-in-the-loop systems – Adjust model predictions

How Do We Know the Estimated Uncertainties are Meaningful?

Due to the lack of ground-truth estimates for the uncertainties, it is common to utilize calibration, which measures the agreement between predictions and known priors. For example, in classification problems, one can expect the probabilities of class assignment not to be concentrated on any specific class, when the prediction is wrong. Similarly, in regression problems, one can expect the true target to be contained with high likelihood within an uniform interval around the mean prediction.

Building Calibrated Deep Regression Models

A natural strategy to produce calibrated predictors is to directly optimize for prediction intervals that satisfy the calibration objective. For example, in the heteroscedastic regression, the variance estimates are obtained using the Gaussian likelihood objective, under a heteroscedastic prior assumption. However, by not explicitly constructing the intervals based on epistemic (model variability) or aleatoric (inherent stochasticity) uncertainties, it is not straightforward to interpret the variances, even when they are well calibrated. On the other hand, approaches designed to capture specific sources of uncertainties, e.g. Monte Carlo dropout for epistemic or conditional quantile based aleatoric uncertainties, are found to be poorly calibrated in practice.

In our recent paper (AAAI 2020), we conjecture that one can reliably build calibrated deep models by posing calibration as an auxiliary task and utilizing a novel uncertainty matching strategy. With applications in regression, time-series forecasting, and object localization, we show that our approach achieves significant improvements over existing uncertainty quantification methods in deep learning.

Can We Calibrate Any Uncertainty Estimator In Deep Models?

Though a large class of methods exists for measuring deep uncertainties, in practice, the resulting estimates are found to be poorly calibrated, thus making it challenging to translate them into actionable insights. A common workaround is to utilize a separate recalibration step, which adjusts the estimates to compensate for the miscalibration. Instead, in our recent work, we proposed to repurpose the heteroscedastic regression objective as a surrogate for calibration, and enabled any existing uncertainty estimator to produce inherently calibrated intervals. By performing calibration automatically in the training process based on an explicit uncertainty estimator, this approach does not suffer the limitations of recalibration methods and can be associated to specific error sources. Surprisingly, our approach is able to achieve significantly improved calibration with both epistemic and an aleatoric uncertainty estimators, though they are known to be produce miscalibrated intervals in practice. More importantly,this implicit calibration objective regularizes the training process and produces highly accurate mean estimators.


[1] Jayaraman J. Thiagarajan, Bindya Venkatesh, Prasanna Sattigeri and Timo Bremer. Building Calibrated Deep Modelsvia Uncertainty Matching with Auxiliary Interval Predictors. AAAI Conference on Artificial Intelligence, Feb. 2020.

[2] Jayaraman J. Thiagarajan, Bindya Venkatesh and Deepta Rajan. Learn-by-Calibrating: Using Calibration as a Training Objective. IEEE ICASSP, May 2020.

[3] Bindya Venkatesh and Jayaraman J. Thiagarajan. Heteroscedastic Calibration of Uncertainty Estimators in Deep Learning. 2020.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s