Evaluating Log-Likelihood for Confidence Estimation in LLM-Based Multiple-Choice Question Answering

Christopher Boseak

doi:10.70844/ijas.2025.2.29

Authors

Christopher Boseak Independent Researcher, Dallas, TX, USA Author

DOI:

https://doi.org/10.70844/ijas.2025.2.29

Keywords:

Artificial intelligence, Machine learning, Large Language Models (LLMs), Confidence estimation, Log-likelihood, Calibration, Multiple-Choice Question Answering (MCQA), Softmax, Uncertainty quantification, Model reliability, Answer scoring methods, NLP evaluation

Abstract

Reliable deployment of Large Language Models (LLMs) in question-answering tasks requires well-calibrated confidence estimates. This work investigates whether token-level log-likelihoods—sums of log-probabilities over answer tokens—can serve as effective confidence signals in Multiple-Choice Question Answering (MCQA). We compare three methods: (1) Raw log-likelihood, (2) length-normalized log- likelihood and (3) conventional softmax-based choice probability. Across four diverse MCQA benchmarks, we find that no single scoring method is universally best. Length normalization can significantly improve calibration but may reduce accuracy, while softmax and raw log-likelihood yield identical predictions. These results highlight important trade-offs between calibration and accuracy and offer insights into selecting or adapting confidence measures for different tasks. Our findings inform the design of more trustworthy LLM-based QA systems and lay groundwork for broader uncertainty quantification efforts.

Evaluating Log-Likelihood for Confidence Estimation in LLM-Based Multiple-Choice Question Answering

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Similar Articles

Submit Article

Submit Article

Journal Highlights

Biology and Applied Life Sciences

Dentistry and Oral Sciences

Applied Microbiology

Computer Engineering

Artificial Intelligence

Mechanical Engineering

Civil Engineering

Environmental Engineering

Electrical and Electronics Engineering

Communications Engineering

Robotics and Automation

Machine Learning

Applied Chemistry

Materials Science

Environmental Science

Earth Sciences

Social Science

Applied Physics

Applied Mathematics

Biotechnology

Zoology

Veterinary Sciences

Food Science

Forestry

Genetics and Genomics

Plant Sciences

View more

Indexing

Google Scholar

Scilit

CrossRef

Information