Stabilizing LLM-Assisted Résumé–Job Description Matching Through Controlled Evaluation
Large language models (LLMs) are increasingly used in résumé–job description (JD) matching systems to extract requirements, identify evidence, and score candidate alignment. However, without explicit architectural controls, these systems can exhibit non-deterministic behavior that undermines trust, auditability, and fairness. This paper presents an empirical case study of an LLM-assisted résumé–JD gap analyzer that initially produced unstable and inflated scores under identical inputs. We identify two primary failure modes—criteria drift and evidence misattribution—and demonstrate how freezing evaluation artifacts and enforcing evidence provenance restore deterministic behavior. Results show that architectural constraints, rather than model size, are the primary determinant of evaluation reliability.
Read full article