Conclusion
This study aimed to investigate the potential biases in algorithmic resume screening, focusing on gender, ethnicity and ideological expression as main dimensions of discrimination. The analysis of results brings forward valuable insights on how an experimental protocol can be leveraged to identify ethnic, gender, sectorial and ideological bias in such algorithms. The Resume-Matcher algorithm has demonstrated to be especially biased on gender and ethnic variables–measured through the proxy of names. Certain sectoral differences exist, for instance, among Nurse and Teacher roles where women are higher ranked than men by the algorithm. Importantly, these results demonstrate that the bias in automated hiring is not uniformly distributed but varies significantly by sector, reflecting both broader labor market dynamics and the specificities of algorithmic training data.
While this study presents important societal implications for algorithmic decision-making in hiring processes, it also contributes to the academic discussion by putting forward a standardized research design model. Indeed, our research protocol, leveraging a small LLM to generate quality synthetic data at a large scale, presents a great opportunity to explore biases of automated resume screening algorithms with better control over variables and more statistical significance. Nonetheless, data quality remains difficult to ensure at a very large scale. This protocol can be replicated to any other resume screening algorithm that produces a score, thus making it valuable for future research. We believe that the Resume Matcher employed in this study is a suitable proof-of-concept, however testing this research design on an algorithm used by actual companies (Workdays, LinkedIn, etc.) could lead to insightful audits both for the companies producing the algorithms and their clients–companies that hire with these algorithms.
We plan to release our 180k + resume database publicly on Kaggle, and provide our code as a ready-to-use Python package for anyone willing to build on this research. This will be done in the coming months.
Limitations
We remain aware of a number of limitations with this protocol.
- As discussed previously, the algorithm we have used was designed to help people enhance their CV to better fit a specific job offer. While it can be used by a company to screen resumes automatically, this is not its primary goal/. Companies using resume screening algorithms unfortunately do not publish these codes publicly, but there is room for more research.
- Although largely automatised, our data generation pipeline requires having a human check for anomalies or hallucinations in LLM-generated content. This is absolutely crucial to ensure data quality: the LLM generated content is the cornerstone of the resume generation protocol.
- Even with human review, the data generated is synthetic and can only try to approach reality.
- Our data processing pipeline relies on the LLM generating consistent outputs - especially regarding their format. The job and volunteering experiences are generated by a small LLM. The stochastic nature of LLMs makes human review all the more important. We address gender bias through a binary lens, which does not acknowledge for other gender identities. This was done because of time constraints, but future research could investigate further in this direction.
- The annotation of voluntary experiences as “ideologically controversial” or a job experience as “not adapted” to a job offer can be subject to debate: we acknowledge that, despite all our precautions, our analysis may be biased in itself due to these annotations.