Reviewerly Publications

Here you’ll find the research behind Reviewerly: our papers and talks that advance trustworthy peer review ecosystems. We care about transparency, reproducibility, and real-world impact, so where possible we include links to PDFs, code, datasets, and citation details. Browse the list below to explore our latest findings and the ideas shaping Reviewerly’s products and community.

RottenReviews: Benchmarking Review Quality with Human and LLM-Based Judgments
Sajad Ebrahimi, Soroush Sadeghian, Ali Ghorbanpour, Negar Arabzadeh, Sara Salamat, Muhan Li, Hai Son Le, Mahdi Bashari and Ebrahim Bagheri
Applied Research Track @ The 34th ACM International Conference on Information and Knowledge Management (CIKM 2025)
- Abstract: The quality of peer review plays a critical role in scientific publishing, yet remains poorly understood and challenging to evaluate at scale. In this work, we introduce RottenReviews, a benchmark designed to facilitate systematic assessment of review quality. RottenReviews comprises over 15,000 submissions from four distinct academic venues enriched with over 9,000 reviewer scholarly profiles and paper metadata. We define and compute a diverse set of quantifiable review-dependent and reviewer-dependent metrics, and compare them against structured assessments from large language models (LLMs) and expert human annotations. Our human-annotated subset includes over 700 paper–review pairs labeled across 13 explainable and conceptual dimensions of review quality. Our empirical findings reveal that LLMs, both zero-shot and fine-tuned, exhibit limited alignment with human expert evaluations of peer review quality. Surprisingly, simple interpretable models trained on quantifiable features outperform fine-tuned LLMs in predicting overall review quality.
Building Trustworthy Peer Review Quality Assessment Systems
Negar Arabzadeh, Sajad Ebrahimi, Ali Ghorbanpour, Soroush Sadeghian, Sara Salamat, Muhan Li, Hai Son Le, Mahdi Bashari and Ebrahim Bagheri
Industry Day Talks @ The 34th ACM International Conference on Information and Knowledge Management (CIKM 2025)
- Abstract: Peer review is foundational to academic publishing, yet the quality of reviews remains difficult to assess at scale due to subjectivity, inconsistency, and the lack of standardized evaluation mechanisms. This talk presents our experience developing and deploying a scalable framework for assessing review quality in operational settings. We combine two complementary approaches: interpretable machine learning models built on quantifiable review- and reviewer-level features, and the application of large language models (LLMs), including Qwen, Phi, and GPT-4o, in zero- and few-shot configurations for textual quality evaluation. We also explore the fine-tuning of LLMs on expert-annotated datasets to examine their upper-bound capabilities. To benchmark these methods, we constructed a dataset of over 700 paper–review pairs labeled by domain experts across multiple quality dimensions. Our findings demonstrate that transparent, feature-based models consistently outperform LLMs in reliability and generalization, particularly when evaluating conceptual depth and argumentative structure. The talk will highlight key engineering choices, deployment challenges, and broader implications for integrating automated review evaluation into scholarly workflows.
exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem
Negar Arabzadeh, Sajad Ebrahimi, Sara Salamat, Mahdi Bashari and Ebrahim Bagheri
Full Paper Track @ The 47th European Conference on Information Retrieval (ECIR 2025)
- Abstract: The peer review process is crucial for ensuring the quality and reliability of scholarly work, yet assigning suitable reviewers remains a significant challenge. Traditional manual methods are labor-intensive and often ineffective, leading to nonconstructive or biased reviews. This paper introduces the exHarmony (eHarmony but for connecting experts to manuscripts) benchmark, designed to address these challenges by re-imagining the Reviewer Assignment Problem (RAP) as a retrieval task. Utilizing the extensive data from OpenAlex, we propose a novel approach that considers a host of signals from the authors, most similar experts, and the citation relations as potential indicators for a suitable reviewer for a manuscript. This approach allows us to develop a standard benchmark dataset for evaluating the reviewer assignment problem without needing explicit labels. We benchmark various methods, including traditional lexical matching, static neural embeddings, and contextualized neural embeddings, and introduce evaluation metrics that assess both relevance and diversity in the context of RAP. Our results indicate that while traditional methods perform reasonably well, contextualized embeddings trained on scholarly literature show the best performance. The findings underscore the importance of further research to enhance the diversity and effectiveness of reviewer assignments.
Reviewerly: Modeling the Reviewer Assignment Task as an Information Retrieval Problem
Negar Arabzadeh, Sajad Ebrahimi, Sara Salamat, Mahdi Bashari and Ebrahim Bagheri
Industry Day Talks @ The 33th ACM International Conference on Information and Knowledge Management (CIKM 2024)
- Abstract: The peer review process is a fundamental aspect of academic publishing, ensuring the quality and credibility of scholarly work. In this talk, we will explore the critical challenges associated specifically with the assignment of reviewers to submitted papers. We will introduce Reviewerly, our innovative solution designed to enhance the efficiency and effectiveness of reviewer assignments by leveraging data from diverse sources, including OpenAlex, PubMed, and DBLP. By modeling the reviewer assignment problem as an information retrieval task, we focus on retrieving a pool of relevant and diverse reviewers for each paper. We will highlight the challenges we faced and showcase the benefits of this approach in addressing the reviewer assignment problem.

Find the right experts, faster

Try Reviewer.ly now