PMC-Patients

A Large-scale Dataset of Patient Summaries and Relations for Benchmarking Retrieval-based Clinical Decision Support Systems

About

There are two tasks in PMC-Patients for benchmarking ReCDS systems: Patient-to-Article Retrieval (PAR) and Patient-to-Patient Retrieval (PPR). For a given query patient, PAR aims to retrieve relevant articles from PubMed, and PPR aims to retrieve similar patients from PMC-Patients.

For more details about PMC-Patients, please refer to our paper:

Dataset & Submission

PMC-Patients contain 167k patient summaries collected from PubMed Central, annotated with 3.1M relevant articles and 293k similar patients defined by PubMed citation relationships.

Please visit our GitHub repository to download the dataset and submit your model:

Patient-to-Article Retrieval (PAR) Leaderboard
Model MRR (%) P@10 (%) nDCG@10 (%) R@1k (%)
1
June 25, 2023
DPR (SciMult-MHAExpert)
UIUC/Microsoft
(Zhang et al. 2023)
29.89 9.35 13.79 53.71
2
Apr 5, 2023
RRF
Tsinghua University
(Zhao et al. 2023)
29.86 8.86 13.36 49.45
3
Apr 5, 2023
DPR (PubMedBERT)
Tsinghua University
(Zhao et al. 2023)
19.83 6.51 8.87 46.23
4
Apr 5, 2023
DPR (BioLinkBERT)
Tsinghua University
(Zhao et al. 2023)
19.06 6.11 8.26 45.79
5
Apr 5, 2023
DPR (SPECTER)
Tsinghua University
(Zhao et al. 2023)
17.92 5.49 7.66 42.46
6
Apr 5, 2023
BM25
Tsinghua University
(Zhao et al. 2023)
18.71 3.84 7.38 21.89
7
Sep 14, 2023
bge-base-en-v1.5
BAAI
(Xiao et al. 2023)
15.88 4.27 6.44 30.43
8
Oct 4, 2023
MedCPT-d
NCBI
(Jin et al. 2023)
13.06 2.67 4.95 19.94
Patient-to-Patient Retrieval (PPR) Leaderboard
Model MRR (%) P@10 (%) nDCG@10 (%) R@1k (%)
1
Apr 5, 2023
RRF
Tsinghua University
(Zhao et al. 2023)
27.76 6.96 24.12 85.14
2
June 25, 2023
DPR (SciMult-MHAExpert)
UIUC/Microsoft
(Zhang et al. 2023)
25.34 6.66 22.40 83.87
3
Apr 5, 2023
BM25
Tsinghua University
(Zhao et al. 2023)
22.86 4.67 18.29 69.66
4
Apr 5, 2023
DPR (BioLinkBERT)
Tsinghua University
(Zhao et al. 2023)
21.20 5.59 18.06 80.49
5
Apr 5, 2023
DPR (PubMedBERT)
Tsinghua University
(Zhao et al. 2023)
19.37 5.05 16.30 79.35
6
Sep 14, 2023
bge-base-en-v1.5
BAAI
(Xiao et al. 2023)
16.20 3.78 13.02 68.85
7
Apr 5, 2023
DPR (SPECTER)
Tsinghua University
(Zhao et al. 2023)
15.08 3.79 12.27 73.01
8
Oct 4, 2023
MedCPT-d
NCBI
(Jin et al. 2023)
13.68 3.18 11.01 60.17

Citation

If you use PMC-Patients in your research, please cite our paper by:

@article{Zhao2023ALD,
  title={A large-scale dataset of patient summaries for retrieval-based clinical decision support systems.},
  author={Zhengyun Zhao and Qiao Jin and Fangyuan Chen and Tuorui Peng and Sheng Yu},
  journal={Scientific data},
  year={2023},
  volume={10 1},
  pages={
          909
        },
  url={https://api.semanticscholar.org/CorpusID:266360591}
}