A review on generative AI models for synthetic medical text, time series, and longitudinal data
Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. Ai in health and medicine. Nat. Med. 28, 31–38 (2022).
Google Scholar
McGenity, C. et al. Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy. NPJ Digital Med. 7, 114 (2024).
Google Scholar
Regulation (eu) 2024/1689 of the european parliament and of the council of 13 june 2024 laying down harmonised rules on artificial intelligence (2024). Accessed on 13 June, 2024.
Ghosheh, G. O., Li, J. & Zhu, T. A survey of generative adversarial networks for synthesizing structured electronic health records. ACM Comput. Surv. 56, 1–34 (2024).
Google Scholar
Khoury, B., Kogan, C. & Daouk, S. International classification of diseases 11th edition (icd-11). In Encyclopedia of Personality and Individual Differences, 2350–2355 (Springer, 2020).
Li, J., Cairns, B. J., Li, J. & Zhu, T. Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications. NPJ Digital Med. 6, 98 (2023).
Google Scholar
Mosquera, L. et al. A method for generating synthetic longitudinal health data. BMC Med. Res. Methodol. 23, 1–21 (2023).
Google Scholar
Kaur, D. et al. Application of bayesian networks to generate synthetic health data. J. Am. Med. Inform. Assoc. 28, 801–811 (2021).
Google Scholar
Nikolentzos, G., Vazirgiannis, M., Xypolopoulos, C., Lingman, M. & Brandt, E. G. Synthetic electronic health records generated with variational graph autoencoders. NPJ Digital Med. 6, 83 (2023).
Google Scholar
Murtaza, H. et al. Synthetic data generation: State of the art in health care domain. Computer Sci. Rev. 48, 100546 (2023).
Google Scholar
Alcaraz, J. M. L. & Strodthoff, N. Diffusion-based conditional ecg generation with structured state space models. Computers in Biology and Medicine 107115 (2023).
Boukhennoufa, I. et al. A novel model to generate heterogeneous and realistic time-series data for post-stroke rehabilitation assessment. IEEE Transactions on Neural Systems and Rehabilitation Engineering (2023).
Brophy, E., De Vos, M., Boylan, G. & Ward, T. Multivariate generative adversarial networks and their loss functions for synthesis of multichannel ecgs. Ieee Access 9, 158936–158945 (2021).
Google Scholar
Brophy, E. Synthesis of dependent multichannel ecg using generative adversarial networks. In Proceedings of the 29th ACM international conference on Information & Knowledge Management, 3229–3232 (2020).
Lee, M., Tae, D., Choi, J. H., Jung, H.-Y. & Seok, J. Improved recurrent generative adversarial networks with regularization techniques and a controllable framework. Inf. Sci. 538, 428–443 (2020).
Google Scholar
Nikolaidis, K. et al. Augmenting physiological time series data: A case study for sleep apnea detection. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 376–399 (Springer, 2019).
Asadi, M. et al. Accurate detection of paroxysmal atrial fibrillation with certified-gan and neural architecture search. Sci. Rep. 13, 11378 (2023).
Google Scholar
Koo, H. & Kim, T. E. A comprehensive survey on generative diffusion models for structured data. arXiv preprint arXiv:2306.04139v2 (2023).
Khademi, S. et al. Data augmentation to improve syndromic detection from emergency department notes. In Proceedings of the Australasian Computer Science Week, 198–205 (ACM Digital Library, 2023).
Hiebel, N., Ferret, O., Fort, K. & Névéol, A. Can synthetic text help clinical named entity recognition? a study of electronic health records in french. In 17th Conference of the European Chapter of Association for Computational Linguistics (2023).
Peng, C. et al. A study of generative large language model for medical research and healthcare. NPJ Digital Medicine 6 (2023).
Huang, J. & Chang, K. C.-C. Towards reasoning in large language models: A survey. In 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, 1049–1065 (Association for Computational Linguistics (ACL), 2023).
Valmeekam, K., Olmo, A., Sreedharan, S. & Kambhampati, S. Large language models still can’t plan (a benchmark for llms on planning and reasoning about change). arXiv preprint arXiv:2206.10498 (2022).
Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).
Tornede, A. et al. AutoML in the age of large language models: Current challenges, future opportunities and risks. Transactions on Machine Learning Research (2024).
Bandi, A., Adapa, P. V. S. R. & Kuchi, Y. E. V. P. K. The power of generative ai: A review of requirements, models, input–output formats, evaluation metrics, and challenges. Future Internet 15, 260 (2023).
Google Scholar
Abbasian, M. et al. Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative ai. NPJ Digital Med. 7, 82 (2024).
Google Scholar
Alaa, A., Van Breugel, B., Saveliev, E. S. & van der Schaar, M. How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models. In International Conference on Machine Learning, 290–306 (PMLR, 2022).
El Emam, K., Mosquera, L., Fang, X. & El-Hussuna, A. Utility metrics for evaluating synthetic health data generation methods: validation study. JMIR Med. Inform. 10, e35734 (2022).
Google Scholar
Kaabachi, B., Despraz, J., Meurers, T., Prasser, F. & Raisaro, J. L. Generation and evaluation of synthetic data in a university hospital setting. In Challenges of Trustable AI and Added-Value on Health, 141–142 (IOS Press, 2022).
Hernandez, M., Epelde, G., Alberdi, A., Cilla, R. & Rankin, D. Synthetic data generation for tabular health records: a systematic review. Neurocomputing 493, 28–45 (2022).
Google Scholar
Jordon, J. et al. Hide-and-seek privacy challenge: Synthetic data generation vs. patient re-identification. In NeurIPS 2020 Competition and Demonstration Track, 206–215 (PMLR, 2021).
van Breugel, B. & van der Schaar, M. Beyond privacy: Navigating the opportunities and challenges of synthetic data. arXiv preprint arXiv:2304.03722 (2023).
Wickramaratne, S. D. & Parekh, A. Sleepsim: Conditional gan-based non-rem sleep eeg signal generator. In 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 1–4 (IEEE, 2023).
Wang, X. et al. Using an optimized generative model to infer the progression of complications in type 2 diabetes patients. BMC Med. Inform. Decis. Mak. 22, 1–9 (2022).
Google Scholar
Lee, D. et al. Generating sequential electronic health records using dual adversarial autoencoder. J. Am. Med. Inform. Assoc. 27, 1411–1419 (2020).
Google Scholar
Bahrpeyma, F., Roantree, M., Cappellari, P., Scriney, M. & McCarren, A. A methodology for validating diversity in synthetic time series generation. MethodsX 8, 101459 (2021).
Google Scholar
Bhanot, K., Qi, M., Erickson, J. S., Guyon, I. & Bennett, K. P. The problem of fairness in synthetic healthcare data. Entropy 23, 1165 (2021).
Google Scholar
Gharehbaghi, A. & Lindén, M. A deep machine learning method for classifying cyclic time series of biological signals using time-growing neural network. IEEE Trans. Neural Netw. Learn. Syst. 29, 4102–4115 (2017).
Google Scholar
Gharehbaghi, A., Linden, M. & Babic, A. An artificial intelligent-based model for detecting systolic pathological patterns of phonocardiogram based on time-growing neural network. Appl. Soft Comput. 83, 105615 (2019).
Google Scholar
Ghosh, S., Boucher, C., Bian, J. & Prosperi, M. Propensity score synthetic augmentation matching using generative adversarial networks (pssam-gan). Computer Methods Prog. Biomedicine Update 1, 100020 (2021).
Google Scholar
Ive, J. Leveraging the potential of synthetic text for ai in mental healthcare. Front. Digital Health 4, 1010202 (2022).
Google Scholar
Eberly, L. A. et al. Identification of racial inequities in access to specialized inpatient heart failure care at an academic medical center. Circulation: Heart Fail. 12, e006214 (2019).
Guevara, M. et al. Large language models to identify social determinants of health in electronic health records. NPJ Digital Med. 7, 6 (2024).
Google Scholar
Tricco, A. Prisma extension for scoping reviews (prisma-scr): Checklist and explanation. Angew. Chem. Int. Ed. 6, 951 (1967).
Yang, Z., Li, Y. & Zhou, G. Ts-gan: Time-series gan for sensor-based health data augmentation. ACM Trans. Comput. Healthc. 4, 1–21 (2023).
Google Scholar
Haleem, M. S. et al. Deep-learning-driven techniques for real-time multimodal health and physical data synthesis. Electronics 12, 1989 (2023).
Google Scholar
Festag, S. & Spreckelsen, C. Medical multivariate time series imputation and forecasting based on a recurrent conditional wasserstein gan and attention. J. Biomed. Inform. 139, 104320 (2023).
Google Scholar
Li, X., Metsis, V., Wang, H. & Ngu, A. H. H. Tts-gan: A transformer-based time-series generative adversarial network. In International Conference on Artificial Intelligence in Medicine, 133–143 (Springer, 2022).
Wang, J., Chen, Y. & Gu, Y. A wearable-har oriented sensory data generation method based on spatio-temporal reinforced conditional gans. Neurocomputing 493, 548–567 (2022).
Google Scholar
Foomani, F. H. et al. Synthesizing time-series wound prognosis factors from electronic medical records using generative adversarial networks. J. Biomed. Inform. 125, 103972 (2022).
Google Scholar
Bing, S., Dittadi, A., Bauer, S. & Schwab, P. Conditional generation of medical time series for extrapolation to underrepresented populations. PLOS Digital Health 1, e0000074 (2022).
Google Scholar
Habiba, M., Borphy, E., Pearlmutter, B. A. & Ward, T. Ecg synthesis with neural ode and gan models. In International Conference on Electrical, Computer and Energy Technologies, 1–6 (IEEE, 2021).
Lee, W., Lee, J. & Kim, Y. Contextual imputation with missing sequence of eeg signals using generative adversarial networks. IEEE Access 9, 151753–151765 (2021).
Google Scholar
Maweu, B. M., Shamsuddin, R., Dakshit, S. & Prabhakaran, B. Generating healthcare time series data for improving diagnostic accuracy of deep neural networks. IEEE Trans. Instrum. Meas. 70, 1–15 (2021).
Google Scholar
Li, X., Luo, J. & Younes, R. Activitygan: Generative adversarial networks for data augmentation in sensor-based human activity recognition. In ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers, 249–254 (2020).
Kiyasseh, D. et al. Plethaugment: Gan-based ppg augmentation for medical diagnosis in low-resource settings. IEEE J. Biomed. health Inform. 24, 3226–3235 (2020).
Google Scholar
Wang, S., Rudolph, C., Nepal, S., Grobler, M. & Chen, S. Part-gan: Privacy-preserving time-series sharing. In Artificial Neural Networks and Machine Learning, 578–593 (Springer, 2020).
Dahmen, J. & Cook, D. Synsys: A synthetic data generation system for healthcare applications. Sensors 19, 1181 (2019).
Google Scholar
Theodorou, B., Xiao, C. & Sun, J. Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model. Nat. Commun. 14, 5305 (2023).
Google Scholar
Li, R. et al. Improving an electronic health record–based clinical prediction model under label deficiency: Network-based generative adversarial semisupervised approach. JMIR Med. Inform. 11, e47862 (2023).
Google Scholar
Shi, J., Wang, D., Tesei, G. & Norgeot, B. Generating high-fidelity privacy-conscious synthetic patient data for causal effect estimation with multiple treatments. Front. Artif. Intell. 5, 918813 (2022).
Google Scholar
Wendland, P. et al. Generation of realistic synthetic data using multimodal neural ordinary differential equations. NPJ Digital Med. 5, 122 (2022).
Google Scholar
Wang, Z. & Sun, J. Promptehr: Conditional electronic healthcare records generation with prompt learning. Conference on Empirical Methods in Natural Language Processing 2873–2885 (2022).
Zhou, N., Wang, L., Marino, S., Zhao, Y. & Dinov, I. D. Datasifter ii: Partially synthetic data sharing of sensitive information containing time-varying correlated observations. J. Algorithms Comput. Technol. 16, 17483026211065379 (2022).
Google Scholar
Abell-Hart, K., Hajagos, J., Zhu, W., Saltz, M. & Saltz, J. Generating longitudinal synthetic ehr data with recurrent autoencoders and generative adversarial networks. Data Management, Polystores, and Analytics for Healthcare 153 (2021).
Zhang, Z., Yan, C., Lasko, T. A., Sun, J. & Malin, B. A. Synteg: a framework for temporal structured electronic health data simulation. J. Am. Med. Inform. Assoc. 28, 596–604 (2021).
Google Scholar
Yoon, J., Drumright, L. N. & Van Der Schaar, M. Anonymization through data synthesis using generative adversarial networks (ads-gan). IEEE J. Biomed. Health Inform. 24, 2378–2388 (2020).
Google Scholar
Christensen, J. H. et al. Fully synthetic longitudinal real-world data from hearing aid wearers for public health policy modeling. Front. Neurosci. 13, 850 (2019).
Google Scholar
Baowaly, M. K., Lin, C.-C., Liu, C.-L. & Chen, K.-T. Synthesizing electronic health records using improved generative adversarial networks. J. Am. Med. Inform. Assoc. 26, 228–241 (2019).
Google Scholar
Wang, Y., Meng, X. & Liu, X. Differentially private recurrent variational autoencoder for text privacy preservation. Mobile Networks and Applications 1–16 (2023).
Zhou, N., Wu, Q., Wu, Z., Marino, S. & Dinov, I. D. Datasiftertext: Partially synthetic text generation for sensitive clinical notes. J. Med. Syst. 46, 96 (2022).
Google Scholar
Shim, H., Lowet, D., Luca, S. & Vanrumste, B. Synthetic data generation and multi-task learning for extracting temporal information from health-related narrative text. In Proceedings of the Seventh Workshop on Noisy User-generated Text, 260–273 (2021).
Kasthurirathne, S. N., Dexter, G. & Grannis, S. J. Generative adversarial networks for creating synthetic free-text medical data: a proposal for collaborative research and re-use of machine learning models. AMIA Summits Transl. Sci. Proc. 2021, 335 (2021).
Google Scholar
Al Aziz, M. M. et al. Differentially private medical texts generation using generative neural networks. ACM Trans. Comput. Healthc. 3, 1–27 (2021).
Google Scholar
Libbi, C. A., Trienes, J., Trieschnigg, D. & Seifert, C. Generating synthetic training data for supervised de-identification of electronic health records. Future Internet 13, 136 (2021).
Google Scholar
Guan, J., Li, R., Yu, S. & Zhang, X. A method for generating synthetic electronic medical record text. IEEE/ACM Trans. Comput. Biol. Bioinform. 18, 173–182 (2019).
Google Scholar
Kasthurirathne, S. N., Dexter, G. & Grannis, S. J. An adversorial approach to enable re-use of machine learning models and collaborative research efforts using synthetic unstructured free-text medical data. MEDINFO 2019: Health and Wellbeing e-Networks for All 1510–1511 (2019).
Syed, M., Marshall, J., Nigam, A. & Chawla, N. V. Gender prediction through synthetic resampling of user profiles using seqgans. In 8th International Conference on Computational Data and Social Networks, 363–370 (Springer, 2019).
Lee, S. H. Natural language generation for electronic health records. NPJ Digital Med. 1, 63 (2018).
Google Scholar
Li, I. et al. Neural natural language processing for unstructured data in electronic health records: a review. Comput. Sci. Rev. 46, 100511 (2022).
Google Scholar
Hahn, W. et al. Contribution of synthetic data generation towards an improved patient stratification in palliative care. J. Personalized Med. 12, 1278 (2022).
Google Scholar
link
