№1, 2026

ANONYMIZATION OF PERSONAL MEDICAL DATA BASED ON ARTIFICIAL INTELLIGENCE
Ramiz Shikhaliyev

The widespread digitalization of healthcare has led to the accumulation of substantial volumes of personal medical data (PMD), creating new opportunities for enhancing the quality of medical care, supporting informed clinical decision-making, and advancing scientific research. At the same time, the large-scale accumulation of PMD poses serious risks to cybersecurity and patient privacy. One key mechanism for mitigating these risks is the anonymization of PMD. However, traditional anonymization methods demonstrate significant limitations when processing complex, multidimensional, and unstructured PMD. This article examines approaches to using artificial intelligence (AI) methods for PMD anonymization. It substantiates the need to move from classical statistical, syntactic, and cryptographic models to intelligent and adaptive systems capable of automatically identifying sensitive information, performing context-sensitive transformations, and generating synthetic data. Furthermore, key technical, ethical, and regulatory issues, as well as the risks associated with the use of AI for PMD anonymization, are analyzed (pp.35-43).

Keywords:Personal medical data, Patient confidentiality, Data anonymization, Artificial intelligence, De-identification, Differential privacy
References
  • Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., Zhang, L. (2016). Deep learning with differential privacy. Proceedings of the ACM Conference on Computer and Communications Security, 308–318.
  • Acar, A., Aksu, H., Uluagac, A.S., Conti, M. (2018). A survey on homomorphic encryption schemes: Theory and implementation. ACM Computing Surveys, 51(4), 1–35.
  • Aminifar, A., Lamo, Y., Pun, K.I., Rabbi, F. (2019). A practical methodology for anonymization of structured health data. Proceedings of the Scandinavian Conference on Health Informatics, 1–7.
  • Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., and Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82-115.
  • Bae H, Jung D, Choi HS, Yoon S. (2020). AnomiGAN: Generative Adversarial Networks for Anonymizing Private Medical Data. Pac Symp Biocomput., 25, 563-574.
  • Benitez, K., and Malin, B. (2010). Evaluating re-identification risks with respect to the HIPAA privacy rule. Journal of the American Medical Informatics Association, 17(2), 169-177.
  • Chen, R.J., Lu, M.Y., Chen, T.Y., Williamson, D.F., Mahmood, F. (2021). Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering, 5(6), 493–497.
  • Dernoncourt, F., Lee, J.Y., Uzuner, O., Szolovits, P. (2017). De-identification of patient notes with recurrent neural networks. Journal of the American Medical Informatics Association, 24(3), 596–606. (
  • Di Cerbo, F., and Trabelsi, S. (2018). Towards personal data identification and anonymization using machine learning techniques. ADBIS Short Papers and Workshops, 1–11.
  • Dwork, C. (2006). Differential privacy. In Automata, Languages and Programming. 1–12. Springer.
  • European Parliament and Council. (2016). Regulation (EU) 2016/679 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation). Official Journal of the European Union, L119, 1–88.
  • Fredrikson M, Lantz E, Jha S, Lin S, Page D, Ristenpart T. (2014). Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing. Proc USENIX Secur Symp., 17-32.
  • Goodman, B., and Flaxman, S. (2017). European Union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3), 50–57. (Scopus, Q3/ Web of Science, Q4, in SCIE)
  • Gupta, N.S., and Kumar, P. (2023). Perspective of artificial intelligence in healthcare data management: A journey towards precision medicine. Computers in Biology and Medicine, 162, 107051.
  • Hayes, J., Melis, L., Danezis, G., and De Cristofaro, E. (2019). LOGAN: Evaluating privacy leakage of generative models using generative adversarial networks. Proceedings on Privacy Enhancing Technologies, 2019(1), 133-152.
  • Jordon, J., Szpruch, L., Houssiau, F., Bottarelli, M., Cherubin, G., Maple, C., Weller, A. (2022). Synthetic data – what, why and how? arXiv preprint.
  • Korytkowski, M., Nowak, J., Scherer, R. (2023). Detecting Sensitive Data with GANs and Fully Convolutional Networks. In: Nguyen, N.T., et al. Intelligent Information and Database Systems. ACIIDS 2023. Lecture Notes in Computer Science, 13995.
  • Kotelnikov, A., Baranchuk, D., Rubachev, I., Babenko, A. (2023). TabDDPM: Modeling tabular data with diffusion models. Proceedings of the International Conference on Machine Learning, 2023, 17564–17575.
  • Li, N., Li, T., Venkatasubramanian, S. (2007). t-closeness: Privacy beyond k-anonymity and l-diversity. Proceedings of the International Conference on Data Engineering, 106–115.
  • Li, W., Milletari, F., Xu, D., Rieke, N., Hancox, J., Zhu, W., Kainz, B. (2019). Privacy-preserving federated brain tumor segmentation. Proceedings of the Workshop on Machine Learning in Medical Imaging, 133–141.
  • Lison, P., Pilán, I., Sánchez, D., Batet, M., Øvrelid, L. (2021). Anonymization models for text data: State of the art, challenges and future directions. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 4188–4203.
  • Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M. (2006). l-diversity: Privacy beyond k-anonymity. Proceedings of the International Conference on Data Engineering, 24.
  • Nugent, T., Upton, D., and Cimpoesu, M. (2016). Improving data transparency in clinical trials using blockchain smart contracts. F1000Research, 5, 2541.
  • Pakhale K. (2023). Comprehensive Overview of Named Entity Recognition: Models, Domain-Specific Applications and Challenges. arXiv:2309.14084
  • Phong L. T., and Phuong T. T., (2023). Differentially private stochastic gradient descent via compression and memorization. Journal of Systems Architecture, 135, 102819.
  • Rocher, L., Hendrickx, J.M., de Montjoye, Y.-A. (2019). Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications, 10(1), 3069.
  • Shokri, R., Stronati, M., Song, C., Shmatikov, V. (2017). Membership inference attacks against machine learning models. IEEE Symposium on Security and Privacy, 3–18.
  • Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 557–570.
  • U.S. Department of Health and Human Services. (1996). The Health Insurance Portability and Accountability Act of 1996 (HIPAA). Public Law, 104–191.
  • Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., and Bennett, K. P. (2020). Generation and evaluation of privacy-preserving synthetic health data. Neurocomputing, 416, 244-255.
  • Yang, M., Lyu, L., Zhao, J., Zhu, T., and Lam, K. Y. (2020). Local differential privacy and its applications: A comprehensive survey. arXiv preprint arXiv:2008.03686.
  • Yoon, J., Drumright, L.N., van der Schaar, M. (2020). Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE Journal of Biomedical and Health Informatics, 24(8), 2378–2388.