This tutorial offers a comprehensive exploration of the Data-Centric AI paradigm, emphasizing the fundamental shift towards prioritizing data within the AI lifecycle. Through a hands-on approach, participants will experiment with several open-source tools to address challenges such as data complexity, missing data, imbalances, fairness, privacy, and explainability, with applications across healthcare and finance, among others. The tutorial is tailored for a diverse audience, ranging from early-year PhD students to experienced researchers across various AI subfields, requiring only basic proficiency in data science/machine learning, and Python programming.
A Multi-view panorama of Data-Centric AI: Techniques, Tools, and Applications
Dr. Miriam Seoane Santos (ORCID 0000-0002-5912-963X) completed her Ph.D. in Informatics Engineering in 2022 and is currently an Assistant Professor at the Department of Computer Science of the University of Porto, Portugal. She is a full member of the Cognitive and Media Systems of CISUC and her research spans several topics within data complexity, data difficulty factors, and data intrinsic characteristics with particular implications for AI Ethics and Responsible AI. She has published her research across several high-impact journals, currently holding 26 publications, 1053 citations, and a h-index of 12. In 2023, she received the Award for Best Ph.D. Thesis in Artificial Intelligence 2022, promoted by the Portuguese Association in Artificial Intelligence.
Dr. Pedro Henriques Abreu (ORCID ID 0000-0002-9278-8194) is an Associate Professor at the Department of Informatics of the University of Coimbra in Portugal, full member of the Cognitive and Media Systems of CISUC, one of the area editors of the Information Fusion journal and one of the editors of the Data Science and Analytics journal. His research interests include the development of Data-Centric AI approaches especially related to missing, imbalance data, and data fairness. He is the author of more than 100 publications, which have been nominated and awarded multiple times as best papers. He currently holds 2490 citations, an h-index of 25, and an i10-index of 47. Over the years, he was a member of the Programme Committee and Editorial Board of some of the leading conferences and journals in these fields and participated as a researcher in 8 international and international projects.
Dr. Alberto Fernández Hilario (ORCID ID 0000-0002-6480-8434) is a Full Professor at the Department of Computer Science and Artificial Intelligence of the University of Granada, Spain. His research spans Data Science, Computational Intelligence, Ethics, and Trustworthy Artificial Intelligence, with a particular emphasis on interdisciplinary, applied, and problem-oriented approaches. With over 60 publications, he has received over 10,000 citations, achieving an H-index of 37. He has edited a monograph titled “Learning from Imbalanced Datasets” (2018, Springer Ed.) and received awards such as the “Lofti A. Zadeh” prize from IFSA for the Best Article in 2009-2010, two awards from the University of Granada for Best Publication in Engineering (2013 and 2014), and the “Excellence in Knowledge” award in 2017. Acknowledged as a Highly Cited Researcher in Computer Science in 2017, Dr. Fernández ranks within the top 2% of the most influential researchers globally according to the Stanford ranking.