{rfName}
Bi

Indexed in

License and use

Citations

Altmetrics

Analysis of institutional authors

Iglesias, GuillermoCorresponding Author

Share

Publications
>
Meeting

Binary Classification Optimisation with AI-Generated Data

Publicated to:Lecture Notes In Computer Science. 15383 210-216 - 2025-01-01 15383(), DOI: 10.1007/978-3-031-80889-0_15

Authors: Mazon, Manuel Jesus Cerezo; Garcia, Ricardo Moya; Garcia, Ekaitz Arriola; del Castillo, Miguel Herencia Garcia; Iglesias, Guillermo

Affiliations

Ainovis, Colquide 6, Madrid 28231, Spain - Author
Univ Politecn Madrid, Madrid, Spain - Author

Abstract

In the field of machine learning, obtaining sufficient and high-quality data is a persistent challenge. This report explores the innovative solution of using synthetic data generated from existing datasets to overcome this limitation. By employing synthetic data, we not only increase the quantity of available information but also maintain the integrity and essential characteristics of natural data. This methodology allows the application of conventional data augmentation techniques, ensuring a more robust and efficient learning process. The study is based on a dataset provided by the International Skin Imaging Collaboration (ISIC), consisting of 3,323 cases divided equally between melanomas and Basal Cell Carcinoma (BCC). Using Generative Adversarial Networks (GANs), specifically StyleGAN2 with transfer learning from the Flickr-Faces-HQ (FFHQ) model, synthetic images were generated, expanding the dataset fourfold to a total of 26,584 synthetic records. The quality of the synthetic images was ensured using the Frechet Inception Distance (FID) metric [5], with BCC obtaining 22.2534 and melanomas obtaining 20.4577 according to this metric. Models trained with a hybrid approach using both real and synthetic data showed improved performance metrics (F1 0.71 to 0.79), highlighting the effectiveness of this method in enhancing binary classification tasks in medical imaging. The source code for all the research, along with the generated dataset is publicly available.

Keywords

Data augmentationFidGanIsicMachine learningMedical imaginMedical imagingSkin lesion classificationSynthetic data

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

The work has been published in the journal Lecture Notes In Computer Science due to its progression and the good impact it has achieved in recent years, according to the agency WoS (JCR), it has become a reference in its field. In the year of publication of the work, 2025, it was in position 13/61, thus managing to position itself as a Q1 (Primer Cuartil), in the category Computer Science, Theory & Methods. Notably, the journal is positioned above the 90th percentile.

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2025-05-27:

  • The use of this contribution in bookmarks, code forks, additions to favorite lists for recurrent reading, as well as general views, indicates that someone is using the publication as a basis for their current work. This may be a notable indicator of future more formal and academic citations. This claim is supported by the result of the "Capture" indicator, which yields a total of: 1 (PlumX).

Leadership analysis of institutional authors

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: Last Author (IGLESIAS HERNANDEZ, GUILLERMO).

the author responsible for correspondence tasks has been IGLESIAS HERNANDEZ, GUILLERMO.