SPRITZ-PS: Validation of synthetic face images using a large dataset of printed documents

Faces generated by generative adversarial networks (GAN) appear difficult to distinguish from real human faces. Therefore, since these images are currently being utilized as profile images for false identities across the social media and web, synthetic images may have serious social consequences such as emotionally hurting, misleading, and impacting public actions and opinions [1]. One of the solutions to distinguish the pristine face image from the synthetic one is focusing on the similarity of the left and right irises of the same person. Naturally, a healthy person’s left and right irises have the same shape and colour. Anomalies in iris patterns may reveal GAN-generated face images due to the absence of physiological restrictions in the GAN models. These anomalies could be observed widely in faces generated by GANs of high quality. The main advantage of focusing on the iris instead of the entire face image is that size of an iris image (64 × 64 ppi) is considerably smaller than the size of a complete face image (over 2000 × 2000 ppi). Consequently, in real-time scenarios, analysis of the iris would be faster than analysis of the entire face due to the difference in dimensions of images in the dataset. There are some severe issues which should be addressed in order to distinguish fake images; (I) Printing and scanning the documents with the same printers and scanners add the same noise to the images. Thus, real and GAN-generated face images will contain the same noise, which makes it quite challenging to differentiate the synthetic face image; (ii) Possibility of extracting the iris parts from the whole face image with incomplete shape due to eye occlusion or not gazing directly in the camera. Incomplete iris images may cause false detections, like indicating a real image as a fake one. For these issues to be addressed, we start with fragmenting irises from printed-scanned face images, both real and fake faces. To segment the iris section from an entire face image, we use Dlib, which adds 68 facial landmarks, and with the help of EyeCool, we extract both left and right irises. However, the extracted iris images are not completely shaped due to eyelid occlusion. Since there is no incomplete iris in the real world and the incomplete image is not an appropriate input to train deep neural networks, we need to fill missing pixels of extracted iris. To this end, we employ the Hypergraph convolution-based image inpainting technique. Hypergraph convolution has been applied to discover the intricate link between the iris images. The outputs of the inpainting have resulted in the creation of a novel dataset, called SPRITZ-PS, that includes printed-scanned iris images of pristine and GAN generated faces. To verify our dataset, we have used a Siamese Neural Network trained on anchor, positive, and negative pairs of images and learns to produce an embedding to measure the similarities using pre-trained convolutional neural networks such as Resnet50, VGG16, MobileNet-v2, and Xception. A triplet loss distance matrix was also utilized to determine the distance between the learned embeddings in n-dimensional space. Finally, we compare the results of our pre-trained models in erms of training parameters, training and testing losses, training and testing accuracies, computing time, and average similarity score. For ProGAN, we have reached the highest average similarity score of 95.04% for RAW (pristine) images using the MobileNet-v2 network and the lowest average similarity score of 56.52% for GAN (GAN-generated fake) images using the Xception network. On the other hand, for StyleGAN, the highest average similarity score is 93.84% for RAW inputs on the MobileNet-v2 network, and the lowest average similarity score is 56.76% for GAN images on the Xception network.

SPRITZ-PS: Validation of synthetic face images using a large dataset of printed documents

SHOARI, SEYEDSADRA

2021/2022

Abstract

Faces generated by generative adversarial networks (GAN) appear difficult to distinguish from real human faces. Therefore, since these images are currently being utilized as profile images for false identities across the social media and web, synthetic images may have serious social consequences such as emotionally hurting, misleading, and impacting public actions and opinions [1]. One of the solutions to distinguish the pristine face image from the synthetic one is focusing on the similarity of the left and right irises of the same person. Naturally, a healthy person’s left and right irises have the same shape and colour. Anomalies in iris patterns may reveal GAN-generated face images due to the absence of physiological restrictions in the GAN models. These anomalies could be observed widely in faces generated by GANs of high quality. The main advantage of focusing on the iris instead of the entire face image is that size of an iris image (64 × 64 ppi) is considerably smaller than the size of a complete face image (over 2000 × 2000 ppi). Consequently, in real-time scenarios, analysis of the iris would be faster than analysis of the entire face due to the difference in dimensions of images in the dataset. There are some severe issues which should be addressed in order to distinguish fake images; (I) Printing and scanning the documents with the same printers and scanners add the same noise to the images. Thus, real and GAN-generated face images will contain the same noise, which makes it quite challenging to differentiate the synthetic face image; (ii) Possibility of extracting the iris parts from the whole face image with incomplete shape due to eye occlusion or not gazing directly in the camera. Incomplete iris images may cause false detections, like indicating a real image as a fake one. For these issues to be addressed, we start with fragmenting irises from printed-scanned face images, both real and fake faces. To segment the iris section from an entire face image, we use Dlib, which adds 68 facial landmarks, and with the help of EyeCool, we extract both left and right irises. However, the extracted iris images are not completely shaped due to eyelid occlusion. Since there is no incomplete iris in the real world and the incomplete image is not an appropriate input to train deep neural networks, we need to fill missing pixels of extracted iris. To this end, we employ the Hypergraph convolution-based image inpainting technique. Hypergraph convolution has been applied to discover the intricate link between the iris images. The outputs of the inpainting have resulted in the creation of a novel dataset, called SPRITZ-PS, that includes printed-scanned iris images of pristine and GAN generated faces. To verify our dataset, we have used a Siamese Neural Network trained on anchor, positive, and negative pairs of images and learns to produce an embedding to measure the similarities using pre-trained convolutional neural networks such as Resnet50, VGG16, MobileNet-v2, and Xception. A triplet loss distance matrix was also utilized to determine the distance between the learned embeddings in n-dimensional space. Finally, we compare the results of our pre-trained models in erms of training parameters, training and testing losses, training and testing accuracies, computing time, and average similarity score. For ProGAN, we have reached the highest average similarity score of 95.04% for RAW (pristine) images using the MobileNet-v2 network and the lowest average similarity score of 56.52% for GAN (GAN-generated fake) images using the Xception network. On the other hand, for StyleGAN, the highest average similarity score is 93.84% for RAW inputs on the MobileNet-v2 network, and the lowest average similarity score is 56.76% for GAN images on the Xception network.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				ICT FOR INTERNET AND MULTIMEDIA - INGEGNERIA PER LE COMUNICAZIONI MULTIMEDIALI E INTERNET Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2021
			
	Titolo inglese
	
				SPRITZ-PS: Validation of synthetic face images using a large dataset of printed documents
			
	Abstract in italiano
	
				Faces generated by generative adversarial networks (GAN) appear difficult to distinguish from real human faces. Therefore, since these images are currently being utilized as profile images for false identities across the social media and web, synthetic images may have serious social consequences such as emotionally hurting, misleading, and impacting public actions and opinions [1]. One of the solutions to distinguish the pristine face image from the synthetic one is focusing on the similarity of the left and right irises of the same person. Naturally, a healthy person’s left and right irises have the same shape and colour. Anomalies in iris patterns may reveal GAN-generated face images due to the absence of physiological restrictions in the GAN models. These anomalies could be observed widely in faces generated by GANs of high quality. The main advantage of focusing on the iris instead of the entire face image is that size of an iris image (64 × 64 ppi) is considerably smaller than the size of a complete face image (over 2000 × 2000 ppi). Consequently, in real-time scenarios, analysis of the iris would be faster than analysis of the entire face due to the difference in dimensions of images in the dataset. There are some severe issues which should be addressed in order to distinguish fake images; (I) Printing and scanning the documents with the same printers and scanners add the same noise to the images. Thus, real and GAN-generated face images will contain the same noise, which makes it quite challenging to differentiate the synthetic face image; (ii) Possibility of extracting
the iris parts from the whole face image with incomplete shape due to eye occlusion or not gazing directly in the camera. Incomplete iris images may cause false detections, like indicating a real image as a fake one.
For these issues to be addressed, we start with fragmenting irises from printed-scanned face images, both real and fake faces. To segment the iris section from an entire face image, we use Dlib, which adds 68 facial landmarks, and with the help of EyeCool, we extract both left and right irises. However, the extracted iris images are not completely shaped due to eyelid occlusion. Since there is no incomplete iris in the real world and the incomplete image is not an appropriate input to train deep neural networks, we need to fill missing pixels of extracted iris. To this end, we employ the Hypergraph convolution-based image inpainting technique. Hypergraph convolution has been applied to discover the intricate link between the iris images. The outputs of the inpainting have resulted in the creation of a novel dataset, called SPRITZ-PS, that includes printed-scanned iris images of pristine and GAN generated faces. To verify our dataset, we have used a Siamese Neural Network trained on anchor, positive, and negative pairs of images and learns to produce an embedding to measure the similarities using pre-trained convolutional neural networks such as Resnet50, VGG16, MobileNet-v2, and Xception. A triplet loss distance matrix was also utilized to determine the distance between the learned embeddings in n-dimensional space. Finally, we compare the results of our pre-trained models in erms of training parameters, training and testing losses, training and testing accuracies, computing time, and average similarity score. For ProGAN, we have reached the highest average similarity score of 95.04% for RAW (pristine) images using the MobileNet-v2 network and the lowest average similarity score of 56.52% for GAN (GAN-generated fake) images using the Xception network. On the other hand, for StyleGAN, the highest average similarity score is 93.84% for RAW inputs on the MobileNet-v2 network, and the lowest average similarity score is 56.76% for GAN images on the Xception network.
			
	Parola chiave
	
				GAN
Deepfake
Iris detection
Image forensics
			
	Relatore
	
				CONTI, MAURO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Shoari_Seyedsadra.pdf accesso riservato Dimensione 7.51 MB Formato Adobe PDF	7.51 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/35250