I’m working on a side project, machine learning with neural networks.
My goal with this is getting 3D information out of 2D video – but that’s a super difficult problem so I’ve been trying out different techniques on simpler problems such as generating faces. Yes, generating photos of people that never existed is a – relatively – simple problem.
Above are a selection of random faces slowly being changed into a different set of faces, as generated by my code and neural network.
Each face is generated from just 100 numbers. Altering one number will make a subtle or not so subtle change to the resulting image. Altering twenty will completely change the face (that’s how many are being altered above).
There are more impressive results out there, such as the BEGAN network (image above) that’s generating faces indistinguishable from real photos at a larger size. There’s the neural network that’s using a photo of a dress to generate a photo of a model wearing the dress. There’s a lot going on.
For anyone interested, here are some of the details about the coding, data and training I’ve used for the above.
I modified a version of this script to download enough images from Google image search, then went through and removed as many bad images as I could find. As there were around fifty thousand this took a while and was imperfect, but the results aren’t bad considering. Here, “bad images” means ones that weren’t faces, or faces that were blurry or obscured in some way.
I trained the NN on a laptop with an Nvidia 970m graphics card for several days to get to the point I could make the image at the top.
I used a WGAN, a GAN (Generative Adversarial Network) with wasserstein loss, which has been much more stable than my previous attempts with simpler techniques.
I wrote the program in python using keras with a tensorflow backend.
I’m currently working on a program to sort images into groups or categories by ‘similarity’ – here meaning that they have certain features in common. I decide which features by dragging each image into a category, the program then works out what makes them ‘similar’ and can filter the entire dataset based on that.
Once I’ve got that working I should be able to improve the quality of my training data and therefore the results I get at the end.
There’s room for huge improvement just by throwing better hardware at the problem. The latest Nvidia cards would allow larger images of higher quality to be processed at a similar rate to what I’m getting now, due to having approximately four times the memory and three times the number of processing units.