Deep Learning Based Text to Image Generation

G. Ajay; Ch. Sai Teja; P. Baswaraj; V. Vasanth; Dr. G. Sreenivasulu; B.Tech. Student; Professor

doi:10.32628/IJSRSET23102105

Authors

G. Ajay B.Tech., CSE Department, JB Institute of Engineer
Ch. Sai Teja B.Tech., CSE Department, JB Institute of Engineer
P. Baswaraj B.Tech., CSE Department, JB Institute of Engineer
V. Vasanth B.Tech., CSE Department, JB Institute of Engineering and Technology, Hyderabad, India
Dr. G. Sreenivasulu Professor, CSE Department, JB Institute of Engineering and Technology, Hyderabad, India
B.Tech. Student
Professor

Keywords:

PSNR, GAN, Caltech birds dataset, NLP, CNN, RNN, CNN

Abstract

Text-to-image generation is a method used for generating images related to given textual descriptions. It has a significant influence on many research areas as well as a diverse set of applications (e.g., photo-searching, photo-editing, art generation, computer-aided design, image re-construction, captioning, and portrait drawing). The most challenging task is to consistently produce realistic images according to given conditions. Existing algorithms for text-to-image generation create pictures that do not properly match the text. We considered this issue in our study and built a deep learning-based architecture for semantically consistent image generation: recurrent convolutional generative adversarial network (RC-GAN). RC-GAN successfully bridges the advancements in text and picture modelling, converting visual notions from words to pixels. The proposed model was trained on the Oxford-102 flowers dataset, and its performance was evaluated using an inception score and PSNR. The experimental results demonstrate that our model is capable of generating more realistic photos of flowers from given captions, with an inception score of 4.15 and a PSNR value of 30.12 dB, respectively. Generating images from natural language is one of the primary applications of conditional generative models. This project uses Generative Adversarial Networks (GANs) to generate an image given a text description. GANs are Deep Neural Networks that are generative models of data. Given a group of coaching data, GANs can learn to estimate the underlying probability distribution of the info. In this project, the model is trained on the Caltech birds dataset. Recent progress has been made using GANs.

References

T. Han, X. Zhang, Y. Xu, Y. Xu, and X. Tao, “Text-to-Image Generation via Stable Diffusion,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” in International Conference on Learning Representations (ICLR), 2019.
M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015.

Deep Learning Based Text to Image Generation

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite