Vision Transfomer for Scene Text Recognition

This project aims to explore application of the Vision Transformer (ViT) architecture for Scene Text Recognition problems. The model is implemented with Tensorflow 2 and is similar to the original ViT with some changes to fit the new type of problem:

For more details on model architecture and training process, please, checkout the training notebook. The model was tested on the IIIT 5K-word dataset and achieved 80.97% accuracy. More details are available in the testing notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
datasets		datasets
notebooks		notebooks
vit		vit
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Transfomer for Scene Text Recognition

About

Releases

Packages

Languages

OlegPonomaryov/vit-str

Folders and files

Latest commit

History

Repository files navigation

Vision Transfomer for Scene Text Recognition

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages