Skip to content

Experiments on using Vision Transformer for Scene Text Recognition

Notifications You must be signed in to change notification settings

OlegPonomaryov/vit-str

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vision Transfomer for Scene Text Recognition

This project aims to explore application of the Vision Transformer (ViT) architecture for Scene Text Recognition problems. The model is implemented with Tensorflow 2 and is similar to the original ViT with some changes to fit the new type of problem:

For more details on model architecture and training process, please, checkout the training notebook. The model was tested on the IIIT 5K-word dataset and achieved 80.97% accuracy. More details are available in the testing notebook.

About

Experiments on using Vision Transformer for Scene Text Recognition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published