Skip to content

Releases: open-mmlab/mmocr

MMOCR Release v1.0.0rc0

01 Sep 06:30
c44b611
Compare
Choose a tag to compare
Pre-release

We are excited to announce the release of MMOCR 1.0.0rc0!
MMOCR 1.0.0rc0 is the first version of MMOCR 1.x, a part of the OpenMMLab 2.0 projects.
Built upon the new training engine,
MMOCR 1.x unifies the interfaces of dataset, models, evaluation, and visualization with faster training and testing speed.

Highlights

  1. New engines. MMOCR 1.x is based on MMEngine, which provides a general and powerful runner that allows more flexible customizations and significantly simplifies the entrypoints of high-level interfaces.

  2. Unified interfaces. As a part of the OpenMMLab 2.0 projects, MMOCR 1.x unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.

  3. Cross project calling. Benefiting from the unified design, you can use the models implemented in other OpenMMLab projects, such as MMDet. We provide an example of how to use MMDetection's Mask R-CNN through MMDetWrapper. Check our documents for more details. More wrappers will be released in the future.

  4. Stronger visualization. We provide a series of useful tools which are mostly based on brand-new visualizers. As a result, it is more convenient for the users to explore the models and datasets now.

  5. More documentation and tutorials. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it here.

Breaking Changes

We briefly list the major breaking changes here.
We also have the migration guide that provides complete details and migration instructions.

Dependencies

  • MMOCR 1.x relies on MMEngine to run. MMEngine is a new foundational library for training deep learning models in OpenMMLab 2.0 models. The dependencies of file IO and training are migrated from MMCV 1.x to MMEngine.
  • MMOCR 1.x relies on MMCV>=2.0.0rc0. Although MMCV no longer maintains the training functionalities since 2.0.0rc0, MMOCR 1.x relies on the data transforms, CUDA operators, and image processing interfaces in MMCV. Note that the package mmcv is the version that provide pre-built CUDA operators and mmcv-lite does not since MMCV 2.0.0rc0, while mmcv-full has been deprecated.

Training and testing

  • MMOCR 1.x uses Runner in MMEngine rather than that in MMCV. The new Runner implements and unifies the building logic of dataset, model, evaluation, and visualizer. Therefore, MMOCR 1.x no longer maintains the building logics of those modules in mmocr.train.apis and tools/train.py. Those code have been migrated into MMEngine. Please refer to the migration guide of Runner in MMEngine for more details.
  • The Runner in MMEngine also supports testing and validation. The testing scripts are also simplified, which has similar logic as that in training scripts to build the runner.
  • The execution points of hooks in the new Runner have been enriched to allow more flexible customization. Please refer to the migration guide of Hook in MMEngine for more details.
  • Learning rate and momentum schedules has been migrated from Hook to Parameter Scheduler in MMEngine. Please refer to the migration guide of Parameter Scheduler in MMEngine for more details.

Configs

Dataset

The Dataset classes implemented in MMOCR 1.x all inherits from the BaseDetDataset, which inherits from the BaseDataset in MMEngine. There are several changes of Dataset in MMOCR 1.x.

  • All the datasets support serializing the data list to reduce the memory when multiple workers are built to accelerate data loading.
  • The interfaces are changed accordingly.

Data Transforms

Data transforms in MMOCR 1.x all inherits from those in MMCV>=2.0.0rc0, which follows a new convention in OpenMMLab 2.0 projects.
The changes are listed below:

  • The interfaces are also changed. Please refer to the API Reference
  • The functionalities of some data transforms (e.g., Resize) are decomposed into several transforms.
  • The same data transforms in different OpenMMLab 2.0 libraries have the same augmentation implementation and the logic of the same arguments, i.e., Resize in MMDet 3.x and MMOCR 1.x will resize the image in the exact same manner given the same arguments.

Model

The models in MMOCR 1.x all inherit from BaseModel in MMEngine, which defines a new convention of models in OpenMMLab 2.0 projects. Users can refer to the tutorial of model in MMEngine for more details. Accordingly, there are several changes as the following:

  • The model interfaces, including the input and output formats, are significantly simplified and unified following the new convention in MMOCR 1.x. Specifically, all the input data in training and testing are packed into inputs and data_samples, where inputs contains model inputs like a list of image tensors, and data_samples contains other information of the current data sample such as ground truths and model predictions. In this way, different tasks in MMOCR 1.x can share the same input arguments, which makes the models more general and suitable for multi-task learning.
  • The model has a data preprocessor module, which is used to pre-process the input data of model. In MMOCR 1.x, the data preprocessor usually does the necessary steps to form the input images into a batch, such as padding. It can also serve as a place for some special data augmentations or more efficient data transformations like normalization.
  • The internal logic of model has been changed. In MMOCR 0.x, model used forward_train and simple_test to deal with different model forward logics. In MMOCR 1.x and OpenMMLab 2.0, the forward function has three modes: loss, predict, and tensor for training, inference, and tracing or other purposes, respectively. The forward function calls self.loss(), self.predict(), and self._forward() given the modes loss, predict, and tensor, respectively.

Evaluation

MMOCR 1.x mainly implements corresponding metrics for each task, which are manipulated by Evaluator to complete the evaluation.
In addition, users can build an evaluator in MMOCR 1.x to conduct offline evaluation, i.e., evaluate predictions that may not be produced by MMOCR, prediction follows our dataset conventions. More details can be find in the Evaluation Tutorial in MMEngine.

Visualization

The functions of visualization in MMOCR 1.x are removed. Instead, in OpenMMLab 2.0 projects, we use Visualizer to visualize data. MMOCR 1.x implements TextDetLocalVisualizer, TextRecogLocalVisualizer, and KIELocalVisualizer to allow visualization of ground truths, model predictions, and feature maps, etc., at any place, for the three tasks supported in MMOCR. It also supports dumping the visualization data to any external visualization backends such as Tensorboard and Wandb. Check our Visualization Document for more details.

Improvements

  • Most models enjoy a performance improvement from the new framework and refactor of data transforms. For example, in MMOCR 1.x, DBNet-R50 achieves 0.854 hmean score on ICDAR 2015, while the counterpart can only get 0.840 hmean score in MMOCR 0.x.
  • Support mixed precision training of most of the models. However, the rest models are not supported yet because the operators they used might not be representable in fp16. We will update the documentation and list the results of mixed precision training.

Ongoing changes

  1. Test-time augmentation: which was supported in MMOCR 0.x, is not implemented yet in this version due to limited time slot. We will support it in the following releases with a new and simplified design.
  2. Inference interfaces: unified inference interfaces will be supported in the future to ease the use of released models.
  3. Interfaces of useful tools that can be used in notebook: more useful tools that are implemented in the tools/ directory will have their python interfaces so that they can be used through notebook and in downstream libraries.
  4. Documentation: we will add more design docs, tutorials, and migration gui...
Read more

MMOCR Release v0.6.1

04 Aug 06:03
e5f071a
Compare
Choose a tag to compare

Highlights

  1. ArT dataset is available for text detection and recognition!
  2. Fix several bugs that affects the correctness of the models.
  3. Thanks to MIM, our installation is much simpler now! The docs has been renewed as well.

New Features & Enhancements

Bug Fixes

Docs

New Contributors

Full Changelog: v0.6.0...v0.6.1

MMOCR Release v0.6.0

05 May 14:20
1962c24
Compare
Choose a tag to compare

Highlights

  1. A new recognition algorithm MASTER has been added into MMOCR, which was the championship solution for the "ICDAR 2021 Competition on Scientific Table Image Recognition to Latex"! The model pre-trained on SynthText and MJSynth is available for testing! Credit to @JiaquanYe
  2. DBNet++ has been released now! A new Adaptive Scale Fusion module has been equipped for feature enhancement. Benefiting from this, the new model achieved 2% better h-mean score than its predecessor on the ICDAR2015 dataset.
  3. Three more dataset converters are added: LSVT, RCTW and HierText. Check the dataset zoo (Det & Recog) to explore further information.
  4. To enhance the data storage efficiency, MMOCR now supports loading both images and labels from .lmdb format annotations for the text recognition task. To enable such a feature, the new lmdb_converter.py is ready for use to pack your cropped images and labels into an lmdb file. For a detailed tutorial, please refer to the following sections and the doc.
  5. Testing models on multiple datasets is a widely used evaluation strategy. MMOCR now supports automatically reporting mean scores when there is more than one dataset to evaluate, which enables a more convenient comparison between checkpoints. Doc
  6. Evaluation is more flexible and customizable now. For text detection tasks, you can set the score threshold range where the best results might come out. (Doc) If too many results are flooding your text recognition train log, you can trim it by specifying a subset of metrics in evaluation config. Check out the Evaluation section for details.
  7. MMOCR provides a script to convert the .json labels obtained by the popular annotation toolkit Labelme to MMOCR-supported data format. @Y-M-Y contributed a log analysis tool that helps users gain a better understanding of the entire training process. Read tutorial docs to get started.

Lmdb Dataset

Reading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR now officially supports loading images and labels from lmdb datasets via a new pipeline LoadImageFromLMDB.
This section is intended to serve as a quick walkthrough for you to master this update and apply it to facilitate your research.

Specifications

To better align with the academic community, MMOCR now requires the following specifications for lmdb datasets:

  • The parameter describing the data volume of the dataset is num-samples instead of total_number (deprecated).
  • Images and labels are stored with keys in the form of image-000000001 and label-000000001, respectively.

Usage

  1. Use existing academic lmdb datasets if they meet the specifications; or the tool provided by MMOCR to pack images & annotations into a lmdb dataset.
  • Previously, MMOCR had a function txt2lmdb (deprecated) that only supported converting labels to lmdb format. However, it is quite different from academic lmdb datasets, which usually contain both images and labels. Now MMOCR provides a new utility lmdb_converter to convert recognition datasets with both images and labels to lmdb format.

  • Say that your recognition data in MMOCR's format are organized as follows. (See an example in ocr_toy_dataset).

    # Directory structure
    
    ├──img_path
    |      |—— img1.jpg
    |      |—— img2.jpg
    |      |—— ...
    |——label.txt (or label.jsonl)
    
    # Annotation format
    
    label.txt:  img1.jpg HELLO
                img2.jpg WORLD
                ...
    
    label.jsonl:    {'filename':'img1.jpg', 'text':'HELLO'}
                    {'filename':'img2.jpg', 'text':'WORLD'}
                    ...
    
  • Then pack these files up:

    python tools/data/utils/lmdb_converter.py  {PATH_TO_LABEL} {OUTPUT_PATH} --i {PATH_TO_IMAGES}
  • Check out tools.md for more details.

  1. The second step is to modify the configuration files. For example, to train CRNN on MJ and ST datasets:
  • Set parser as LineJsonParser and file_format as 'lmdb' in dataset config

    # configs/_base_/recog_datasets/ST_MJ_train.py
    train1 = dict(
        type='OCRDataset',
        img_prefix=train_img_prefix1,
        ann_file=train_ann_file1,
        loader=dict(
            type='AnnFileLoader',
            repeat=1,
            file_format='lmdb',
            parser=dict(
                type='LineJsonParser',
                keys=['filename', 'text'],
            )),
        pipeline=None,
        test_mode=False)
  • Use LoadImageFromLMDB in pipeline:

    # configs/_base_/recog_pipelines/crnn_pipeline.py
    train_pipeline = [
        dict(type='LoadImageFromLMDB', color_type='grayscale'),
        ...
  1. You are good to go! Start training and MMOCR will load data from your lmdb dataset.

New Features & Enhancements

Bug Fixes

Read more

MMOCR Release v0.5.0

31 Mar 09:50
0546134
Compare
Choose a tag to compare

Highlights

  1. MMOCR now supports SPACE recognition! (What a prominent feature!) Users only need to convert the recognition annotations that contain spaces from a plain .txt file to JSON line format .jsonl, and then revise a few configurations to enable the LineJsonParser. For more information, please read our step-by-step tutorial.
  2. Tesseract is now available in MMOCR! While MMOCR is more flexible to support various downstream tasks, users might sometimes not be satisfied with DL models and would like to turn to effective legacy solutions. Therefore, we offer this option in mmocr.utils.ocr by wrapping Tesseract as a detector and/or recognizer. Users can easily create an MMOCR object by MMOCR(det=’Tesseract’, recog=’Tesseract’). Credit to @garvan2021
  3. We release data converters for 16 widely used OCR datasets, including multiple scenarios such as document, handwritten, and scene text. Now it is more convenient to generate annotation files for these datasets. Check the dataset zoo ( Det & Recog ) to explore further information.
  4. Special thanks to @EighteenSprings @BeyondYourself @yangrisheng, who had actively participated in documentation translation!

Migration Guide - ResNet

Some refactoring processes are still going on. For text recognition models, we unified the ResNet-like architectures which are used as backbones. By introducing stage-wise and block-wise plugins, the refactored ResNet is highly flexible to support existing models, like ResNet31 and ResNet45, and other future designs of ResNet variants.

Plugin

  • Plugin is a module category inherited from MMCV's implementation of PLUGIN_LAYERS, which can be inserted between each stage of ResNet or into a basicblock. You can find a simple implementation of plugin at mmocr/models/textrecog/plugins/common.py, or click the button below.

    Plugin Example
    @PLUGIN_LAYERS.register_module()
    class Maxpool2d(nn.Module):
        """A wrapper around nn.Maxpool2d().
    
        Args:
            kernel_size (int or tuple(int)): Kernel size for max pooling layer
            stride (int or tuple(int)): Stride for max pooling layer
            padding (int or tuple(int)): Padding for pooling layer
        """
    
        def __init__(self, kernel_size, stride, padding=0, **kwargs):
            super(Maxpool2d, self).__init__()
            self.model = nn.MaxPool2d(kernel_size, stride, padding)
    
        def forward(self, x):
            """
            Args:
                x (Tensor): Input feature map
    
            Returns:
                Tensor: The tensor after Maxpooling layer.
            """
            return self.model(x)

Stage-wise Plugins

  • ResNet is composed of stages, and each stage is composed of blocks. E.g., ResNet18 is composed of 4 stages, and each stage is composed of basicblocks. For each stage, we provide two ports to insert stage-wise plugins by giving plugins parameters in ResNet.

    [port1: before stage] ---> [stage] ---> [port2: after stage]
    
  • E.g. Using a ResNet with four stages as example. Suppose we want to insert an additional convolution layer before each stage, and an additional convolution layer at stage 1, 2, 4. Then you can define the special ResNet18 like this

    resnet18_speical = ResNet(
            # for simplicity, some required
            # parameters are omitted
            plugins=[
                dict(
                    cfg=dict(
                    type='ConvModule',
                    kernel_size=3,
                    stride=1,
                    padding=1,
                    norm_cfg=dict(type='BN'),
                    act_cfg=dict(type='ReLU')),
                    stages=(True, True, True, True),
                    position='before_stage')
                dict(
                    cfg=dict(
                    type='ConvModule',
                    kernel_size=3,
                    stride=1,
                    padding=1,
                    norm_cfg=dict(type='BN'),
                    act_cfg=dict(type='ReLU')),
                    stages=(True, True, False, True),
                    position='after_stage')
            ])
  • You can also insert more than one plugin in each port and those plugins will be executed in order. Let's take ResNet in MASTER as an example:

    Multiple Plugins Example
    • ResNet in Master is based on ResNet31. And after each stage, a module named GCAModule will be used. The GCAModule is inserted before the stage-wise convolution layer in ResNet31. In conlusion, there will be two plugins at after_stage port in the same time.

      resnet_master = ResNet(
                      # for simplicity, some required
                      # parameters are omitted
                      plugins=[
                          dict(
                              cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)),
                              stages=(True, True, False, False),
                              position='before_stage'),
                          dict(
                              cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)),
                              stages=(False, False, True, False),
                              position='before_stage'),
                          dict(
                              cfg=dict(type='GCAModule', kernel_size=3, stride=1, padding=1),
                              stages=[True, True, True, True],
                              position='after_stage'),
                          dict(
                              cfg=dict(
                                  type='ConvModule',
                                  kernel_size=3,
                                  stride=1,
                                  padding=1,
                                  norm_cfg=dict(type='BN'),
                                  act_cfg=dict(type='ReLU')),
                              stages=(True, True, True, True),
                              position='after_stage')
                      ])
  • In each plugin, we will pass two parameters (in_channels, out_channels) to support operations that need the information of current channels.

Block-wise Plugin (Experimental)

  • We also refactored the BasicBlock used in ResNet. Now it can be customized with block-wise plugins. Check here for more details.

  • BasicBlock is composed of two convolution layer in the main branch and a shortcut branch. We provide four ports to insert plugins.

        [port1: before_conv1] ---> [conv1] --->
        [port2: after_conv1] ---> [conv2] --->
        [port3: after_conv2] ---> +(shortcut) ---> [port4: after_shortcut]
    
  • In each plugin, we will pass a parameter in_channels to support operations that need the information of current channels.

  • E.g. Build a ResNet with customized BasicBlock with an additional convolution layer before conv1:

    Block-wise Plugin Example
    resnet_31 = ResNet(
            in_channels=3,
            stem_channels=[64, 128],
            block_cfgs=dict(type='BasicBlock'),
            arch_layers=[1, 2, 5, 3],
            arch_channels=[256, 256, 512, 512],
            strides=[1, 1, 1, 1],
            plugins=[
                dict(
                    cfg=dict(type='Maxpool2d',
                    kernel_size=2,
                    stride=(2, 2)),
                    stages=(True, True, False, False),
                    position='before_stage'),
                dict(
                    cfg=dict(type='Maxpool2d',
                    kernel_size=(2, 1),
                    stride=(2, 1)),
                    stages=(False, False, True, False),
                    position='before_stage'),
                dict(
                    cfg=dict(
                    type='ConvModule',
                    kernel_size=3,
                    stride=1,
                    padding=1,
                    norm_cfg=dict(type='BN'),
                    act_cfg=dict(type='ReLU')),
                    stages=(True, True, True, True),
                    position='after_stage')
            ])

Full Examples

ResNet without plugins
  • ResNet45 is used in ASTER and ABINet without any plugins.

    resnet45_aster = ResNet(
        in_channels=3,
        stem_channels=[64, 128],
        block_cfgs=dict(type='BasicBlock', use_conv1x1='True'),
        arch_layers=[3, 4, 6, 6, 3],
        arch_channels=[32, 64, 128, 256, 512],
        strides=[(2, 2), (2, 2), (2, 1), (2, 1), (2, 1)])
    
    resnet45_abi = ResNet(
        in_channels=3,
        stem_channels=32,
        block_cfgs=dict(type='BasicBlock', use_conv1x1='True'),
        arch_layers=[3, 4, 6, 6, 3],
        arch_channels=[32, 64, 128, 256, 512],
        strides=[2, 1, 2, 1, 1])

...

Read more

MMOCR Release v0.4.1

27 Jan 06:41
a75fc6b
Compare
Choose a tag to compare

Highlights

  1. Visualizing edge weights in OpenSet KIE is now supported! #677
  2. Some configurations have been optimized to significantly speed up the training and testing processes! Don't worry - you can still tune these parameters in case these modifications do not work. #757
  3. Now you can use CPU to train/debug your model! #752
  4. We have fixed a severe bug that causes users unable to call mmocr.apis.test with our pre-built wheels. #667

New Features & Enhancements

Bug Fixes

Docs

New Contributors

Full Changelog: v0.4.0...v0.4.1

MMOCR Release v0.4.0

15 Dec 03:40
af9a625
Compare
Choose a tag to compare

Highlights

  1. We release a new text recognition model - ABINet (CVPR 2021, Oral). With dedicated model design and useful data augmentation transforms, ABINet achieves the best performance on irregular text recognition tasks. Check it out!
  2. We are also working hard to fulfill the requests from our community. OpenSet KIE is one of the achievements, which extends the application of SDMGR from text node classification to node-pair relation extraction. We also provide a demo script to convert WildReceipt to open set domain, though it may not take full advantage of the OpenSet format. For more information, read our tutorial.
  3. APIs of models can be exposed through TorchServe. Docs

Breaking Changes & Migration Guide

Postprocessor

Some refactoring processes are still going on. For all text detection models, we unified their decode implementations into a new module category, POSTPROCESSOR, which is responsible for decoding different raw outputs into boundary instances. In all text detection configs, the text_repr_type argument in bbox_head is deprecated and will be removed in the future release.

Migration Guide: Find a similar line from detection model's config:

text_repr_type=xxx,

And replace it with

postprocessor=dict(type='{MODEL_NAME}Postprocessor', text_repr_type=xxx)),

Take a snippet of PANet's config as an example. Before the change, its config for bbox_head looks like:

    bbox_head=dict(
        type='PANHead',
        text_repr_type='poly',
        in_channels=[128, 128, 128, 128],
        out_channels=6,
        loss=dict(type='PANLoss')),

Afterwards:

    bbox_head=dict(
    type='PANHead',
    in_channels=[128, 128, 128, 128],
    out_channels=6,
    loss=dict(type='PANLoss'),
    postprocessor=dict(type='PANPostprocessor', text_repr_type='poly')),

There are other postprocessors and each takes different arguments. Interested users can find their interfaces or implementations in mmocr/models/textdet/postprocess or through our api docs.

New Config Structure

We reorganized the configs/ directory by extracting reusable sections into configs/_base_. Now the directory tree of configs/_base_ is organized as follows:

_base_
├── det_datasets
├── det_models
├── det_pipelines
├── recog_datasets
├── recog_models
├── recog_pipelines
└── schedules

Most of model configs are making full use of base configs now, which makes the overall structural clearer and facilitates fair comparison across models. Despite the seemingly significant hierarchical difference, these changes would not break the backward compatibility as the names of model configs remain the same.

New Features

Refactoring

Docs

Enhancements

Bug Fixes

Read more

MMOCR Release v0.3.0

25 Aug 08:52
f9d158f
Compare
Choose a tag to compare

Highlights

  1. We add a new text recognition model -- SATRN! Its pretrained checkpoint achieves the best performance over other provided text recognition models. A lighter version of SATRN is also released which can obtain ~98% of the performance of the original model with only 45 MB in size. (@2793145003) #405
  2. Improve the demo script, ocr.py, which supports applying end-to-end text detection, text recognition and key information extraction models on images with easy-to-use commands. Users can find its full documentation in the demo section. (@samayala22, @manjrekarom) #371, #386, #400, #374, #428
  3. Our documentation is reorganized into a clearer structure. More useful contents are on the way! #409, #454
  4. The requirement of Polygon3 is removed since this project is no longer maintained or distributed. We unified all its references to equivalent substitutions in shapely instead. #448

Breaking Changes & Migration Guide

  1. Upgrade version requirement of MMDetection to 2.14.0 to avoid bugs #382
  2. MMOCR now has its own model and layer registries inherited from MMDetection's or MMCV's counterparts. (#436) The modified hierarchical structure of the model registries are now organized as follows.
mmcv.MODELS -> mmdet.BACKBONES -> BACKBONES
mmcv.MODELS -> mmdet.NECKS -> NECKS
mmcv.MODELS -> mmdet.ROI_EXTRACTORS -> ROI_EXTRACTORS
mmcv.MODELS -> mmdet.HEADS -> HEADS
mmcv.MODELS -> mmdet.LOSSES -> LOSSES
mmcv.MODELS -> mmdet.DETECTORS -> DETECTORS
mmcv.ACTIVATION_LAYERS -> ACTIVATION_LAYERS
mmcv.UPSAMPLE_LAYERS -> UPSAMPLE_LAYERS

To migrate your old implementation to our new backend, you need to change the import path of any registries and their corresponding builder functions (including build_detectors) from mmdet.models.builder to mmocr.models.builder. If you have referred to any model or layer of MMDetection or MMCV in your model config, you need to add mmdet. or mmcv. prefix to its name to inform the model builder of the right namespace to work on.

Interested users may check out MMCV's tutorial on Registry for in-depth explanations on its mechanism.

New Features

  • Automatically replace SyncBN with BN for inference #420, #453
  • Support batch inference for CRNN and SegOCR #407
  • Support exporting documentation in pdf or epub format #406
  • Support persistent_workers option in data loader #459

Bug Fixes

  • Remove depreciated key in kie_test_imgs.py #381
  • Fix dimension mismatch in batch testing/inference of DBNet #383
  • Fix the problem of dice loss which stays at 1 with an empty target given #408
  • Fix a wrong link in ocr.py (@naarkhoo) #417
  • Fix undesired assignment to "pretrained" in test.py #418
  • Fix a problem in polygon generation of DBNet #421, #443
  • Skip invalid annotations in totaltext_converter #438
  • Add zero division handler in poly utils, remove Polygon3 #448

Improvements

  • Replace lanms-proper with lanms-neo to support installation on Windows (with special thanks to @gen-ko who has re-distributed this package!)
  • Support MIM #394
  • Add tests for PyTorch 1.9 in CI #401
  • Enables fullscreen layout in readthedocs #413
  • General documentation enhancement #395
  • Update version checker #427
  • Add copyright info #439
  • Update citation information #440

Contributors

We thank @2793145003, @samayala22, @manjrekarom, @naarkhoo, @gen-ko, @duanjiaqi, @gaotongxiao, @cuhk-hbsun, @innerlee, @wdsd641417025 for their contribution to this release!

MMOCR Release v0.2.1

20 Jul 15:30
c1ae3a4
Compare
Choose a tag to compare

Highlights

  1. Upgrade to use MMCV-full >= 1.3.8 and MMDetection >= 2.13.0 for latest features
  2. Add ONNX and TensorRT export tool, supporting the deployment of DBNet, PSENet, PANet and CRNN (experimental) #278, #291, #300, #328
  3. Unified parameter initialization method which uses init_cfg in config files #365

New Features

  • Support TextOCR dataset #293
  • Support Total-Text dataset #266, #273, #357
  • Support grouping text detection box into lines #290, #304
  • Add benchmark_processing script that benchmarks data loading process #261
  • Add SynthText preprocessor for text recognition models #351, #361
  • Support batch inference during testing #310
  • Add user-friendly OCR inference script #366

Bug Fixes

  • Fix improper class ignorance in SDMGR Loss #221
  • Fix potential numerical zero division error in DRRG #224
  • Fix installing requirements with pip and mim #242
  • Fix dynamic input error of DBNet #269
  • Fix space parsing error in LineStrParser #285
  • Fix textsnake decode error #264
  • Correct isort setup #288
  • Fix a bug in SDMGR config #316
  • Fix kie_test_img for KIE nonvisual #319
  • Fix metafiles #342
  • Fix different device problem in FCENet #334
  • Ignore improper tailing empty characters in annotation files #358
  • Docs fixes #247, #255, #265, #267, #268, #270, #276, #287, #330, #355, #367
  • Fix NRTR config #356, #370

Improvements

  • Add backend for resizeocr #244
  • Skip image processing pipelines in SDMGR novisual #260
  • Speedup DBNet #263
  • Update mmcv installation method in workflow #323
  • Add part of Chinese documentations #353, #362
  • Add support for ConcatDataset with two workflows #348
  • Add list_from_file and list_to_file utils #226
  • Speed up sort_vertex #239
  • Support distributed evaluation of KIE #234
  • Add pretrained FCENet on IC15 #258
  • Support CPU for OCR demo #227
  • Avoid extra image pre-processing steps #375

MMOCR Release v0.2.0

18 May 15:24
af3cb8d
Compare
Choose a tag to compare

Highlights

  1. Add the NER approach Bert-softmax (NAACL'2019)
  2. Add the text detection method DRRG (CVPR'2020)
  3. Add the text detection method FCENet (CVPR'2021)
  4. Increase the ease of use via adding text detection and recognition end-to-end demo, and colab online demo.
  5. Simplify the installation.

New Features

Bug Fixes

  • Fix the duplicated point bug due to transform for textsnake #130
  • Fix CTC loss NaN #159
  • Fix error raised if result is empty in demo #144
  • Fix results missing if one image has a large number of boxes #98
  • Fix package missing in dockerfile #109

Improvements

  • Simplify installation procedure via removing compiling #188
  • Speed up panet post processing so that it can detect dense texts #188
  • Add zh-CN README #70 #95
  • Support windows #89
  • Add Colab #147 #199
  • Add 1-step installation using conda environment #193 #194 #195

MMOCR Release v0.1.0

13 Apr 14:05
5244984
Compare
Choose a tag to compare

Main Features

  • Support text detection, text recognition and the corresponding downstream tasks such as key information extraction.
  • For text detection, support both single-step (PSENet, PANet, DBNet, TextSnake) and two-step (MaskRCNN) methods.
  • For text recognition, support CTC-loss based method CRNN; Encoder-decoder (with attention) based methods SAR, Robustscanner; Segmentation based method SegOCR; Transformer based method NRTR.
  • For key information extraction, support GCN based method SDMG-R.
  • Provide checkpoints and log files for all of the methods above.