Skip to content

Latest commit

 

History

History
137 lines (111 loc) · 6.07 KB

DATASET_ZOO.md

File metadata and controls

137 lines (111 loc) · 6.07 KB

Dataset Zoo

We provide several relevant datasets for training and evaluating the Detectron2DeepsortPlus models. Annotations are provided in a unified format. If you want to use these datasets, please follow their licenses, and if you use any of these datasets in your research, please cite the original work.

Data Format

  1. MICARehab

    The MICARehab dataset has the following structure:

    MICARehab 
    ├── GH010383_8_3221_3956_2/
    │   ├── 0000.png
    │   ├── ...
    │   ├── 000N.png
    │   ├── GH010383_8_3221_3956_2.avi
    │   ├── gt
    │   │   └── gt.txt
    │   ├── seqinfo.ini
    │   ├── via_export_json.json
    ├── ...
    ├── GH0XXXX_X_XXXX_XXXX_X/
    

    This dataset contains 32 videos labelled via VIA.

    Every video has a corresponding annotation text 'via_export_json.json', both are in the same folder.

    Frames from each video are also available in the same folder. The 'gt.txt' contains groundtruth in the MOT16 format.

    We provide 2 ways to access labels:

    • VIA format: 'via_export_json.json'

      Annotation of each image is as follow:

      {
      "0000.png1585642": {
        "filename": "0000.png",
        "size": 1585642,
        "regions": [
          {
            "shape_attributes": {
              "name": "polygon",
              "all_points_x": [ 1248, 1915, 1915, 1248 ],
              "all_points_y": [ 1089, 1089, 1430, 1430 ]
            },
            "region_attributes": { "category_id": "1" }
          }
        ],
        "file_attributes": {}
      },
    
    

    For detection labels, a bounding box can be represented by the 2 points: top-left and bottom-right. Top-left coordinate is (x_min, y_min) while bottom-right is (x_max, y_max) from the "all_points_x" and "all_points_y".

    For tracking labels, id of a hand is the "category_id".

    Code for visualizing the groundtruth in this way is at visualize_gt.py.

    For example,

    python visualize_gt.py --input /path/to/GH010373_6_3150_4744/ --display True --out_vid out_vid.avi
    

    will show the groundtruth for the input video and also save this to out_vid.avi as follow:

    The white polygon can be a rectangle or sometime a true polygon represented for the hand's mask as follow:

    Full 32 original quality groundtruth videos are uploaded to this youtube link.

    • MOT16 format: 'gt.txt'.

    This is saved in simple comma-separated value (CSV) files. Each line represents one object instance and contains 9 values.

    Position Name Description
    1 Frame number Indicate at which frame the object is present
    2 Identity number Each hand trajectory is identified by a unique ID
    3 Bounding box left Coordinate of the top-left corner of the pedestrian bounding box
    4 Bounding box top Coordinate of the top-left corner of the pedestrian bounding box
    5 Bounding box width Width in pixels of the hand bounding box
    6 Bounding box height Height in pixels of the hand bounding box
    7 Confidence score It acts as a flag whether the entry is to be considered (1) or ignored (0)
    8 Class Indicates the type of object annotated. In this thesis, always (1)
    9 Visibility Visibility ratio, a number between 0 and 1 that says how much of the hand is visible.

    An example of such an annotation file is:

    1, 1, 1672, 763, 248, 245, 1, 1 ,1
    1, 2, 1253, 426, 156, 200, 1, 1 ,1
    2, 1, 1668, 774, 252, 234, 1, 1, 1
    

    In this case, there are 2 hand in the first frame of the sequence, with identity tags 1, 2 and 1 hand in the second frame with identity tags 1.

    This format is useful for evaluating MOT16 metrics with py-motmetrics.

    More detail can be found in the paper:

    @INPROCEEDINGS{9642078,
    author={Pham, Van-Tien and Tran, Thanh-Hai and Vu, Hai},
    booktitle={2021 RIVF International Conference on Computing and Communication Technologies (RIVF)},
    title={Detection and tracking hand from FPV: benchmarks and challenges on rehabilitation exercises dataset},
    year={2021},
    volume={},
    number={},
    pages={1-6},
    doi={10.1109/RIVF51545.2021.9642078}}
    
  2. EgoHands

    @InProceedings{Bambach_2015_ICCV,
    author = {Bambach, Sven and Lee, Stefan and Crandall, David J. and Yu, Chen},
    title = {Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions},
    booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
    month = {December},
    year = {2015}
    }
  1. Georgia Tech Egocentric Activity Datasets
   @misc{li2020eye,
         title={In the Eye of the Beholder: Gaze and Actions in First Person Video}, 
         author={Yin Li and Miao Liu and James M. Rehg},
         year={2020},
         eprint={2006.00626},
         archivePrefix={arXiv},
         primaryClass={cs.CV}
   }