Skip to content

Code for the paper "Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos"

License

Notifications You must be signed in to change notification settings

fubel/stmodeling

Repository files navigation

CNN-based Spatio-Temporal Modeling

Paper / Video

Pytorch implementation for the paper "Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos". In this work, different 'Spatiotemporal Modeling Blocks' are analyzed for the architecture illustrated at the above below.

Maintainers: Okan Köpüklü and Fabian Herzog

The structure was inspired by the project TRN-pytorch

Results and Pretrained Models

The pretrained models can be found in our Google Drive.

Setup

Clone the repo with the following command:

git clone git@github.com:fubel/stmodeling.git

Setup in virtual environment

The project requirements can be found in the file requirements.txt. To run the code, create a Python >= 3.6 virtual environment and install the requirements with

pip install -r requirements.txt

NOTE: This project assumes that you have a GPU with CUDA support.

Dataset Preparation

Download the jester dataset or something-something-v2 dataset. Decompress them into the same folder and use process_dataset.py to generate the index files for train, val, and test split. Poperly set up the train, validatin, and category meta files in datasets_video.py. To convert the something-something-v2 dataset, you can use the extract_frames.py from TRN-pytorch.

Assume the structure of data directories is the following:

~/stmodeling/
   datasets/
      jester/
         rgb/
            .../ (directories of video samples for Jester)
                .../ (jpg color frames)
      something/
         rgb/    
            .../ (directories of video samples for Something-Something)
    model/
       .../(saved models for the last checkpoint and best model)

Running the Code

Currently the following ST Modeling blocks are implemented:

  • MLP
  • TRNmiltiscale
  • RNN_TANH
  • RNN_RELU
  • LSTM
  • GRU
  • BLSTM
  • FCN

Furthermore, the following backbone feature extractors are implemented:

  • squeezenet1_1
  • BNInception

Followings are some examples for training under different scenarios:

  • Train 8-segment network for Jester with MLP and squeeznet backbone
python main.py jester RGB --arch squeezenet1_1 --num_segments 8 \
--consensus_type MLP --batch-size 16
  • Train 16-segment network for Something-Something with TRN-multiscale and BNInception backbone
python main.py something RGB --arch BNInception --num_segments 16 \ 
--consensus_type TRNmultiscale --batch-size 16

Reference

@inproceedings{kopuklu2021comparative,
  title={Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos},
  author={K{\"o}p{\"u}kl{\"u}, Okan and Herzog, Fabian and Rigoll, Gerhard},
  booktitle={International Conference on Pattern Recognition},
  pages={186--202},
  year={2021},
  organization={Springer}
}

Acknowledgement

This project was build on top of TRN-pytorch, which itself was build on top of TSN-Pytorch. We thank the authors for sharing their code publicly.

About

Code for the paper "Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages