In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation

Dahyun Kang Minsu Cho

This repo is the official implementation of the ECCV 2024 paper In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation

Conda installation command

conda env create -f environment.yml --prefix $YOURPREFIX

$YOUPREFIX is typically /home/$USER/anaconda3

Dependencies

This repo is built on CLIP, SCLIP, and MMSegmentation.

mim install mmcv==2.0.1 mmengine==0.8.4 mmsegmentation==1.1.1
pip install ftfy regex yapf==0.40.1

Dataset preparation

Please make it compatible with Pascal VOC 2012, Pascal Context, COCO stuff 164K, COCO object, ADEChallengeData2016, and Cityscapes following the MMSeg data preparation. The COCO-Object dataset can be converted from COCO-Stuff164k by executing the following command:

python datasets/cvt_coco_object.py PATH_TO_COCO_STUFF164K -o PATH_TO_COCO164K

Place them under $yourdatasetroot/ directory such that:

    $yourdatasetroot/
    ├── ADEChallengeData2016/
    │   ├── annotations/
    │   ├── images/
    │   ├── ...
    ├── VOC2012/
    │   ├── Annotations/
    │   ├── JPEGImages/
    │   ├── ...
    ├── coco_stuff164k/
    │   ├── annotations/
    │   ├── images/
    │   ├── ...
    ├── Cityscapes/
    │   ├── gtFine/
    │   ├── leftImg8bit/
    │   ├── ...
    ├── ...

1) Panoptic Cut for unsupervised object mask discovery

cd panoptic_cut
python predict.py \
    --logs panoptic_cut \
    --dataset {coco_object, coco_stuff, ade20k, voc21, voc20, context60, context59, cityscapes} \
    --datasetroot $yourdatasetroot

The checkpoints for the panoptic mask discovery is found below google drive:

mask prediction root after stage 1)	benchmark id	Google drive link
coco_stuff164k	coco_object, coco_stuff164k	link to download (84.5 MB)
VOC2012	context59, context60, voc20, voc21	link to download (66.7 MB)
ADEChallengeData2016	ade20k	link to download (29.4 MB)
Cityscapes	cityscapes	link to download (23.1 MB)

Place them under lavg/panoptic_cut/pred/ directory such that:

    lavg/panoptic_cut/pred/panoptic_cut/
    ├── ADEChallengeData2016/
    │   ├── ADE_val_00000001.pth
    │   ├── ADE_val_00000002.pth
    │   ├── ...
    ├── VOC2012/
    │   ├── 2007_000033.pth
    │   ├── 2007_000042.pth
    │   ├── ...
    ├── coco_stuff164k/
    │   ├── 000000000139.pth
    │   ├── 000000000285.pth
    │   ├── ...
    ├── Cityscapes/
    │   ├── frankfurt_000000_000294_leftImg8bit.pth
    │   ├── ...

2) Visual grounding & Segmentation evaluation

Update $yourdatasetroot in configs/cfg_*.py

cd lavg
python eval.py --config ./configs/{cfg_context59/cfg_context60/cfg_voc20/cfg_voc21}.py --maskpred_root VOC2012/panoptic_cut
python eval.py --config ./configs/cfg_ade20k.py --maskpred_root ADEChallengeData2016/panoptic_cut
python eval.py --config ./configs/{cfg_coco_object/cfg_coco_stuff164k}.py --maskpred_root coco_stuff164k/panoptic_cut
python eval.py --config ./configs/cfg_city_scapes.py --maskpred_root Cityscapes/panoptic_cut

The run is a single-GPU compatible.

Quantitative performance (mIoU, %) on open-vocabulary semantic segmentation benchmarks

	With background category			Without background category
Method	VOC21	Context60	COCO-obj	VOC20	Context59	ADE	COCO-stuff	Cityscapes
LaVG	62.1	31.6	34.2	82.5	34.7	15.8	23.2	26.2

Related repos

Our project refers to and heavily borrows some the codes from the following repos:

Acknowledgements

This work was supported by Samsung Electronics (IO201208-07822-01), the NRF grant (NRF-2021R1A2C3012728 (45%), and the IITP grants (RS-2022-II220959: Few-Shot Learning of Causal Inference in Vision and Language for Decision Making (50%), RS-2019-II191906: AI Graduate School Program at POSTECH (5%)) funded by Ministry of Science and ICT, Korea. We also thank Sua Choi for her helpful discussion.

BibTex source

If you find our code or paper useful, please consider citing our paper:

@inproceedings{kang2024lazy,
  title={In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation},
  author={Kang, Dahyun and Cho, Minsu},
  booktitle={European Conference on Computer Vision},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
clip		clip
configs		configs
datasets		datasets
panoptic_cut		panoptic_cut
prompts		prompts
.gitignore		.gitignore
README.md		README.md
clip_segmentor.py		clip_segmentor.py
cog.yaml		cog.yaml
custom_datasets.py		custom_datasets.py
environment.yml		environment.yml
eval.py		eval.py
pamr.py		pamr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation

Dahyun Kang Minsu Cho

Conda installation command

Dependencies

Dataset preparation

1) Panoptic Cut for unsupervised object mask discovery

2) Visual grounding & Segmentation evaluation

Quantitative performance (mIoU, %) on open-vocabulary semantic segmentation benchmarks

Related repos

Acknowledgements

BibTex source

About

Releases

Packages

Languages

dahyun-kang/lavg

Folders and files

Latest commit

History

Repository files navigation

In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation

Dahyun Kang Minsu Cho

Conda installation command

Dependencies

Dataset preparation

1) Panoptic Cut for unsupervised object mask discovery

2) Visual grounding & Segmentation evaluation

Quantitative performance (mIoU, %) on open-vocabulary semantic segmentation benchmarks

Related repos

Acknowledgements

BibTex source

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages