Robust Multi-tab Website Fingerprinting Attacks in the Wild

This repository contains the source code and datasets for our paper "Robust Multi-tab Website Fingerprinting Attacks in the Wild" (Published in IEEE S&P 2023).

If you want to cite the repo, you can use our paper.

@INPROCEEDINGS {multitab-wf-datasets,
author = {X. Deng and Q. Yin and Z. Liu and X. Zhao and Q. Li and M. Xu and K. Xu and J. Wu},
booktitle = {2023 IEEE Symposium on Security and Privacy (SP)},
title = {Robust Multi-tab Website Fingerprinting Attacks in the Wild},
year = {2023},
}

[News] We release a website fingerprint attack library (link) that includes implementations of 11 advanced DL-based WF attacks.

Prerequisites

We prototype attacks using Pytorch 2.0.1 and Python 3.8. For convenience, we recommend running the following command.

conda create --name <env> --file requirements.txt

Datasets

We collect our Tor browsing datasets under the real multi-tab scenario. You can download the dataset via the link.

You can load the dataset using numpy.

import numpy as np

inpath = "example.npz"
data = np.load(inpath)
dir_array = data["direction"]  # Sequence of packet direction
time_array = data["time"] # Sequence of packet timestamps
label = data["label"]  # labels

Note that we have recently improved the quality of our dataset. Specifically, during traffic collection, we used xvfbwrapper to retain screenshots of the websites after loading. We built a new image classification model based on ResNet, which effectively filters out failed website traffic using screenshots. We will report the latest experimental results in the extended journal version (under review).

Usage

Prepare Data

Download datasets and place it in the folder ./datasets
Divide the dataset into training, validation, and test sets. For example, for the 2-tab dataset collected in the closed-world, you can execute the following command.

python scripts/dataset_split.py -i datasets/closed_2tab.npz -o datasets/processed/closed_2tab

Training

We take the training of ARES on a 2-tab dataset in the closed-world as an example.

python train.py -d closed-2tab -g 0 -l ARES

Training separate models for each website is costly. We use torch.nn.MultiLabelSoftMarginLoss to achieve a similar effect. This loss function creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy. In our study, compared to the original ARES, using this approximate calculation results in a 1-2% performance loss.

Specifically, you can use TensorBoard to visualize the training process.

tensorboard --logdir=runs

Note that, benefiting from the Transformer architecture, ARES's performance gradually improves with an increase in epochs, even experiencing slight improvements beyond 500 epochs.

Evaluation

We take the evaluation of ARES on a 2-tab dataset in the closed-world as an example.

python eval.py -d closed_2tab -g 0 -m ARES

You can directly download the trained ARES parameter file (with the random seed set to 1018) link.

Contact

If you have any questions or suggestions, feel free to contact:

Xinhao Deng (dengxh23@mails.tsinghua.edu.cn)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Robust Multi-tab Website Fingerprinting Attacks in the Wild

Prerequisites

Datasets

Usage

Prepare Data

Training

Evaluation

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Robust Multi-tab Website Fingerprinting Attacks in the Wild

Prerequisites

Datasets

Usage

Prepare Data

Training

Evaluation

Contact