Skip to content

sustainable-computing/PrivDiffuser

Repository files navigation

PrivDiffuser

PoPETS arXiv License

This repository contains the implementation of the paper entitled "PrivDiffuser: Privacy-Guided Diffusion Model for Data Obfuscation in Sensor Networks.

Datasets

PrivDiffuser is evaluated on three Human Activity Recognition (HAR) datasets: MotionSense, MobiAct, and WiFi-HAR.

The datasets and the preprocessing script (required for MobiAct) are available at:

Due to the file size limit, we compressed the datasets, pre-trained models, and evaluation models into a zip file (DatasetsAndModels.zip) and uploaded to Google Drive: https://drive.google.com/file/d/1168ZSbA4CjzZ8YLkGr-wE-u-gBVfV9jN/view?usp=sharing

After downloading the zip file, unzip to get 3 folders: eval_models, datasets, and models, move them under the root directory of this repo. The notebook should load the corresponding models and datasets correctly by default.

Setup

We provided requirements.txt for dependencies installed via pip. In addition, we provided environment.yml for conda environments.

Note: the environment.yml is generated for building our Docker image, hence it does NOT install GPU-related packages. You may need to install GPU-enabled PyTorch and TensorFlow if you want to use GPU acceleration.

Docker

We provided a pre-built Docker image that contains our code, datasets, pre-trained models, and all required dependencies to run the code (without GPU acceleration). The Docker image is pre-configured with a base conda environment and will automatically launch Jupyter Lab on port 8889.

You can pull our Docker image from the Docker Hub: docker pull neilyxin/privdiffuser.

Run the Docker image: docker run -it --rm -p 8889:8889 neilyxin/privdiffuser.

You can find the link to the Jupyter Lab with authentication token in the terminal: http://127.0.0.1:8889/lab?token=replace_with_your_token. You can paste this into your browser to open Jupyter Lab. The code base, datasets, and models are located under the default work directory. Open PrivDiffuser.ipynb to run the code.

Note: This Docker image is built for and tested on Ubuntu (20.04). Using this image on other OS or architecture, such as macOS with Apple silicon chips, may require additional setup.

How to Use

The Jupyter notebook contains the code for obfuscating the gender attribute using the MobiAct dataset.

To use a different dataset, change args.dataset to mobi / motion / wifi to use the MobiAct/MotionSense/WiFi-HAR dataset. args.private specifies the private attribute, the default value is gender, change to weight for weight obfuscation used in MobiAct or WiFi-HAR.

Below we list the private attribute(s) supported in each dataset:

Dataset Supported Private Attribute
MobiAct gender, weight
MotionSense gender
WiFi-HAR weight

PrivDiffuser.ipynb: jupyter notebook for running the PrivDiffuser code base.

eval_models: contains pre-trained evaluation models.

models: saves trained models, pre-trained model checkpoints included.

datasets: contains pre-processed datasets for MobiAct, MotionSense, and WiFi-HAR.

dataset_loader.py: contains the code to load the pre-processed datasets. You may need to change the path to your local dataset here.

Reproduce Results on MobiAct

The default configuration in PrivDiffuser.ipynb will load the pre-trained models under the models folder to perform gender obfuscation on the MobiAct dataset. It will generate obfuscated data and evaluate data utility and privacy. Running the default notebook will generate results for PrivDiffuser in Table 1 and Figure 4 (a).

Change self.private = 'gender' into self.private = 'weight' in the Args class, then re-run the notebook to obtain weight obfuscation results on MobiAct, as presented in Table 1 and Figure 4 (b).

Reproduce Results on MotionSense

To reproduce the results on the MotionSense dataset using pre-trained models, as presented in Table 2 and Figure 5, set self.dataset='motion' and self.private='gender' in the Args class, then re-run the notebook.

Reproduce Results on MotionSense

To reproduce the results on the Wifi-HAR dataset using pre-trained models, as presented in Table 3 and Figure 6, set self.dataset='motion' and self.private='gender' in the Args class, then re-run the notebook.

Note: the sampling process can be interrupted if at least one batch of obfuscated data is generated. Running the remaining code after the interruption will report the data obfuscation performance on this generated portion of the test set.

Dependencies

Package Version
python3 3.8.18
datasets 3.0.1
matplotlib 3.3.4
numpy 1.22.0
pandas 1.1.4
pytorch_lightning 2.3.3
scikit_learn 1.5.2
tensorflow 2.11.0
torch 2.2.0
torchvision 0.17.0
tqdm 4.66.1

Acknowledgement

  • guided-diffusion: OpenAI's implementation for guided diffusion models.
  • mine-pytorch: a PyTorch implementation for MINE (Mutual Information Neural Estimation).

About

Code for "PrivDiffuser: Privacy-Guided Diffusion Model for Data Obfuscation in Sensor Networks"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published