CountFi

WiFi CSI Datasets for Passenger Counting on the Upper-Deck of a Double-Decker Bus in Hong Kong

Overview

CountFi: RSSI-Assisted CSI-Based Passenger Counting with Multiple Wi-Fi Receivers

Authors: Jingtao Guo, Wenhao Zhuang, Yuyi Mao, Ivan Wang-Hei Ho

WCNC 2025

A privacy-preserving approach to passenger counting using Wi-Fi Channel State Information in public transportation

CountFi Wi-Fi CSI dataset example with passengers

Abstract

The CountFi dataset supports research on CSI-based passenger counting with both single and multiple Wi-Fi receivers in the public transportation systems. It provides annotated Wi-Fi CSI data to develop and evaluate sensing algorithms for accurate, privacy-preserving passenger counting in public transportation systems. Unlike traditional camera-based approaches, the Wi-Fi CSI method offers enhanced privacy protection while maintaining accurate counting capabilities through learning of signal propagation characteristics affected by passenger presence and fidgeting.

Methodology

The CountFi dataset was collected using Wi-Fi devices installed in the upper deck of double-decker buses operating on various scenarios in Hong Kong. Channel State Information (CSI) data was captured across different times of day, passenger densities, and bus operating conditions to ensure diversity.

CSI Processing

The raw Wi-Fi CSI data was processed to extract features that correlate with passenger presence and fidgeting. The signal propagation characteristics change based on the number of passengers and their distribution in the bus, allowing for passenger counting estimation.

Annotation Process

Ground truth passenger counts were manually recorded during data collection. These annotations were synchronized with the CSI data timestamps to create labeled data suitable for machine learning model training and evaluation.

Privacy Advantages

Using Wi-Fi CSI data for passenger counting offers significant privacy benefits over traditional camera-based approaches. The CSI data captures only signal propagation characteristics and cannot be used to identify individuals, eliminating privacy concerns associated with visual surveillance.

System Architecture

The CountFi system uses strategically placed Wi-Fi devices to capture Channel State Information (CSI) data in the upper deck of double-decker buses. The system consists of:

  • Wi-Fi Transmitters/Receivers: Capture CSI data as signals propagate through the bus environment
  • Data Collection Units: Process and store raw CSI measurements
  • Feature Fusion Module: Leverage RSSI features to enhance the accuracy of passenger counting
  • Passenger Count Estimation: Uses deep learning models to estimate the number of passengers and visualize the results in a web interface

The entire system operates in real-time, providing privacy-preserving passenger counts without capturing any personally identifiable information.

System architecture and components

CountFi system architecture showing interconnected components

Dataset

CSI Samples

3,230,000

Max No. People

20

Max No. Bus Stops

5

Dataset Features

  • Real-world conditions: Captured in actual double-decker buses in Hong Kong
  • Diverse scenarios: Various bus conditions, passenger densities, and seating arrangements
  • Multiple collection periods: Data collected across different dates in 2023 and 2025
  • Privacy-preserving: Using CSI data derived from Wi-Fi signals to protect passenger identity

Dataset Structure

The CountFi dataset comprises three distinct collection periods, each designed to capture different aspects of passenger counting scenarios in double-decker buses:

February 6-7, 2025

Stationary Only
  • Scenario type: Stationary bus (stopped) scenarios only
  • Collection locations: Front, middle, and end sections of the upper deck
  • File naming convention: Files are denoted with suffix _stop
  • Purpose: To establish baseline CSI patterns in controlled static environments

April 8, 2025

Mixed Scenarios
  • Scenario types: Both moving and stationary bus scenarios
  • Collection locations: Front, middle, and end sections of the upper deck
  • File naming convention: Files are denoted with suffix _xm (where x represents relevant sections, for example, _fm represents data collected from front section of the upper deck)
  • Purpose: To capture CSI variations under dynamic conditions with passenger movement

June 13, 2023

Engine Off
  • Scenario type: Highly controlled stationary scenario
  • Special condition: Bus engine turned off to eliminate vibration interference
  • Purpose: To isolate passenger presence effects from vehicle-induced signal variations
  • Applications: Ideal for algorithm development requiring clean baseline signals

All three datasets include CSI data with corresponding ground truth passenger counts, enabling comprehensive analysis across different operational conditions.

Examples

Our dataset includes various passenger occupancy scenarios to enable robust model training and testing. Below are two representative examples showing different passenger densities:

Medium occupancy scenario with 9 passengers

Medium Occupancy

This scenario features around 9 passengers in the upper deck of a double-decker bus, representing a medium occupancy case. The passengers are distributed throughout the seating area, providing varied signal paths for the Wi-Fi CSI data. This distribution allows algorithms to learn patterns associated with medium passenger density.

High occupancy scenario with 20 passengers

High Occupancy

This scenario features around 20 passengers in the upper deck of a double-decker bus, representing a high occupancy case that challenges counting algorithms. Wi-Fi CSI patterns change significantly as more passengers enter the bus, providing rich data for model training and evaluation.

Benchmarks

We provide baseline results using several state-of-the-art methods for multiple receiver Wi-Fi CSI-based passenger counting on our June 13, 2023 dataset:

Method Accuracy F1-Score GFLOPs
Direct Prob Avg 90.99 90.98 3.44
Re-weighted CSI Prob Avg 91.45 91.39 3.44
CSI Feature Concatenation Training 92.59 92.49 3.40
Adaptive RSSI-weighted CSI Feature Concatenation 94.86 94.83 3.41

GFLOPs: Giga floating point of operations.

For more details on benchmark methodology and evaluation metrics, please refer to our paper.

Download

The CountFi dataset is available for research purposes. Each collection period offers unique characteristics as described in the Dataset Structure section. Here we provide raw dataset, allowing researchers to process the data as they wish.

Usage Policy

By downloading and using this dataset, you agree to:

  1. Use the dataset for non-commercial research purposes only
  2. Properly cite our paper in any publications that use this dataset
  3. Share or redistribute the dataset only under the same license terms (CC BY-NC 4.0) with proper attribution

Citation

@inproceedings{guo2025rssi,
                    title={RSSI-Assisted CSI-Based Passenger Counting with Multiple Wi-Fi Receivers},
                    author={Guo, Jingtao and Zhuang, Wenhao and Mao, Yuyi and Ho, Ivan Wang-Hei},
                    booktitle={2025 IEEE Wireless Communications and Networking Conference (WCNC)},
                    pages={1--6},
                    year={2025},
                    organization={IEEE}
                  }
                

Acknowledgement

This work was supported in part by the Smart Traffic Fund (Project No. PSRI/31/2202/PR) established under the Transport Department of the Hong Kong Special Administrative Region (HKSAR), China. We thank all volunteers for their participations.

License

Attribution-NonCommercial 4.0 International

This dataset is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt the material for non-commercial purposes, provided you give appropriate credit.