Back to list

Datasets for RF Fingerprinting of Bit-similar USRP X310 Radios


Download our datasets:
Please use below links to download the datasets:
Dataset#1: Raw IQ samples of over-the-air transmissions from 16 X310 USRP radios
Dataset#2: Demodulated IQ symbols of over-the-cable transmissions with 16 configurations of IQ imbalances

These datasets were used for the paper "ORACLE: Optimized Radio clAssification through Convolutional neuraL nEtworks" published in INFOCOM 2019. Please use this link to download the paper. Any use of this dataset which results in an academic publication or other publication which includes a bibliography should include a citation to our paper. Here is the reference for the work:

Conference version: PDF
K. Sankhe, M. Belgiovine,F. Zhou, S. Riyaz, S. Ioannidis, and K. R. Chowdhury, "ORACLE: Optimized Radio clAssification through Convolutional neuraL nEtworks,” IEEE INFOCOM 2019, Paris, France, May. 2019.


Extended version: PDF
K. Sankhe, M. Belgiovine,F. Zhou, L. Angioloni, F. Restuccia, S. D’Oro, T. Melodia, S. Ioannidis, and K. R. Chowdhury, "No Radio Left Behind: Radio Fingerprinting Through Deep Learning of Physical-Layer Hardware Impairments,” IEEE Transactions on Cognitive Communications and Networking, Special Issue on Evolution of Cognitive Radio to AI-enabled Radio and Networks, 2019.


Description:
Our proposed RF fingerprinting approach 'ORACLE' detects a unique radio from a large pool of bit-similar devices (same hardware, protocol, physical address, MAC ID) using only IQ samples at the physical layer. ORACLE follows two approaches; 1) it trains a convolutional neural network (CNN) to detect hardware-centric unique signatures (e.g. IQ imbalance, DC offsets, etc.) embedded in transmitter radio chain; and 2) it uses a receiver-feedback to inject modifications in the transmitter chain to perform channel-independent RF fingerprinting. ORACLE achieves 99% classification accuracy for a 16-node USRP X310 SDR testbed and an external database of >100 COTS WiFi devices. To evaluate the performance of ORACLE's deep-learning model, we have created two standard datasets. These datasets can be used by fellow researchers to reproduce the original work or to further explore other machine learning problems in the domain of wireless communication.

Snow

Fig. 1: Deep-learning, such as Convolutional neural network (CNN) to detect unique transmitter-signatures.

Forest

Fig. 2: Use of receiver-feeback to inject impairments (e.g., IQ imbalance, DC offsets) to increase differentiability among radios.


Experimental Setup
ORACLE trains CNN using IQ samples collected from an experimental setup of USRP SDRs, as shown in Fig. 3, with a fixed USRP B210 as the receiver. All transmitters are bit-similar USRP X310 radios that emit IEEE 802.11a standards-compliant frames generated via a MATLAB WLAN System toolbox. The data frames generated contain random payload but have the same address fields, and are then streamed to the selected SDR for over-the-air wireless transmission. The receiver SDR samples the incoming signals at 5 MS/s sampling rate at the center frequency of 2.45 GHz for WiFi. Overall, we collect over 20 million samples for each radio. We conduct the experiments in a more open area which has fewer reflections as shown in Fig.4. The transmitter-receiver separation distance is increased from 2 ft to 62 ft with an interval of 6 ft.
Exp-Setup

Fig. 3: Experimental setup for data collection using SDR

Environment

Fig. 4: Experimental environment: open area with much less reflections

Dataset Description:
We are releasing two datasets a) Dataset #1 : recordings of raw IQ samples collected from over-the-air transmissions of 16 USRP X310 transmitter radios ; b) Dataset #2: recordings of demodulated IQ symbols collected after equalizing over-the-cable transmissions of 16 IQ imbalance configurations. In both the datasets, each recording consists of two files: a metadata file and a dataset file. The dataset file is a binary file of digital samples, and the metadata file contains information that describes the dataset. Our metadata and data format is an extension of, and compatible with the SigMF specifications .
  • Dataset #1 : It consists of recordings of collected raw IQ samples from 16, high-end X310 USRP SDRs with the same B210 radio as a receiver. The recordings are categorized into different folders with folder name "xxft", where xx represents the transmitter-receiver separation distance in feet. Each recording has a dataset file with an extenstion of '.sigmf-data' , and a metadata file with an extension of '.sigmf-meta'. These files are named in a specific format for more intuitive understanding.
    For example, the dataset file "WiFi_air_X310_3123D7B_2ft_run1" represents
    • WiFi : --> IEEE802.11a standard-compliant WLAN frame
    • air :--> medium of transmission
    • X310 :--> the type of USRP radio
    • 3123D7B : --> device serial ID
    • 2ft: --> the transmitter-receiver separation distance in feet
    • run1 : --> the recording number
    • sigmf-data/sigmf-meta : --> the extension of dataset file/metadata file
  • Dataset #2: It consists of recordings of demodulated IQ symbols obtained after equalizing over-the-cable transmission from X310 USRP SDR transmitter and B210 radio as a receiver. To obtain each recording, we use set_iq_balance function in GRC to set a complex correction factor to the transmit chain of the RF daughterboard that intentionally introduces required level of impairments in the radio. Due to intentional IQ imbalance, the demodulated symbols acquire device- and channel- invariant unique characteristics as shown in Fig. 5. This makes the CNN robust to channel changes, i.e., it makes the transmitter hardware dominate channel induced variations.

    Fig. 5. : Patterns generated by 3 impairments on 2 devices under 2 channel conditions. First and second row show the channel- and device- invariance of the patterns respectively.


    Similar to Dataset#1, each recording has a dataset file with an extenstion of '.sigmf-data' , and a metadata file with an extension of '.sigmf-meta'. These files are named in a specific format for more intuitive understanding.
    For example, the dataset file "Demod_WiFi_cable_X310_3123D76_IQ#1_run1" represents
    • Demod_WiFi : --> Demodulated IQ symbols obtained after equalizating raw IQ samples of IEEE802.11a standard-compliant WLAN frame
    • air :--> medium of transmission
    • X310 :--> the type of USRP radio
    • 3123D7B : --> device serial ID
    • IQ#1 : --> IQ imbalance configuration number that introduces a specific level of IQ imbalance in the radio
    • run1 : --> the recording number
    • sigmf-data/sigmf-meta : --> the extension of dataset file/metadata file

SigMF Description:

Global Object

The global object consists of name/value pairs that provide information applicable to the entire dataset. It contains the information that is minimally necessary to open and parse the dataset file, as well as general information about the recording itself. The following names are specified in the core namespace:

name required type description
datatype true string The format of the stored samples in the dataset file.
sample_rate true double The sample rate of the signal in samples per second.
version true string The version of the SigMF specification used to create the metadata file.
sha512 false string The SHA512 hash of the dataset file associated with the SigMF file.
description false string A text description of the SigMF recording.
hw false string A text description of the hardware used to make the recording.
recorder false string The name of the software used to make this SigMF recording.
author false string The author's name

Snapshot

global": {
"core:sha512": "b3ff6b996da344e35762f962893e69a9172367bb1e020bfadf2b245adaad9c2146853ce9657f2c7d619b61d63191fbc1741f481f1ed5d67ee7ddeea0029e9d51",
"core:version": "0.0.1",
"core:author": "Kunal Sankhe",
"core:sample_rate": 5000000.0,
"core:description": "SigMF IQ samples recording of demodulated data derived from over-the-cable WiFi transmissions collected by a fixed USRP B210 as a receiver. The transmitter emitted IEEE 802.11a standards compliant frames generated via a MATLAB WLAN System toolbox. Using UHD software, a controlled level of IQ imbalance is introduced at the runtime such that the demodulated symbols acquire unique characteristics.",
"core:datatype": "cf32"}
}

Captures

As per the SigMF specifications, the captures value is an array of capture segment objects that describe the parameters of the signal capture. It MUST be sorted by the value of each capture segment's core:sample_start key, ascending. The following names are specified in the core namespace:

name required type description
sample_start true uint The sample index in the dataset file at which this segment takes effect.
global_index false double The center frequency of the signal in Hz.
datetime false string An ISO-8601 string indicating the timestamp of the sample index specified by sample_start

Annotations

According to the SigMF specifications, the Annotations value is an array of annotation segment objects that describe anything regarding the signal data not part of the global and captures objects. Each SigMF annotation segment object must contain a core:sample_start name/value pair, which indicates the first index at which the rest of the segment's name/value pairs apply. We have extended the Annotations with genesys namespace
name required type unit description
environment true double N/A A description of the environment where antenna is mounted. E.g. "indoor" or "outdoor".
transmitter_identification false object N/A Transmitter identification parameters. See Transmitter Object definition.
receiver_identification false object N/A Receiver identification parameters. See Transmitter Object definition.
distance false string feet Distance between transmitter and receiver

Transmitter Object

The Transmitter object contains the following name/value pairs:

name required type unit description
model true string N/A Make and model of the transmitter. E.g., "Ettus N210", "Ettus B200", "Keysight N6841A", "Tektronix B206B".
serial_number false string N/A Globally unique identifier
low_frequency false float Hz Low frequency of operational range of the receiver.
high_frequency false float Hz High frequency of operational range of the receiver.
noise_figure false float dB Noise figure of the receiver.
max_power false float dBm Maximum input power of the receiver.
antenna true object N/A See Antenna Object definition.

Receiver Object

The Receiver object contains the following name/value pairs:

name required type unit description
model true string N/A Make and model of the receiver. E.g., "Ettus N210", "Ettus B200", "Keysight N6841A", "Tektronix B206B".
serial_number false string N/A Globally unique identifier
low_frequency false float Hz Low frequency of operational range of the receiver.
high_frequency false float Hz High frequency of operational range of the receiver.
noise_figure false float dB Noise figure of the receiver.
max_power false float dBm Maximum input power of the receiver.
antenna true object N/A See Antenna Object definition.

Antenna object

The Antenna object contains the following name/value pairs:

name required type unit description
model true string N/A Antenna make and model number. E.g. "ARA CSB-16", "L-com HG3512UP-NF".
type false string N/A Antenna type. E.g. "dipole", "biconical", "monopole", "conical monopole".
low_frequency false float Hz Low frequency of operational range.
high_frequency false float Hz High frequency of operational range.