Skip to content

Rounding errors in to_dataframe #444

Open
@bemoody

Description

@bemoody

The wfdb.Record.to_dataframe function generates a DataFrame from a Record object. The index of the resulting DataFrame is the elapsed or absolute time of each sample.

This code, however, will have significant rounding errors over a long record:

        if self.base_datetime is not None:
            index = pd.date_range(
                start=self.base_datetime,
                periods=self.sig_len,
                freq=pd.Timedelta(seconds=1 / self.fs),
            )
        else:
            index = pd.timedelta_range(
                start=pd.Timedelta(0),
                periods=self.sig_len,
                freq=pd.Timedelta(seconds=1 / self.fs),
            )

For example:

$ python3
>>> import wfdb
>>> r = wfdb.rdrecord('81739927', pn_dir='mimic4wdb/0.1.0/waves/p100/p10014354/81739927')
>>> str(r.base_datetime)
'2148-08-16 09:00:17.566000'
>>> r.fs
62.4725
>>> r.sig_len
6661120
>>> r.to_dataframe()
                             I     II    III      V  aVR     Pleth      Resp
2148-08-16 09:00:17.566000 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
2148-08-16 09:00:17.582007 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
2148-08-16 09:00:17.598014 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
2148-08-16 09:00:17.614021 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
2148-08-16 09:00:17.630028 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
...                         ..    ...    ...    ...  ...       ...       ...
2148-08-17 14:37:22.033805 NaN -0.220 -0.285 -0.025  NaN  0.404297  0.487477
2148-08-17 14:37:22.049812 NaN -0.030  0.005  0.025  NaN  0.396484  0.530238
2148-08-17 14:37:22.065819 NaN -0.065 -0.030 -0.015  NaN  0.386475  0.574832
2148-08-17 14:37:22.081826 NaN -0.265 -0.255 -0.125  NaN  0.375977  0.621258
2148-08-17 14:37:22.097833 NaN -0.550 -0.610 -0.355  NaN  0.366211  0.664020

[6661120 rows x 7 columns]
>>> str(r.get_absolute_time(6661119)
'2148-08-17 14:37:22.384920'

$ wfdbtime -r mimic4wdb/0.1.0/waves/p100/p10014354/81739927/ s6661119
       s6661119    29:37:04.819 [14:37:22.385 17/08/2148]

Here, get_absolute_time is correct to the nearest microsecond and the wfdbtime command is correct to the nearest millisecond. to_dataframe, however, is off by 0.287 seconds.

I think this would be avoided by using start and end arguments to date_range or timedelta_range, rather than using start and freq.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions