[BUG] RandomUnderSampler throws errors if pandas DataFrame has timestamps #970

tomateit · 2023-02-19T08:30:12Z

Describe the bug

RandomUnderSampler performs checks on X argument, which are unnecessary, as they do not affect the choice of resampled indices.
This is an issue if I pass pandas DataFrame.
The exception is not risen if I pass a numpy object with timestamps.

Steps/Code to Reproduce

from datetime import datetime
import pandas as pd

df = pd.DataFrame({"label": [0,0,0,1], "td": [datetime.now()]*4})
rus = imblearn.under_sampling.RandomUnderSampler(random_state=2342374)
rus.fit_resample(df, df.label)

Expected Results

No error is thrown.

Actual Results

TypeError: The DType <class 'numpy.dtype[int64]'> could not be promoted by <class 'numpy.dtype[datetime64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[int64]'>, <class 'numpy.dtype[datetime64]'>)

Versions

Linux-5.15.0-60-generic-x86_64-with-glibc2.35
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
NumPy 1.24.1
SciPy 1.9.3
Scikit-Learn 1.2.1
Imbalanced-Learn 0.10.0

My current workaround

from datetime import datetime
import pandas as pd

df = pd.DataFrame({"label": [0,0,0,1], "td": [datetime.now()]*4})

rus = imblearn.under_sampling.RandomUnderSampler(random_state=2342374)

downsabpled_df, _ = rus.fit_resample(df.to_numpy(), df.label)
downsabpled_df = pd.DataFrame(downsabpled_df, columns=df.columns)

P.S. Huge thanks for this useful library.

The text was updated successfully, but these errors were encountered:

glemaitre · 2023-03-10T11:24:05Z

The proposal is reasonable, we just have to check if it is possible to be compatible with check_estimator from scikit-learn. A PR would be welcome.

glemaitre mentioned this issue Jul 7, 2023

ENH allow any dtype in input from RandomSampler #1004

Merged

glemaitre closed this as completed in #1004 Jul 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] RandomUnderSampler throws errors if pandas DataFrame has timestamps #970

[BUG] RandomUnderSampler throws errors if pandas DataFrame has timestamps #970

tomateit commented Feb 19, 2023 •

edited

Loading

glemaitre commented Mar 10, 2023

Uh oh!

[BUG] RandomUnderSampler throws errors if pandas DataFrame has timestamps #970

[BUG] RandomUnderSampler throws errors if pandas DataFrame has timestamps #970

Comments

tomateit commented Feb 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

My current workaround

glemaitre commented Mar 10, 2023

Uh oh!

tomateit commented Feb 19, 2023 •

edited

Loading