Skip to content

Commit 54a7b5b

Browse files
authored
MAINT add support for feature_names_in_ (#959)
1 parent ad71707 commit 54a7b5b

21 files changed

+436
-3
lines changed

doc/whats_new/v0.10.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,10 @@ Compatibility
2222
- Add support for automatic parameters validation as in scikit-learn >= 1.2.
2323
:pr:`955` by :user:`Guillaume Lemaitre <glemaitre>`.
2424

25+
- Add support for `feature_names_in_` as well as `get_feature_names_out` for
26+
all samplers.
27+
:pr:`959` by :user:`Guillaume Lemaitre <glemaitre>`.
28+
2529
Deprecation
2630
...........
2731

imblearn/base.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@
88

99
import numpy as np
1010
from sklearn.base import BaseEstimator
11+
12+
try:
13+
# scikit-learn >= 1.2
14+
from sklearn.base import OneToOneFeatureMixin
15+
except ImportError:
16+
from sklearn.base import _OneToOneFeatureMixin as OneToOneFeatureMixin
1117
from sklearn.preprocessing import label_binarize
1218
from sklearn.utils.multiclass import check_classification_targets
1319

@@ -133,7 +139,7 @@ class attribute, which is a dictionary `param_name: list of constraints`. See
133139
)
134140

135141

136-
class BaseSampler(SamplerMixin, _ParamsValidationMixin):
142+
class BaseSampler(SamplerMixin, OneToOneFeatureMixin, _ParamsValidationMixin):
137143
"""Base class for sampling algorithms.
138144
139145
Warning: This class should not be used directly. Use the derive classes
@@ -260,6 +266,12 @@ class FunctionSampler(BaseSampler):
260266
261267
.. versionadded:: 0.9
262268
269+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
270+
Names of features seen during `fit`. Defined only when `X` has feature
271+
names that are all strings.
272+
273+
.. versionadded:: 0.10
274+
263275
See Also
264276
--------
265277
sklearn.preprocessing.FunctionTransfomer : Stateless transformer.

imblearn/combine/_smote_enn.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,12 @@ class SMOTEENN(BaseSampler):
6767
6868
.. versionadded:: 0.9
6969
70+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
71+
Names of features seen during `fit`. Defined only when `X` has feature
72+
names that are all strings.
73+
74+
.. versionadded:: 0.10
75+
7076
See Also
7177
--------
7278
SMOTETomek : Over-sample using SMOTE followed by under-sampling removing

imblearn/combine/_smote_tomek.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,12 @@ class SMOTETomek(BaseSampler):
6666
6767
.. versionadded:: 0.9
6868
69+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
70+
Names of features seen during `fit`. Defined only when `X` has feature
71+
names that are all strings.
72+
73+
.. versionadded:: 0.10
74+
6975
See Also
7076
--------
7177
SMOTEENN : Over-sample using SMOTE followed by under-sampling using Edited

imblearn/metrics/pairwise.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,17 @@ class ValueDifferenceMetric(BaseEstimator, _ParamsValidationMixin):
7171
List of length `n_features` containing the conditional probabilities
7272
for each category given a class.
7373
74+
n_features_in_ : int
75+
Number of features in the input dataset.
76+
77+
.. versionadded:: 0.10
78+
79+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
80+
Names of features seen during `fit`. Defined only when `X` has feature
81+
names that are all strings.
82+
83+
.. versionadded:: 0.10
84+
7485
See Also
7586
--------
7687
sklearn.neighbors.DistanceMetric : Interface for fast metric computation.

imblearn/over_sampling/_adasyn.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,12 @@ class ADASYN(BaseOverSampler):
7373
7474
.. versionadded:: 0.9
7575
76+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
77+
Names of features seen during `fit`. Defined only when `X` has feature
78+
names that are all strings.
79+
80+
.. versionadded:: 0.10
81+
7682
See Also
7783
--------
7884
SMOTE : Over-sample using SMOTE.

imblearn/over_sampling/_random_over_sampler.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,12 @@ class RandomOverSampler(BaseOverSampler):
7676
7777
.. versionadded:: 0.9
7878
79+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
80+
Names of features seen during `fit`. Defined only when `X` has feature
81+
names that are all strings.
82+
83+
.. versionadded:: 0.10
84+
7985
See Also
8086
--------
8187
BorderlineSMOTE : Over-sample using the borderline-SMOTE variant.

imblearn/over_sampling/_smote/base.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,12 @@ class SMOTE(BaseSMOTE):
264264
265265
.. versionadded:: 0.9
266266
267+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
268+
Names of features seen during `fit`. Defined only when `X` has feature
269+
names that are all strings.
270+
271+
.. versionadded:: 0.10
272+
267273
See Also
268274
--------
269275
SMOTENC : Over-sample using SMOTE for continuous and categorical features.
@@ -442,6 +448,12 @@ class SMOTENC(SMOTE):
442448
443449
.. versionadded:: 0.9
444450
451+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
452+
Names of features seen during `fit`. Defined only when `X` has feature
453+
names that are all strings.
454+
455+
.. versionadded:: 0.10
456+
445457
See Also
446458
--------
447459
SMOTE : Over-sample using SMOTE.
@@ -759,6 +771,12 @@ class SMOTEN(SMOTE):
759771
760772
.. versionadded:: 0.9
761773
774+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
775+
Names of features seen during `fit`. Defined only when `X` has feature
776+
names that are all strings.
777+
778+
.. versionadded:: 0.10
779+
762780
See Also
763781
--------
764782
SMOTE : Over-sample using SMOTE.

imblearn/over_sampling/_smote/cluster.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,12 @@ class KMeansSMOTE(BaseSMOTE):
9393
9494
.. versionadded:: 0.9
9595
96+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
97+
Names of features seen during `fit`. Defined only when `X` has feature
98+
names that are all strings.
99+
100+
.. versionadded:: 0.10
101+
96102
See Also
97103
--------
98104
SMOTE : Over-sample using SMOTE.

imblearn/over_sampling/_smote/filter.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,12 @@ class BorderlineSMOTE(BaseSMOTE):
100100
101101
.. versionadded:: 0.9
102102
103+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
104+
Names of features seen during `fit`. Defined only when `X` has feature
105+
names that are all strings.
106+
107+
.. versionadded:: 0.10
108+
103109
See Also
104110
--------
105111
SMOTE : Over-sample using SMOTE.
@@ -352,6 +358,12 @@ class SVMSMOTE(BaseSMOTE):
352358
353359
.. versionadded:: 0.9
354360
361+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
362+
Names of features seen during `fit`. Defined only when `X` has feature
363+
names that are all strings.
364+
365+
.. versionadded:: 0.10
366+
355367
See Also
356368
--------
357369
SMOTE : Over-sample using SMOTE.

imblearn/tests/test_common.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
# Christos Aridas
44
# License: MIT
55

6+
import warnings
67
from collections import OrderedDict
78

89
import numpy as np
@@ -19,6 +20,7 @@
1920
from imblearn.under_sampling import NearMiss, RandomUnderSampler
2021
from imblearn.utils.estimator_checks import (
2122
_set_checking_parameters,
23+
check_dataframe_column_names_consistency,
2224
check_param_validation,
2325
parametrize_with_checks,
2426
)
@@ -92,3 +94,17 @@ def test_strategy_as_ordered_dict(Sampler):
9294
X_res, y_res = sampler.fit_resample(X, y)
9395
assert X_res.shape[0] == sum(strategy.values())
9496
assert y_res.shape[0] == sum(strategy.values())
97+
98+
99+
@pytest.mark.parametrize(
100+
"estimator", _tested_estimators(), ids=_get_check_estimator_ids
101+
)
102+
def test_pandas_column_name_consistency(estimator):
103+
_set_checking_parameters(estimator)
104+
with ignore_warnings(category=(FutureWarning)):
105+
with warnings.catch_warnings(record=True) as record:
106+
check_dataframe_column_names_consistency(
107+
estimator.__class__.__name__, estimator
108+
)
109+
for warning in record:
110+
assert "was fitted without feature names" not in str(warning.message)

imblearn/under_sampling/_prototype_generation/_cluster_centroids.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,12 @@ class ClusterCentroids(BaseUnderSampler):
7878
7979
.. versionadded:: 0.9
8080
81+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
82+
Names of features seen during `fit`. Defined only when `X` has feature
83+
names that are all strings.
84+
85+
.. versionadded:: 0.10
86+
8187
See Also
8288
--------
8389
EditedNearestNeighbours : Under-sampling by editing samples.

imblearn/under_sampling/_prototype_selection/_condensed_nearest_neighbour.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,12 @@ class CondensedNearestNeighbour(BaseCleaningSampler):
6969
7070
.. versionadded:: 0.9
7171
72+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
73+
Names of features seen during `fit`. Defined only when `X` has feature
74+
names that are all strings.
75+
76+
.. versionadded:: 0.10
77+
7278
See Also
7379
--------
7480
EditedNearestNeighbours : Undersample by editing samples.

imblearn/under_sampling/_prototype_selection/_edited_nearest_neighbours.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,12 @@ class EditedNearestNeighbours(BaseCleaningSampler):
7676
7777
.. versionadded:: 0.9
7878
79+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
80+
Names of features seen during `fit`. Defined only when `X` has feature
81+
names that are all strings.
82+
83+
.. versionadded:: 0.10
84+
7985
See Also
8086
--------
8187
CondensedNearestNeighbour : Undersample by condensing samples.
@@ -251,6 +257,12 @@ class RepeatedEditedNearestNeighbours(BaseCleaningSampler):
251257
252258
.. versionadded:: 0.9
253259
260+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
261+
Names of features seen during `fit`. Defined only when `X` has feature
262+
names that are all strings.
263+
264+
.. versionadded:: 0.10
265+
254266
See Also
255267
--------
256268
CondensedNearestNeighbour : Undersample by condensing samples.
@@ -454,6 +466,12 @@ class without early stopping.
454466
455467
.. versionadded:: 0.9
456468
469+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
470+
Names of features seen during `fit`. Defined only when `X` has feature
471+
names that are all strings.
472+
473+
.. versionadded:: 0.10
474+
457475
See Also
458476
--------
459477
CondensedNearestNeighbour: Under-sampling by condensing samples.

imblearn/under_sampling/_prototype_selection/_instance_hardness_threshold.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,12 @@ class InstanceHardnessThreshold(BaseUnderSampler):
6767
6868
.. versionadded:: 0.9
6969
70+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
71+
Names of features seen during `fit`. Defined only when `X` has feature
72+
names that are all strings.
73+
74+
.. versionadded:: 0.10
75+
7076
See Also
7177
--------
7278
NearMiss : Undersample based on near-miss search.

imblearn/under_sampling/_prototype_selection/_nearmiss.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,12 @@ class NearMiss(BaseUnderSampler):
7272
7373
.. versionadded:: 0.9
7474
75+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
76+
Names of features seen during `fit`. Defined only when `X` has feature
77+
names that are all strings.
78+
79+
.. versionadded:: 0.10
80+
7581
See Also
7682
--------
7783
RandomUnderSampler : Random undersample the dataset.

imblearn/under_sampling/_prototype_selection/_neighbourhood_cleaning_rule.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,12 @@ class NeighbourhoodCleaningRule(BaseCleaningSampler):
8383
8484
.. versionadded:: 0.9
8585
86+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
87+
Names of features seen during `fit`. Defined only when `X` has feature
88+
names that are all strings.
89+
90+
.. versionadded:: 0.10
91+
8692
See Also
8793
--------
8894
EditedNearestNeighbours : Undersample by editing noisy samples.

imblearn/under_sampling/_prototype_selection/_one_sided_selection.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,12 @@ class OneSidedSelection(BaseCleaningSampler):
6868
6969
.. versionadded:: 0.9
7070
71+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
72+
Names of features seen during `fit`. Defined only when `X` has feature
73+
names that are all strings.
74+
75+
.. versionadded:: 0.10
76+
7177
See Also
7278
--------
7379
EditedNearestNeighbours : Undersample by editing noisy samples.

imblearn/under_sampling/_prototype_selection/_random_under_sampler.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,12 @@ class RandomUnderSampler(BaseUnderSampler):
5050
5151
.. versionadded:: 0.9
5252
53+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
54+
Names of features seen during `fit`. Defined only when `X` has feature
55+
names that are all strings.
56+
57+
.. versionadded:: 0.10
58+
5359
See Also
5460
--------
5561
NearMiss : Undersample using near-miss samples.

imblearn/under_sampling/_prototype_selection/_tomek_links.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,12 @@ class TomekLinks(BaseCleaningSampler):
4848
4949
.. versionadded:: 0.9
5050
51+
feature_names_in_ : ndarray of shape (`n_features_in_`,)
52+
Names of features seen during `fit`. Defined only when `X` has feature
53+
names that are all strings.
54+
55+
.. versionadded:: 0.10
56+
5157
See Also
5258
--------
5359
EditedNearestNeighbours : Undersample by samples edition.

0 commit comments

Comments
 (0)