-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
DEPR: Enforce deprecation of numeric_only=None in DataFrame aggregations #49551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
23ed323
7a98ef0
9faf0a9
0da37c5
9b29793
48bf1eb
2535f7a
aadbc17
c62de6f
de65e8a
af1f3cb
811bea5
aea86fc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -441,7 +441,7 @@ Removal of prior version deprecations/changes | |
- Changed behavior of comparison of a :class:`Timestamp` with a ``datetime.date`` object; these now compare as un-equal and raise on inequality comparisons, matching the ``datetime.datetime`` behavior (:issue:`36131`) | ||
- Enforced deprecation of silently dropping columns that raised a ``TypeError`` in :class:`Series.transform` and :class:`DataFrame.transform` when used with a list or dictionary (:issue:`43740`) | ||
- Change behavior of :meth:`DataFrame.apply` with list-like so that any partial failure will raise an error (:issue:`43740`) | ||
- | ||
- Enforced deprecation of silently dropping columns that raised in DataFrame reductions (:issue:`41480`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. specific to |
||
|
||
.. --------------------------------------------------------------------------- | ||
.. _whatsnew_200.performance: | ||
|
@@ -509,6 +509,7 @@ Timezones | |
Numeric | ||
^^^^^^^ | ||
- Bug in :meth:`DataFrame.add` cannot apply ufunc when inputs contain mixed DataFrame type and Series type (:issue:`39853`) | ||
- Bug in DataFrame reduction methods (e.g. :meth:`DataFrame.sum`) with object dtype, ``axis=1`` and ``numeric_only=False`` would not be coerced to float (:issue:`49551`) | ||
- | ||
|
||
Conversion | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -266,9 +266,8 @@ | |
you to specify a location to update with some value.""", | ||
} | ||
|
||
_numeric_only_doc = """numeric_only : bool or None, default None | ||
Include only float, int, boolean data. If None, will attempt to use | ||
everything, then use only numeric data | ||
_numeric_only_doc = """numeric_only : bool, default False | ||
Include only float, int, boolean data. | ||
""" | ||
|
||
_merge_doc = """ | ||
|
@@ -10506,7 +10505,7 @@ def _reduce( | |
*, | ||
axis: Axis = 0, | ||
skipna: bool = True, | ||
numeric_only: bool | None = None, | ||
numeric_only: bool = False, | ||
filter_type=None, | ||
**kwds, | ||
): | ||
|
@@ -10515,7 +10514,6 @@ def _reduce( | |
|
||
# TODO: Make other agg func handle axis=None properly GH#21597 | ||
axis = self._get_axis_number(axis) | ||
labels = self._get_agg_axis(axis) | ||
assert axis in [0, 1] | ||
|
||
def func(values: np.ndarray): | ||
|
@@ -10541,25 +10539,22 @@ def _get_data() -> DataFrame: | |
data = self._get_bool_data() | ||
return data | ||
|
||
numeric_only_bool = com.resolve_numeric_only(numeric_only) | ||
if numeric_only is not None or axis == 0: | ||
if numeric_only or axis == 0: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why can't we go through this path unconditionally? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I don't think what I wrote was all too clear, but tried to explain this in #49551 (comment). I think we should take this path unconitionality, but there would be a behavior change for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here is an example to make this more explicit (on main):
Taking this path unconditionally would be to always get object dtype. I think that's the right thing to do, but would be a change in the default (numeric_only=None) behavior and plan to handle in a follow up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. makes sense, thanks There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I opened #49603 |
||
# For numeric_only non-None and axis non-None, we know | ||
# which blocks to use and no try/except is needed. | ||
# For numeric_only=None only the case with axis==0 and no object | ||
# dtypes are unambiguous can be handled with BlockManager.reduce | ||
# Case with EAs see GH#35881 | ||
df = self | ||
if numeric_only_bool: | ||
if numeric_only: | ||
df = _get_data() | ||
if axis == 1: | ||
df = df.T | ||
axis = 0 | ||
|
||
ignore_failures = numeric_only is None | ||
|
||
# After possibly _get_data and transposing, we are now in the | ||
# simple case where we can use BlockManager.reduce | ||
res, _ = df._mgr.reduce(blk_func, ignore_failures=ignore_failures) | ||
res, _ = df._mgr.reduce(blk_func, ignore_failures=False) | ||
out = df._constructor(res).iloc[0] | ||
if out_dtype is not None: | ||
out = out.astype(out_dtype) | ||
|
@@ -10576,36 +10571,11 @@ def _get_data() -> DataFrame: | |
|
||
return out | ||
|
||
assert numeric_only is None | ||
assert not numeric_only and axis == 1 | ||
|
||
data = self | ||
values = data.values | ||
|
||
try: | ||
result = func(values) | ||
|
||
except TypeError: | ||
# e.g. in nanops trying to convert strs to float | ||
|
||
data = _get_data() | ||
labels = data._get_agg_axis(axis) | ||
|
||
values = data.values | ||
with np.errstate(all="ignore"): | ||
result = func(values) | ||
|
||
# columns have been dropped GH#41480 | ||
arg_name = "numeric_only" | ||
if name in ["all", "any"]: | ||
arg_name = "bool_only" | ||
warnings.warn( | ||
"Dropping of nuisance columns in DataFrame reductions " | ||
f"(with '{arg_name}=None') is deprecated; in a future " | ||
"version this will raise TypeError. Select only valid " | ||
"columns before calling the reduction.", | ||
FutureWarning, | ||
stacklevel=find_stack_level(), | ||
) | ||
result = func(values) | ||
|
||
if hasattr(result, "dtype"): | ||
if filter_type == "bool" and notna(result).all(): | ||
|
@@ -10617,6 +10587,7 @@ def _get_data() -> DataFrame: | |
# try to coerce to the original dtypes item by item if we can | ||
pass | ||
|
||
labels = self._get_agg_axis(axis) | ||
result = self._constructor_sliced(result, index=labels) | ||
return result | ||
|
||
|
Uh oh!
There was an error while loading. Please reload this page.