Skip to content

BUG: cumsum() shows inconsistent results in nullable dtype #39479

Closed
@koizumihiroo

Description

@koizumihiroo
  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

# import pandas as pd
# import numpy as np

# like https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumsum.html
# Ex1
>>> pd.Series([2, np.nan, 5, -1, 0], dtype="float").cumsum()
0    2.0
1    NaN
2    7.0
3    6.0
4    6.0
dtype: float64

# Ex2
>>> pd.Series([2, np.nan, 5, -1, 0], dtype="Float64").cumsum()
0    2.0
1    2.0
2    7.0
3    6.0
4    6.0
dtype: object

# Ex3
>>> pd.Series([2, np.nan, 5, -1, 0], dtype="Int64").cumsum()
0       2
1    <NA>
2    <NA>
3    <NA>
4    <NA>
dtype: object

Problem description

  • Ex1 is expected behavior while Ex2 and Ex3 yield different values.
  • Ex2 returns dtype: object but Float64 is expected, the same applies to Ex3.
  • Other cum-family methods (cumprod, cummax, cummin) have the same problem.

Expected Output

# Ex2
>>> pd.Series([2, np.nan, 5, -1, 0], dtype="Float64").cumsum()
0    2.0
1    <NA>
2    7.0
3    6.0
4    6.0
dtype: Float64

# Ex3
>>> pd.Series([2, np.nan, 5, -1, 0], dtype="Int64").cumsum()
0       2
1    <NA>
2       7
3       6
4       6
dtype: Int64

Environment

FROM python:3.9.1-slim-buster
WORKDIR /home
RUN pip install pandas==1.2.1
CMD ["/usr/local/bin/python"]

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 9d598a5
python : 3.9.1.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.121-linuxkit
Version : #1 SMP Tue Dec 1 17:50:32 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.1
numpy : 1.19.5
pytz : 2020.5
dateutil : 2.8.1
pip : 21.0
setuptools : 52.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNA - MaskedArraysRelated to pd.NA and nullable extension arrays

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions