Skip to content

adding pandas.api.typing.aliases and docs #61735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/source/development/contributing_codebase.rst
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,8 @@ With custom types and inference this is not always possible so exceptions are ma
pandas-specific types
~~~~~~~~~~~~~~~~~~~~~

Commonly used types specific to pandas will appear in `pandas._typing <https://github.com/pandas-dev/pandas/blob/main/pandas/_typing.py>`_ and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas.
Commonly used types specific to pandas will appear in `pandas._typing <https://github.com/pandas-dev/pandas/blob/main/pandas/_typing.py>`__ and you should use these where applicable. This module is private and is meant for pandas development.
Types that are meant for user consumption should be exposed in `pandas.api.typing.aliases <https://github.com/pandas-dev/pandas/blob/main/pandas/api/typing/aliases.py>`__ and ideally added to the `pandas-stubs <https://github.com/pandas-dev/pandas-stubs>`__ project.

For example, quite a few functions in pandas accept a ``dtype`` argument. This can be expressed as a string like ``"object"``, a ``numpy.dtype`` like ``np.int64`` or even a pandas ``ExtensionDtype`` like ``pd.CategoricalDtype``. Rather than burden the user with having to constantly annotate all of those options, this can simply be imported and reused from the pandas._typing module

Expand Down
91 changes: 91 additions & 0 deletions doc/source/reference/aliases.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
{{ header }}

.. _api.typing.aliases:

======================================
pandas typing aliases
======================================

**************
Typing aliases
**************

.. currentmodule:: pandas.api.atyping.aliases

The typing declarations in ``pandas/_typing.py`` are considered private, and used
by pandas developers for type checking of the pandas code base. For users, it is
highly recommended to use the ``pandas-stubs`` package that represents the officially
supported type declarations for users of pandas.
Note that the definitions and use cases of these aliases are subject to change.
They are documented here for users who wish to use these declarations in their
own python code that calls pandas or expects certain results.

Each of these aliases listed in the table below can be found by importing them from :py:mod:`pandas.api.typing.aliases`.

==================================== ================================================================
Alias Meaning
==================================== ================================================================
:py:type:`AggFuncType` Type of functions that can be passed to :meth:`agg` methods
:py:type:`AlignJoin` Argument type for ``join`` in :meth:`DataFrame.join`
:py:type:`AnyAll` Argument type for ``how`` in :meth:`dropna`
:py:type:`AnyArrayLike` Used to represent :class:`ExtensionArray`, ``numpy`` arrays, :class:`Index` and :class:`Series`
:py:type:`ArrayLike` Used to represent :class:`ExtensionArray`, ``numpy`` arrays
:py:type:`AstypeArg` Argument type in :meth:`astype`
:py:type:`Axes` :py:type:`AnyArrayLike` plus sequences (not strings) and ``range``
:py:type:`Axis` Argument type for ``axis`` in many methods
:py:type:`CSVEngine` Argument type for ``engine`` in :meth:`DataFrame.read_csv`
:py:type:`ColspaceArgType` Argument type for ``colspace`` in :meth:`DataFrame.to_html`
:py:type:`CompressionOptions` Argument type for ``compression`` in many I/O output methods
:py:type:`CorrelationMethod` Argument type for ``correlation`` in :meth:`corr`
:py:type:`DropKeep` Argument type for ``keep`` in :meth:`drop_duplicates`
:py:type:`Dtype` Types as objects that can be used to specify dtypes
:py:type:`DtypeArg` Argument type for ``dtype`` in various methods
:py:type:`DtypeBackend` Argument type for ``dtype_backend`` in various methods
:py:type:`DtypeObj` Numpy dtypes and Extension dtypes
:py:type:`ExcelWriterIfSheetExists` Argument type for ``if_sheet_exists`` in :class:`ExcelWriter`
:py:type:`ExcelWriterMergeCells` Argument type for ``merge_cells`` in :meth:`to_excel`
:py:type:`FilePath` Type of paths for files for I/O methods
:py:type:`FillnaOptions` Argument type for ``method`` in various methods where NA values are filled
:py:type:`FloatFormatType` Argument type for ``float_format`` in :meth:`to_string`
:py:type:`FormattersType` Argument type for ``formatters`` in :meth:`to_string`
:py:type:`FromDictOrient` Argument type for ``orient`` in :meth:`DataFrame.from_dict`
:py:type:`HTMLFlavors` Argument type for ``flavor`` in :meth:`pandas.read_html`
:py:type:`IgnoreRaise` Argument type for ``errors`` in multiple methods
:py:type:`IndexLabel` Argument type for ``level`` in multiple methods
:py:type:`InterpolateOptions` Argument type for ``interpolate`` in :meth:`interpolate`
:py:type:`JSONEngine` Argument type for ``engine`` in :meth:`read_json`
:py:type:`JSONSerializable` Argument type for the return type of a callable for argument ``default_handler`` in :meth:`to_json`
:py:type:`JoinHow` Argument type for ``how`` in :meth:`pandas.merge_ordered` and for ``join`` in :meth:`Series.align`
:py:type:`JoinValidate` Argument type for ``validate`` in :meth:`DataFrame.join`
:py:type:`MergeHow` Argument type for ``how`` in :meth:`merge`
:py:type:`MergeValidate` Argument type for ``validate`` in :meth:`merge`
:py:type:`NaPosition` Argument type for ``na_position`` in :meth:`sort_index` and :meth:`sort_values`
:py:type:`NsmallestNlargestKeep` Argument type for ``keep`` in :meth:`nlargest` and :meth:`nsmallest`
:py:type:`OpenFileErrors` Argument type for ``errors`` in :meth:`to_hdf` and :meth:`to_csv`
:py:type:`Ordered` Return type for :py:attr:`ordered`` in :class:`CategoricalDtype` and :class:`Categorical`
:py:type:`QuantileInterpolation` Argument type for ``interpolation`` in :meth:`quantile`
:py:type:`ReadBuffer` Additional argument type corresponding to buffers for various file reading methods
:py:type:`ReadCsvBuffer` Additional argument type corresponding to buffers for :meth:`pandas.read_csv`
:py:type:`ReadPickleBuffer` Additional argument type corresponding to buffers for :meth:`pandas.read_pickle`
:py:type:`ReindexMethod` Argument type for ``reindex`` in :meth:`reindex`
:py:type:`Scalar` Basic type that can be stored in :class:`Series`
:py:type:`SequenceNotStr` Used for arguments that require sequences, but not plain strings
:py:type:`SortKind` Argument type for ``kind`` in :meth:`sort_index` and :meth:`sort_values`
:py:type:`StorageOptions` Argument type for ``storage_options`` in various file output methods
:py:type:`Suffixes` Argument type for ``suffixes`` in :meth:`merge`, :meth:`compare` and :meth:`merge_ordered`
:py:type:`TakeIndexer` Argument type for ``indexer`` and ``indices`` in :meth:`take`
:py:type:`TimeAmbiguous` Argument type for ``ambiguous`` in time operations
:py:type:`TimeGrouperOrigin` Argument type for ``origin`` in :meth:`resample` and :class:`TimeGrouper`
:py:type:`TimeNonexistent` Argument type for ``nonexistent`` in time operations
:py:type:`TimeUnit` Time unit argument and return type for :py:attr:`unit`, arguments ``unit`` and ``date_unit``
:py:type:`TimedeltaConvertibleTypes` Argument type for ``offset`` in :meth:`resample`, ``halflife`` in :meth:`ewm` and ``start`` and ``end`` in :meth:`pandas.timedelta_range`
:py:type:`TimestampConvertibleTypes` Argument type for ``origin`` in :meth:`resample` and :meth:`pandas.to_datetime`
:py:type:`ToStataByteorder` Argument type for ``byteorder`` in :meth:`DataFrame.to_stata`
:py:type:`ToTimestampHow` Argument type for ``how`` in :meth:`to_timestamp` and ``convention`` in :meth:`resample`
:py:type:`UpdateJoin` Argument type for ``join`` in :meth:`DataFrame.update`
:py:type:`UsecolsArgType` Argument type for ``usecols`` in :meth:`pandas.read_clipboard`, :meth:`pandas.read_csv` and :meth:`pandas.read_excel`
:py:type:`WindowingRankType` Argument type for ``method`` in :meth:`rank`` in rolling and expanding window operations
:py:type:`WriteBuffer` Additional argument type corresponding to buffers for various file output methods
:py:type:`WriteExcelBuffer` Additional argument type corresponding to buffers for :meth:`to_excel`
:py:type:`XMLParsers` Argument type for ``parser`` in :meth:`DataFrame.to_xml` and :meth:`pandas.read_xml`
==================================== ================================================================
1 change: 1 addition & 0 deletions doc/source/reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ to be stable.
extensions
testing
missing_value
aliases

.. This is to prevent warnings in the doc build. We don't want to encourage
.. these methods.
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ Other enhancements
- Errors occurring during SQL I/O will now throw a generic :class:`.DatabaseError` instead of the raw Exception type from the underlying driver manager library (:issue:`60748`)
- Implemented :meth:`Series.str.isascii` and :meth:`Series.str.isascii` (:issue:`59091`)
- Improved deprecation message for offset aliases (:issue:`60820`)
- Many type aliases are now exposed in the new submodule :py:mod:`pandas.api.typing.aliases` (:issue:`55231`)
- Multiplying two :class:`DateOffset` objects will now raise a ``TypeError`` instead of a ``RecursionError`` (:issue:`59442`)
- Restore support for reading Stata 104-format and enable reading 103-format dta files (:issue:`58554`)
- Support passing a :class:`Iterable[Hashable]` input to :meth:`DataFrame.drop_duplicates` (:issue:`59237`)
Expand Down
131 changes: 131 additions & 0 deletions pandas/api/typing/aliases.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
from pandas._typing import (
AggFuncType,
AlignJoin,
AnyAll,
AnyArrayLike,
ArrayLike,
AstypeArg,
Axes,
Axis,
ColspaceArgType,
CompressionOptions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many type aliases here where it is not clear what method(s) they are appropriate for. E.g. it would be wrong to use this for DataFrame.to_parquet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to cover that in the docs, without getting too specific. I can make the docs more specific, although there are cases where the aliases are used in lots of methods, so the list can get quite long. E.g., for CompressionOptions, I said "Argument type for compression in many I/O output methods" .

Open to suggestions as to how to better document this.

CorrelationMethod,
CSVEngine,
DropKeep,
Dtype,
DtypeArg,
DtypeBackend,
DtypeObj,
ExcelWriterIfSheetExists,
ExcelWriterMergeCells,
FilePath,
FillnaOptions,
FloatFormatType,
FormattersType,
FromDictOrient,
HTMLFlavors,
IgnoreRaise,
IndexLabel,
InterpolateOptions,
JoinHow,
JoinValidate,
JSONEngine,
JSONSerializable,
MergeHow,
MergeValidate,
NaPosition,
NsmallestNlargestKeep,
OpenFileErrors,
Ordered,
QuantileInterpolation,
ReadBuffer,
ReadCsvBuffer,
ReadPickleBuffer,
ReindexMethod,
Scalar,
SequenceNotStr,
SortKind,
StorageOptions,
Suffixes,
TakeIndexer,
TimeAmbiguous,
TimedeltaConvertibleTypes,
TimeGrouperOrigin,
TimeNonexistent,
TimestampConvertibleTypes,
TimeUnit,
ToStataByteorder,
ToTimestampHow,
UpdateJoin,
UsecolsArgType,
WindowingRankType,
WriteBuffer,
WriteExcelBuffer,
XMLParsers,
)

__all__ = [
"AggFuncType",
"AlignJoin",
"AnyAll",
"AnyArrayLike",
"ArrayLike",
"AstypeArg",
"Axes",
"Axis",
"CSVEngine",
"ColspaceArgType",
"CompressionOptions",
"CorrelationMethod",
"DropKeep",
"Dtype",
"DtypeArg",
"DtypeBackend",
"DtypeObj",
"ExcelWriterIfSheetExists",
"ExcelWriterMergeCells",
"FilePath",
"FillnaOptions",
"FloatFormatType",
"FormattersType",
"FromDictOrient",
"HTMLFlavors",
"IgnoreRaise",
"IndexLabel",
"InterpolateOptions",
"JSONEngine",
"JSONSerializable",
"JoinHow",
"JoinValidate",
"MergeHow",
"MergeValidate",
"NaPosition",
"NsmallestNlargestKeep",
"OpenFileErrors",
"Ordered",
"QuantileInterpolation",
"ReadBuffer",
"ReadCsvBuffer",
"ReadPickleBuffer",
"ReindexMethod",
"Scalar",
"SequenceNotStr",
"SortKind",
"StorageOptions",
"Suffixes",
"TakeIndexer",
"TimeAmbiguous",
"TimeGrouperOrigin",
"TimeNonexistent",
"TimeUnit",
"TimedeltaConvertibleTypes",
"TimestampConvertibleTypes",
"ToStataByteorder",
"ToTimestampHow",
"UpdateJoin",
"UsecolsArgType",
"WindowingRankType",
"WriteBuffer",
"WriteExcelBuffer",
"XMLParsers",
]
70 changes: 70 additions & 0 deletions pandas/tests/api/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
types as api_types,
typing as api_typing,
)
from pandas.api.typing import aliases as api_aliases


class Base:
Expand Down Expand Up @@ -275,6 +276,7 @@ class TestApi(Base):
"TimedeltaIndexResamplerGroupby",
"TimeGrouper",
"Window",
"aliases",
]
allowed_api_types = [
"is_any_real_numeric_dtype",
Expand Down Expand Up @@ -342,6 +344,71 @@ class TestApi(Base):
"ExtensionScalarOpsMixin",
]
allowed_api_executors = ["BaseExecutionEngine"]
allowed_api_aliases = [
"AggFuncType",
"AlignJoin",
"AnyAll",
"AnyArrayLike",
"ArrayLike",
"AstypeArg",
"Axes",
"Axis",
"CSVEngine",
"ColspaceArgType",
"CompressionOptions",
"CorrelationMethod",
"DropKeep",
"Dtype",
"DtypeArg",
"DtypeBackend",
"DtypeObj",
"ExcelWriterIfSheetExists",
"ExcelWriterMergeCells",
"FilePath",
"FillnaOptions",
"FloatFormatType",
"FormattersType",
"FromDictOrient",
"HTMLFlavors",
"IgnoreRaise",
"IndexLabel",
"InterpolateOptions",
"JSONEngine",
"JSONSerializable",
"JoinHow",
"JoinValidate",
"MergeHow",
"MergeValidate",
"NaPosition",
"NsmallestNlargestKeep",
"OpenFileErrors",
"Ordered",
"QuantileInterpolation",
"ReadBuffer",
"ReadCsvBuffer",
"ReadPickleBuffer",
"ReindexMethod",
"Scalar",
"SequenceNotStr",
"SortKind",
"StorageOptions",
"Suffixes",
"TakeIndexer",
"TimeAmbiguous",
"TimeGrouperOrigin",
"TimeNonexistent",
"TimeUnit",
"TimedeltaConvertibleTypes",
"TimestampConvertibleTypes",
"ToStataByteorder",
"ToTimestampHow",
"UpdateJoin",
"UsecolsArgType",
"WindowingRankType",
"WriteBuffer",
"WriteExcelBuffer",
"XMLParsers",
]

def test_api(self):
self.check(api, self.allowed_api_dirs)
Expand All @@ -364,6 +431,9 @@ def test_api_extensions(self):
def test_api_executors(self):
self.check(api_executors, self.allowed_api_executors)

def test_api_typing_aliases(self):
self.check(api_aliases, self.allowed_api_aliases)


class TestErrors(Base):
def test_errors(self):
Expand Down
Loading