-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
PDEP-1: Purpose and guidelines for pandas enhancement proposals #47444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 13 commits
8bde84f
0b43492
3d9a75b
6d9d34b
a0e6cda
a0d7276
a8295b8
1e408dd
291de8d
2ce2164
ebf1687
05d43a5
d20de1e
9b37d11
55b3887
8c34db0
4f3343b
7c1a725
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -15,10 +15,35 @@ fundamental changes to the project that are likely to take months or | |||||
years of developer time. Smaller-scoped items will continue to be | ||||||
tracked on our [issue tracker](https://github.com/pandas-dev/pandas/issues). | ||||||
|
||||||
See [Roadmap evolution](#roadmap-evolution) for proposing | ||||||
changes to this document. | ||||||
The roadmap is defined as a set of major enhancement proposals named PDEPs. | ||||||
For more information about PDEPs, and how to submit one, please refer to | ||||||
[PEDP-1](/pdeps/accepted/0001-puropose-and-guidelines.html). | ||||||
|
||||||
## Extensibility | ||||||
## PDEPs | ||||||
|
||||||
{% for pdep_type in ["Under discussion", "Accepted", "Implemented", "Rejected"] %} | ||||||
|
||||||
<h3 id="pdeps-{{pdep_type}}">{{ pdep_type.replace("_", " ").capitalize() }}</h3> | ||||||
|
||||||
<ul> | ||||||
{% for pdep in pdeps[pdep_type] %} | ||||||
<li><a href="{{ pdep.url }}">{{ pdep.title }}</a></li> | ||||||
{% else %} | ||||||
<li>There are currently no PDEPs with this status</li> | ||||||
{% endfor %} | ||||||
</ul> | ||||||
|
||||||
{% endfor %} | ||||||
|
||||||
## Roadmap points pending a PDEP | ||||||
|
||||||
<div class="alert alert-warning" role="alert"> | ||||||
pandas is in the process of moving roadmap points to PDEPs (implemented in | ||||||
June 2022). During the transition, some roadmap points will exist as PDEPs, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
to match the PDEP created date? but probably August. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah switch to August :-> |
||||||
while others will exist as sections below. | ||||||
</div> | ||||||
|
||||||
### Extensibility | ||||||
|
||||||
Pandas `extending.extension-types` allow | ||||||
for extending NumPy types with custom data types and array storage. | ||||||
|
@@ -33,7 +58,7 @@ library, making their behavior more consistent with the handling of | |||||
NumPy arrays. We'll do this by cleaning up pandas' internals and | ||||||
adding new methods to the extension array interface. | ||||||
|
||||||
## String data type | ||||||
### String data type | ||||||
|
||||||
Currently, pandas stores text data in an `object` -dtype NumPy array. | ||||||
The current implementation has two primary drawbacks: First, `object` | ||||||
|
@@ -54,7 +79,7 @@ work, we may need to implement certain operations expected by pandas | |||||
users (for example the algorithm used in, `Series.str.upper`). That work | ||||||
may be done outside of pandas. | ||||||
|
||||||
## Apache Arrow interoperability | ||||||
### Apache Arrow interoperability | ||||||
|
||||||
[Apache Arrow](https://arrow.apache.org) is a cross-language development | ||||||
platform for in-memory data. The Arrow logical types are closely aligned | ||||||
|
@@ -65,7 +90,7 @@ data types within pandas. This will let us take advantage of its I/O | |||||
capabilities and provide for better interoperability with other | ||||||
languages and libraries using Arrow. | ||||||
|
||||||
## Block manager rewrite | ||||||
### Block manager rewrite | ||||||
|
||||||
We'd like to replace pandas current internal data structures (a | ||||||
collection of 1 or 2-D arrays) with a simpler collection of 1-D arrays. | ||||||
|
@@ -92,7 +117,7 @@ See [these design | |||||
documents](https://dev.pandas.io/pandas2/internal-architecture.html#removal-of-blockmanager-new-dataframe-internals) | ||||||
for more. | ||||||
|
||||||
## Decoupling of indexing and internals | ||||||
### Decoupling of indexing and internals | ||||||
|
||||||
The code for getting and setting values in pandas' data structures | ||||||
needs refactoring. In particular, we must clearly separate code that | ||||||
|
@@ -150,7 +175,7 @@ which are actually expected (typically `KeyError`). | |||||
and when small differences in behavior are expected (e.g. getting with `.loc` raises for | ||||||
missing labels, setting still doesn't), they can be managed with a specific parameter. | ||||||
|
||||||
## Numba-accelerated operations | ||||||
### Numba-accelerated operations | ||||||
|
||||||
[Numba](https://numba.pydata.org) is a JIT compiler for Python code. | ||||||
We'd like to provide ways for users to apply their own Numba-jitted | ||||||
|
@@ -162,7 +187,7 @@ window contexts). This will improve the performance of | |||||
user-defined-functions in these operations by staying within compiled | ||||||
code. | ||||||
|
||||||
## Documentation improvements | ||||||
### Documentation improvements | ||||||
|
||||||
We'd like to improve the content, structure, and presentation of the | ||||||
pandas documentation. Some specific goals include | ||||||
|
@@ -177,7 +202,7 @@ pandas documentation. Some specific goals include | |||||
subsections of the documentation to make navigation and finding | ||||||
content easier. | ||||||
|
||||||
## Performance monitoring | ||||||
### Performance monitoring | ||||||
|
||||||
Pandas uses [airspeed velocity](https://asv.readthedocs.io/en/stable/) | ||||||
to monitor for performance regressions. ASV itself is a fabulous tool, | ||||||
|
@@ -197,29 +222,3 @@ We'd like to fund improvements and maintenance of these tools to | |||||
<https://pyperf.readthedocs.io/en/latest/system.html> | ||||||
- Build a GitHub bot to request ASV runs *before* a PR is merged. | ||||||
Currently, the benchmarks are only run nightly. | ||||||
|
||||||
## Roadmap Evolution | ||||||
|
||||||
Pandas continues to evolve. The direction is primarily determined by | ||||||
community interest. Everyone is welcome to review existing items on the | ||||||
roadmap and to propose a new item. | ||||||
|
||||||
Each item on the roadmap should be a short summary of a larger design | ||||||
proposal. The proposal should include | ||||||
|
||||||
1. Short summary of the changes, which would be appropriate for | ||||||
inclusion in the roadmap if accepted. | ||||||
2. Motivation for the changes. | ||||||
3. An explanation of why the change is in scope for pandas. | ||||||
4. Detailed design: Preferably with example-usage (even if not | ||||||
implemented yet) and API documentation | ||||||
5. API Change: Any API changes that may result from the proposal. | ||||||
|
||||||
That proposal may then be submitted as a GitHub issue, where the pandas | ||||||
maintainers can review and comment on the design. The [pandas mailing | ||||||
list](https://mail.python.org/mailman/listinfo/pandas-dev) should be | ||||||
notified of the proposal. | ||||||
|
||||||
When there's agreement that an implementation would be welcome, the | ||||||
roadmap should be updated to include the summary and a link to the | ||||||
discussion issue. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
# PDEP-1: Purpose and guidelines | ||
|
||
- Created: 30 July 2022 | ||
- Status: Under discussion | ||
- Discussion: [#47444](https://github.com/pandas-dev/pandas/pull/47444) | ||
- Author: [Marc Garcia](https://github.com/datapythonista) | ||
- Revision: 1 | ||
|
||
## PDEP definition, purpose and scope | ||
|
||
A PDEP (pandas enhancement proposal) is a proposal for a **major** change in | ||
pandas, in a similar way as a Python [PEP](https://peps.python.org/pep-0001/) | ||
or a NumPy [NEP](https://numpy.org/neps/nep-0000.html). | ||
|
||
Bug fixes and conceptually minor changes (e.g. adding a parameter to a function) | ||
are out of the scope of PDEPs. A PDEP should be used for changes that are not | ||
immediate and not obvious, and are expected to require a significant amount of | ||
discussion and require detailed documentation before being implemented. | ||
|
||
PDEP are appropriate for user facing changes, internal changes and organizational | ||
discussions. Examples of topics worth a PDEP could include moving a module from | ||
pandas to a separate repository, a refactoring of the pandas block manager or | ||
a proposal of a new code of conduct. | ||
|
||
## PDEP guidelines | ||
|
||
### Target audience | ||
|
||
A PDEP is a public document available to anyone, but the main stakeholders to | ||
consider when writing a PDEP are: | ||
|
||
- The core development team, who will have the final decision on whether a PDEP | ||
is approved or not | ||
- Contributors to pandas and other related projects, and experienced users. Their | ||
feedback is highly encouraged and appreciated, to make sure all points of views | ||
are taken into consideration | ||
- The wider pandas community, in particular users, who may or may not have feedback | ||
on the proposal, but should know and be able to understand the future direction of | ||
the project | ||
|
||
### PDEP authors | ||
|
||
Anyone can propose a PDEP, but in most cases developers of pandas itself and related | ||
projects are expected to author PDEPs. If you are unsure if you should be opening | ||
an issue or creating a PDEP, it's probably safe to start by | ||
[opening an issue](https://github.com/pandas-dev/pandas/issues/new/choose), which can | ||
be eventually moved to a PDEP. | ||
|
||
### Workflow | ||
|
||
The possible states of a PDEP are: | ||
|
||
- Under discussion | ||
- Accepted | ||
- Implemented | ||
- Rejected | ||
|
||
Next is described the workflow that PDEPs can follow. | ||
|
||
#### Submitting a PDEP | ||
|
||
Proposing a PDEP is done by creating a PR adding a new file to `web/pdeps/`. | ||
The file is a markdown file, you can use `web/pdeps/0001.md` as a reference | ||
for the expected format. | ||
|
||
The initial status of a PDEP will be `Status: Under discussion`. This will be changed | ||
to `Status: Accepted` when the PDEP is ready and have the approval of the core team. | ||
|
||
#### Accepted PDEP | ||
|
||
A PDEP can only be accepted by the core development team, if the proposal is considered | ||
worth implementing. Decisions will be made based on the process detailed in the | ||
[pandas governance document](https://github.com/pandas-dev/pandas-governance/blob/master/governance.md). | ||
In general, more than one approval will be needed before the PR is merged. And | ||
there should not be any `Request changes` review at the time of merging. | ||
|
||
Once a PDEP is accepted, any contributions can be made toward the implementation of the PDEP, | ||
with an open-ended completion timeline. Development of pandas is difficult to understand and | ||
forecast, being the contributors a mix of volunteers and developers paid from different sources, | ||
datapythonista marked this conversation as resolved.
Show resolved
Hide resolved
|
||
with different priorities. For companies, institutions or individuals with interest in seeing a | ||
PDEP being implemented, or to in general see progress to the pandas roadmap, please check how | ||
you can help in the [contributing page](/contribute.html). | ||
|
||
#### Implemented PDEP | ||
|
||
Once a PDEP is implemented and available in the main branch of pandas, its | ||
status will be changed to `Status: Implemented`, so there is visibility that the PDEP | ||
is not part of the roadmap and future plans, but a change that it already | ||
datapythonista marked this conversation as resolved.
Show resolved
Hide resolved
|
||
happened. The first pandas version in which the PDEP implementation is | ||
available will also be included in the PDEP. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not yet relevant, but would this be just a note in the revision history or something else? |
||
|
||
#### Rejected PDEP | ||
|
||
A PDEP can be rejected when the final decision is that its implementation is | ||
not the best for the interests of the project. Rejected PDEPs are as useful as accepted | ||
datapythonista marked this conversation as resolved.
Show resolved
Hide resolved
|
||
PDEPs, since there are discussions that are worth having, and decisions about | ||
changes to pandas being made. They will be merged with `Status: Rejected`, so | ||
there is visibility on what was discussed and what was the outcome of the | ||
discussion. A PDEP can be rejected for different reasons, for example good ideas | ||
that aren't backward-compatible, and the breaking changes aren't considered worth | ||
implementing. | ||
|
||
#### Invalid PDEP | ||
|
||
For submitted PDEPs that do not contain proper documentation, are out of scope, or | ||
are not useful to the community for any other reason, the PR will be closed after | ||
discussion with the author, instead of merging them as rejected. This is to not | ||
add noise to the list of rejected PDEPs, which should contain documentation as | ||
datapythonista marked this conversation as resolved.
Show resolved
Hide resolved
|
||
good as an accepted PDEP, but where the final decision was to not implement the changes. | ||
|
||
## Evolution of PDEPs | ||
|
||
Most PDEPs aren't expected to change after accepted. Once there is agreement in the changes, | ||
and they are implemented, the PDEP will be only useful to understand why the development happened, | ||
and the details of the discussion. | ||
|
||
But in some cases, a PDEP can be updated. For example, a PDEP defining a procedure or | ||
a policy, like this one (PDEP-1). Or cases when after attempting the implementation, | ||
new knowledge is obtained that makes the original PDEP obsolete, and changes are | ||
required. When there are specific changes to be made to the original PDEP, this will | ||
be edited, its `Revision: X` label will be increased by one, and a note will be added | ||
to the `PDEP-N history` section. This will let readers understand that the PDEP has | ||
changed and avoid confusion. | ||
|
||
### PDEP-1 History | ||
|
||
- 30 July 2022: Initial version |
Uh oh!
There was an error while loading. Please reload this page.