Skip to content

Add support for data download during backtest and refactor of catalog #2652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

faysou
Copy link
Collaborator

@faysou faysou commented May 19, 2025

Pull Request

NautilusTrader prioritizes correctness and reliability, please follow existing patterns for validation and testing.

Summary

Add download function to backtest node in order to be able to download data and save it to a catalog easily.
This PR builds on previous PRs (see their comments for more details on each of them).

To reduce maintance, I will only rebase this PR while waiting it gets merged.

Related Issues/PRs

#2574
#2594

Type of change

  • New feature (non-breaking)

Release notes

  • I added a concise entry to RELEASES.md that follows the existing conventions (when applicable)

Testing

Tested on included example

@faysou faysou force-pushed the download_data branch 5 times, most recently from 94bee45 to d3ae70b Compare May 23, 2025 18:39
@CLAassistant
Copy link

CLAassistant commented May 23, 2025

CLA assistant check
All committers have signed the CLA.

@faysou faysou force-pushed the download_data branch 11 times, most recently from 2ac3ff8 to 8d5b794 Compare May 29, 2025 09:48
@faysou faysou force-pushed the download_data branch 8 times, most recently from 161fd38 to 1e28b9b Compare June 1, 2025 13:20
@faysou faysou force-pushed the download_data branch 4 times, most recently from d5b6a85 to 6f8617e Compare June 4, 2025 11:18
@faysou faysou changed the title Add download_data function to BacktestNode Add support for data download during backtest and refactor of catalog Jun 4, 2025
@faysou faysou force-pushed the download_data branch 6 times, most recently from 8083561 to be93e2b Compare June 8, 2025 21:59
Copy link
Member

@cjdsellers cjdsellers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to wrap my head around the changes to the node, looking great so far though!

if success_msg:
self._log.info(success_msg, success_color)
self._log.info(f"{success_msg} (Color: {success_color})")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we mean to change passing the LogColor enum here?

kwargs : Any
Additional keyword arguments to be passed to the `write_chunk` method.
start : int, optional
The start timestamp for the data chunk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it Unix nanos? (we should document that, param names are fine)

Also, should it be inferred from that data [0] and [-1] to prevent mismatches? (there might be a reason for this I haven't read yet)

[edit] Is it because we're carrying the original start and end metadata for the query which produced the data? we could probably provide a little more detail about the behavior in the docstring, which we'll carry over to Rust - because when I saw start and end I immediately thought of the potential for mismatches.

self.fs.mkdirs(directory, exist_ok=True)

if isinstance(data[0], Instrument):
# When writing an instrument for a given instrument_id, we don't want duplicates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll eventually want to change this to handling a series of instrument versions, to reflect changes to the definition over time. (Understood that this preserves existing functionality)

) -> None:
"""
Consolidate several parquet files into a single file with data sorted in
ascending chronological order.
Reset the filenames of parquet files for a specific data class and instrument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great docs here 👌

@@ -34,14 +35,11 @@ def class_to_filename(cls: type) -> str:
return name


def urisafe_instrument_id(instrument_id: InstrumentId | str) -> str:
def urisafe_instrument_id(instrument_id: InstrumentId | BarType | str) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we choose a different param name if it could also be a bar type?

@@ -107,6 +107,8 @@ cdef class SubscribeData(DataCommand):
----------
data_type : type
The data type for the subscription.
instrument_id : InstrumentId
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little on the fence about making instrument_id optional and adding it to the more general subscribe custom data message. Does it make downstream logic in the catalog easier?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants