Skip to content

Change of default behavior from "load all data at once" to streaming in high-level API #2565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
stastnypremysl opened this issue Apr 27, 2025 · 2 comments
Labels
RFC A request for comment

Comments

@stastnypremysl
Copy link
Collaborator

stastnypremysl commented Apr 27, 2025

RFC: Change of default behavior from "load all data at once" to streaming in high-level API

Context

I propose to change of default behavior from "load all data at once" to streaming during high level API back-tests. chunk_size=65536 seems to work well.

Considerations

There are almost only positives for users processing large amount of data at once and almost no performance hit for users, which are processing smaller amount of data, or have enough of RAM for backtest available.

Also, I propose to change this until NT is beta, as this presents breaking API change for NT.

@stastnypremysl stastnypremysl added the RFC A request for comment label Apr 27, 2025
@cjdsellers
Copy link
Member

Hi @stastnypremysl

From a performance perspective, what you outline here makes sense.

The reason we don't set a chunk_size by default for streaming is that this then constraints the data types which can be used to the built in ones available through Rust and the DataFusion backend:

  • OrderBookDelta
  • OrderBookDepth10
  • QuoteTick
  • TradeTick
  • Bar

There are users who need other custom data types to be available, and streaming is a memory optimization which can be discovered as a user gains experience with the platform and reads the documentation.

I think that changing the default has only marginal benefits and we're better off leaving things as they are until the Rust port is complete, at which point all data types including custom data will be available - and it would definitely make sense to stream by default for the reasons you mention.

@stastnypremysl
Copy link
Collaborator Author

Hi, @cjdsellers.

Thanks for your answer.

After reading your argument, I fully agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC A request for comment
Projects
None yet
Development

No branches or pull requests

2 participants