Skip to content

Refine Rust catalog file operations #2454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 19, 2025
Merged

Refine Rust catalog file operations #2454

merged 1 commit into from
Mar 19, 2025

Conversation

faysou
Copy link
Collaborator

@faysou faysou commented Mar 16, 2025

Pull Request

Refine rust catalog file operations
Made it work in python

Type of change

  • New feature (non-breaking change which adds functionality)

How has this change been tested?

Added python tests

@faysou faysou force-pushed the rustcata branch 8 times, most recently from dbda861 to e7ef66d Compare March 18, 2025 18:48
Copy link
Member

@cjdsellers cjdsellers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for all this work 👌

ONE_MIN_BID = BarSpecification(1, BarAggregation.MINUTE, PriceType.BID)
AUDUSD_1_MIN_BID = BarType(AUDUSD_SIM, ONE_MIN_BID)

bar1 = Bar(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some py03 types instead of standard ones here, I could add a similar stubs file for pyo3 though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, in that case the way you've done it is the easiest path for now then.

@@ -3480,6 +3486,48 @@ class PostgresCacheDatabase:
def update_order(self, order: object) -> None: ...
def update_account(self, account: Account) -> None: ...


class PyParquetDataCatalogV2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to prefix with Py here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll do that, I used an LLM for this and didn't notice

}
}
let mut combined = Vec::with_capacity(existing_batches.len() + batches.len());
let batches: Vec<RecordBatch> = batches.to_vec();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice simplification.

self.inner
.write_to_parquet(data, None, None, None, write_mode)
.map(|path| path.to_string_lossy().to_string())
.map_err(|e| PyIOError::new_err(format!("Failed to write bars: {}", e)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can standardize on format strings here, rather than interpolation of e outside the string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll do that

@@ -294,42 +288,47 @@ impl ParquetDataCatalog {
start: Option<UnixNanos>,
end: Option<UnixNanos>,
where_clause: Option<&str>,
) -> datafusion::error::Result<QueryResult>
) -> Result<QueryResult>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should leave all non built-in Result types qualified.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to make it compatible with the calling function that uses anyhow

Copy link
Collaborator Author

@faysou faysou Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try again to see if it works when keeping the data fusion result in the signature

Copy link
Collaborator Author

@faysou faysou Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep the anyhow result as standard, the type of the error is not lost in the anyhow result, it's a box dyn error that can be converted back to whatever is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the actual result type changed then thats fine, which is where using anyhow::Result is clearer.

@cjdsellers cjdsellers changed the title Refine rust catalog file operations Refine Rust catalog file operations Mar 18, 2025
@faysou
Copy link
Collaborator Author

faysou commented Mar 18, 2025

I've updated things above. It should be ready after the CI finishes.

@cjdsellers cjdsellers merged commit d521cce into develop Mar 19, 2025
12 checks passed
@cjdsellers cjdsellers deleted the rustcata branch March 19, 2025 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants