Skip to content
This repository was archived by the owner on Apr 23, 2025. It is now read-only.

Multiple threads to perform BPE encoding #503

Merged
merged 3 commits into from
May 13, 2020
Merged

Multiple threads to perform BPE encoding #503

merged 3 commits into from
May 13, 2020

Conversation

xihui-wu
Copy link
Contributor

No description provided.

@@ -74,7 +74,7 @@ final class TextUnsupervisedTests: XCTestCase {
XCTAssertEqual(example.second.shape[0], 1024)
totalCount += 1
}
XCTAssertEqual(totalCount, 12)
XCTAssertEqual(totalCount, 64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this increase the length of time it takes to run these tests during every presubmit? If so, let's keep the lower document count by being explicit during init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I increased it to show that with this concurrency a larger number of documents won't increase test time. I'm fine to revert it back to 4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to these changes? I believe they will resolve the failure in #526

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the failure? I didn't touch the test eventually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In presubmits, I'm seeing:

/swift-models/Tests/DatasetsTests/TextUnsupervised/TextUnsupervisedTests.swift:77: error: TextUnsupervisedTests.testCreateWikiText2WithBpe : XCTAssertEqual failed: ("64") is not equal to ("12") -

The test passes for me locally with 12, so I don't quite understand this. I wonder if kokoro is picking up a cached image somehow... 🤔 I'll try running again.

Copy link
Member

@texasmichelle texasmichelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach since it cleans up in-line. Thank you!!

encodedDocs = documents.map { embedding(for: $0, bpe: bpe) }
} else {
encodedDocs = documents.concurrentMap { embedding(for: $0, bpe: bpe) }
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace tab with spaces

@xihui-wu xihui-wu merged commit 331bcbf into master May 13, 2020
@xihui-wu xihui-wu deleted the bpe branch May 13, 2020 04:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants