Skip to content

Label handling commit breaks the imdb finetuning script #3

Open
@prrao87

Description

@prrao87

Thomas, thanks for sharing this code! I noticed that commit 8d9c237 seems to have broken the default functioning of the classification finetuning scripts - in the previous version there seems to have been a key called 'labels' associated with the imdb and trec dictionaries, but in finetuning_train.py this line still references the now deleted key.

I updated the line to just use DATASETS_LABELS_URL['imdb']['test'] as intended, but then it seems that the S3 bucket doesn't have the IMDB test file.

See below:

file_path = "https://s3.amazonaws.com/datasets.huggingface.co/imdb/test.labels.txt"
label_file = cached_path(file_path)
with open(label_file, "r", encoding="utf-8") as f:
    all_lines = f.readlines()
    print(all_lines[:5])

Gives:

['<?xml version="1.0" encoding="UTF-8"?>\n', '<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>imdb/test.labels.txt</Key><RequestId>3D9E7C511167A0FB</RequestId><HostId>RiidOcrHfFaqxW9tmUXRppE/G3lsYoCZcq+uaYDi2yPPoe8mv/Og6PMuUncwk+B53tGsvcCZMWk=</HostId></Error>']

Does the test file for IMDB still exist with this name? This doesn't seem to be an issue with TREC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions