[QEff Finetune] : Made fixes to training script #439

quic-mamta · 2025-06-10T09:33:50Z

Made fixes to training script.

quic-mamta · 2025-06-10T09:38:41Z

QEfficient/finetune/dataset/samsum_dataset.py



 def get_preprocessed_samsum(dataset_config, tokenizer, split, context_length=None):
-    dataset = datasets.load_dataset("Samsung/samsum", split=split, trust_remote_code=True)
+    dataset = datasets.load_dataset("knkarthick/samsum", split=split, trust_remote_code=True)


Please check if this dataset can be used.

quic-mamta · 2025-06-10T10:15:31Z

QEfficient/finetune/utils/config_utils.py

 def get_dataloader_kwargs(train_config, dataset, dataset_processer, mode):
    kwargs = {}
    batch_size = train_config.batch_size_training if mode == "train" else train_config.val_batch_size
    if train_config.enable_ddp:
+        print("Length of dataset before: ", len(dataset))
+        dataset = pad_dataset(dataset, batch_size, 2)


instead of 2 use world_size here

quic-mamta · 2025-06-10T10:17:46Z

QEfficient/finetune/utils/config_utils.py

@@ -115,10 +115,26 @@ def generate_dataset_config(dataset_name: str) -> Any:
    return dataset_config


+def pad_dataset(dataset, batch_size, num_replicas):
+    reminder = len(dataset) % (batch_size * num_replicas)


Please use remainder as variable name here.

Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>

quic-swatia · 2025-06-16T06:36:19Z

QEfficient/finetune/utils/train_utils.py

@@ -235,11 +241,23 @@ def train(
                train_step_metric.append(step_metric_val)

            if train_config.grad_scaler:
-                scaler.scale(loss).backward()  # backward pass
+                if train_config.enable_ddp:
+                    with model.no_sync():


This will result in no syncing of gradients at any step.

quic-swatia · 2025-06-16T06:36:42Z

QEfficient/finetune/utils/train_utils.py

+                if train_config.enable_ddp:
+                    # FIXME: We can not stop transfer of gradient across devices every time.
+                    # In grad accumulation last step should transfer gradients across devices.
+                    with model.no_sync():


This will result in no syncing of gradients at any step here as well.

…ght parameter to make the loss for padded samples as zero.

quic-mamta requested review from quic-rishinr, ochougul, quic-hemagnih and quic-amitraj as code owners June 10, 2025 09:33

quic-mamta marked this pull request as draft June 10, 2025 09:34

quic-mamta commented Jun 10, 2025

View reviewed changes

quic-mamta force-pushed the jitender_fixes branch from 375701d to 6d833cf Compare June 10, 2025 10:56

Made fixes to training script based on recent findings.

d269d0c

Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>

quic-mamta force-pushed the jitender_fixes branch from 6d833cf to d269d0c Compare June 10, 2025 10:57

quic-mamta changed the title ~~Made fixes to training script based on recent findings.~~ [QEff Finetune] : Made fixes to training script Jun 12, 2025

quic-mamta assigned quic-mamta and quic-meetkuma and unassigned quic-mamta and quic-meetkuma Jun 16, 2025

quic-swatia reviewed Jun 16, 2025

View reviewed changes

Meet Patel added 3 commits June 17, 2025 09:38

Cleaned up the patch and added padding of the dataset with a loss_wei…

de2ae62

…ght parameter to make the loss for padded samples as zero.

Minor cleanup

84db1bb

Updated loss arithmatic for gradient accumulation.

1985ef1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QEff Finetune] : Made fixes to training script #439

[QEff Finetune] : Made fixes to training script #439

quic-mamta commented Jun 10, 2025 •

edited

Loading

Uh oh!

quic-mamta Jun 10, 2025

Uh oh!

quic-mamta Jun 10, 2025

Uh oh!

quic-mamta Jun 10, 2025

Uh oh!

quic-swatia Jun 16, 2025

Uh oh!

quic-swatia Jun 16, 2025

Uh oh!

Uh oh!

[QEff Finetune] : Made fixes to training script #439

Are you sure you want to change the base?

[QEff Finetune] : Made fixes to training script #439

Conversation

quic-mamta commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quic-mamta Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

quic-mamta Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

quic-mamta Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

quic-swatia Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

quic-swatia Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

quic-mamta commented Jun 10, 2025 •

edited

Loading