Skip to content

Demonstration of a real parallel macro

Dimitrios Stefanos Velissariou edited this page Jun 21, 2022 · 24 revisions

Description

The script used in this demonstration was kindly provided by Bruno Vellutini. It is representative of a typical pipeline found in molecular biology scripts that he has written and uses. It is representative of the embarrassingly parallel type of workload that Parallel Macro can be used to parallelize.

The script performs the computationally intensive conversion of the raw data obtained from a microscope in CZI format to TIFF images and AVI movies that can be opened and visualized on any computer. At the same time it saves the raw and processed data as TIFFs that are to be used in downstream analysis. Specifically it performs the following functions:

  • Split data-sets. The biologist wants to image several samples under the microscope simultaneously. However, the particular microscope that he uses bundles the samples in the raw data as views. The raw data must be split into individual data-sets. Each view is one data-set.
  • Convert to TIFF. For each view basic steps are performed such as selecting the channel colors, adjusting the intensity levels and rotating the data-set the way the biologist needs. Once this is done, the view is saved as a TIFF file. The TIFF is just the raw data of a single view with the base adjustments made.
  • Image processing. The next steps are specific image processing methods that the biologist does to visualize the data. Noise is removed, the background is subtracted, the maximum intensity projection is performed, which is an easier way to visualize the original 3D data. Finally a movie of the projected data is created, because the data-sets are time-lapses.

Original serial script

  • Start Fiji.
  • Start the HPC Workflow Manager. (Plugins > HPC-ParallelTools > HPC Workflow Manager).
  • Create a new directory on your local machine called "bruno_serial".
  • Create an empty macro script called "bruno_serial.ijm" inside the empty directory you created.
  • Create a new job with one (1) compute node selecting the macro script. (Right-click inside the table of the "HPC Workflow Manager" window and click the "Create a new job" context menu item. Select the macro script. Press the "Create" button.)
  • Copy the code bellow into the macro script.
  • Replace dd-20-36-01 with your username in the macro script.
  • Save the script and close the editor.
  • Upload the data. (Right-click the row of the job you created in the "HPC Workflow Manager" window and click the "Upload data" context menu item.)
  • Start the job.
startTime = getTime();

username = "dd-20-36-01";

experimentDirectory="bruno";
// Input folder
inputFolder = "/scratch/work/project/dd-20-36/"+experimentDirectory+"/";
// Output folder
outputFolder = "/home/training/"+username+"/"+experimentDirectory+"/OUTPUT/";
// Basename
basename = "slp-HisGap_1_t35s";
// CZI file
rawData = basename+"/slp-HisGap_1_t35s.czi";
viewPath =  inputFolder + rawData;
print("View path: "+viewPath);

// Number of views
views = 9;

// batch mode on
setBatchMode(true);

// Also save the output in a directory by size (total number of nodes):
outputFolder += "serial/";
exec("mkdir -p "+outputFolder);
exec("chmod -R 777 "+outputFolder);

for (i=0; i < views; i++) {
  startViewTime = getTime();
  
  // Open individual view
  run("Bio-Formats Macro Extensions");
  Ext.setSeries(i);
  Ext.openImagePlus(viewPath);
  // Rename
  rename(basename +"_E"+ i);
  // Get stack title
  dataset = getTitle();

  // Reset levels
  run("Green");
  setMinAndMax(200, 3000);
  run("Next Slice [>]");
  run("Magenta");
  setMinAndMax(200, 1500);
  Stack.setDisplayMode("composite");
  // Orient stack properly
  run("Rotate 90 Degrees Left");
  run("Flip Horizontally", "stack");
  // Save rotated data
  saveAs("Tiff", outputFolder + dataset +".tif");

  // Remove some noise
  run("Despeckle", "stack");
  // Subtract background to improve MAX projection
  run("Subtract Background...", "rolling=20 stack");
  // Reset new levels
  setMinAndMax(5, 800);
  run("Previous Slice [<]");
  setMinAndMax(5, 2000);
  // Save processed data as new stack
  saveAs("Tiff", outputFolder + dataset +"_sub20.tif");

  // Maximum projection
  run("Z Project...", "projection=[Max Intensity] all");
  // Get title of maximum projection
  max = getTitle();
  // Save maximum projection
  saveAs("Tiff", outputFolder + max);
  // Create and save as AVI movie
  run("RGB Color", "frames");
  run("AVI... ", "compression=None frame=15 save="+ outputFolder + max +".avi");
  // MAX
  close();
  // Sub
  close();
  
  print("View "+(i+1)+" time: "+(getTime() - startViewTime)/1000+" seconds");
}
print("Total execution time: "+(getTime() - startTime)/1000+" seconds for "+size+" nodes.");

// exit batch mode
setBatchMode(false);

Grouping into tasks for monitoring purposes

Serial

Serial version

Let's group the fourteen (14) steps in the following four (4) tasks:

  • Pre-process: Open individual view, rename, get stack title.
  • Adjust: Reset levels, orient stack properly, save rotated data.
  • Enhance image: Remove some noise, subtract background to improve maxprojection, reset new levels, save processed data as new stack.
  • Maximum projection: Maximum projection calculation, get title of maxi-mum projection, save maximum projection, create and save AVI movie.

Parallelization

Parallel

parallel version

Let’s parallelize the script by assigning each of the nine iterations of the loop to a separate compute node.

  • Start Fiji.
  • Start the HPC Workflow Manager. (Plugins > HPC-ParallelTools > HPC Workflow Manager).
  • Create a new directory on your local machine called "bruno_serial".
  • Create an empty macro script called "bruno_serial.ijm" inside the empty directory you created.
  • Create a new job with four (4) compute nodes selecting the macro script. (Right-click inside the table of the "HPC Workflow Manager" window and click the "Create a new job" context menu item. Select the macro script. Set the compute nodes to four (4). Press the "Create" button.)
    • We use only four instead of nine compute nodes due to artificial limitations in the queue created to conserve resources during the workshop.
  • Copy the code bellow into the macro script.
  • Replace dd-20-36-01 with your username in the macro script.
  • Save the script and close the editor.
  • Upload the data. (Right-click the row of the job you created in the "HPC Workflow Manager" window and click the "Upload data" context menu item.)
  • Start the job.
// Calculate the chunk size given 
// a size and the number of files.
function calculateChunkSize(size, views_number) {
  index = 0;
  counter = 0;
  chunk = newArray(size);
  while(counter < views_number){
    if(index == size){
      index = 0;
    }
    
    chunk[index] += 1;
    index++;
    counter++;
  }
  return chunk;
}

// Calculate the iteration interval start for each node:
function calculateInterval(size, chunk, start, end){
  start[0] = 0;
  for(i = 0; i < size; i++){
    if(i > 0){
      start[i] = end[i-1];
    }
    end[i] = start[i] + chunk[i];
  }
}

startTime = getTime();

preprocessingTask = parAddTask("Preprocessing");
rotateTask = parAddTask("Rotate");
enhanceTask = parAddTask("Enhance image");
maximumProjectionTask = parAddTask("Maximum projection");
parReportTasks();

// Username, change this before running:
username = "dd-20-36-01";

// Get the rank of the node and number of nodes available:
myRank = parGetRank();
size = parGetSize();

experimentDirectory="bruno";
// Input folder
inputFolder = "/scratch/work/project/dd-20-36/"+experimentDirectory+"/";
// Output folder
outputFolder = "/home/training/"+username+"/"+experimentDirectory+"/OUTPUT/";
// Basename
basename = "slp-HisGap_1_t35s";
// First CZI file
rawData = basename+"/slp-HisGap_1_t35s.czi";
viewPath =  inputFolder + rawData;
print("View path: "+viewPath);

// Number of views
views = 9;

// batch mode on
setBatchMode(true);

// Also save the output in a directory by size (total number of nodes):
outputFolder += "parallel/";
exec("mkdir -p "+outputFolder);
exec("chmod -R 777 "+outputFolder);

print("My rank: "+myRank+", number of nodes: "+size);

// Split the workload (the number of views) into evenly 
// distributed parts for each compute node if possible.

// Deal the workload iterations evenly:
start = newArray(size);
end = newArray(size);

chunk = calculateChunkSize(size, views);

// Create the iteration start and end for each node:
calculateInterval(size, chunk, start, end);

print("Rank: "+myRank+" size: "+size+" start: "+start[myRank]+" end: "+end[myRank]+" part size: "+chunk[myRank]);

max_work = chunk[myRank];
work = 0;
for (i=start[myRank]; i < end[myRank]; i++) {
    startViewTime = getTime();

    work += 1;
    work_done = work/max_work;

    // Open individual view
    viewPath =  inputFolder + rawData;
    print("View path: "+viewPath);
    run("Bio-Formats Macro Extensions");
    Ext.setSeries(i);
    Ext.openImagePlus(viewPath);
    parReportProgress(preprocessingTask,33*work_done);
    // Rename
    rename(basename +"_E"+ i);
    parReportProgress(preprocessingTask,66*work_done);
    // Get stack title
    dataset = getTitle();
    parReportProgress(preprocessingTask,100*work_done);

    // Reset levels
    run("Green");
    setMinAndMax(200, 3000);
    run("Next Slice [>]");
    run("Magenta");
    setMinAndMax(200, 1500);
    Stack.setDisplayMode("composite");
    parReportProgress(rotateTask,33*work_done);
    // Orient stack properly
    run("Rotate 90 Degrees Left");
    run("Flip Horizontally", "stack");
    parReportProgress(rotateTask,66*work_done);
    // Save rotated data
    saveAs("Tiff", outputFolder + dataset +".tif");
    parReportProgress(rotateTask,100*work_done);

    // Remove some noise
    run("Despeckle", "stack");
    parReportProgress(enhanceTask,25*work_done);
    // Subtract background to improve MAX projection
    run("Subtract Background...", "rolling=20 stack");
    parReportProgress(enhanceTask,50*work_done);
    // Reset new levels
    setMinAndMax(5, 800);
    run("Previous Slice [<]");
    setMinAndMax(5, 2000);
    parReportProgress(enhanceTask,75*work_done);
    // Save processed data as new stack
    saveAs("Tiff", outputFolder + dataset +"_sub20.tif");
    parReportProgress(enhanceTask,100*work_done);

    // Maximum projection
    run("Z Project...", "projection=[Max Intensity] all");
    parReportProgress(maximumProjectionTask,20*work_done);
    // Get title of maximum projection
    max = getTitle();
    parReportProgress(maximumProjectionTask,40*work_done);
    // Save maximum projection
    saveAs("Tiff", outputFolder + max);
    parReportProgress(maximumProjectionTask,60*work_done);
    // Create and save as AVI movie
    run("RGB Color", "frames");
    run("AVI... ", "compression=None frame=15 save="+ outputFolder + max +".avi");
    parReportProgress(maximumProjectionTask,80*work_done);
    // MAX
    close();
    // Sub
    close();
    parReportProgress(maximumProjectionTask,100*work_done);

    print("View "+(i+1)+" time: "+(getTime() - startViewTime)/1000+" seconds");
}
print("Rank: "+myRank+" total time before the barrier: "+(getTime() - startTime)/1000+" seconds.");
parBarrier();
if(myRank == 0){
    print("Total execution time: "+(getTime() - startTime)/1000+" seconds for "+size+" nodes.");
}
print("Rank "+myRank+" finished.");
// exit batch mode
setBatchMode(false);

video version 📺

main hub:house: previous 👈 next 👉

Clone this wiki locally