diff --git a/README.md b/README.md
index 181ba4b..fe3f729 100644
--- a/README.md
+++ b/README.md
@@ -24,6 +24,7 @@ Example | Description
 [axi_target](./axi_target)|Example of an AXI4-Target top-level interface.
 [Canny_RISCV](./Canny_RISCV)|Integrating a SmartHLS module created using the IP Flow into the RISC-V subsystem.
 [ECC_demo](./ECC_demo)|Example of Error Correction Code feature.
+[auto_instrument](./auto_instrument/)|Example of Automatic On-Chip Instrumentation feature.
 
 ## Simple Examples
 Example | Description
diff --git a/Training1/readme.md b/Training1/readme.md
index fecdc66..5ffc933 100644
--- a/Training1/readme.md
+++ b/Training1/readme.md
@@ -1015,7 +1015,7 @@ run in the pipeline at each stage. The leftmost column indicates the
 loop iteration for the instructions in the row starting (from Iteration
 0). For function pipelines, Iteration 0 corresponds to the first input.
 
-If you hold you mouse over an instruction you will see more details
+If you hold your mouse over an instruction you will see more details
 about the operation type.
 
 <p align="center"><img src=".//media/steady_state_alpha_blend_pipeline_viewer.png" /></br>Figure 15: SmartHLS Schedule Viewer: Pipeline Viewer</p></br>
@@ -1577,7 +1577,7 @@ SmartHLS.
 Pipelining is a common HLS optimization used to increase hardware
 throughput and to better utilize FPGA hardware resources. We also
 covered the concept of loop pipelining in the SmartHLS Sobel Filter
-Tutorial. In Figure 18a) shows a loop to be scheduled with 3
+Tutorial. Figure 18a) shows a loop to be scheduled with 3
 single-cycle operations: Load, Comp, Store. We show a comparison of the
 cycle-by-cycle operations when hardware operations in a loop are
 implemented b) sequentially (default) or c) pipelined (with SmartHLS
@@ -1606,7 +1606,7 @@ pragma or the function pipeline pragma:
 ```
 Loop pipelining only applies to a specific loop in a C++ function.
 Meanwhile, function pipelining is applied to an entire C++ function and
-SmartHLS will automatically unrolls all loops in that function.
+SmartHLS will automatically unroll all loops in that function.
 
 ## SmartHLS Pipelining Hazards: Why Initiation Interval Cannot Always Be 1
 
@@ -2710,14 +2710,12 @@ the user specifies an incorrect value in a SmartHLS pragma. For example,
 specifying an incorrect depth on a memory interface such as the
 following on line 29:
 ```c
-#pragma HLS interface argument(input_buffer) type(memory)
-num_elements(SIZE)
+#pragma HLS interface argument(input_buffer) type(memory) num_elements(SIZE)
 ```
 For example, we can try changing the correct SIZE array depth to a wrong
 value like 10:
 ```c
-#pragma HLS interface argument(input_buffer) type(memory)
-num_elements(10)
+#pragma HLS interface argument(input_buffer) type(memory) num_elements(10)
 ```
 
 Now we rerun SmartHLS to generate the hardware
diff --git a/Training2/readme.md b/Training2/readme.md
index 9a4b003..c07e53c 100644
--- a/Training2/readme.md
+++ b/Training2/readme.md
@@ -783,7 +783,7 @@ int main() {
 
 ## Verification: Co-simulation of Multi-threaded SmartHLS Code
 
-As mentioned before the `producer_consumer` project cannot be simulated
+As mentioned before, the `producer_consumer` project cannot be simulated
 with co-simulation. This is because the `producer_consumer` project has
 threads that run forever and do not finish before the top-level function
 returns. SmartHLS co-simulation supports a single call to the top-level
@@ -1127,11 +1127,11 @@ is defined by the number of rows, columns, and the depth.
 Convolution layers are good for extracting geometric features from an
 input tensor. Convolution layers work in the same way as an image
 processing filter (such as the Sobel filter) where a square filter
-(called a **kernel**) is slid across an input image. The **size** of
-filter is equal to the side length of the square filter, and the size of
-the step when sliding the filter is called the **stride**. The values of
-the input tensor under the kernel (called the **window**) and the values
-of the kernel are multiplied and summed at each step, which is also
+(called a **kernel**) is slid across an input image. The **size** of a
+filter is equal to its side length, and the size of the step when sliding 
+the filter is called the **stride**. The values of the input tensor 
+under the kernel (called the **window**) and the values of the 
+kernel are multiplied and summed at each step, which is also
 called a convolution. Figure 13 shows an example of a convolution layer
 processing an input tensor with a depth of 1.
 
@@ -1580,16 +1580,16 @@ we show the input tensor values and convolution filters involved in the
 computation of the set of colored output tensor values (see Loop 3
 arrow).
 
-Loop 1 and Loop 2 the code traverses along the row and column dimensions
+For Loop 1 and Loop 2, the code traverses along the row and column dimensions
 of the output tensor. Loop 3 traverses along the depth dimension of the
-output tensor, each iteration computes a `PARALLEL_KERNELS` number of
+output tensor, and each iteration computes a total of `PARALLEL_KERNELS`
 outputs. The `accumulated_value` array will hold the partial
 dot-products. Loop 4 traverses along the row and column dimensions of
-the input tensor and convolution filter kernels. Then Loop 5 walks
-through each of the `PARALLEL_KERNELS` number of selected convolution
+the input tensor and convolution filter kernels. Then, Loop 5 walks
+through each of the `PARALLEL_KERNELS` selected convolution
 filters and Loop 6 traverses along the depth dimension of the input
 tensor. Loop 7 and Loop 8 add up the partial sums together with biases
-to produce `PARALLEL_KERNEL` number of outputs.
+to produce `PARALLEL_KERNEL` outputs.
 
 ```C
 const static unsigned PARALLEL_KERNELS = NUM_MACC / INPUT_DEPTH;
@@ -2203,7 +2203,7 @@ instructions that always run together with a single entry point at the
 beginning and a single exit point at the end. A basic block in LLVM IR
 always has a label at the beginning and a branching instruction at the
 end (br, ret, etc.). An example of LLVM IR is shown below, where the
-`body.0` basic block performs an addition (add) and subtraction (sub) and
+`body.0` basic block performs an addition (add) and subtraction (sub), and
 then branches unconditionally (br) to another basic block labeled
 `body.1`. Control flow occurs between basic blocks.
 
@@ -2231,7 +2231,7 @@ button (![](.//media/image28.png)) to build the design and generate the
 schedule.
 
 We can ignore the `printWarningMessageForGlobalArrayReset` warning message
-for global variable a in this example as described in the producer
+for global variable `a` in this example as described in the producer
 consumer example in the [section 'Producer Consumer Example'](#producer-consumer-example).
 
 The first example we will look at is the `no_dependency` example on line
@@ -2240,7 +2240,7 @@ The first example we will look at is the `no_dependency` example on line
 
 <p align="center"><img src=".//media/image19.png" /></p>
 
-```
+```c++
  8  void no_dependency() {
  9  #pragma HLS function noinline
 10    e = b + c;
@@ -2252,10 +2252,10 @@ The first example we will look at is the `no_dependency` example on line
 <p align="center">Figure 28: Source code and data dependency graph for no_dependency
 function.</p>
 
-In this example, values are loaded from b, c, and d and additions happen
-before storing to *e*, *f*, and *g*. None of the adds use results from
+In this example, values are loaded from `b`, `c`, and `d`, and additions happen
+before storing to `e`, `f`, and `g`. None of the adds use results from
 the previous adds and thus all three adds can happen in parallel. The
-*noinline* pragma is used to prevent SmartHLS from automatically
+`noinline` pragma is used to prevent SmartHLS from automatically
 inlining this small function and making it harder for us to understand
 the schedule. Inlining is when the instructions in the called function
 get copied into the caller, to remove the overhead of the function call
@@ -2290,17 +2290,17 @@ the store instruction highlighted in yellow depends on the result of the
 add instruction as we expect.
 
 We have declared all the variables used in this function as
-**volatile**. The volatile C/C++ keyword specifies that the variable can
+**volatile**. The `volatile` C/C++ keyword specifies that the variable can
 be updated by something other than the program itself, making sure that
 any operation with these variables do not get optimized away by the
 compiler as every operation matters. An example of where the compiler
 handles this incorrectly is seen in the [section 'Producer Consumer Example'](#producer-consumer-example), where we had to
 declare a synchronization signal between two threaded functions as
-volatile. Using volatile is required for toy examples to make sure each
+`volatile`. Using `volatile` is required for toy examples to make sure each
 operation we perform with these variables will be generated in hardware
 and viewable in the Schedule Viewer.
 
-```
+```c++
 4  volatile int a[5] = {0};
 5  volatile int b = 0, c = 0, d = 0;
 6  volatile int e, f, g;
@@ -2315,7 +2315,7 @@ code and SmartHLS cannot schedule all instructions in the first cycle.
 
 <p align="center"><img src=".//media/image68.png" /></p>
 
-```
+```c++
 15  void data_dependency() {
 16  #pragma HLS function noinline
 17    e = b + c;
@@ -2337,8 +2337,8 @@ second add is also used in the third add. These are examples of data
 dependencies as later adds use the data result of previous adds. Because
 we must wait for the result `e` to be produced before we can compute `f`,
 and then the result `f` must be produced before we can compute `g`, not all
-instructions can be scheduled immediately. They must wait for their
-dependent instructions to finish executing before they can start, or
+instructions can be scheduled immediately. They must wait for the instructions
+they depend on to finish executing before they can start, or
 they would produce the wrong result.
 
 <p align="center"><img src=".//media/image70.png" /></br>
@@ -2375,7 +2375,7 @@ memories.
 
 <p align="center"><img src=".//media/image72.png" /></p>
 
-```
+```c++
 22  void memory_dependency() {
 23  #pragma HLS function noinline
 24    volatile int i = 0;
@@ -2419,7 +2419,7 @@ resource cannot be scheduled in parallel due to a lack of resources.
 `resource_contention` function on line 30 of
 `instruction_level_parallelism.cpp`.
 
-```
+```c++
 30  void resource_contention() {
 31  #pragma HLS function noinline
 32    e = a[0];
@@ -2452,7 +2452,7 @@ when generating the schedule for a design.
 Next, we will see an example of how loops prevent operations from being
 scheduled in parallel.
 
-```
+```c++
 37  void no_loop_unroll() {
 38  #pragma HLS function noinline
 39    int h = 0;
@@ -2481,10 +2481,9 @@ has no unrolling on the loop and `loop_unroll` unrolls the loop
 completely. This affects the resulting hardware by removing the control
 signals needed to facilitate the loop and combining multiple loop bodies
 into the same basic block, allowing more instructions to be scheduled in
-parallel. The trade-off here is an unrolled loop does not reuse hardware
-resources and can potentially use a lot of resources. However, the
-unrolled loop would finish earlier depending on how inherently parallel
-the loop body is.
+parallel. The trade-off here is that an unrolled loop does not reuse hardware 
+resources and can potentially use a lot of resources, however it will 
+finish earlier depending on how inherently parallel the loop body is.
 
 ![](.//media/image3.png)To see the effects of this, open the Schedule
 Viewer and first click on the `no_loop_unroll` function shown in Figure
diff --git a/Training3/readme.md b/Training3/readme.md
index 0030e54..eb97732 100644
--- a/Training3/readme.md
+++ b/Training3/readme.md
@@ -2002,7 +2002,7 @@ format to write and read from the addresses that line up with addresses
 in the AXI target interface in the SmartHLS core. For burst mode, the
 processor will also write to and read from addresses corresponding to
 the DDR memory. Note, the pointers are cast as volatile to prevent the
-SoftConsole compiler from optimization away these reads and writes. The
+SoftConsole compiler from optimizating away these reads and writes. The
 Mi-V then asserts the run signal and waits until the accelerator
 de-asserts it, signally the computing is done. The Mi-V then reads from
 the memory to issue read requests and get the results from the
diff --git a/Training4/readme.md b/Training4/readme.md
index 46dade7..769834f 100644
--- a/Training4/readme.md
+++ b/Training4/readme.md
@@ -385,7 +385,7 @@ array.
 
 ```c
 24  // The core logic of this example                                          
-25  void vector_add_sw(int a, int b, int result) { 
+25  void vector_add_sw(int* a, int* b, int* result) { 
 26    for (int i = 0; i < SIZE; i++) {                                  
 27      result[i] = a[i] + b[i];                                             
 28    }                                                                          
@@ -398,7 +398,7 @@ Now we look on line 70 at the `vector_add_axi_target_memcpy` top-level
 C++ function as shown in Figure 6‑12.
 
 ```c
-70  void vector_add_axi_target_memcpy(int a, int b, int result) { 
+70  void vector_add_axi_target_memcpy(int* a, int* b, int* result) { 
 71  #pragma HLS function top                                                               
 72  #pragma HLS interface control type(axi_target)                                        
 73  #pragma HLS interface argument(a) type(axi_target) num_elements(SIZE)                
@@ -500,7 +500,7 @@ corresponding to the C++ top-level function as shown below in Figure
 clock and a single AXI4 Target port. Due to the large number of AXI4
 ports in the RTL, SmartHLS uses a wildcard “`axi4target_*`” to
 simplify the table. The “Control AXI4 Target” indicates that
-start/finish control as done using the AXI target interface. Each of the
+start/finish control is done using the AXI target interface. Each of the
 function’s three arguments also use the AXI target interface. The
 address map of the AXI target port is given later in the report.
 
diff --git a/auto_instrument/Makefile b/auto_instrument/Makefile
new file mode 100644
index 0000000..d9b3e47
--- /dev/null
+++ b/auto_instrument/Makefile
@@ -0,0 +1,3 @@
+SRCS=main.cpp
+LOCAL_CONFIG = -legup-config=config.tcl
+HLS_INSTRUMENT_ENABLE=1
diff --git a/auto_instrument/README.md b/auto_instrument/README.md
new file mode 100644
index 0000000..0f39d50
--- /dev/null
+++ b/auto_instrument/README.md
@@ -0,0 +1,573 @@
+# Automatic On-Chip Instrumentation
+
+- [Automatic On-Chip Instrumentation](#automatic-on-chip-instrumentation)
+  - [Introduction](#introduction)
+  - [Requirements](#requirements)
+  - [Explanation of the Example Design](#explanation-of-the-example-design)
+  - [About the Automatic On-Chip Instrumentation Flow](#about-the-automatic-on-chip-instrumentation-flow)
+  - [Instrument and Compile](#instrument-and-compile)
+    - [Instrumenting the design](#instrumenting-the-design)
+    - [Compile \& Program Hardware](#compile--program-hardware)
+    - [Compiling the software](#compiling-the-software)
+  - [Part 1: Debugging Mode](#part-1-debugging-mode)
+    - [Connecting to the JTAG Cable](#connecting-to-the-jtag-cable)
+    - [Triggering and Capturing dDta](#triggering-and-capturing-data)
+    - [Running the Software](#running-the-software)
+      - [Exercise 1 - Examine the Submodule Delays](#exercise-1---examine-the-submodule-delays)
+      - [Exercise 2 - Verify the Presence of the `0x0FF` Flag](#exercise-2---verify-the-presence-of-the-0x0ff-flag)
+      - [Exercise 3: FIFO Occupancy Waveform](#exercise-3-fifo-occupancy-waveform)
+  - [Part 2: Monitoring Mode with ModelSim](#part-2-monitoring-mode-with-modelsim)
+  - [Part 3: Monitoring Mode with FIFO Dashboard](#part-3-monitoring-mode-with-fifo-dashboard)
+  - [Appendix A: Using the Identify GUI](#appendix-a-using-the-identify-gui)
+
+## Introduction
+
+In a SmartHLS project there are now three levels of debug and verification:
+
+1. **Software-only** - Compile the C++ code and run on the host machine (e.g. x86).
+2. **Co-simulation** - Use RTL simulation to confirm the results match the software-only run.
+3. **On-Chip** - Add probes to the design to see data once the FPGA is programmed (**this example**).
+
+The SmartHLS Automatic On-Chip Instrumentation feature streamlines on-chip debugging and verification by automatically adding probes to ports and FIFOs in the generated Verilog code, eliminating the need for manual instrumentation. This feature enables developers to:
+
+1. Monitor input and output data of HLS modules through port instrumentation
+2. Track data flow between subfunctions via FIFO instrumentation
+3. Detect critical issues like FIFO overflows, FIFO underflows (not enough data) and pipeline bubbles
+4. Optimize FIFO depths to avoid overprovisioning of LSRAM resources
+
+To minimize area overhead, users can control instrumentation scope through configurable log levels that specify which class of signals to monitor.
+
+## Requirements
+
+**IMPORTANT:** This is an advanced example. It is assumed that you have completed the previous training modules available in the [examples repository](https://github.com/MicrochipTech/fpga-hls-examples), and have experience with on-chip debugging using ModelSim, Synopsys Identify, and command-line operations.
+
+Before beginning this tutorial, you should install the following software:
+
+- Libero® SoC 2024.2 or later ([Download Page](https://www.microchip.com/en-us/products/fpgas-and-plds/fpga-and-soc-design-tools/fpga/libero-software-later-versions)). SmartHLS™ is packaged with Libero
+
+- The following hardware is required:
+  - PolarFire® SoC FPGA Icicle Kit. Please follow [this link](https://onlinedocs.microchip.com/oxy/GUID-AFCB5DCC-964F-4BE7-AA46-C756FA87ED7B-en-US-13/GUID-1F9BA312-87A9-43F0-A66E-B83D805E3F02.html) to set up your Icicle Kit and make sure Linux boots-up and that the board has an IP network address assigned to it.
+  - This example can be used in a local or remote configuration depending on where the Icicle Kit board is connected to via JTAG. The *JTAG host* is the machine that is connected to the board and the *build host* is the machine where the project is compiled. See the following diagram:
+
+  ![alt text](assets/local_remote.drawio.png)
+
+- Add the following environment variables, and adjust as necessary for your setup. You will need to set these up for every one of the terminals we launch in the tutorial below.
+
+  - On Linux:
+
+    ```bash
+    export SHLS_ROOT_DIR="<SMARTHLS_INSTALLATION_DIRECTORY>/SmartHLS"
+    export PATH="$SHLS_ROOT_DIR/dependencies/python:$PATH"
+    export BOARD_IP="<YOUR ICICLE KIT BOARD IP HERE>"
+    export JTAG_HOST="<YOUR JTAG HOST IP HERE>" # For local JTAG debug, use 127.0.0.1
+    export PROGRAMMER_ID="<YOUR PROGRAMMER ID HERE>" # Available from FPExpress
+    ```
+
+  - On Windows PowerShell please either use forward slashes (/) or double back slashes (\\\\):
+
+    ```powershell
+    $env:SHLS_ROOT_DIR = "<SMARTHLS_INSTALLATION_DIRECTORY>/SmartHLS"
+    $env:PATH = "$env:SHLS_ROOT_DIR/dependencies/python;$env:PATH"
+    $env:BOARD_IP="<YOUR ICICLE KIT BOARD IP HERE>"
+    $env:JTAG_HOST="<YOUR JTAG HOST IP HERE>" # For local JTAG debug, use 127.0.0.1
+    $env:PROGRAMMER_ID="<YOUR PROGRAMMER ID HERE>" # Available from FPExpress
+    ```
+
+    - **KNOWN ISSUE**: In Windows, SmartHLS includes Python 3 and the binary name is `python.exe`, however, a TCL script in the SmartHLS 2024.2 installation is explicitly calling `python3`, which does not exist. To be able to run the instrumentation example in Windows, just copy the file as follows:
+
+    ```console
+    cp "$env:SHLS_ROOT_DIR/dependencies/python/python.exe" "$env:SHLS_ROOT_DIR/dependencies/python/python3.exe"
+    ```
+
+**NOTE**: The `JTAG_HOST` variable can be set to `127.0.0.1` if the machine that the board is connected to is the same as the machine where the project is being compiled and debugged.
+
+## Explanation of the Example Design
+
+We have created a simple, yet general example that describes a streaming design pattern to showcase how the SmartHLS Automatic On-Chip Instrumentation feature works. The following is a block diagram of the example architecture where the red dots represent the instrumentation probes that are automatically inserted:
+
+![alt text](assets/example_design.drawio.png)
+
+ A typical pipeline starts with a data `producer()`, and the data goes through a series of processing functions, in this case, the `fifoToFifo()` function just forwards the data, but in a real design it can be any stream processing function. Finally, the data is read by the `consumer()`. The `producer()` will start generating a continuous data sequence when the RISC-V CPU sends the `go = 1` signal and will stop when `go = 0`.
+ Each instance of the `FifotoFifo()` function and the `consumer()` function have a `delay` argument to artificially create a backpressure in the dataflow and cause the FIFOs to fill up to a level proportional to the delays.  The delays for the pipeline are passed as command-line arguments to the RISC-V binary (`.elf` file).  Once the RISC-V binary is running, the user can press `CTRL + C` to send the `go = 0` signal to the HLS module to stop the execution of the program.
+
+ With this example, users can see on-chip when the RISC-V CPU writes the `go` and `delay` arguments to the HLS module by looking at the `AXI target` ports, or can see how data flows through the FIFOs and how they fill-up.
+
+## About the Automatic On-Chip Instrumentation Flow
+
+From the user perspective, the flow is simple compared to manual instrumentation: just enable the instrumentation flow and adjust the instrumentation parameters as needed.  The rest is handled by SmartHLS.
+
+Here is a high-level diagram of the Automatic On-Chip Instrumentation flow:
+
+![alt text](assets/auto_instrumentation_flow.drawio.png)
+
+The process follows these steps:
+
+1. *Initial Phase*:
+
+   - User writes C++ code.
+   - SmartHLS converts the C++ code to Verilog and generates an initial instrumentation specification (`instrument_config.json`)
+   - Users can optionally customize the instrumentation configuration
+
+2. *High-Level Instrumentation Phase*:
+
+   - SmartHLS integrates the generated Verilog module into the Icicle Kit Libero reference design and runs RTL synthesis. The resulting netlist is not yet instrumented.
+   - SmartHLS High-Level Instrumentor then extracts information from the uninstrumented netlist and from the `instrument_config.json` to generate TCL scripts for Synplify to actually instrument the design.
+
+3. *Implementation Phase*:
+
+   - Synplify & Libero apply the generated instrumentation constraints (`identify.idc` file) to generate an instrumented netlist.
+   - Libero completes Place-and-Route and generates the bitstream
+
+4. *Debugging and Monitoring Phases*:
+
+   Two operational modes are available to capture data from the instrumented design:
+
+   - **Debugging Mode**: Users can set triggers for specific conditions and use Identify to capture data. More interactive.
+   - **Monitoring Mode**: Automatic periodic data capture, always trigger without conditions.
+
+5. Visualization Phase:
+
+   - The Update Wave Viewer script processes the captured data (`.vcd` files) and convert it to viewer-specific formats (e.g., .wlf for ModelSim)
+   - SmartHLS High-Level Instrumentor generates TCL scripts for optimal signal display in ModelSim:
+     - Groups signals by top-level
+     - Configures hex notation for address/data buses
+     - Displays FIFO occupancy levels as analog waveforms
+
+## Instrument and Compile
+
+### Instrumenting the design
+
+First, to enable the Automatic On-Chip Instrumentation feature in the project, the `Makefile` contains the following line:
+
+```Makefile
+HLS_INSTRUMENT_ENABLE=1
+```
+
+In a new terminal, remove stale files by running
+
+```bash
+shls clean
+```
+
+Then run the following command to generate a file called `instrument_conf.json`.
+
+```bash
+shls -a instrument_init
+```
+
+This command will automatically run `shls hw` first to convert the C++ code to Verilog,
+since it is a pre-requisite for generating the `instrument_conf.json` file. This is how `instrument_conf.json` will look like:
+
+```json
+{
+    "modules": {
+        "hlsModule": {
+            "log_level": "2",
+            "fifo_log_level": "0"
+        }
+    },
+    "dashboard": {
+        "max_iterations": -1,
+        "show_markers": 1,
+        "monitoring_mode": 0,
+        "waveform_period": "10"
+    },
+    "iice_options": {
+        "sample_buffer_depth": 1024,
+        "iice_name": ""
+    }
+}
+```
+
+The `hlsModule` is the same name of the top-level function as described in the `main.cpp` file:
+
+```cpp
+void hlsModule(volatile unsigned char& go,
+              unsigned long long int delay1,
+              unsigned long long int delay2,
+              unsigned long long int delay3,
+              unsigned long long int delay4) {
+    #pragma HLS function dataflow top
+   ...
+}
+```
+
+A full explanation of the parameters of `instrument_conf.json` is located in the [User Guide](https://onlinedocs.microchip.com/oxy/GUID-AFCB5DCC-964F-4BE7-AA46-C756FA87ED7B-en-US-13/GUID-0BA4F982-F732-459D-8CAB-C02B0E92879F.html#GUID-0BA4F982-F732-459D-8CAB-C02B0E92879F__GUID-F622374A-37E3-440B-922A-7980536D3130).
+
+**NOTE:** Make sure to clean your project and re-run `shls -a instrument_init` if you modify the top-level modules of your design, for example, if you want to add a new top-level function.
+
+Now, let's change the log levels related to `hlsModule()`. A lower log level means fewer signals will be instrumented, which in turn saves resources. The same property applies to the FIFO log level. Let's change `log_level` to 3, and the `fifo_log_level` to 3.
+
+```json
+"hlsModule": {
+            "log_level": "3",
+            "fifo_log_level": "3"
+        }
+```
+
+For the sake of this example, we will only demonstrate log levels 3 and 3 (respectively), but a description of each log level is located in the [User Guide](https://onlinedocs.microchip.com/oxy/GUID-AFCB5DCC-964F-4BE7-AA46-C756FA87ED7B-en-US-13/GUID-0BA4F982-F732-459D-8CAB-C02B0E92879F.html#GUID-0BA4F982-F732-459D-8CAB-C02B0E92879F__GUID-4D55BBE5-2C4C-4533-BA42-68622188A166).
+
+### Compile & Program Hardware
+
+Next, run Synthesis and Place-and-Route. This can be done with the following command:
+
+```bash
+shls -a soc_accel_proj_pnr
+```
+
+Now program the FPGA with the instrumented bitstream file (`hls_output\soc\designer\MPFS_ICICLE_KIT_BASE_DESIGN\Icicle_SoC.job`). You can use the command line to do this (please make sure you have declared the `PROGRAMMER_ID` environment variable):
+
+```bash
+shls soc_accel_proj_program
+```
+
+Alternatively, you can also use FlashPro Express. If you do, please make sure you close FPExpress after flashing the bitstream, as it may interfere with the debugging process.
+
+At this point the FPGA has been programmed with the instrumented design. Now let's compile the software.
+
+### Compiling the Software
+
+You can now cross-compile the `main()` program for the RISC-V CPU by typing:
+
+```bash
+shls -a soc_sw_compile_accel
+```
+
+Then copy the binary (`.elf` file) to the board:
+
+On Linux:
+
+```bash
+scp hls_output/auto_instrument.accel.elf root@$BOARD_IP:./
+```
+
+On Windows Powershell
+
+```powershell
+scp hls_output/auto_instrument.accel.elf root@$env:BOARD_IP:./
+```
+
+Do *NOT* run the `auto_instrument.accel.elf` program on-board yet. Let's first arm the trigger in Identify in the next section.
+
+## Part 1: Debugging Mode
+
+### Connecting to the JTAG Cable
+
+To start debugging, we first need to connect to the Icicle Kit board via JTAG. On the `JTAG_HOST` machine (the one where your board is connected to), launch a new terminal and start the Actel JTAG server. You can choose any unoccupied port, we'll use `57123`, but please remember which port you use.
+
+```bash
+acteljtag -p 57123
+```
+
+NOTE: In Windows, if you see an error upon running this command, you may have to use the fully qualified PATH of `acteljtag`. You can find this in Windows PowerShell using
+
+```powershell
+Get-Command acteljtag
+```
+
+*NOTE*: Keep the `acteljtag` server terminal open as occasionally it may get disconnected and may need to be started again.
+
+Now open an interactive shell for Identify Debugger:
+
+On Linux, run
+
+```bash
+identify_debugger_shell -licensetype identdebugger_actel -shell  hls_output/soc/synthesis/MPFS_ICICLE_KIT_BASE_DESIGN_syn.prj
+```
+
+On Windows, run
+
+```powershell
+identify_debugger_console -licensetype identdebugger_actel  hls_output/soc/synthesis/MPFS_ICICLE_KIT_BASE_DESIGN_syn.prj
+```
+
+And then you can connect to the JTAG server using the following commands (make sure to use the same port number as before):
+
+```tcl
+server set -addr $::env(JTAG_HOST) -port 57123 -cabletype Microsemi_BuiltinJTAG
+server start
+com cableoption Microsemi_BuiltinJTAG_port $::env(PROGRAMMER_ID)
+com check
+```
+
+### Triggering and Capturing Data
+
+Next, we'll pick a signal to trigger on. As a result from the instrumentation process, you should see that the `Identify Design Constraints` file (`hls_output/soc/synthesis/identify.idc`) has been automatically generated. It contains all the signals that are being instrumented. For example, you may notice the line:
+
+```tcl
+{/FIC_0_PERIPHERALS_1/hlsModule_top_0/hlsModule_inst/hlsModule_BB_0_fifo1_inst/genblk1/fwft_fifo_bram_inst/empty}
+```
+
+This is a signal that indicates whether or not `fifo1` is empty. We can then set a trigger when this signal transitions from high to low, which would indicate that `fifo1` is not empty.  When this happens, the trigger will inform Identify to start recording sample data every cycle up until the `sample_buffer` is full, which then can be visualized in ModelSim. To do this, run the following command in the Identify shell:
+
+```tcl
+watch enable -language verilog  {/FIC_0_PERIPHERALS_1/hlsModule_top_0/hlsModule_inst/hlsModule_BB_0_fifo1_inst/genblk1/fwft_fifo_bram_inst/empty} {1'b1} {1'b0}
+```
+
+Finally, arm the trigger:
+
+```tcl
+run -iice {IICE_auto_instrument}
+```
+
+Now wait until you see the following in the shell:
+
+```tcl
+DI179 IICE 'IICE_auto_instrument' configured. Waiting for trigger.
+% Running.
+Running...
+```
+
+This will wait until `inputFifo`'s `empty` signal becomes low. But to get it to become low, we need to run the `auto_instrument.accel.elf` binary that was compiled earlier on-board.
+
+### Running the Software
+
+Now, to run the design on the board, open an `ssh` session to the Icicle Kit board:
+
+```bash
+ssh root@$BOARD_IP
+```
+
+Now run the software binary that was copied earlier on-board. The four arguments are the delays for the FIFOs. This will cause `fifo1` to become non-empty which will, in turn, trigger in Identify.
+
+```bash
+./auto_instrument.accel.elf 0 0 0 0
+```
+
+You should see in the Identify shell that the trigger has been activated. Let's now write the captured data to a `.vcd` file. First press ENTER, and then execute:
+
+```console
+write vcd "IICE_auto_instrument.vcd" -iice IICE_auto_instrument
+```
+
+Now in a new terminal window, launch ModelSim by running
+
+```console
+vsim -do hls_output/scripts/instrument/vsim_keyboard_shortcut
+```
+
+Now, open the ModelSim window and press Ctrl + R to refresh.
+
+You should see the signals for FIFOs arranged and grouped in an intuitive manner. You can expand the `User_Defined_FIFOs` group to see the signals for the FIFOs in the design. For example, here's the grouped signals for `fifo1` (after toggling on leaf names):
+
+![alt text](assets/wave_template_grouping.png)
+
+#### Exercise 1 - Examine the Submodule Delays
+
+Let's take a look at the `empty` and `write_data` signals for `fifo1`, and compare these signals to the corresponding counterparts for `fifo2`, `fifo3`, and `fifo4`. Since we ran the HLS module with all delays set to `0`, we expect to see that the time between two consecutive FIFOs becoming non-empty is very small. Place a cursor at falling edge of the `empty` signal for all the FIFOs. You may have to zoom in a little to get it right. For clarity, we'll remove all the other signals for now, so that we can see the four `empty` signals on top of each other.
+
+![alt text](assets/empty_signals_delay_0.png)
+
+Notice that the delay between the falling edges is 60ns. Since a clock cycle is 10ns, 60ns is 6 clock cycles. The reason for this offset in delay is due to some of the control logic in the generated Verilog code. In general, expect
+
+- A 6-cycle delay when the delay is 0
+- An (9 + N)-cycle delay when the delay is N, for some positive integer N.
+
+To confirm this, kill the executable running on the board by pressing `Ctrl + C`, arm the trigger again, and then rerun the program with all delays set to `1`.
+
+In the Identify shell:
+
+```console
+run -iice {IICE_auto_instrument}
+```
+
+Then, in the board SSH terminal:
+
+```console
+./auto_instrument.accel.elf 1 1 1 1
+```
+
+Then, refresh the ModelSim waveform (press `Ctrl + R`), and replace the cursors as before, on the falling edge of the empty signal. You should see the difference is 9 + 1 = 10 clock cycles
+
+![alt text](assets/empty_signals_delay_1.png)
+
+*Exercise*: Try with different delays to make sure the design works.
+
+#### Exercise 2 - Verify the Presence of the `0x0FF` Flag
+
+When the SIGINT signal (`Ctrl + C`) is sent to the executable running on the board, the `producer()` function should write `0x0FF` (wordplay for "OFF") to `fifo1` and terminate. When `0x0FF` is seen by the other functions in the pipeline, they too will terminate, effectively ceasing operation of the pipeline completely. Let's confirm this is actually the case.
+
+In the Identify shell, remove the existing trigger.
+
+```console
+watch disable {/FIC_0_PERIPHERALS_1/hlsModule_top_0/hlsModule_inst/hlsModule_BB_0_fifo1_inst/genblk1/fwft_fifo_bram_inst/empty}
+```
+
+Then, trigger on `0x0FF` being written to `fifo1`:
+
+```console
+watch enable -language verilog  {/FIC_0_PERIPHERALS_1/hlsModule_top_0/hlsModule_inst/hlsModule_BB_0_fifo1_inst/genblk1/fwft_fifo_bram_inst/write_data} {32'h0FF}
+```
+
+Now run the debugger
+
+```console
+run -iice {IICE_auto_instrument}
+```
+
+And then, kill the running executable on the board by pressing `Ctrl + C`.
+
+Then, in the Identify shell, the trigger should have been activated. Write to the `.vcd` file by pressing ENTER then running
+
+```console
+write vcd "IICE_auto_instrument.vcd" -iice IICE_auto_instrument
+```
+
+Then, refresh ModelSim (`Ctrl + R`).
+
+You should see the FIFOs becoming empty at the tail-end of the waves.
+
+Let's check what was written to `fifo1` just before it became empty. Look at the last falling edge of `write_en`, and place a cursor just before it (this is the location of the last write). You will have to zoom in.
+
+![alt text](assets/write_data.png)
+
+As expected, we see that the last written word is `0x0FF` (you can also see that the word written before is `0x11FE<SOME NUMBER>`, as we expect!).
+
+#### Exercise 3: FIFO Occupancy Waveform  
+
+A very important signal we have not yet mentioned is the `usedw` signal. This signal counts the number of elements in the FIFO at any given point in time. In this design, we expect that the occupancy of each FIFO grows during the initial delay stage, and then remains constant until the executable running on-board is killed, in which case it will become empty. We will examine the `usedw` signal for when the FIFO occupancy is growing (when the HLS modules is first started), and leave the examination of the `usedw` signal when the FIFO size is shrinking as an exercise to the reader.
+
+First, remove the trigger set in the previous section.
+
+```console
+watch disable  {/FIC_0_PERIPHERALS_1/hlsModule_top_0/hlsModule_inst/hlsModule_BB_0_fifo1_inst/genblk1/fwft_fifo_bram_inst/write_data}
+```
+
+Then, add a trigger for `fifo1` becoming non-empty
+
+```console
+watch enable -language verilog  {/FIC_0_PERIPHERALS_1/hlsModule_top_0/hlsModule_inst/hlsModule_BB_0_fifo1_inst/genblk1/fwft_fifo_bram_inst/empty} {1'b1} {1'b0}
+```
+
+Then, run the debugger
+
+```console
+run -iice {IICE_auto_instrument}
+```
+
+Finally, run the executable on the board with all delays set to `12`, then execute
+Then, in the Identify shell, write to the `.vcd` file by pressing ENTER then running
+
+```console
+write vcd "IICE_auto_instrument.vcd" -iice IICE_auto_instrument
+```
+
+in the Identify shell to write to the `.vcd`, and then refresh ModelSim (`Ctrl + R`).
+
+First, let's format the `usedw` signal for `fifo1` in ModelSim. Right-click it and hover over `Format`, then select `Analog (automatic)`. Then, right-click it again and hover over `Radix`, then select `Decimal`.
+
+Now you can place a cursor anywhere in the waveform and see the occupancy level of `fifo1`. You will see that the occupancy at the pipeline's stable state is 20. This is because the initial delay of 12 plus the control logic delay of the submodules. In these 21 clock cycles, 20 elements were able to be written before the `fifotoFifo()` function started forwarding data from `fifo1` to `fifo2`.
+
+![alt text](assets/usedw_delay_12.png)
+
+*Exercise*: Play around with the delays and check that the occupancies of the other FIFOs makes sense intuitively.
+
+Now close ModelSim and the Identify Debugger console.
+
+## Part 2: Monitoring Mode with ModelSim
+
+So far, anytime we wanted to test a new delay, we would have to manually arm the Identify trigger, write to the `.vcd` file, and refresh ModelSim. This is an interactive approach where a new capture would overwrite the previous waveform where developers can inspect and experiment with different trigger conditions.
+
+In contrast, in monitoring mode the goal is to have more long-term setting to see how signals change over more prolonged periods of time. For example, to see if FIFOs slowly start growing and eventually overflow.  In monitoring mode the waveforms in ModelSim will update automatically with every new capture of data from the instrumented design and the trigger will happen periodically and with no trigger condition.
+
+Monitoring mode consists of two parts:
+
+- a monitoring process
+- a visualizing process
+  - with a waveform (using ModelSim)
+  - with a bar plot (using Python)
+
+To enable Monitoring Mode, change
+
+```text
+set monitoring_mode 0
+```
+
+to
+
+```text
+set monitoring_mode 1
+```
+
+in `hls_output/scripts/update_vcd.tcl`. This indicates to the waveform updating scripts that when we get new data from the debugger, we don't want to refresh the waveform, but rather want to concatenate the new data to the end of the existing waveform.
+
+Then, open a new terminal and start a monitoring process that periodically captures the data:
+
+```console
+identify_debugger_shell -licensetype identdebugger_actel ./hls_output/scripts/instrument/monitor.tcl $PROGRAMMER_ID
+```
+
+Finally, open Modelsim in a new terminal for visualization:
+
+```console
+vsim -do hls_output/scripts/instrument/update_vcd.tcl
+```
+
+This will launch ModelSim again, but the waveform will update continuously (no need to press `Ctrl+R` to refresh) as soon as Identify provides new captured data periodically.
+
+Now close ModelSim and the Identify Debugger console.
+
+## Part 3: Monitoring Mode with FIFO Dashboard
+
+When writing C++ to design a hardware module, it may not be clear at first how deep your FIFOs need to be. On the one hand, if they are too shallow, then you will have a lot of data backup (backpressure). And, if they are too deep, then you may be overprovisioning and wasting area.
+
+The FIFO Monitoring Dashboard aims to show developers, in nearly real-time, how filled up their FIFOs are getting as their program executes. It does so in an intuitive manner using a bar graph, where each bar represents a FIFO.
+
+Start the monitoring loop that will generate the periodic captures:
+
+```console
+identify_debugger_shell -licensetype identdebugger_actel hls_output/scripts/instrument/monitor.tcl $PROGRAMMER ID
+```
+
+Finally, open a new terminal and launch the FIFO Monitoring Dashboard:
+
+```console
+shls -s instrument_monitor_fifos
+```
+
+Now, when you run the `auto_instrument.accel.elf` executable on-board, you should see the bar graph changing according to how full the FIFOs are. Try playing around with different delays and see how this affects the bar graphs. The occupancies should match the values you see for the `usedw` signal in ModelSim, for the FIFO dashboard is simply a python-based visualization of this signal!
+
+The bar graph should periodically change as it receives data from the monitoring process. The timestamp at the top of the plot indicates the time the plotted data was created by the monitoring process.
+
+You might notice a few shallow FIFOs on the left of the screen with very long names. These are infrastructure FIFOs, and are part of SmartHLS's AXI hardware design IP. The rightmost four FIFOs are the user-defined FIFOs, and are the ones described in the C++ code and the ones you'll want to pay attention to.
+
+**NOTE**: Please be advised that the FIFO Dashboard feature is currently in an experimental phase. Please use it with caution and anticipate potential minor issues
+
+Here are some examples of the bar plot. You should confirm these make sense intuitively.
+
+- When the executable is not running on-board, all FIFOs are empty:
+
+  ![alt text](assets/fifo_monitoring_executable_not_running.png)
+
+- When the delays are 20, 40, 80, and 160, respectively:
+  
+  ```console
+  auto_instrument.accel.elf 20 40 80 160
+  ```
+
+  ![alt text](assets/fifo_monitoring_delays_20_40_80_160.png)
+
+- When the delays are 220, 20, 150, and 80, respectively:
+
+  ```console
+  auto_instrument.accel.elf 220 20 150 80
+  ```
+
+  ![alt text](assets/fifo_monitoring_delays_220_20_150_80.png)
+
+## Appendix A: Using the Identify GUI
+
+In this tutorial, we only demonstrated how to set triggers and configure the client-server connection using the Identify shell. However, all of this can also be done with the GUI. You can launch the GUI by opening a new terminal and running
+
+```console
+identify_debugger -licensetype identdebugger_actel hls_output/soc/synthesis/MPFS_ICICLE_KIT_BASE_DESIGN_syn.prj
+```
+
+Then, you can configure the client-server connection by clicking *`Debugger > Setup debugger...`* dialog, and then visiting the *`Communications`* tab to connect to the JTAG server.
+
+![alt text](assets/identify_gui_setup.png)
+![alt text](assets/identify_gui_connect.png)
+
+To trigger and run the debugger, you should find the correct signals you wish to trigger on, right-click it, hover over *`Triggering`*, and customize your trigger appropriately. Then, hit the big *`Run`* button at the upper-left side of the window. For example, here's how you can trigger on `fifo1` becoming non-empty and then run the debugger:
+
+![alt text](assets/identify_gui.png)
diff --git a/auto_instrument/assets/auto_instrumentation_flow.drawio.png b/auto_instrument/assets/auto_instrumentation_flow.drawio.png
new file mode 100644
index 0000000..590a708
Binary files /dev/null and b/auto_instrument/assets/auto_instrumentation_flow.drawio.png differ
diff --git a/auto_instrument/assets/empty_signals_delay_0.png b/auto_instrument/assets/empty_signals_delay_0.png
new file mode 100644
index 0000000..9d52cad
Binary files /dev/null and b/auto_instrument/assets/empty_signals_delay_0.png differ
diff --git a/auto_instrument/assets/empty_signals_delay_1.png b/auto_instrument/assets/empty_signals_delay_1.png
new file mode 100644
index 0000000..0e8b82a
Binary files /dev/null and b/auto_instrument/assets/empty_signals_delay_1.png differ
diff --git a/auto_instrument/assets/example_design.drawio.png b/auto_instrument/assets/example_design.drawio.png
new file mode 100644
index 0000000..807f598
Binary files /dev/null and b/auto_instrument/assets/example_design.drawio.png differ
diff --git a/auto_instrument/assets/fifo_monitoring_delays_20_40_80_160.png b/auto_instrument/assets/fifo_monitoring_delays_20_40_80_160.png
new file mode 100644
index 0000000..7157de1
Binary files /dev/null and b/auto_instrument/assets/fifo_monitoring_delays_20_40_80_160.png differ
diff --git a/auto_instrument/assets/fifo_monitoring_delays_220_20_150_80.png b/auto_instrument/assets/fifo_monitoring_delays_220_20_150_80.png
new file mode 100644
index 0000000..2b15abd
Binary files /dev/null and b/auto_instrument/assets/fifo_monitoring_delays_220_20_150_80.png differ
diff --git a/auto_instrument/assets/fifo_monitoring_executable_not_running.png b/auto_instrument/assets/fifo_monitoring_executable_not_running.png
new file mode 100644
index 0000000..936b02f
Binary files /dev/null and b/auto_instrument/assets/fifo_monitoring_executable_not_running.png differ
diff --git a/auto_instrument/assets/identify_gui.png b/auto_instrument/assets/identify_gui.png
new file mode 100644
index 0000000..b35eec8
Binary files /dev/null and b/auto_instrument/assets/identify_gui.png differ
diff --git a/auto_instrument/assets/identify_gui_connect.png b/auto_instrument/assets/identify_gui_connect.png
new file mode 100644
index 0000000..808dbbe
Binary files /dev/null and b/auto_instrument/assets/identify_gui_connect.png differ
diff --git a/auto_instrument/assets/identify_gui_setup.png b/auto_instrument/assets/identify_gui_setup.png
new file mode 100644
index 0000000..81e2425
Binary files /dev/null and b/auto_instrument/assets/identify_gui_setup.png differ
diff --git a/auto_instrument/assets/local_remote.drawio.png b/auto_instrument/assets/local_remote.drawio.png
new file mode 100644
index 0000000..04ff4c3
Binary files /dev/null and b/auto_instrument/assets/local_remote.drawio.png differ
diff --git a/auto_instrument/assets/usedw_delay_12.png b/auto_instrument/assets/usedw_delay_12.png
new file mode 100644
index 0000000..34d1a50
Binary files /dev/null and b/auto_instrument/assets/usedw_delay_12.png differ
diff --git a/auto_instrument/assets/wave_template_grouping.png b/auto_instrument/assets/wave_template_grouping.png
new file mode 100644
index 0000000..660d675
Binary files /dev/null and b/auto_instrument/assets/wave_template_grouping.png differ
diff --git a/auto_instrument/assets/write_data.png b/auto_instrument/assets/write_data.png
new file mode 100644
index 0000000..b8fb5b0
Binary files /dev/null and b/auto_instrument/assets/write_data.png differ
diff --git a/auto_instrument/config.tcl b/auto_instrument/config.tcl
new file mode 100644
index 0000000..c10ada1
--- /dev/null
+++ b/auto_instrument/config.tcl
@@ -0,0 +1,6 @@
+source $env(SHLS_ROOT_DIR)/examples/legup.tcl
+set_project PolarFireSoC MPFS250T Icicle_SoC
+
+# Set other parameters and constraints here
+# Refer to the user guide for more information: https://microchiptech.github.io/fpga-hls-docs/constraintsmanual.html
+set_parameter CLOCK_PERIOD 10
\ No newline at end of file
diff --git a/auto_instrument/main.cpp b/auto_instrument/main.cpp
new file mode 100644
index 0000000..84e7eec
--- /dev/null
+++ b/auto_instrument/main.cpp
@@ -0,0 +1,142 @@
+#include <iostream>
+#include <csignal>
+#include <hls/streaming.hpp>
+#include <hls/hls_alloc.h>
+
+// FIFO depths:
+#define FIFO1_DEPTH 256
+#define FIFO2_DEPTH 256
+#define FIFO3_DEPTH 256
+#define FIFO4_DEPTH 256
+
+//------------------------------------------------------------------------------
+// Write data to `fifo` at a rate of 1 element per clock-cycle.
+// Arguments: 
+//  go: controls the loop execution (1 = run, 0 = stop)
+//  fifo: a reference to an hls::FIFO of int type where elements are written.
+void producer(volatile unsigned char& go, hls::FIFO<int>& fifo) {
+    short int counter = 0b0;
+    #pragma HLS loop pipeline
+    while (go) {
+        // We write twice to the FIFO to overlap the loading of the "go" variable.
+        // Even though the II=2 for this function we effectively write one word
+        // every cycle, as if we had II=1
+        fifo.write(0xFEED0000 | counter++);
+        fifo.write(0xFEED0000 | counter++);
+    }
+    // Special flag that indicates the end of the program. This flag will propagate
+    // through the pipeline.
+    fifo.write(0x0FF); 
+}
+
+//------------------------------------------------------------------------------
+// Wait for the first element to appear in `inputFifo`, then wait for `delay` 
+// clock-cycles before start forwarding elements from `inputFifo` to `outputFifo`
+// at a rate of 1 element per clock-cycle.
+void fifoToFifo(hls::FIFO<int>& inputFifo, hls::FIFO<int>& outputFifo, unsigned long long int delay) {
+    #pragma HLS function replicate
+
+    // Wait until the first element appears in `inputFifo`.
+    #pragma HLS loop pipeline
+    while(inputFifo.empty());
+
+    // Induce a delay of `delay` clock-cycles. 
+    #pragma HLS loop pipeline
+    for (unsigned long long int i = 0; i < delay; i++) {
+        // `printf` is a nice way to avoid the loop being optimized away, however,
+        // it only executes in software, in hardware it is ignored.
+        printf("Stall...\n");  
+    }
+
+    // Forward dat from `inputFifo` to `outputFifo` until the end-of-program flag is reached.
+    int inputElement;
+    #pragma HLS loop pipeline
+    while ((inputElement = inputFifo.read()) != 0x0FF) {
+        outputFifo.write(inputElement);
+    }
+    outputFifo.write(0x0FF);
+}
+
+//------------------------------------------------------------------------------
+// Wait for the first element to appear in `fifo`, then wait for `delay` clock-cycles
+// before start reading the fifo and drop the contents.
+void consumer(hls::FIFO<int>& fifo, unsigned long long int delay) {
+    // Wait until the first element appears in `inputFifo`.
+    #pragma HLS loop pipeline
+    while(fifo.empty());
+
+    // Induce a delay of `delay` clock-cycles.
+    #pragma HLS loop pipeline
+    for (unsigned long long int i = 0; i < delay; i++) {
+        // `printf` is a nice way to avoid the loop being optimized away, however,
+        // it only executes in software, in hardware it is ignored.
+        printf("Stall...\n");
+    }
+
+    #pragma HLS loop pipeline
+    while(fifo.read() != 0x0FF); // Until the end-of-program flag is reached
+}
+ 
+
+//------------------------------------------------------------------------------
+// Design pipeline - top-level HLS module
+void hlsModule(volatile unsigned char& go,
+              unsigned long long int delay1,
+              unsigned long long int delay2,
+              unsigned long long int delay3,
+              unsigned long long int delay4) {
+    #pragma HLS function dataflow top
+    #pragma HLS interface default type(axi_target)
+
+    hls::FIFO<int> fifo1(FIFO1_DEPTH);
+    hls::FIFO<int> fifo2(FIFO2_DEPTH);
+    hls::FIFO<int> fifo3(FIFO3_DEPTH);
+    hls::FIFO<int> fifo4(FIFO4_DEPTH);
+ 
+    producer(go, fifo1);
+    fifoToFifo(fifo1, fifo2, delay1);
+    fifoToFifo(fifo2, fifo3, delay2);
+    fifoToFifo(fifo3, fifo4, delay3);
+    consumer(fifo4, delay4);
+}
+
+
+//------------------------------------------------------------------------------
+// When compiling for RISC-V CPU (i.e. not generating hardware)
+#ifndef __SYNTHESIS__
+#include "hls_output/accelerator_drivers/auto_instrument_accelerator_driver.h"
+
+// The virtual base address for the HLS module in the RISC-V memory.
+// This is initialized by the hlsModule_setup() function
+void* virtualAddress;
+
+// Signal handler for SIGINT. Writes 0 to the `go` argument, which effectively 
+// "stops" the HLS module.
+void reset(int signal) {
+    printf("\nCaught SIGINT (Ctrl + C). Stopping the HLS module.\n");
+    unsigned char go = 0;
+    hlsModule_memcpy_write_go(&go, 1, virtualAddress);
+}
+ 
+int main(int argc, char** argv) {
+    if (argc != 5) {
+        printf("usage: %s delay1 delay2 delay3 delay4\n", argv[0]);
+        exit(-1);
+    }
+
+    // The following code uses driver functions to perform the following
+    // * Set up the virtual address for the on-chip memory
+    // * Write 1 to `go` and write all delay values (this will launch the accelerator)
+    virtualAddress = hlsModule_setup();
+    if (virtualAddress == NULL) {
+        printf("%s: Error: Could not set up virtual address.\n", argv[0]);
+        exit(-1);
+    }
+    unsigned char go = 1;
+    signal(SIGINT, reset);
+    printf("Starting the pipeline. Send SIGINT (Ctrl + C) anytime to stop the hardware accelerator.\n");
+    hlsModule_write_input_and_start(&go, atoi(argv[1]), atoi(argv[2]), atoi(argv[3]), atoi(argv[4]), virtualAddress);
+    hlsModule_join_and_read_output(virtualAddress);
+    hlsModule_teardown();
+}
+#endif // __SYNTHESIS__
\ No newline at end of file
diff --git a/risc-v-demo/precompiled/setup.sh b/risc-v-demo/precompiled/setup.sh
index 47ca936..ecb9b8d 100755
--- a/risc-v-demo/precompiled/setup.sh
+++ b/risc-v-demo/precompiled/setup.sh
@@ -19,8 +19,8 @@ eval $cmd
 
 # Now copy the necessary files to the board
 cmd="scp -r $SSH_OPT "
-cmd+=" shls_sw_dependencies/ffmpeg4.4-riscv_64-linux"
-cmd+=" shls_sw_dependencies/opencv4.5.4-riscv_64-linux"
+cmd+=" shls_sw_dependencies/ffmpeg4.4-riscv_64"
+cmd+=" shls_sw_dependencies/opencv4.5.4-riscv_64"
 cmd+=" root@$BOARD_IP:/usr/local/shls_sw_dependencies"
 echo $cmd
 eval $cmd
diff --git a/risc-v-demo/precompiled/shls_sw_dependencies/download_precompiled_libraries.sh b/risc-v-demo/precompiled/shls_sw_dependencies/download_precompiled_libraries.sh
index 38e4336..4d19f38 100755
--- a/risc-v-demo/precompiled/shls_sw_dependencies/download_precompiled_libraries.sh
+++ b/risc-v-demo/precompiled/shls_sw_dependencies/download_precompiled_libraries.sh
@@ -1,3 +1,4 @@
+SHLS_ROOT_DIR=$(dirname "$(dirname "$(which shls)")")
 echo $SHLS_ROOT_DIR
 
 rsync -av --exclude='*.tar.gz' $SHLS_ROOT_DIR/smarthls-library/external/vision/precompiled_sw_libraries/* ./
diff --git a/risc-v-demo/sev-kit-reference-design/MPFS_SEV_KIT_REFERENCE_DESIGN.tcl b/risc-v-demo/sev-kit-reference-design/MPFS_SEV_KIT_REFERENCE_DESIGN.tcl
index b5e5f91..a91b1ec 100644
--- a/risc-v-demo/sev-kit-reference-design/MPFS_SEV_KIT_REFERENCE_DESIGN.tcl
+++ b/risc-v-demo/sev-kit-reference-design/MPFS_SEV_KIT_REFERENCE_DESIGN.tcl
@@ -109,7 +109,7 @@ if { [file exists $project_dir/$project_name.prjx] } {
     set mipicsi2rxdecoderPF_version     4.4.0
     set PF_CCC_version                  2.2.220
     set PF_CLK_DIV_version              1.0.103
-    set PF_IOD_GENERIC_RX_version       2.1.110
+    set PF_IOD_GENERIC_RX_version       2.1.113
     set PF_OSC_version                  1.0.102
     set RGBtoYCbCr_version              4.4.0
     set PF_XCVR_REF_CLK_version         1.0.103
diff --git a/risc-v-demo/sev-kit-reference-design/script_support/additional_configurations/smarthls/hls_pipeline/Makefile b/risc-v-demo/sev-kit-reference-design/script_support/additional_configurations/smarthls/hls_pipeline/Makefile
index 7b6b5cb..d7d52c4 100644
--- a/risc-v-demo/sev-kit-reference-design/script_support/additional_configurations/smarthls/hls_pipeline/Makefile
+++ b/risc-v-demo/sev-kit-reference-design/script_support/additional_configurations/smarthls/hls_pipeline/Makefile
@@ -9,6 +9,6 @@ USER_CXX_FLAG+=-DHLS_DBG_PRINTF
 USER_CXX_FLAG+=-DHLS_PROFILER_SAMPLES=100
 USER_CXX_FLAG+=-DHLS_PROFILER_ENABLE
 
-BOARD_IP=10.245.245.184
+BOARD_IP?=192.168.1.2
 
 include $(LEVEL)/Makefile.common
\ No newline at end of file
diff --git a/risc-v-demo/sev-kit-reference-design/script_support/additional_configurations/smarthls/hls_pipeline/compile_and_copy.sh b/risc-v-demo/sev-kit-reference-design/script_support/additional_configurations/smarthls/hls_pipeline/compile_and_copy.sh
index 0588e9f..2d14dbc 100755
--- a/risc-v-demo/sev-kit-reference-design/script_support/additional_configurations/smarthls/hls_pipeline/compile_and_copy.sh
+++ b/risc-v-demo/sev-kit-reference-design/script_support/additional_configurations/smarthls/hls_pipeline/compile_and_copy.sh
@@ -24,8 +24,8 @@ rm -f $ELF
 HLS_DRIVER_PATH="./hls_output/accelerator_drivers"
 
 # Extra defines to include shared opencv/ffmpeg libraries below:
-OPENCV_PATH=$EXAMPLE_ROOT_FOLDER/precompiled/shls_sw_dependencies/opencv4.5.4-$arch-linux
-FFMPEG_PATH=$EXAMPLE_ROOT_FOLDER/precompiled/shls_sw_dependencies/ffmpeg4.4-$arch-linux
+OPENCV_PATH=$EXAMPLE_ROOT_FOLDER/precompiled/shls_sw_dependencies/opencv4.5.4-$arch
+FFMPEG_PATH=$EXAMPLE_ROOT_FOLDER/precompiled/shls_sw_dependencies/ffmpeg4.4-$arch
 
 LD_LIBRARY_PATH=$OPENCV_PATH/lib:$FFMPEG_PATH/lib:$LD_LIBRARY_PATH
 PATH=$FFMPEG_PATH/bin:$PATH