…use the Dask Dashboard to monitor computations?

…use the Dask Dashboard to monitor computations?#

With the power to access large amounts of data and run complex computations on our HPC cluster, it’s important to wield that power responsibly. To help you do so — and ensure your tasks run efficiently — the Dask Dashboard lets you monitor your computations in real time. You can access the dashboard by following these steps:

First, you can import the dask_client as done in the following cell at the beginning of your notebook:

from sdc import dask_client

dask_client

[INFO] Trying to allocate requested resources on the cluster (timeout after 5 minutes)...
No recent SLURM jobs found for dask-workers
[INFO] Cluster is ready for computation! :) Dask dashboard available via 'localhost:8806'

Client

Client-b43bc688-b33c-11f0-9325-a4bf0187f99a

Connection method: Cluster object	Cluster type: dask_jobqueue.SLURMCluster
Dashboard: http://172.18.10.2:8806/status

Cluster Info

SLURMCluster

bc63c432

Dashboard: http://172.18.10.2:8806/status	Workers: 8
Total threads: 80	Total memory: 80.00 GiB

Scheduler Info

Scheduler

Scheduler-5c37f0a0-11ea-4d90-897d-3c5e82a16f45

Comm: tcp://172.18.10.2:39483	Workers: 0
Dashboard: http://172.18.10.2:8806/status	Total threads: 0
Started: Just now	Total memory: 0 B

Workers

Worker: SLURMCluster-0-0

Comm: tcp://172.18.1.16:42099	Total threads: 10
Dashboard: http://172.18.1.16:40613/status	Memory: 10.00 GiB
Nanny: tcp://172.18.1.16:36023
Local directory: /scratch/du23yow/dask-scratch-space/worker-jenz25o4

Worker: SLURMCluster-0-1

Comm: tcp://172.18.1.16:36359	Total threads: 10
Dashboard: http://172.18.1.16:44129/status	Memory: 10.00 GiB
Nanny: tcp://172.18.1.16:40751
Local directory: /scratch/du23yow/dask-scratch-space/worker-h5fbr702

Worker: SLURMCluster-0-2

Comm: tcp://172.18.1.16:39469	Total threads: 10
Dashboard: http://172.18.1.16:34035/status	Memory: 10.00 GiB
Nanny: tcp://172.18.1.16:42437
Local directory: /scratch/du23yow/dask-scratch-space/worker-5gfdaa57

Worker: SLURMCluster-0-3

Comm: tcp://172.18.1.16:37579	Total threads: 10
Dashboard: http://172.18.1.16:35969/status	Memory: 10.00 GiB
Nanny: tcp://172.18.1.16:38417
Local directory: /scratch/du23yow/dask-scratch-space/worker-ly2rwki5

Worker: SLURMCluster-0-4

Comm: tcp://172.18.1.16:43089	Total threads: 10
Dashboard: http://172.18.1.16:32977/status	Memory: 10.00 GiB
Nanny: tcp://172.18.1.16:34907
Local directory: /scratch/du23yow/dask-scratch-space/worker-vtpxm010

Worker: SLURMCluster-0-5

Comm: tcp://172.18.1.16:44769	Total threads: 10
Dashboard: http://172.18.1.16:38467/status	Memory: 10.00 GiB
Nanny: tcp://172.18.1.16:37893
Local directory: /scratch/du23yow/dask-scratch-space/worker-uhqoj0_i

Worker: SLURMCluster-0-6

Comm: tcp://172.18.1.16:37931	Total threads: 10
Dashboard: http://172.18.1.16:38941/status	Memory: 10.00 GiB
Nanny: tcp://172.18.1.16:40059
Local directory: /scratch/du23yow/dask-scratch-space/worker-pifjuhh0

Worker: SLURMCluster-0-7

Comm: tcp://172.18.1.16:36985	Total threads: 10
Dashboard: http://172.18.1.16:40659/status	Memory: 10.00 GiB
Nanny: tcp://172.18.1.16:44819
Local directory: /scratch/du23yow/dask-scratch-space/worker-zz6rfwoh

As you can see, the dask_client provides a link to the dashboard and a button to “Launch dashboard in JupyterLab”. Both options will not do anything because we first need to set up port forwarding from our local machine to the remote HPC cluster. The key here is the port number shown in the link: 8806 in this example (it will be different for you!). This port number is essential for setting up the port forwarding, which basically creates a secure tunnel from your local machine to the remote server and allows you to access the dashboard in your local web browser.

Setting up port forwarding in VS Code is quite straightforward. You can follow these steps:

Open the Command Palette in VS Code by pressing Ctrl+Shift+P (or Cmd+Shift+P on Mac).
Type “Port”, select the “Forward a Port” option and click on the “Forward a Port” button.
Enter the port number you obtained from the dask_client (e.g., 8806) and press Enter. It should automatically fill in the “Forwarded Address” field with localhost:<port_number>.

That’s it! After setting up the port forwarding, you can access the Dask Dashboard by opening your web browser and navigating to http://localhost:8806 (replace 8806 with the port number you were assigned to). You should now see the Dask Dashboard, which provides various visualizations and metrics about your Dask computations, such as task progress, memory usage, and worker status.

For more information on how to interpret the dashboard and use it effectively, you can refer to the Dask documentation on the dashboard.

In the following image, you can see an example of how the Dask Dashboard looks like while running a computation. To point out a few key features/observations:

On the left side, you have an overview of the “workers” that are executing your tasks and how much memory and CPU resources they are using. In this example, there are 2 workers and both of them are not fully utilized, which is a good sign. By looking at the “Bytes stored per worker”-panel and its x-axis, you can see that each worker is using only a fraction of its available 8 GB memory.
- IMPORTANT: If you don’t see any bars in this panel, it means that you don’t have any active workers! This could be due to the workers not being started properly in the first place or all of them having died without any new workers taking their place. In such cases, you should restart your notebook kernel (“Restart”-button in the top panel of your notebook) and re-run the cells that set up the Dask client and of the computations you want to execute. Otherwise you might end up running your computations on the Login-nodes of the HPC cluster, which is not allowed!
In the center, you have a “task stream” panel that shows the progress of your tasks over time. Each worker can utilize multiple threads (CPU cores) to execute tasks concurrently. In this example, each worker has 8 threads so in total we can see 16 horizontal lanes representing the threads of both workers. The colored bars represent the tasks being executed, and you can see that tasks are being processed in parallel across all available threads with minimal idle time (white spaces between the bars)… again, a good sign for efficient resource usage!
Below the task stream, you have a “progress” panel that shows the overall progress of your computation split into multiple subtasks. While the tasks are being executed, you can see the progress bars filling up, indicating how much of each task has been completed. In this example the progress bars are filling up smoothly, which corresponds to the efficient execution of tasks shown in the task stream above. If you notice that tasks are taking a long time to complete or if there are significant delays between tasks, it may indicate that your computation is not optimized and could benefit from further tuning or adjustments (e.g., changing chunk sizes, optimizing algorithms, etc.).

Dask dashboard