Replies: 5 comments 4 replies
-
Hey Delaunay, I'm working with an hpc team to test our systems and looking to test some models on our GPU's and figure out how scaling works for our multi-node setup. Didn't want to post flood the issues board with questions but i just had some questions about my understanding of milabench is correct. I'm using the non-dockerized version on my slurm system using the pip install guide. Q1. When i'm running a multi node benchmark like llm-lora-ddp-nodes benchmark in an sbatch/salloc session system what happens if the main/master process is interrupted in my system.yaml file. Is there some sort of cleanup mechanism if a node stops cleaning mid benchmark? Q2. When utilizing milabench system_slurm to generate my system.yaml config, what happens if the default port 22 isn't used. In my current setup my nodes are not using the default ssh port 22 and i'm finding the config file ends up looking a bit weird and i have to manually tweak it. I get some output like "couldn't resolve hostname" as well as "connection closed by port 22". Is it possible to adjust the code to work with non standard ssh ports? Q3. When using a multi-node benchmark on a 2 node x 8gpu (a100) configuration based on my understanding of the launch process when milabench run is executed it will launch a main benchmark processes on each node which will then launch children process that run on the gpu's 1 process per gpu? My understanding is from the image that's in the git repo. As a second question do these benchmarks run independently or do they coordinate together when a multi-node benchmark (to my understanding of running llm-lora-ddp-nodes. Pytorch uses FDSP to replicate the model across the gpus and i'm assuming they coordinate their gradients. Q4. Lastly in the report when we get the value of "n" in the columns is that the number of main processes launched in a multi-node benchmark and lastly what is the difference between "perf" and "score" metrics. I looked through the code and noticed that only when you run the single gpu benchmarks it's like a weighted average. While on multi-node or multi-gpus benchmarks perf and score are the same for the most part. Thanks for your time and hopefully these are easy to questions to answer. |
Beta Was this translation helpful? Give feedback.
-
Milabench does nothing specific, it relies on
You can use the
Yes, milabench does something similar to the example below to launch multi node experiments.
Assuming each node has 8 GPUs
For clarity, we added
The score is a normalized "per-node" performance measure.
There is an ongoing discussion on how to handle the perf normalization. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Here's what i'm trying to achieve, i've created my own copy of a slurm.yaml file for some extra configurations and i want to test it with schedule.py and lastly have it call the milabench/scripts/milabench_run.bash How would i specify it using the:
is there a mechanism to use your custom profiles, i found the section of code responsible for grabbing the parameters see below: I noticed the section of code where it grabs the parameters from the slurm.yaml file below What would be the best practice to substitute the parameters required here? |
Beta Was this translation helpful? Give feedback.
-
Yet Another Question, My goal is to just check if the auto sizing is working properly, after reading the documentation is it sufficient to just have the capacity set in my system.yaml and to run the benchmark while setting the environment variable MILABENCH_SIZER_AUTO=1. Also should i include multirun into my currently existing system.yaml file and what is it's purpose? I've done some digging below and this was what i was able to piece together. I get the purpose of scaling.yaml is to adjust the batch size for a given benchmark but i also no ticed in your example system.yaml. You have 3 different things setup on multirun could you explain the design purpose and how to use them. Based on my view of the code base you start from the run.py->system.py -> multirun() function -> apply_system -> then launch the actual benchmark. From what i understand matrix lets you do a dynamically setup multiple configurations for the same benchmark you're about to run. Auto enables just dynamic batch resizing to fit your available VRAM, with a multiple of 8. and batch size just forces a particular size. Also one more unrelated question that's not tied to auto-sizing, when i run this job script on slurm. What happens if i don't know which nodes i'm being allocated to be the main:true in our system.yaml file. Suppose i have 6 nodes x 8gpu config but i ask sbatch to just give me 2 nodes at random to test, is there a way to handle that case since i'm assuming you'd have to set true if i don't know ahead of time which nodes i'm getting allocated? |
Beta Was this translation helpful? Give feedback.
-
👋 Welcome!
We’re using Discussions as a place to connect with other members of our community. We hope that you:
build together 💪.
To get started, comment below with an introduction of yourself and tell us about what you do with this community.
Beta Was this translation helpful? Give feedback.
All reactions