Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run Genie_Finetune.py #70

Open
xiang-xiang-zhu opened this issue Nov 6, 2023 · 1 comment
Open

Unable to run Genie_Finetune.py #70

xiang-xiang-zhu opened this issue Nov 6, 2023 · 1 comment

Comments

@xiang-xiang-zhu
Copy link

i run the following bash

OUT_DIR = "/Your/output/path"
DATA_PATH = "/Your/data/path"
DATA_NAME = "xsum_data"
PRETRAIN_CKPT_PATH = "/Your/pretrain_ckpt/path"


python -u -m torch.distributed.launch --nproc_per_node=4 --master_port=9421 \
./GENIE_main/Genie_Finetune.py \
--checkpoint_path=$OUT_DIR \
--model_channels 128 --in_channel 128 --out_channel 128 --vocab_size 30522 \
--config_name="bert-base-uncased" --token_emb_type="random" --model_arch="s2s_CAT" \
--diffusion_steps 2000 --predict_xstart --noise_schedule="sqrt" --training_mode="s2s" \
--schedule_sampler="loss-second-moment" --tgt_max_len 64 --src_max_len 512 --data_name=$DATA_NAME \
--data_path=$DATA_PATH \
--lr_anneal_steps 120000 --batch_size 64 --lr 5e-05 --warmup_steps 7200 --train_type="S2S_Diffusion" \
--eval_interval 200 --log_interval 200 --save_interva 20000 \
--pretrain_model_path=$PRETRAIN_CKPT_PATH

Then the system prompted me that I don't have mpi4py, and then I installed mpi4py

Collecting mpi4py
  Using cached http://mirrors.aliyun.com/pypi/packages/2e/1a/1393e69df9cf7b04143a51776727dd048586781bca82543594ab439e2eb4/mpi4py-3.1.5.tar.gz (2.5 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: mpi4py
  Building wheel for mpi4py (pyproject.toml) ... done
  Created wheel for mpi4py: filename=mpi4py-3.1.5-cp38-cp38-linux_x86_64.whl size=6024408 sha256=64ef1c54d03ecb2c862c4e57da02d6dd8d9e33673ad3948afafca08d60edfd64
  Stored in directory: /root/.cache/pip/wheels/9d/2a/7e/c6575a1d595c7d8cce796177f1b9827975c5b48b31e28f25b9
Successfully built mpi4py
Installing collected packages: mpi4py
Successfully installed mpi4py-3.1.5

When I try to run the program again, the program has no output and seems to be stuck somewhere in the program.

/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects `--local-rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(

Can anyone help solve this problem? I don't know if mpi causes this problem. My operating system is ubuntu

@poonehmousavi
Copy link

I have the same problem.. any updates on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants