Created
August 18, 2024 21:34
-
-
Save relyt0925/58e8e8760d1083e7e5556902e631341d to your computer and use it in GitHub Desktop.
This file has been truncated, but you can view the full file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[root@tyler-a100-newimage-val instructlab]# nohup /root/bin/ilab.sh train --strategy lab-multiphase --phased-phase1-data /var/mnt/inststg1/instructlab/generated/knowledge_train_msgs_2024-08-18T15_57_14.jsonl --phased-phase2-data /var/mnt/inststg1/instructlab/generated/skills_train_msgs_2024-08-18T15_57_14.jsonl --phased-base-dir /var/mnt/inststg1/instructlab/phasedbasedir --phased-phase1-num-epochs 2 --phased-phase2-num-epochs 2 --phased-mt-bench-judge /var/mnt/inststg1/instructlab/models/prometheus-eval/prometheus-8x7b-v2.0/ --max-batch-len 10000 --max-seq-len 4096 --phased-phase1-effective-batch-size 128 --phased-phase2-effective-batch-size 3840 --enable-serving-output --gpus 8 --skip-user-confirm --model-path /var/mnt/inststg1/instructlab/models/granite-7b-starter1.1/ & | |
[root@tyler-a100-newimage-val instructlab]# cat nohup.out | |
time="2024-08-18T20:04:24Z" level=warning msg="The input device is not a TTY. The --tty and --interactive flags might not work properly" | |
You are using an aliased command, this will be deprecated in a future release. Please consider using `ilab model train` instead | |
Training Phase 1/2... | |
TrainingArgs for current phase: TrainingArgs(model_path='/var/mnt/inststg1/instructlab/models/granite-7b-starter1.1/', chat_tmpl_path='/opt/app-root/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py', data_path='/var/mnt/inststg1/instructlab/generated/knowledge_train_msgs_2024-08-18T15_57_14.jsonl', ckpt_output_dir='/var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints', data_output_dir='/var/mnt/inststg1/instructlab/.local/share/instructlab/internal', max_seq_len=4096, max_batch_len=10000, num_epochs=2, effective_batch_size=128, save_samples=0, learning_rate=2e-05, warmup_steps=25, is_padding_free=False, random_seed=42, checkpoint_at_epoch=True, mock_data=False, mock_data_len=0, deepspeed_options=DeepSpeedOptions(cpu_offload_optimizer=False, cpu_offload_optimizer_ratio=1.0, cpu_offload_optimizer_pin_memory=False, save_samples=None), disable_flash_attn=False, lora=None) | |
[2024-08-18 20:04:33,199] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
df: /var/mnt/inststg1/instructlab/.triton/autotune: No such file or directory | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
INFO 2024-08-18 20:04:40,050 numexpr.utils:145: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. | |
INFO 2024-08-18 20:04:40,051 numexpr.utils:148: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16. | |
INFO 2024-08-18 20:04:40,051 numexpr.utils:161: NumExpr defaulting to 16 threads. | |
INFO 2024-08-18 20:04:40,206 datasets:58: PyTorch version 2.3.1 available. | |
INFO 2024-08-18 20:04:40,465 root:611: eos: 32001, pad: 32002, system: 32003, user: 32004, assistant: 32005 | |
Generating train split: 267 examples [00:00, 11032.42 examples/s] | |
tokenizing the dataset with /var/mnt/inststg1/instructlab/models/granite-7b-starter1.1/ tokenizer... | |
Map (num_proc=16): 100% 267/267 [00:00<00:00, 398.01 examples/s] | |
ten largest length percentiles: | |
Map (num_proc=16): 100% 267/267 [00:00<00:00, 1626.59 examples/s] | |
quantile 90th: 1116.4 | |
quantile 91th: 1138.42 | |
quantile 92th: 1165.36 | |
quantile 93th: 1181.0400000000002 | |
quantile 94th: 1210.08 | |
quantile 95th: 1226.2999999999997 | |
quantile 96th: 1278.36 | |
quantile 97th: 1652.6199999999994 | |
quantile 98th: 1689.72 | |
quantile 99th: 1712.7599999999998 | |
quantile 100th: 1734.0 | |
at 4096 max sequence length, the number of samples to be dropped is 0 | |
(0.00% of total) | |
quantile 0th: 255.0 | |
quantile 1th: 284.66 | |
quantile 2th: 288.32 | |
quantile 3th: 295.88 | |
quantile 4th: 301.0 | |
quantile 5th: 303.0 | |
quantile 6th: 318.84 | |
quantile 7th: 320.62 | |
quantile 8th: 322.28 | |
quantile 9th: 324.94 | |
quantile 10th: 327.6 | |
at 20 min sequence length, the number of samples to be dropped is 0 | |
checking the validity of the samples... | |
Filter (num_proc=16): 100% 267/267 [00:00<00:00, 435.71 examples/s] | |
INFO 2024-08-18 20:04:48,018 root:611: number of dropped samples: 0 -- out of 267 | |
Categorizing training data type... | |
Data type sorting: 100% 267/267 [00:00<00:00, 468764.83it/s] | |
unmasking the appropriate message content... | |
Map (num_proc=16): 100% 267/267 [00:00<00:00, 1418.85 examples/s] | |
The following are some examples of the processed data, with masked tokens (not to be learned) represented with <mask>. The unmasked tokens are the ones the model will learn to predict. Please review these samples to ensure the model is learning to predict expected tokens. | |
Pretraining ex sample 186: <mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask> | |
The Social Security number (SSN) is a nine-digit identifier in the format "AAA-GG-SSSS," consisting of an area number, group number, and serial number. Prior to June 25, 2011, area numbers were assigned based on geographical region, with numbers issued from the northeast to the southwest. However, the SSN assignment process was randomized in 2011, eliminating the geographical significance of the first three digits and the significance of the highest group number assigned for each area number. Unassigned area numbers, excluding 000, 666, and 900-999, were introduced for assignment. The middle two digits, the group number, range from 01 to 99 and were not assigned consecutively in an area. The last four digits are the serial number. Individual Taxpayer Identification Numbers (ITINs) are not affected by this SSA change as they are issued by the IRS. | |
What are the three parts of a Social Security number? | |
<mask> | |
A Social Security number consists of an area number, group number, and serial number. | |
<|endoftext|> | |
Original Input: <|system|> | |
I am, Red Hat® Instruct Model based on Granite 7B, an AI language model developed by Red Hat and IBM Research, based on the Granite-7b-base language model. My primary function is to be a chat assistant. | |
<|user|> | |
The Social Security number (SSN) is a nine-digit identifier in the format "AAA-GG-SSSS," consisting of an area number, group number, and serial number. Prior to June 25, 2011, area numbers were assigned based on geographical region, with numbers issued from the northeast to the southwest. However, the SSN assignment process was randomized in 2011, eliminating the geographical significance of the first three digits and the significance of the highest group number assigned for each area number. Unassigned area numbers, excluding 000, 666, and 900-999, were introduced for assignment. The middle two digits, the group number, range from 01 to 99 and were not assigned consecutively in an area. The last four digits are the serial number. Individual Taxpayer Identification Numbers (ITINs) are not affected by this SSA change as they are issued by the IRS. | |
What are the three parts of a Social Security number? | |
<|assistant|> | |
A Social Security number consists of an area number, group number, and serial number. | |
<|endoftext|> | |
Pretraining ex sample 75: <mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask> | |
Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and organizations. | |
PII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient to determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth. | |
PII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose risks. | |
Data privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures, while non-sensitive PII may or may not be protected depending on the regulations and the organization's policies. The classification of PII as sensitive or non-sensitive depends on the context, such as the specific use case or potential harm resulting from a breach. | |
What is the difference between sensitive and non-sensitive PII? | |
<mask> | |
Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose risks. | |
<|endoftext|> | |
Original Input: <|system|> | |
I am, Red Hat® Instruct Model based on Granite 7B, an AI language model developed by Red Hat and IBM Research, based on the Granite-7b-base language model. My primary function is to be a chat assistant. | |
<|user|> | |
Personally Identifiable Information (PII) refers to any information that can be used to identify a specific individual, such as their social security number, full name, email address, or phone number. With the increasing reliance on information technology, the amount of PII shared with organizations has grown, making it a target for cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware, leading to significant costs for individuals and organizations. | |
PII can be categorized into direct and indirect identifiers. Direct identifiers, such as passport or driver's license numbers, are unique to a person and sufficient to determine their identity. Indirect identifiers, like race and place of birth, are not unique but can identify a person when combined, such as gender, ZIP code, and date of birth. | |
PII can also be classified as sensitive or non-sensitive. Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose risks. | |
Data privacy laws typically require organizations to safeguard sensitive PII with encryption, access control, or other cybersecurity measures, while non-sensitive PII may or may not be protected depending on the regulations and the organization's policies. The classification of PII as sensitive or non-sensitive depends on the context, such as the specific use case or potential harm resulting from a breach. | |
What is the difference between sensitive and non-sensitive PII? | |
<|assistant|> | |
Sensitive PII, such as social security numbers, unique identification numbers, biometric data, financial information, and medical records, directly identifies an individual and could cause significant harm if leaked or stolen. Non-sensitive PII, like a person's full name, mother's maiden name, telephone number, IP address, place of birth, date of birth, geographical details, employment information, email address or mailing address, race or ethnicity, and religion, may or may not be unique to a person but would not cause significant harm if leaked or stolen in isolation. However, when combined, they can still pose risks. | |
<|endoftext|> | |
Creating json from Arrow format: 100% 1/1 [00:00<00:00, 23.07ba/s] | |
Running command: torchrun --nnodes=1 --node_rank=0 --nproc_per_node=8 --rdzv_id=123 --rdzv_endpoint=127.0.0.1:12222 /opt/app-root/lib64/python3.11/site-packages/instructlab/training/main_ds.py --model_name_or_path=/var/mnt/inststg1/instructlab/models/granite-7b-starter1.1/ --data_path=/var/mnt/inststg1/instructlab/.local/share/instructlab/internal/data.jsonl --output_dir=/var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints --num_epochs=2 --effective_batch_size=128 --learning_rate=2e-05 --num_warmup_steps=25 --save_samples=0 --log_level=INFO --max_batch_len=10000 --seed=42 --chat-tmpl-path=/opt/app-root/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py --checkpoint_at_epoch | |
W0818 20:04:49.993000 140562764190144 torch/distributed/run.py:757] | |
W0818 20:04:49.993000 140562764190144 torch/distributed/run.py:757] ***************************************** | |
W0818 20:04:49.993000 140562764190144 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
W0818 20:04:49.993000 140562764190144 torch/distributed/run.py:757] ***************************************** | |
[2024-08-18 20:04:52,891] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:04:53,058] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:04:53,222] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:04:53,242] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:04:53,264] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:04:53,304] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:04:53,305] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:04:53,335] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
model_name_or_path: /var/mnt/inststg1/instructlab/models/granite-7b-starter1.1/ | |
data_path: /var/mnt/inststg1/instructlab/.local/share/instructlab/internal/data.jsonl | |
output_dir: /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints | |
num_epochs: 2 | |
last_step: 0 | |
effective_batch_size: 128 | |
learning_rate: 2.0e-05 | |
lr_scheduler: cosine | |
num_warmup_steps: 25 | |
save_samples: 0 | |
save_samples_ds: null | |
save_last: false | |
checkpoint_at_epoch: true | |
log_level: INFO | |
seed: 42 | |
mock_data: false | |
mock_len: 2600 | |
sharding_strategy: FULL_SHARD | |
is_granite: false | |
lora_r: 0 | |
lora_alpha: 32 | |
lora_dropout: 0.1 | |
lora_quant_bits: null | |
lora_target_modules: null | |
max_batch_len: 10000 | |
cpu_offload_optimizer: false | |
cpu_offload_optimizer_pin_memory: false | |
cpu_offload_optimizer_ratio: 1.0 | |
NEFTune_alpha: null | |
chat_tmpl_path: /opt/app-root/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py | |
disable_flash_attn: false | |
{ | |
"script_params": { | |
"model_name_or_path": "/var/mnt/inststg1/instructlab/models/granite-7b-starter1.1/", | |
"data_path": "/var/mnt/inststg1/instructlab/.local/share/instructlab/internal/data.jsonl", | |
"output_dir": "/var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints", | |
"num_epochs": 2, | |
"last_step": 0, | |
"effective_batch_size": 128, | |
"learning_rate": 2e-05, | |
"lr_scheduler": "cosine", | |
"num_warmup_steps": 25, | |
"save_samples": 0, | |
"save_samples_ds": null, | |
"save_last": false, | |
"checkpoint_at_epoch": true, | |
"log_level": "INFO", | |
"seed": 42, | |
"mock_data": false, | |
"mock_len": 2600, | |
"sharding_strategy": "FULL_SHARD", | |
"is_granite": false, | |
"lora_r": 0, | |
"lora_alpha": 32, | |
"lora_dropout": 0.1, | |
"lora_quant_bits": null, | |
"lora_target_modules": null, | |
"max_batch_len": 10000, | |
"cpu_offload_optimizer": false, | |
"cpu_offload_optimizer_pin_memory": false, | |
"cpu_offload_optimizer_ratio": 1.0, | |
"NEFTune_alpha": null, | |
"chat_tmpl_path": "/opt/app-root/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py", | |
"disable_flash_attn": false | |
}, | |
"timestamp": "2024-08-18T20:04:56.779187" | |
} | |
[2024-08-18 20:04:56,857] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-08-18 20:04:56,857] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl | |
[2024-08-18 20:04:57,155] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-08-18 20:04:57,392] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-08-18 20:04:57,719] [INFO] [comm.py:637:init_distributed] cdb=None | |
tyler-a100-newimage-val:570:570 [0] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:570:570 [0] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:570:570 [0] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
tyler-a100-newimage-val:572:572 [2] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:572:572 [2] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:572:572 [2] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
tyler-a100-newimage-val:573:573 [3] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:573:573 [3] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:573:573 [3] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
tyler-a100-newimage-val:574:574 [4] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:574:574 [4] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:574:574 [4] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
[2024-08-18 20:04:57,854] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-08-18 20:04:57,858] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-08-18 20:04:57,865] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-08-18 20:04:57,877] [INFO] [comm.py:637:init_distributed] cdb=None | |
tyler-a100-newimage-val:576:576 [6] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:576:576 [6] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:576:576 [6] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
tyler-a100-newimage-val:571:571 [1] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:571:571 [1] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:571:571 [1] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
tyler-a100-newimage-val:577:577 [7] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:577:577 [7] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:577:577 [7] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
tyler-a100-newimage-val:575:575 [5] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:575:575 [5] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:575:575 [5] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO ncclCommInitRank comm 0x5636163750f0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId c060 commId 0xa1bb5af6fed5ca65 - Init START | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO ncclCommInitRank comm 0x55e6aed5c790 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a030 commId 0xa1bb5af6fed5ca65 - Init START | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO ncclCommInitRank comm 0x55f7ae069780 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 8020 commId 0xa1bb5af6fed5ca65 - Init START | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO ncclCommInitRank comm 0x558d760ca530 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId e080 commId 0xa1bb5af6fed5ca65 - Init START | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO ncclCommInitRank comm 0x5640cf2b7db0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId e070 commId 0xa1bb5af6fed5ca65 - Init START | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO ncclCommInitRank comm 0x55e7b0065170 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId a040 commId 0xa1bb5af6fed5ca65 - Init START | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO ncclCommInitRank comm 0x55d34ff865c0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 8010 commId 0xa1bb5af6fed5ca65 - Init START | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO ncclCommInitRank comm 0x560ada6ca910 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId c050 commId 0xa1bb5af6fed5ca65 - Init START | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO NVLS multicast support is not available on dev 4 | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO Setting affinity for GPU 3 to ff,ffffffff | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO NVLS multicast support is not available on dev 3 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffffffff | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO NVLS multicast support is not available on dev 0 | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO NVLS multicast support is not available on dev 5 | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO NVLS multicast support is not available on dev 7 | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO NVLS multicast support is not available on dev 6 | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffffffff | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO NVLS multicast support is not available on dev 2 | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffffffff | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO NVLS multicast support is not available on dev 1 | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO comm 0x558d760ca530 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0 | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO comm 0x5640cf2b7db0 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0 | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO comm 0x560ada6ca910 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0 | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO comm 0x55d34ff865c0 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0 | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO comm 0x55e7b0065170 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO comm 0x5636163750f0 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO comm 0x55e6aed5c790 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0 | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO comm 0x55f7ae069780 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0 | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO CC Off, Multi-GPU CC Off, workFifoBytes 1048576 | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO ncclCommInitRank comm 0x55e6aed5c790 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a030 commId 0xa1bb5af6fed5ca65 - Init COMPLETE | |
tyler-a100-newimage-val:572:1301 [2] NCCL INFO Init timings: rank 2 nranks 8 total 0.82 (kernels 0.13, bootstrap 0.36, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO ncclCommInitRank comm 0x558d760ca530 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId e080 commId 0xa1bb5af6fed5ca65 - Init COMPLETE | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO ncclCommInitRank comm 0x55d34ff865c0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 8010 commId 0xa1bb5af6fed5ca65 - Init COMPLETE | |
tyler-a100-newimage-val:577:1314 [7] NCCL INFO Init timings: rank 7 nranks 8 total 0.75 (kernels 0.25, bootstrap 0.17, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:570:1300 [0] NCCL INFO Init timings: rank 0 nranks 8 total 0.84 (kernels 0.15, bootstrap 0.36, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO ncclCommInitRank comm 0x560ada6ca910 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId c050 commId 0xa1bb5af6fed5ca65 - Init COMPLETE | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO ncclCommInitRank comm 0x5636163750f0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId c060 commId 0xa1bb5af6fed5ca65 - Init COMPLETE | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO ncclCommInitRank comm 0x5640cf2b7db0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId e070 commId 0xa1bb5af6fed5ca65 - Init COMPLETE | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:574:1303 [4] NCCL INFO Init timings: rank 4 nranks 8 total 0.82 (kernels 0.15, bootstrap 0.34, allgathers 0.01, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:575:1315 [5] NCCL INFO Init timings: rank 5 nranks 8 total 0.75 (kernels 0.25, bootstrap 0.17, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:576:1312 [6] NCCL INFO Init timings: rank 6 nranks 8 total 0.75 (kernels 0.24, bootstrap 0.18, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO ncclCommInitRank comm 0x55e7b0065170 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId a040 commId 0xa1bb5af6fed5ca65 - Init COMPLETE | |
tyler-a100-newimage-val:573:1302 [3] NCCL INFO Init timings: rank 3 nranks 8 total 0.82 (kernels 0.15, bootstrap 0.34, allgathers 0.01, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO ncclCommInitRank comm 0x55f7ae069780 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 8020 commId 0xa1bb5af6fed5ca65 - Init COMPLETE | |
tyler-a100-newimage-val:571:1313 [1] NCCL INFO Init timings: rank 1 nranks 8 total 0.76 (kernels 0.37, bootstrap 0.05, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1335 [3] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:572:1338 [2] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:571:1339 [1] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:570:1336 [0] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:574:1333 [4] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:577:1334 [7] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:575:1337 [5] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:576:1332 [6] NCCL INFO Connected all rings | |
Generating train split: 267 examples [00:00, 6232.18 examples/s] | |
Data length calculation: 100%|██████████| 267/267 [00:00<00:00, 1973.19it/s] | |
Data length calculation: 100%|██████████| 267/267 [00:00<00:00, 1948.77it/s] | |
Data length calculation: 100%|██████████| 267/267 [00:00<00:00, 1801.53it/s] | |
Data length calculation: 100%|██████████| 267/267 [00:00<00:00, 1713.33it/s] | |
Data length calculation: 100%|██████████| 267/267 [00:00<00:00, 1638.71it/s] | |
Data length calculation: 100%|██████████| 267/267 [00:00<00:00, 1757.64it/s] | |
Data length calculation: 100%|██████████| 267/267 [00:00<00:00, 1982.74it/s] | |
Data length calculation: 100%|██████████| 267/267 [00:00<00:00, 1824.15it/s] | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 3.69it/s] | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Creating extension directory /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124/fused_adam... | |
Detected CUDA files, patching ldflags | |
Emitting ninja build file /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... | |
/opt/app-root/lib64/python3.11/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. | |
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. | |
warnings.warn( | |
Building extension module fused_adam... | |
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
{ | |
"num_gpus": 8, | |
"avg_sample_len": 622.9588014981273, | |
"effective_batch_size": 128, | |
"max_batch_len_per_gpu": 10000, | |
"packing_max_batch_len": 9079, | |
"grad_accum": 2, | |
"num_batches": 2, | |
"avg_samples_per_batch": 133.5, | |
"samples_per_gpu": 8, | |
"timestamp": "2024-08-18T20:05:10.241425" | |
} | |
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Loading checkpoint shards: 33%|███▎ | 1/3 [00:00<00:00, 2.84it/s]You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:00<00:00, 3.30it/s]You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 3.49it/s] | |
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 3.48it/s] | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 3.39it/s] | |
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 3.40it/s] | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 3.49it/s] | |
Loading checkpoint shards: 33%|███▎ | 1/3 [00:00<00:00, 2.70it/s]Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 3.31it/s] | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00, 2.79it/s] | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
[1/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output multi_tensor_adam.cuda.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -I/opt/app-root/lib64/python3.11/site-packages/deepspeed/ops/csrc/includes -I/opt/app-root/lib64/python3.11/site-packages/deepspeed/ops/csrc/adam -isystem /opt/app-root/lib64/python3.11/site-packages/torch/include -isystem /opt/app-root/lib64/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app-root/lib64/python3.11/site-packages/torch/include/TH -isystem /opt/app-root/lib64/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -std=c++17 -c /opt/app-root/lib64/python3.11/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o | |
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -I/opt/app-root/lib64/python3.11/site-packages/deepspeed/ops/csrc/includes -I/opt/app-root/lib64/python3.11/site-packages/deepspeed/ops/csrc/adam -isystem /opt/app-root/lib64/python3.11/site-packages/torch/include -isystem /opt/app-root/lib64/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app-root/lib64/python3.11/site-packages/torch/include/TH -isystem /opt/app-root/lib64/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DBF16_AVAILABLE -c /opt/app-root/lib64/python3.11/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o | |
[3/3] c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/opt/app-root/lib64/python3.11/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o fused_adam.so | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 34.718958377838135 seconds | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 30.327874183654785 seconds | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 30.326067447662354 seconds | |
Loading extension module fused_adam... | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 30.226067066192627 seconds | |
Time to load fused_adam op: 29.229661464691162 seconds | |
[2024-08-18 20:05:41,506] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.4, git-hash=unknown, git-branch=unknown | |
[2024-08-18 20:05:41,506] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 30.326636791229248 seconds | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 30.327282190322876 seconds | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 29.730340242385864 seconds | |
tyler-a100-newimage-val:576:1419 [6] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:574:1418 [4] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:572:1420 [2] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:575:1423 [5] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:577:1422 [7] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:571:1426 [1] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:573:1417 [3] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:571:1426 [1] NCCL INFO bootstrapSplit: comm 0x55f7afbd0a60 parent 0x55f7ae069780 rank 1 nranks 8 color -934961569 key 1 prev 0 next 2 - DONE | |
tyler-a100-newimage-val:573:1417 [3] NCCL INFO bootstrapSplit: comm 0x55e7b1bda750 parent 0x55e7b0065170 rank 3 nranks 8 color -934961569 key 3 prev 2 next 4 - DONE | |
tyler-a100-newimage-val:575:1423 [5] NCCL INFO bootstrapSplit: comm 0x563617fd12e0 parent 0x5636163750f0 rank 5 nranks 8 color -934961569 key 5 prev 4 next 6 - DONE | |
tyler-a100-newimage-val:577:1422 [7] NCCL INFO bootstrapSplit: comm 0x558d77c33a30 parent 0x558d760ca530 rank 7 nranks 8 color -934961569 key 7 prev 6 next 0 - DONE | |
tyler-a100-newimage-val:572:1420 [2] NCCL INFO bootstrapSplit: comm 0x55e6b08cfc90 parent 0x55e6aed5c790 rank 2 nranks 8 color -934961569 key 2 prev 1 next 3 - DONE | |
tyler-a100-newimage-val:574:1418 [4] NCCL INFO bootstrapSplit: comm 0x560adc23dc90 parent 0x560ada6ca910 rank 4 nranks 8 color -934961569 key 4 prev 3 next 5 - DONE | |
tyler-a100-newimage-val:573:1417 [3] NCCL INFO ncclCommSplit comm 0x55e7b1bda750 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId a040 parent 0x55e7b0065170 color -934961569 key 3 commId 0xed88d95c67fb6a92 - Init START | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO bootstrapSplit: comm 0x55d351b0d620 parent 0x55d34ff865c0 rank 0 nranks 8 color -934961569 key 0 prev 7 next 1 - DONE | |
tyler-a100-newimage-val:577:1422 [7] NCCL INFO ncclCommSplit comm 0x558d77c33a30 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId e080 parent 0x558d760ca530 color -934961569 key 7 commId 0xed88d95c67fb6a92 - Init START | |
tyler-a100-newimage-val:572:1420 [2] NCCL INFO ncclCommSplit comm 0x55e6b08cfc90 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a030 parent 0x55e6aed5c790 color -934961569 key 2 commId 0xed88d95c67fb6a92 - Init START | |
tyler-a100-newimage-val:576:1419 [6] NCCL INFO bootstrapSplit: comm 0x5640d0e2b2b0 parent 0x5640cf2b7db0 rank 6 nranks 8 color -934961569 key 6 prev 5 next 7 - DONE | |
tyler-a100-newimage-val:575:1423 [5] NCCL INFO ncclCommSplit comm 0x563617fd12e0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId c060 parent 0x5636163750f0 color -934961569 key 5 commId 0xed88d95c67fb6a92 - Init START | |
tyler-a100-newimage-val:574:1418 [4] NCCL INFO ncclCommSplit comm 0x560adc23dc90 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId c050 parent 0x560ada6ca910 color -934961569 key 4 commId 0xed88d95c67fb6a92 - Init START | |
tyler-a100-newimage-val:571:1426 [1] NCCL INFO ncclCommSplit comm 0x55f7afbd0a60 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 8020 parent 0x55f7ae069780 color -934961569 key 1 commId 0xed88d95c67fb6a92 - Init START | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO ncclCommSplit comm 0x55d351b0d620 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 8010 parent 0x55d34ff865c0 color -934961569 key 0 commId 0xed88d95c67fb6a92 - Init START | |
tyler-a100-newimage-val:576:1419 [6] NCCL INFO ncclCommSplit comm 0x5640d0e2b2b0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId e070 parent 0x5640cf2b7db0 color -934961569 key 6 commId 0xed88d95c67fb6a92 - Init START | |
tyler-a100-newimage-val:571:1426 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffffffff | |
tyler-a100-newimage-val:571:1426 [1] NCCL INFO NVLS multicast support is not available on dev 1 | |
tyler-a100-newimage-val:575:1423 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:575:1423 [5] NCCL INFO NVLS multicast support is not available on dev 5 | |
tyler-a100-newimage-val:572:1420 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffffffff | |
tyler-a100-newimage-val:572:1420 [2] NCCL INFO NVLS multicast support is not available on dev 2 | |
tyler-a100-newimage-val:574:1418 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:574:1418 [4] NCCL INFO NVLS multicast support is not available on dev 4 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffffffff | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO NVLS multicast support is not available on dev 0 | |
tyler-a100-newimage-val:573:1417 [3] NCCL INFO Setting affinity for GPU 3 to ff,ffffffff | |
tyler-a100-newimage-val:576:1419 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:576:1419 [6] NCCL INFO NVLS multicast support is not available on dev 6 | |
tyler-a100-newimage-val:573:1417 [3] NCCL INFO NVLS multicast support is not available on dev 3 | |
tyler-a100-newimage-val:577:1422 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:577:1422 [7] NCCL INFO NVLS multicast support is not available on dev 7 | |
tyler-a100-newimage-val:577:1422 [7] NCCL INFO comm 0x558d77c33a30 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0 | |
tyler-a100-newimage-val:576:1419 [6] NCCL INFO comm 0x5640d0e2b2b0 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0 | |
tyler-a100-newimage-val:572:1420 [2] NCCL INFO comm 0x55e6b08cfc90 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0 | |
tyler-a100-newimage-val:575:1423 [5] NCCL INFO comm 0x563617fd12e0 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO comm 0x55d351b0d620 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0 | |
tyler-a100-newimage-val:571:1426 [1] NCCL INFO comm 0x55f7afbd0a60 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0 | |
tyler-a100-newimage-val:574:1418 [4] NCCL INFO comm 0x560adc23dc90 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0 | |
tyler-a100-newimage-val:573:1417 [3] NCCL INFO comm 0x55e7b1bda750 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:577:1422 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 | |
tyler-a100-newimage-val:577:1422 [7] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:576:1419 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 | |
tyler-a100-newimage-val:576:1419 [6] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:575:1423 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:575:1423 [5] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:574:1418 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:574:1418 [4] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:572:1420 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:572:1420 [2] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:571:1426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:573:1417 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:573:1417 [3] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:571:1426 [1] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:576:1419 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:576:1419 [6] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO CC Off, Multi-GPU CC Off, workFifoBytes 1048576 | |
tyler-a100-newimage-val:571:1426 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:571:1426 [1] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:575:1423 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:575:1423 [5] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:573:1417 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:573:1417 [3] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:574:1418 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:574:1418 [4] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:577:1422 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:577:1422 [7] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:572:1420 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:572:1420 [2] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:576:1419 [6] NCCL INFO ncclCommSplit comm 0x5640d0e2b2b0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId e070 parent 0x5640cf2b7db0 color -934961569 key 6 commId 0xed88d95c67fb6a92 - Init COMPLETE | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO ncclCommSplit comm 0x55d351b0d620 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 8010 parent 0x55d34ff865c0 color -934961569 key 0 commId 0xed88d95c67fb6a92 - Init COMPLETE | |
tyler-a100-newimage-val:576:1419 [6] NCCL INFO Init timings: rank 6 nranks 8 total 0.36 (kernels 0.00, bootstrap 0.03, allgathers 0.00, topo 0.25, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:572:1420 [2] NCCL INFO ncclCommSplit comm 0x55e6b08cfc90 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a030 parent 0x55e6aed5c790 color -934961569 key 2 commId 0xed88d95c67fb6a92 - Init COMPLETE | |
tyler-a100-newimage-val:575:1423 [5] NCCL INFO ncclCommSplit comm 0x563617fd12e0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId c060 parent 0x5636163750f0 color -934961569 key 5 commId 0xed88d95c67fb6a92 - Init COMPLETE | |
tyler-a100-newimage-val:570:1421 [0] NCCL INFO Init timings: rank 0 nranks 8 total 0.36 (kernels 0.00, bootstrap 0.03, allgathers 0.00, topo 0.25, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:572:1420 [2] NCCL INFO Init timings: rank 2 nranks 8 total 0.36 (kernels 0.00, bootstrap 0.03, allgathers 0.00, topo 0.25, graphs 0.00, connections 0.06, rest 0.02) | |
tyler-a100-newimage-val:571:1426 [1] NCCL INFO ncclCommSplit comm 0x55f7afbd0a60 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 8020 parent 0x55f7ae069780 color -934961569 key 1 commId 0xed88d95c67fb6a92 - Init COMPLETE | |
tyler-a100-newimage-val:575:1423 [5] NCCL INFO Init timings: rank 5 nranks 8 total 0.36 (kernels 0.00, bootstrap 0.03, allgathers 0.00, topo 0.25, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:571:1426 [1] NCCL INFO Init timings: rank 1 nranks 8 total 0.33 (kernels 0.00, bootstrap 0.00, allgathers 0.00, topo 0.25, graphs 0.00, connections 0.06, rest 0.02) | |
tyler-a100-newimage-val:574:1418 [4] NCCL INFO ncclCommSplit comm 0x560adc23dc90 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId c050 parent 0x560ada6ca910 color -934961569 key 4 commId 0xed88d95c67fb6a92 - Init COMPLETE | |
tyler-a100-newimage-val:577:1422 [7] NCCL INFO ncclCommSplit comm 0x558d77c33a30 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId e080 parent 0x558d760ca530 color -934961569 key 7 commId 0xed88d95c67fb6a92 - Init COMPLETE | |
tyler-a100-newimage-val:574:1418 [4] NCCL INFO Init timings: rank 4 nranks 8 total 0.36 (kernels 0.00, bootstrap 0.03, allgathers 0.00, topo 0.25, graphs 0.00, connections 0.06, rest 0.02) | |
tyler-a100-newimage-val:577:1422 [7] NCCL INFO Init timings: rank 7 nranks 8 total 0.36 (kernels 0.00, bootstrap 0.03, allgathers 0.00, topo 0.25, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:573:1417 [3] NCCL INFO ncclCommSplit comm 0x55e7b1bda750 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId a040 parent 0x55e7b0065170 color -934961569 key 3 commId 0xed88d95c67fb6a92 - Init COMPLETE | |
tyler-a100-newimage-val:573:1417 [3] NCCL INFO Init timings: rank 3 nranks 8 total 0.36 (kernels 0.00, bootstrap 0.03, allgathers 0.00, topo 0.25, graphs 0.00, connections 0.06, rest 0.02) | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:572:1444 [2] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:573:1446 [3] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:574:1450 [4] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:571:1449 [1] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:570:1445 [0] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:577:1448 [7] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:575:1447 [5] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:576:1443 [6] NCCL INFO Connected all rings | |
[2024-08-18 20:05:47,116] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False | |
[2024-08-18 20:05:47,117] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer | |
[2024-08-18 20:05:47,117] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer | |
[2024-08-18 20:05:47,130] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam | |
[2024-08-18 20:05:47,130] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'> | |
[2024-08-18 20:05:47,130] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer | |
[2024-08-18 20:05:47,130] [INFO] [stage_1_and_2.py:148:__init__] Reduce bucket size 500,000,000 | |
[2024-08-18 20:05:47,130] [INFO] [stage_1_and_2.py:149:__init__] Allgather bucket size 500,000,000 | |
[2024-08-18 20:05:47,130] [INFO] [stage_1_and_2.py:150:__init__] CPU Offload: False | |
[2024-08-18 20:05:47,130] [INFO] [stage_1_and_2.py:151:__init__] Round robin gradient partitioning: False | |
[2024-08-18 20:05:59,693] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:06:00,706] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:06:00,871] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:06:01,012] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:06:01,166] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:06:01,497] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:06:01,620] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:06:01,858] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states | |
[2024-08-18 20:06:01,859] [INFO] [utils.py:782:see_memory_usage] MA 15.69 GB Max_MA 17.26 GB CA 17.26 GB Max_CA 17 GB | |
[2024-08-18 20:06:01,860] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 30.63 GB, percent = 2.4% | |
[2024-08-18 20:06:02,079] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states | |
[2024-08-18 20:06:02,080] [INFO] [utils.py:782:see_memory_usage] MA 15.69 GB Max_MA 18.83 GB CA 20.4 GB Max_CA 20 GB | |
[2024-08-18 20:06:02,080] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 30.64 GB, percent = 2.4% | |
[2024-08-18 20:06:02,080] [INFO] [stage_1_and_2.py:543:__init__] optimizer state initialized | |
[2024-08-18 20:06:02,301] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer | |
[2024-08-18 20:06:02,302] [INFO] [utils.py:782:see_memory_usage] MA 15.69 GB Max_MA 15.69 GB CA 20.4 GB Max_CA 20 GB | |
[2024-08-18 20:06:02,302] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 30.64 GB, percent = 2.4% | |
[2024-08-18 20:06:02,304] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer | |
[2024-08-18 20:06:02,304] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler | |
[2024-08-18 20:06:02,304] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7eff204ab310> | |
[2024-08-18 20:06:02,304] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[(0.9, 0.95)] | |
[2024-08-18 20:06:02,305] [INFO] [config.py:997:print] DeepSpeedEngine configuration: | |
[2024-08-18 20:06:02,305] [INFO] [config.py:1001:print] activation_checkpointing_config { | |
"partition_activations": false, | |
"contiguous_memory_optimization": false, | |
"cpu_checkpointing": false, | |
"number_checkpoints": null, | |
"synchronize_checkpoint_boundary": false, | |
"profile": false | |
} | |
[2024-08-18 20:06:02,305] [INFO] [config.py:1001:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} | |
[2024-08-18 20:06:02,305] [INFO] [config.py:1001:print] amp_enabled .................. False | |
[2024-08-18 20:06:02,305] [INFO] [config.py:1001:print] amp_params ................... False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] autotuning_config ............ { | |
"enabled": false, | |
"start_step": null, | |
"end_step": null, | |
"metric_path": null, | |
"arg_mappings": null, | |
"metric": "throughput", | |
"model_info": null, | |
"results_dir": "autotuning_results", | |
"exps_dir": "autotuning_exps", | |
"overwrite": true, | |
"fast": true, | |
"start_profile_step": 3, | |
"end_profile_step": 5, | |
"tuner_type": "gridsearch", | |
"tuner_early_stopping": 5, | |
"tuner_num_trials": 50, | |
"model_info_path": null, | |
"mp_size": 1, | |
"max_train_batch_size": null, | |
"min_train_batch_size": 1, | |
"max_train_micro_batch_size_per_gpu": 1.024000e+03, | |
"min_train_micro_batch_size_per_gpu": 1, | |
"num_tuning_micro_batch_sizes": 3 | |
} | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] bfloat16_enabled ............. True | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] bfloat16_immediate_grad_update False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] checkpoint_parallel_write_pipeline False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] checkpoint_tag_validation_enabled True | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] checkpoint_tag_validation_fail False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f0270163fd0> | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] communication_data_type ...... None | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] curriculum_enabled_legacy .... False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] curriculum_params_legacy ..... False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] data_efficiency_enabled ...... False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] dataloader_drop_last ......... False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] disable_allgather ............ False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] dump_state ................... False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] dynamic_loss_scale_args ...... None | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] eigenvalue_enabled ........... False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] eigenvalue_gas_boundary_resolution 1 | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] eigenvalue_layer_name ........ bert.encoder.layer | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] eigenvalue_layer_num ......... 0 | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] eigenvalue_max_iter .......... 100 | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] eigenvalue_stability ......... 1e-06 | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] eigenvalue_tol ............... 0.01 | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] eigenvalue_verbose ........... False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] elasticity_enabled ........... False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] flops_profiler_config ........ { | |
"enabled": false, | |
"recompute_fwd_factor": 0.0, | |
"profile_step": 1, | |
"module_depth": -1, | |
"top_modules": 1, | |
"detailed": true, | |
"output_file": null | |
} | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] fp16_auto_cast ............... None | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] fp16_enabled ................. False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] fp16_master_weights_and_gradients False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] global_rank .................. 0 | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] grad_accum_dtype ............. None | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] gradient_accumulation_steps .. 2 | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] gradient_clipping ............ 1.0 | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] gradient_predivide_factor .... 1.0 | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] graph_harvesting ............. False | |
[2024-08-18 20:06:02,306] [INFO] [config.py:1001:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] initial_dynamic_scale ........ 1 | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] load_universal_checkpoint .... False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] loss_scale ................... 1.0 | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] memory_breakdown ............. False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] mics_hierarchial_params_gather False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] mics_shard_size .............. -1 | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] nebula_config ................ { | |
"enabled": false, | |
"persistent_storage_path": null, | |
"persistent_time_interval": 100, | |
"num_of_version_in_retention": 2, | |
"enable_nebula_load": true, | |
"load_path": null | |
} | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] optimizer_legacy_fusion ...... False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] optimizer_name ............... None | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] optimizer_params ............. None | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] pld_enabled .................. False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] pld_params ................... False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] prescale_gradients ........... False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] scheduler_name ............... None | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] scheduler_params ............. None | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] seq_parallel_communication_data_type torch.float32 | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] sparse_attention ............. None | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] sparse_gradients_enabled ..... False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] steps_per_print .............. 1 | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] timers_config ................ enabled=True synchronized=True | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] train_batch_size ............. 128 | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] train_micro_batch_size_per_gpu 8 | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] use_data_before_expert_parallel_ False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] use_node_local_storage ....... False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] wall_clock_breakdown ......... False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] weight_quantization_config ... None | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] world_size ................... 8 | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] zero_allow_untested_optimizer False | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] zero_enabled ................. True | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] zero_force_ds_cpu_optimizer .. True | |
[2024-08-18 20:06:02,307] [INFO] [config.py:1001:print] zero_optimization_stage ...... 2 | |
[2024-08-18 20:06:02,307] [INFO] [config.py:987:print_user_config] json = { | |
"train_batch_size": 128, | |
"gradient_accumulation_steps": 2, | |
"train_micro_batch_size_per_gpu": 8, | |
"steps_per_print": 1, | |
"zero_optimization": { | |
"stage": 2, | |
"offload_param": { | |
"device": "none" | |
}, | |
"offload_optimizer": { | |
"device": "none" | |
} | |
}, | |
"bf16": { | |
"enabled": true | |
}, | |
"gradient_clipping": 1.0, | |
"prescale_gradients": false, | |
"wall_clock_breakdown": false | |
} | |
[2024-08-18 20:06:02,308] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
Epoch 0: 0%| | 0/2 [00:00<?, ?it/s] total tokens: 8715 num samples: 15 num padding tokens: 392 - rank: 5 max len: 581 min len: 525 avg len: 554.8666666666667 num_loss_counted_tokens: 7438 total tokens: 8565 num samples: 15 num padding tokens: 401 - rank: 5 max len: 571 min len: 519 avg len: 544.2666666666667 num_loss_counted_tokens: 7279 | |
total tokens: 8477 num samples: 7 num padding tokens: 255 - rank: 1 max len: 1211 min len: 1136 avg len: 1174.5714285714287 num_loss_counted_tokens: 7809 | |
total tokens: 8666 num samples: 7 num padding tokens: 540 - rank: 1 max len: 1238 min len: 1097 avg len: 1160.857142857143 num_loss_counted_tokens: 7713 | |
total tokens: 8896 num samples: 8 num padding tokens: 933 - rank: 2 max len: 1112 min len: 883 avg len: 995.375 num_loss_counted_tokens: 7491 | |
total tokens: 8200 num samples: 8 num padding tokens: 824 - rank: 2 max len: 1025 min len: 886 avg len: 922.0 num_loss_counted_tokens: 6904 | |
total tokens: 8492 num samples: 22 num padding tokens: 1012 - rank: 7 max len: 386 min len: 288 avg len: 340.0 num_loss_counted_tokens: 6182 | |
total tokens: 8579 num samples: 23 num padding tokens: 1024 - rank: 7 max len: 373 min len: 282 avg len: 328.4782608695652 num_loss_counted_tokens: 6198 | |
total tokens: 8567 num samples: 13 num padding tokens: 536 - rank: 4 max len: 659 min len: 577 avg len: 617.7692307692307 num_loss_counted_tokens: 7264 | |
total tokens: 8723 num samples: 13 num padding tokens: 546 - rank: 4 max len: 671 min len: 581 avg len: 629.0 num_loss_counted_tokens: 7410 | |
total tokens: 8600 num samples: 5 num padding tokens: 654 - rank: 0 max len: 1720 min len: 1230 avg len: 1589.2 num_loss_counted_tokens: 7651 | |
total tokens: 8908 num samples: 17 num padding tokens: 1767 - rank: 6 max len: 524 min len: 384 avg len: 420.05882352941177 num_loss_counted_tokens: 6138 | |
total tokens: 8704 num samples: 17 num padding tokens: 1162 - rank: 6 max len: 512 min len: 388 avg len: 443.6470588235294 num_loss_counted_tokens: 6539 | |
total tokens: 8660 num samples: 5 num padding tokens: 130 - rank: 0 max len: 1732 min len: 1681 avg len: 1706.0 num_loss_counted_tokens: 8235 | |
total tokens: 8688 num samples: 12 num padding tokens: 317 - rank: 3 max len: 724 min len: 662 avg len: 697.5833333333334 num_loss_counted_tokens: 7663 | |
total tokens: 8712 num samples: 12 num padding tokens: 241 - rank: 3 max len: 726 min len: 673 avg len: 705.9166666666666 num_loss_counted_tokens: 7763 | |
Per-token loss scaled by world size: 0.00018896172696258873Per-token loss scaled by world size: 0.00022490561241284013Per-token loss scaled by world size: 0.00020114783546887338Per-token loss scaled by world size: 0.00021589698735624552Per-token loss scaled by world size: 0.0002045775472652167 | |
Per-token loss scaled by world size: 0.0001989303418667987 | |
Per-token loss scaled by world size: 0.00020551522902678698 | |
Epoch: 0, Step: 1, Rank: 3, loss = 1.6271358728408813Epoch: 0, Step: 1, Rank: 6, loss = 1.3670908212661743 | |
Epoch: 0, Step: 1, Rank: 5, loss = 1.4800673723220825 | |
Epoch: 0, Step: 1, Rank: 2, loss = 1.455254316329956 | |
Epoch: 0, Step: 1, Rank: 1, loss = 1.5619606971740723 | |
Epoch: 0, Step: 1, Rank: 7, loss = 1.4392112493515015 | |
Epoch: 0, Step: 1, Rank: 4, loss = 1.4868513345718384 | |
Per-token loss scaled by world size: 0.0001951669983100146 | |
Epoch: 0, Step: 1, Rank: 0, loss = 1.4119844436645508 | |
Epoch 0: 50%|█████ | 1/2 [00:03<00:03, 3.83s/it]{ | |
"epoch": 0, | |
"step": 1, | |
"rank": 0, | |
"loss": 1.4119844436645508, | |
"overall_throughput": 18.825078649542128, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.31652021408081, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 57878, | |
"batch_size": 99, | |
"total_loss": 1.4786945581436157, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:06:06.148365" | |
} | |
Per-token loss scaled by world size: 0.00022342400916386396Per-token loss scaled by world size: 0.00018432749493513256Per-token loss scaled by world size: 0.00020338631293270737Per-token loss scaled by world size: 0.00018564658239483833 | |
Per-token loss scaled by world size: 0.00020801158098038286 | |
Per-token loss scaled by world size: 0.00018980413733515888 | |
Per-token loss scaled by world size: 0.00019465763762127608Epoch: 0, Step: 2, Rank: 1, loss = 1.3317431211471558 | |
Epoch: 0, Step: 2, Rank: 5, loss = 1.4694406986236572Epoch: 0, Step: 2, Rank: 3, loss = 1.6142104864120483 | |
Epoch: 0, Step: 2, Rank: 6, loss = 1.341273307800293 | |
Epoch: 0, Step: 2, Rank: 2, loss = 1.5028576850891113 | |
Epoch: 0, Step: 2, Rank: 4, loss = 1.3713111877441406 | |
Epoch: 0, Step: 2, Rank: 7, loss = 1.4063770771026611 | |
Per-token loss scaled by world size: 0.00021443456353154033 | |
Epoch: 0, Step: 2, Rank: 0, loss = 1.5492628812789917 | |
[2024-08-18 20:06:08,978] [INFO] [logging.py:96:log_dist] [Rank 0] step=1, skipped=0, lr=[8.000000000000001e-07], mom=[(0.9, 0.95)] | |
Epoch 0: 100%|██████████| 2/2 [00:06<00:00, 3.29s/it]{ | |
"epoch": 0, | |
"step": 2, | |
"rank": 0, | |
"loss": 1.5492628812789917, | |
"overall_throughput": 22.634999728904948, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 23.017526626586914, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 57799, | |
"batch_size": 100, | |
"total_loss": 1.4483095407485962, | |
"gradnorm": 3.2187001705169678, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:06:09.050231" | |
} | |
Saving model in huggingface format at samples_seen: 192 | |
Model saved in /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_192 | |
[20:06:27] INFO saving took 18.58629083633423 seconds utils.py:611 | |
Epoch 0: 100%|██████████| 2/2 [00:25<00:00, 12.69s/it] | |
total tokens: 8908 num samples: 17 num padding tokens: 1042 - rank: 6 max len: 524 min len: 389 avg len: 462.70588235294116 num_loss_counted_tokens: 6863 | |
total tokens: 8904 num samples: 21 num padding tokens: 847 - rank: 6 max len: 424 min len: 347 avg len: 383.6666666666667 num_loss_counted_tokens: 6818 | |
total tokens: 8996 num samples: 13 num padding tokens: 686 - rank: 4 max len: 692 min len: 600 avg len: 639.2307692307693 num_loss_counted_tokens: 7543 | |
total tokens: 8610 num samples: 14 num padding tokens: 511 - rank: 4 max len: 615 min len: 546 avg len: 578.5 num_loss_counted_tokens: 7273 | |
total tokens: 8495 num samples: 5 num padding tokens: 1475 - rank: 0 max len: 1699 min len: 1186 avg len: 1404.0 num_loss_counted_tokens: 6725 | |
total tokens: 8660 num samples: 5 num padding tokens: 88 - rank: 0 max len: 1732 min len: 1685 avg len: 1714.4 num_loss_counted_tokens: 8277 | |
total tokens: 8536 num samples: 22 num padding tokens: 1108 - rank: 7 max len: 388 min len: 283 avg len: 337.6363636363636 num_loss_counted_tokens: 6130 total tokens: 8835 num samples: 15 num padding tokens: 400 - rank: 5 max len: 589 min len: 524 avg len: 562.3333333333334 num_loss_counted_tokens: 7550 | |
total tokens: 8688 num samples: 16 num padding tokens: 651 - rank: 5 max len: 543 min len: 428 avg len: 502.3125 num_loss_counted_tokens: 7093 | |
total tokens: 8970 num samples: 26 num padding tokens: 730 - rank: 7 max len: 345 min len: 253 avg len: 316.9230769230769 num_loss_counted_tokens: 6706 | |
total tokens: 8397 num samples: 9 num padding tokens: 1203 - rank: 2 max len: 933 min len: 709 avg len: 799.3333333333334 num_loss_counted_tokens: 6663 | |
total tokens: 8288 num samples: 7 num padding tokens: 408 - rank: 1 max len: 1184 min len: 1017 avg len: 1125.7142857142858 num_loss_counted_tokens: 7467 | |
total tokens: 8470 num samples: 7 num padding tokens: 755 - rank: 2 max len: 1210 min len: 931 avg len: 1102.142857142857 num_loss_counted_tokens: 7302 | |
total tokens: 8950 num samples: 10 num padding tokens: 1495 - rank: 3 max len: 895 min len: 694 avg len: 745.5 num_loss_counted_tokens: 6865 | |
total tokens: 8405 num samples: 5 num padding tokens: 928 - rank: 1 max len: 1681 min len: 1230 avg len: 1495.4 num_loss_counted_tokens: 7182 | |
total tokens: 8508 num samples: 12 num padding tokens: 397 - rank: 3 max len: 709 min len: 628 avg len: 675.9166666666666 num_loss_counted_tokens: 7403 | |
Per-token loss scaled by world size: 0.00018219766207039356Per-token loss scaled by world size: 0.00019190594321116805Per-token loss scaled by world size: 0.00019805562624242157Per-token loss scaled by world size: 0.0001988127187360078Per-token loss scaled by world size: 0.00019518414046615362Per-token loss scaled by world size: 0.00021522259339690208 | |
Per-token loss scaled by world size: 0.00020449883595574647 | |
Epoch: 1, Step: 3, Rank: 6, loss = 1.3844094276428223 | |
Epoch: 1, Step: 3, Rank: 1, loss = 1.3143739700317383Epoch: 1, Step: 3, Rank: 4, loss = 1.428773283958435Epoch: 1, Step: 3, Rank: 7, loss = 1.4080584049224854Epoch: 1, Step: 3, Rank: 3, loss = 1.4342349767684937 | |
Epoch: 1, Step: 3, Rank: 5, loss = 1.552615761756897 | |
Epoch: 1, Step: 3, Rank: 2, loss = 1.4752546548843384 | |
Per-token loss scaled by world size: 0.00021497253328561783 | |
Epoch: 1, Step: 3, Rank: 0, loss = 1.5508118867874146 | |
{ | |
"epoch": 1,████ | 1/2 [00:03<00:03, 3.17s/it] | |
"step": 3, | |
"rank": 0, | |
"loss": 1.5508118867874146, | |
"overall_throughput": 23.69786494018801, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.60161828994751, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 57712, | |
"batch_size": 94, | |
"total_loss": 1.4435664415359497, | |
"gradnorm": 3.2187001705169678, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:06:30.868391" | |
} | |
Per-token loss scaled by world size: 0.00020728030358441174Per-token loss scaled by world size: 0.00018060434376820922Per-token loss scaled by world size: 0.0002226187934866175Per-token loss scaled by world size: 0.00021563439804594964 | |
Per-token loss scaled by world size: 0.00020198585116304457 | |
Per-token loss scaled by world size: 0.0002103917795466259 | |
Per-token loss scaled by world size: 0.0002255949075333774 | |
Epoch: 1, Step: 4, Rank: 3, loss = 1.5624500513076782 | |
Epoch: 1, Step: 4, Rank: 5, loss = 1.5134299993515015 | |
Epoch: 1, Step: 4, Rank: 0, loss = 1.2675715684890747Epoch: 1, Step: 4, Rank: 4, loss = 1.4547967910766602 | |
Epoch: 1, Step: 4, Rank: 1, loss = 1.4176377058029175 | |
Epoch: 1, Step: 4, Rank: 2, loss = 1.4766347408294678 | |
Epoch: 1, Step: 4, Rank: 7, loss = 1.5833379030227661 | |
Per-token loss scaled by world size: 0.00022538744087796658 | |
Epoch: 1, Step: 4, Rank: 6, loss = 1.5818817615509033 | |
[2024-08-18 20:06:33,631] [INFO] [logging.py:96:log_dist] [Rank 0] step=2, skipped=0, lr=[1.6000000000000001e-06], mom=[(0.9, 0.95)] | |
{ | |
"epoch": 1,█████████| 2/2 [00:06<00:00, 2.98s/it] | |
"step": 4, | |
"rank": 0, | |
"loss": 1.2675715684890747, | |
"overall_throughput": 23.434918025287512, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 22.997975826263428, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 56148, | |
"batch_size": 110, | |
"total_loss": 1.4822176694869995, | |
"gradnorm": 3.2527425289154053, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:06:33.767906" | |
} | |
Saving model in huggingface format at samples_seen: 320 | |
Model saved in /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_320 | |
[20:06:52] INFO saving took 18.69700264930725 seconds utils.py:611 | |
Epoch 1: 100%|██████████| 2/2 [00:24<00:00, 12.41s/it] | |
tyler-a100-newimage-val:570:9772 [0] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:571:9773 [1] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:573:9776 [3] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:577:9774 [7] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:574:9775 [4] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:575:9771 [5] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:577:9774 [7] NCCL INFO misc/socket.cc:550 -> 3 | |
tyler-a100-newimage-val:576:9777 [6] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:573:9776 [3] NCCL INFO misc/socket.cc:550 -> 3 | |
tyler-a100-newimage-val:571:9773 [1] NCCL INFO misc/socket.cc:550 -> 3 | |
tyler-a100-newimage-val:574:9775 [4] NCCL INFO misc/socket.cc:550 -> 3 | |
tyler-a100-newimage-val:575:9771 [5] NCCL INFO misc/socket.cc:550 -> 3 | |
tyler-a100-newimage-val:577:9774 [7] NCCL INFO misc/socket.cc:573 -> 3 | |
tyler-a100-newimage-val:572:9778 [2] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:577:9774 [7] NCCL INFO misc/socket.cc:621 -> 3 | |
tyler-a100-newimage-val:570:9772 [0] NCCL INFO misc/socket.cc:550 -> 3 | |
tyler-a100-newimage-val:576:9777 [6] NCCL INFO misc/socket.cc:550 -> 3 | |
tyler-a100-newimage-val:573:9776 [3] NCCL INFO misc/socket.cc:573 -> 3 | |
tyler-a100-newimage-val:571:9773 [1] NCCL INFO misc/socket.cc:573 -> 3 | |
tyler-a100-newimage-val:575:9771 [5] NCCL INFO misc/socket.cc:573 -> 3 | |
tyler-a100-newimage-val:574:9775 [4] NCCL INFO misc/socket.cc:573 -> 3 | |
tyler-a100-newimage-val:572:9778 [2] NCCL INFO misc/socket.cc:550 -> 3 | |
tyler-a100-newimage-val:574:9775 [4] NCCL INFO misc/socket.cc:621 -> 3 | |
tyler-a100-newimage-val:570:9772 [0] NCCL INFO misc/socket.cc:573 -> 3 | |
tyler-a100-newimage-val:576:9777 [6] NCCL INFO misc/socket.cc:573 -> 3 | |
tyler-a100-newimage-val:573:9776 [3] NCCL INFO misc/socket.cc:621 -> 3 | |
tyler-a100-newimage-val:571:9773 [1] NCCL INFO misc/socket.cc:621 -> 3 | |
tyler-a100-newimage-val:575:9771 [5] NCCL INFO misc/socket.cc:621 -> 3 | |
tyler-a100-newimage-val:572:9778 [2] NCCL INFO misc/socket.cc:573 -> 3 | |
tyler-a100-newimage-val:570:9772 [0] NCCL INFO misc/socket.cc:621 -> 3 | |
tyler-a100-newimage-val:572:9778 [2] NCCL INFO misc/socket.cc:621 -> 3 | |
tyler-a100-newimage-val:576:9777 [6] NCCL INFO misc/socket.cc:621 -> 3 | |
tyler-a100-newimage-val:574:1319 [4] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:576:1316 [6] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:574:1319 [4] NCCL INFO misc/socket.cc:752 -> 3 | |
tyler-a100-newimage-val:574:1319 [4] NCCL INFO misc/socket.cc:428 -> 3 | |
tyler-a100-newimage-val:576:1316 [6] NCCL INFO misc/socket.cc:752 -> 3 | |
tyler-a100-newimage-val:574:1319 [4] NCCL INFO misc/socket.cc:564 -> 3 | |
tyler-a100-newimage-val:576:1316 [6] NCCL INFO misc/socket.cc:428 -> 3 | |
tyler-a100-newimage-val:574:1319 [4] NCCL INFO misc/socket.cc:668 -> 3 | |
tyler-a100-newimage-val:576:1316 [6] NCCL INFO misc/socket.cc:564 -> 3 | |
tyler-a100-newimage-val:574:9775 [4] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:577:9774 [7] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:574:9775 [4] NCCL INFO misc/socket.cc:58 -> 3 | |
tyler-a100-newimage-val:576:1316 [6] NCCL INFO misc/socket.cc:668 -> 3 | |
tyler-a100-newimage-val:577:9774 [7] NCCL INFO misc/socket.cc:58 -> 3 | |
tyler-a100-newimage-val:575:9771 [5] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:574:9775 [4] NCCL INFO misc/socket.cc:775 -> 3 | |
tyler-a100-newimage-val:577:9774 [7] NCCL INFO misc/socket.cc:775 -> 3 | |
tyler-a100-newimage-val:571:1330 [1] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:575:9771 [5] NCCL INFO misc/socket.cc:58 -> 3 | |
tyler-a100-newimage-val:576:9777 [6] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:573:9776 [3] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:575:9771 [5] NCCL INFO misc/socket.cc:775 -> 3 | |
tyler-a100-newimage-val:574:1319 [4] proxy.cc:1458 NCCL WARN [Service thread] Accept failed Resource temporarily unavailable | |
tyler-a100-newimage-val:570:1321 [0] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:573:9776 [3] NCCL INFO misc/socket.cc:58 -> 3 | |
tyler-a100-newimage-val:572:1327 [2] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:570:9772 [0] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:577:1318 [7] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:576:9777 [6] NCCL INFO misc/socket.cc:58 -> 3 | |
tyler-a100-newimage-val:570:1321 [0] NCCL INFO misc/socket.cc:752 -> 3 | |
tyler-a100-newimage-val:573:1324 [3] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:574:1319 [4] NCCL INFO misc/socket.cc:826 -> 3 | |
tyler-a100-newimage-val:571:1330 [1] NCCL INFO misc/socket.cc:752 -> 3 | |
tyler-a100-newimage-val:577:1318 [7] NCCL INFO misc/socket.cc:752 -> 3 | |
tyler-a100-newimage-val:576:9777 [6] NCCL INFO misc/socket.cc:775 -> 3 | |
tyler-a100-newimage-val:571:1330 [1] NCCL INFO misc/socket.cc:428 -> 3 | |
tyler-a100-newimage-val:577:1318 [7] NCCL INFO misc/socket.cc:428 -> 3 | |
tyler-a100-newimage-val:571:9773 [1] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:571:1330 [1] NCCL INFO misc/socket.cc:564 -> 3 | |
tyler-a100-newimage-val:570:1321 [0] NCCL INFO misc/socket.cc:428 -> 3 | |
tyler-a100-newimage-val:571:9773 [1] NCCL INFO misc/socket.cc:58 -> 3 | |
tyler-a100-newimage-val:574:1319 [4] proxy.cc:1497 NCCL WARN [Service thread] Could not receive type from localRank 4, res=3, closed=0 | |
tyler-a100-newimage-val:570:1321 [0] NCCL INFO misc/socket.cc:564 -> 3 | |
tyler-a100-newimage-val:571:1330 [1] NCCL INFO misc/socket.cc:668 -> 3 | |
tyler-a100-newimage-val:577:1318 [7] NCCL INFO misc/socket.cc:564 -> 3 | |
tyler-a100-newimage-val:572:1327 [2] NCCL INFO misc/socket.cc:752 -> 3 | |
tyler-a100-newimage-val:573:1324 [3] NCCL INFO misc/socket.cc:752 -> 3 | |
tyler-a100-newimage-val:571:9773 [1] NCCL INFO misc/socket.cc:775 -> 3 | |
tyler-a100-newimage-val:576:1316 [6] proxy.cc:1458 NCCL WARN [Service thread] Accept failed Resource temporarily unavailable | |
tyler-a100-newimage-val:570:1321 [0] NCCL INFO misc/socket.cc:668 -> 3 | |
tyler-a100-newimage-val:572:1327 [2] NCCL INFO misc/socket.cc:428 -> 3 | |
tyler-a100-newimage-val:570:9772 [0] NCCL INFO misc/socket.cc:58 -> 3 | |
tyler-a100-newimage-val:574:1319 [4] proxy.cc:1521 NCCL WARN [Proxy Service 4] Failed to execute operation Close from rank 4, retcode 3 | |
tyler-a100-newimage-val:573:9776 [3] NCCL INFO misc/socket.cc:775 -> 3 | |
tyler-a100-newimage-val:570:9772 [0] NCCL INFO misc/socket.cc:775 -> 3 | |
tyler-a100-newimage-val:572:1327 [2] NCCL INFO misc/socket.cc:564 -> 3 | |
tyler-a100-newimage-val:577:1318 [7] NCCL INFO misc/socket.cc:668 -> 3 | |
tyler-a100-newimage-val:575:1325 [5] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:576:1316 [6] NCCL INFO misc/socket.cc:826 -> 3 | |
tyler-a100-newimage-val:571:1330 [1] proxy.cc:1458 NCCL WARN [Service thread] Accept failed Resource temporarily unavailable | |
tyler-a100-newimage-val:572:1327 [2] NCCL INFO misc/socket.cc:668 -> 3 | |
tyler-a100-newimage-val:570:1321 [0] proxy.cc:1458 NCCL WARN [Service thread] Accept failed Resource temporarily unavailable | |
tyler-a100-newimage-val:572:9778 [2] NCCL INFO misc/socket.cc:47 -> 3 | |
tyler-a100-newimage-val:576:1316 [6] proxy.cc:1497 NCCL WARN [Service thread] Could not receive type from localRank 6, res=3, closed=0 | |
tyler-a100-newimage-val:575:1325 [5] NCCL INFO misc/socket.cc:752 -> 3 | |
tyler-a100-newimage-val:572:9778 [2] NCCL INFO misc/socket.cc:58 -> 3 | |
tyler-a100-newimage-val:572:9778 [2] NCCL INFO misc/socket.cc:775 -> 3 | |
tyler-a100-newimage-val:577:1318 [7] proxy.cc:1458 NCCL WARN [Service thread] Accept failed Resource temporarily unavailable | |
tyler-a100-newimage-val:575:1325 [5] NCCL INFO misc/socket.cc:428 -> 3 | |
tyler-a100-newimage-val:576:1316 [6] proxy.cc:1521 NCCL WARN [Proxy Service 6] Failed to execute operation Close from rank 6, retcode 3 | |
tyler-a100-newimage-val:572:1327 [2] proxy.cc:1458 NCCL WARN [Service thread] Accept failed Resource temporarily unavailable | |
tyler-a100-newimage-val:575:1325 [5] NCCL INFO misc/socket.cc:564 -> 3 | |
tyler-a100-newimage-val:570:1321 [0] NCCL INFO misc/socket.cc:826 -> 3 | |
tyler-a100-newimage-val:573:1324 [3] NCCL INFO misc/socket.cc:428 -> 3 | |
tyler-a100-newimage-val:571:1330 [1] NCCL INFO misc/socket.cc:826 -> 3 | |
tyler-a100-newimage-val:575:1325 [5] NCCL INFO misc/socket.cc:668 -> 3 | |
tyler-a100-newimage-val:570:1321 [0] proxy.cc:1497 NCCL WARN [Service thread] Could not receive type from localRank 0, res=3, closed=0 | |
tyler-a100-newimage-val:573:1324 [3] NCCL INFO misc/socket.cc:564 -> 3 | |
tyler-a100-newimage-val:570:1321 [0] proxy.cc:1521 NCCL WARN [Proxy Service 0] Failed to execute operation Close from rank 0, retcode 3 | |
tyler-a100-newimage-val:571:1330 [1] proxy.cc:1497 NCCL WARN [Service thread] Could not receive type from localRank 1, res=3, closed=0 | |
tyler-a100-newimage-val:577:1318 [7] NCCL INFO misc/socket.cc:826 -> 3 | |
tyler-a100-newimage-val:573:1324 [3] NCCL INFO misc/socket.cc:668 -> 3 | |
tyler-a100-newimage-val:577:1318 [7] proxy.cc:1497 NCCL WARN [Service thread] Could not receive type from localRank 7, res=3, closed=0 | |
tyler-a100-newimage-val:572:1327 [2] NCCL INFO misc/socket.cc:826 -> 3 | |
tyler-a100-newimage-val:577:1318 [7] proxy.cc:1521 NCCL WARN [Proxy Service 7] Failed to execute operation Close from rank 7, retcode 3 | |
tyler-a100-newimage-val:575:1325 [5] proxy.cc:1458 NCCL WARN [Service thread] Accept failed Resource temporarily unavailable | |
tyler-a100-newimage-val:572:1327 [2] proxy.cc:1497 NCCL WARN [Service thread] Could not receive type from localRank 2, res=3, closed=0 | |
tyler-a100-newimage-val:572:1327 [2] proxy.cc:1521 NCCL WARN [Proxy Service 2] Failed to execute operation Close from rank 2, retcode 3 | |
tyler-a100-newimage-val:571:1330 [1] proxy.cc:1521 NCCL WARN [Proxy Service 1] Failed to execute operation Close from rank 1, retcode 3 | |
tyler-a100-newimage-val:573:1324 [3] proxy.cc:1458 NCCL WARN [Service thread] Accept failed Resource temporarily unavailable | |
tyler-a100-newimage-val:575:1325 [5] NCCL INFO misc/socket.cc:826 -> 3 | |
tyler-a100-newimage-val:575:1325 [5] proxy.cc:1497 NCCL WARN [Service thread] Could not receive type from localRank 5, res=3, closed=0 | |
tyler-a100-newimage-val:573:1324 [3] NCCL INFO misc/socket.cc:826 -> 3 | |
tyler-a100-newimage-val:573:1324 [3] proxy.cc:1497 NCCL WARN [Service thread] Could not receive type from localRank 3, res=3, closed=0 | |
tyler-a100-newimage-val:575:1325 [5] proxy.cc:1521 NCCL WARN [Proxy Service 5] Failed to execute operation Close from rank 5, retcode 3 | |
tyler-a100-newimage-val:573:1324 [3] proxy.cc:1521 NCCL WARN [Proxy Service 3] Failed to execute operation Close from rank 3, retcode 3 | |
tyler-a100-newimage-val:570:9772 [0] NCCL INFO comm 0x55d34ff865c0 rank 0 nranks 8 cudaDev 0 busId 8010 - Abort COMPLETE | |
tyler-a100-newimage-val:576:9777 [6] NCCL INFO comm 0x5640cf2b7db0 rank 6 nranks 8 cudaDev 6 busId e070 - Abort COMPLETE | |
tyler-a100-newimage-val:574:9775 [4] NCCL INFO comm 0x560ada6ca910 rank 4 nranks 8 cudaDev 4 busId c050 - Abort COMPLETE | |
tyler-a100-newimage-val:575:9771 [5] NCCL INFO comm 0x5636163750f0 rank 5 nranks 8 cudaDev 5 busId c060 - Abort COMPLETE | |
tyler-a100-newimage-val:572:9778 [2] NCCL INFO comm 0x55e6aed5c790 rank 2 nranks 8 cudaDev 2 busId a030 - Abort COMPLETE | |
tyler-a100-newimage-val:577:9774 [7] NCCL INFO comm 0x558d760ca530 rank 7 nranks 8 cudaDev 7 busId e080 - Abort COMPLETE | |
tyler-a100-newimage-val:573:9776 [3] NCCL INFO comm 0x55e7b0065170 rank 3 nranks 8 cudaDev 3 busId a040 - Abort COMPLETE | |
tyler-a100-newimage-val:571:9773 [1] NCCL INFO comm 0x55f7ae069780 rank 1 nranks 8 cudaDev 1 busId 8020 - Abort COMPLETE | |
Operation completed successfully! 🎉 | |
MMLU evaluation for Phase 1... | |
INFO 2024-08-18 20:07:09,101 lm-eval:152: Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | |
INFO 2024-08-18 20:07:09,102 lm-eval:189: Initializing hf model, with arguments: {'pretrained': '/var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_192', 'dtype': 'bfloat16'} | |
INFO 2024-08-18 20:07:09,231 lm-eval:170: Using device 'cuda' | |
Downloading builder script: 100% 5.86k/5.86k [00:00<00:00, 23.9MB/s] | |
Downloading readme: 100% 1.11k/1.11k [00:00<00:00, 10.3MB/s] | |
Downloading data: 100% 166M/166M [00:01<00:00, 102MB/s] | |
Generating test split: 100 examples [00:00, 1153.22 examples/s] | |
Generating validation split: 11 examples [00:00, 3373.60 examples/s] | |
Generating dev split: 5 examples [00:00, 59.65 examples/s] | |
Generating test split: 135 examples [00:00, 1635.55 examples/s] | |
Generating validation split: 14 examples [00:00, 5139.63 examples/s] | |
Generating dev split: 5 examples [00:00, 61.55 examples/s] | |
Generating test split: 152 examples [00:00, 1850.10 examples/s] | |
Generating validation split: 16 examples [00:00, 5953.06 examples/s] | |
Generating dev split: 5 examples [00:00, 60.12 examples/s] | |
Generating test split: 100 examples [00:00, 1177.68 examples/s] | |
Generating validation split: 11 examples [00:00, 4201.56 examples/s] | |
Generating dev split: 5 examples [00:00, 60.37 examples/s] | |
Generating test split: 265 examples [00:00, 2954.66 examples/s] | |
Generating validation split: 29 examples [00:00, 7941.16 examples/s] | |
Generating dev split: 5 examples [00:00, 61.76 examples/s] | |
Generating test split: 144 examples [00:00, 1774.03 examples/s] | |
Generating validation split: 16 examples [00:00, 3479.13 examples/s] | |
Generating dev split: 5 examples [00:00, 61.13 examples/s] | |
Generating test split: 100 examples [00:00, 1222.35 examples/s] | |
Generating validation split: 8 examples [00:00, 1853.63 examples/s] | |
Generating dev split: 5 examples [00:00, 60.05 examples/s] | |
Generating test split: 100 examples [00:00, 1183.27 examples/s] | |
Generating validation split: 11 examples [00:00, 2736.66 examples/s] | |
Generating dev split: 5 examples [00:00, 60.60 examples/s] | |
Generating test split: 100 examples [00:00, 1185.71 examples/s] | |
Generating validation split: 11 examples [00:00, 3130.71 examples/s] | |
Generating dev split: 5 examples [00:00, 61.68 examples/s] | |
Generating test split: 173 examples [00:00, 2042.05 examples/s] | |
Generating validation split: 22 examples [00:00, 6416.88 examples/s] | |
Generating dev split: 5 examples [00:00, 62.23 examples/s] | |
Generating test split: 102 examples [00:00, 1194.64 examples/s] | |
Generating validation split: 11 examples [00:00, 4022.44 examples/s] | |
Generating dev split: 5 examples [00:00, 61.46 examples/s] | |
Generating test split: 100 examples [00:00, 1207.67 examples/s] | |
Generating validation split: 11 examples [00:00, 2994.57 examples/s] | |
Generating dev split: 5 examples [00:00, 60.33 examples/s] | |
Generating test split: 235 examples [00:00, 2704.80 examples/s] | |
Generating validation split: 26 examples [00:00, 8991.75 examples/s] | |
Generating dev split: 5 examples [00:00, 60.17 examples/s] | |
Generating test split: 114 examples [00:00, 1390.51 examples/s] | |
Generating validation split: 12 examples [00:00, 3749.38 examples/s] | |
Generating dev split: 5 examples [00:00, 60.87 examples/s] | |
Generating test split: 145 examples [00:00, 1763.73 examples/s] | |
Generating validation split: 16 examples [00:00, 5013.74 examples/s] | |
Generating dev split: 5 examples [00:00, 60.14 examples/s] | |
Generating test split: 378 examples [00:00, 4030.96 examples/s] | |
Generating validation split: 41 examples [00:00, 7887.28 examples/s] | |
Generating dev split: 5 examples [00:00, 60.97 examples/s] | |
Generating test split: 126 examples [00:00, 1470.10 examples/s] | |
Generating validation split: 14 examples [00:00, 4072.99 examples/s] | |
Generating dev split: 5 examples [00:00, 59.27 examples/s] | |
Generating test split: 100 examples [00:00, 1255.89 examples/s] | |
Generating validation split: 10 examples [00:00, 3940.16 examples/s] | |
Generating dev split: 5 examples [00:00, 61.03 examples/s] | |
Generating test split: 310 examples [00:00, 3516.83 examples/s] | |
Generating validation split: 32 examples [00:00, 8804.05 examples/s] | |
Generating dev split: 5 examples [00:00, 61.07 examples/s] | |
Generating test split: 203 examples [00:00, 2410.97 examples/s] | |
Generating validation split: 22 examples [00:00, 4926.05 examples/s] | |
Generating dev split: 5 examples [00:00, 62.35 examples/s] | |
Generating test split: 100 examples [00:00, 1268.03 examples/s] | |
Generating validation split: 9 examples [00:00, 3895.64 examples/s] | |
Generating dev split: 5 examples [00:00, 62.00 examples/s] | |
Generating test split: 165 examples [00:00, 1938.49 examples/s] | |
Generating validation split: 18 examples [00:00, 3426.10 examples/s] | |
Generating dev split: 5 examples [00:00, 62.50 examples/s] | |
Generating test split: 198 examples [00:00, 2282.64 examples/s] | |
Generating validation split: 22 examples [00:00, 7912.42 examples/s] | |
Generating dev split: 5 examples [00:00, 61.81 examples/s] | |
Generating test split: 193 examples [00:00, 2366.57 examples/s] | |
Generating validation split: 21 examples [00:00, 7132.01 examples/s] | |
Generating dev split: 5 examples [00:00, 61.24 examples/s] | |
Generating test split: 390 examples [00:00, 4338.53 examples/s] | |
Generating validation split: 43 examples [00:00, 9807.77 examples/s] | |
Generating dev split: 5 examples [00:00, 62.74 examples/s] | |
Generating test split: 270 examples [00:00, 3156.55 examples/s] | |
Generating validation split: 29 examples [00:00, 8374.17 examples/s] | |
Generating dev split: 5 examples [00:00, 61.80 examples/s] | |
Generating test split: 238 examples [00:00, 2714.10 examples/s] | |
Generating validation split: 26 examples [00:00, 5558.48 examples/s] | |
Generating dev split: 5 examples [00:00, 60.55 examples/s] | |
Generating test split: 151 examples [00:00, 1801.49 examples/s] | |
Generating validation split: 17 examples [00:00, 4671.64 examples/s] | |
Generating dev split: 5 examples [00:00, 61.15 examples/s] | |
Generating test split: 545 examples [00:00, 5738.37 examples/s] | |
Generating validation split: 60 examples [00:00, 9898.84 examples/s] | |
Generating dev split: 5 examples [00:00, 61.26 examples/s] | |
Generating test split: 216 examples [00:00, 2474.35 examples/s] | |
Generating validation split: 23 examples [00:00, 6018.78 examples/s] | |
Generating dev split: 5 examples [00:00, 61.26 examples/s] | |
Generating test split: 204 examples [00:00, 2282.53 examples/s] | |
Generating validation split: 22 examples [00:00, 4064.43 examples/s] | |
Generating dev split: 5 examples [00:00, 61.72 examples/s] | |
Generating test split: 237 examples [00:00, 2575.45 examples/s] | |
Generating validation split: 26 examples [00:00, 4640.90 examples/s] | |
Generating dev split: 5 examples [00:00, 61.22 examples/s] | |
Generating test split: 223 examples [00:00, 2635.67 examples/s] | |
Generating validation split: 23 examples [00:00, 5733.33 examples/s] | |
Generating dev split: 5 examples [00:00, 62.89 examples/s] | |
Generating test split: 131 examples [00:00, 1591.09 examples/s] | |
Generating validation split: 12 examples [00:00, 3250.77 examples/s] | |
Generating dev split: 5 examples [00:00, 61.29 examples/s] | |
Generating test split: 121 examples [00:00, 1415.60 examples/s] | |
Generating validation split: 13 examples [00:00, 3564.02 examples/s] | |
Generating dev split: 5 examples [00:00, 61.46 examples/s] | |
Generating test split: 108 examples [00:00, 1342.97 examples/s] | |
Generating validation split: 11 examples [00:00, 2489.47 examples/s] | |
Generating dev split: 5 examples [00:00, 58.93 examples/s] | |
Generating test split: 163 examples [00:00, 2010.64 examples/s] | |
Generating validation split: 18 examples [00:00, 4509.73 examples/s] | |
Generating dev split: 5 examples [00:00, 61.46 examples/s] | |
Generating test split: 112 examples [00:00, 1324.02 examples/s] | |
Generating validation split: 11 examples [00:00, 3809.85 examples/s] | |
Generating dev split: 5 examples [00:00, 61.62 examples/s] | |
Generating test split: 103 examples [00:00, 1277.65 examples/s] | |
Generating validation split: 11 examples [00:00, 3080.55 examples/s] | |
Generating dev split: 5 examples [00:00, 59.10 examples/s] | |
Generating test split: 234 examples [00:00, 2697.17 examples/s] | |
Generating validation split: 25 examples [00:00, 5543.62 examples/s] | |
Generating dev split: 5 examples [00:00, 60.37 examples/s] | |
Generating test split: 100 examples [00:00, 1202.59 examples/s] | |
Generating validation split: 11 examples [00:00, 4425.22 examples/s] | |
Generating dev split: 5 examples [00:00, 62.03 examples/s] | |
Generating test split: 783 examples [00:00, 7376.63 examples/s] | |
Generating validation split: 86 examples [00:00, 16538.75 examples/s] | |
Generating dev split: 5 examples [00:00, 59.82 examples/s] | |
Generating test split: 346 examples [00:00, 3763.46 examples/s] | |
Generating validation split: 38 examples [00:00, 6601.65 examples/s] | |
Generating dev split: 5 examples [00:00, 61.74 examples/s] | |
Generating test split: 895 examples [00:00, 7717.62 examples/s] | |
Generating validation split: 100 examples [00:00, 14271.19 examples/s] | |
Generating dev split: 5 examples [00:00, 61.34 examples/s] | |
Generating test split: 306 examples [00:00, 3463.47 examples/s] | |
Generating validation split: 33 examples [00:00, 7868.34 examples/s] | |
Generating dev split: 5 examples [00:00, 62.63 examples/s] | |
Generating test split: 311 examples [00:00, 3373.99 examples/s] | |
Generating validation split: 34 examples [00:00, 9395.59 examples/s] | |
Generating dev split: 5 examples [00:00, 61.80 examples/s] | |
Generating test split: 324 examples [00:00, 3525.14 examples/s] | |
Generating validation split: 35 examples [00:00, 7428.05 examples/s] | |
Generating dev split: 5 examples [00:00, 61.91 examples/s] | |
Generating test split: 282 examples [00:00, 3107.75 examples/s] | |
Generating validation split: 31 examples [00:00, 5669.96 examples/s] | |
Generating dev split: 5 examples [00:00, 62.70 examples/s] | |
Generating test split: 1534 examples [00:00, 10061.95 examples/s] | |
Generating validation split: 170 examples [00:00, 14781.84 examples/s] | |
Generating dev split: 5 examples [00:00, 59.81 examples/s] | |
Generating test split: 272 examples [00:00, 2957.00 examples/s] | |
Generating validation split: 31 examples [00:00, 6405.41 examples/s] | |
Generating dev split: 5 examples [00:00, 61.40 examples/s] | |
Generating test split: 612 examples [00:00, 6144.16 examples/s] | |
Generating validation split: 69 examples [00:00, 10691.85 examples/s] | |
Generating dev split: 5 examples [00:00, 59.34 examples/s] | |
Generating test split: 110 examples [00:00, 1355.61 examples/s] | |
Generating validation split: 12 examples [00:00, 3074.06 examples/s] | |
Generating dev split: 5 examples [00:00, 61.81 examples/s] | |
Generating test split: 245 examples [00:00, 2837.26 examples/s] | |
Generating validation split: 27 examples [00:00, 4828.44 examples/s] | |
Generating dev split: 5 examples [00:00, 61.01 examples/s] | |
Generating test split: 201 examples [00:00, 2289.38 examples/s] | |
Generating validation split: 22 examples [00:00, 4368.45 examples/s] | |
Generating dev split: 5 examples [00:00, 61.55 examples/s] | |
Generating test split: 100 examples [00:00, 1184.69 examples/s] | |
Generating validation split: 11 examples [00:00, 3479.70 examples/s] | |
Generating dev split: 5 examples [00:00, 61.74 examples/s] | |
Generating test split: 166 examples [00:00, 2008.24 examples/s] | |
Generating validation split: 18 examples [00:00, 4386.07 examples/s] | |
Generating dev split: 5 examples [00:00, 59.23 examples/s] | |
Generating test split: 171 examples [00:00, 2049.86 examples/s] | |
Generating validation split: 19 examples [00:00, 5742.72 examples/s] | |
Generating dev split: 5 examples [00:00, 61.61 examples/s] | |
WARNING 2024-08-18 20:08:01,836 lm-eval:251: Overwriting default num_fewshot of mmlu_world_religions from None to 5 | |
INFO 2024-08-18 20:08:01,836 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,836 lm-eval:251: Overwriting default num_fewshot of mmlu_virology from None to 5 | |
INFO 2024-08-18 20:08:01,836 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,836 lm-eval:251: Overwriting default num_fewshot of mmlu_us_foreign_policy from None to 5 | |
INFO 2024-08-18 20:08:01,836 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,836 lm-eval:251: Overwriting default num_fewshot of mmlu_sociology from None to 5 | |
INFO 2024-08-18 20:08:01,836 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,836 lm-eval:251: Overwriting default num_fewshot of mmlu_security_studies from None to 5 | |
INFO 2024-08-18 20:08:01,836 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,836 lm-eval:251: Overwriting default num_fewshot of mmlu_public_relations from None to 5 | |
INFO 2024-08-18 20:08:01,836 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_professional_psychology from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_professional_medicine from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_professional_law from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_professional_accounting from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_prehistory from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_philosophy from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_nutrition from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_moral_scenarios from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_moral_disputes from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_miscellaneous from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_medical_genetics from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_marketing from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_management from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_machine_learning from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_logical_fallacies from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_jurisprudence from None to 5 | |
INFO 2024-08-18 20:08:01,837 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,837 lm-eval:251: Overwriting default num_fewshot of mmlu_international_law from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_human_sexuality from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_human_aging from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_world_history from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_us_history from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_statistics from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_psychology from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_physics from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_microeconomics from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_mathematics from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_macroeconomics from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_government_and_politics from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_geography from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_european_history from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_computer_science from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_chemistry from None to 5 | |
INFO 2024-08-18 20:08:01,838 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,838 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_biology from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_global_facts from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_formal_logic from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_elementary_mathematics from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_electrical_engineering from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_econometrics from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_conceptual_physics from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_computer_security from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_college_physics from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_college_medicine from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_college_mathematics from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_college_computer_science from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_college_chemistry from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_college_biology from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_clinical_knowledge from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_business_ethics from None to 5 | |
INFO 2024-08-18 20:08:01,839 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,839 lm-eval:251: Overwriting default num_fewshot of mmlu_astronomy from None to 5 | |
INFO 2024-08-18 20:08:01,840 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,840 lm-eval:251: Overwriting default num_fewshot of mmlu_anatomy from None to 5 | |
INFO 2024-08-18 20:08:01,840 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:08:01,840 lm-eval:251: Overwriting default num_fewshot of mmlu_abstract_algebra from None to 5 | |
INFO 2024-08-18 20:08:01,840 lm-eval:261: Setting fewshot random generator seed to 1234 | |
INFO 2024-08-18 20:08:01,845 lm-eval:411: Building contexts for mmlu_world_religions on rank 0... | |
100% 171/171 [00:01<00:00, 136.70it/s] | |
INFO 2024-08-18 20:08:03,104 lm-eval:411: Building contexts for mmlu_virology on rank 0... | |
100% 166/166 [00:01<00:00, 137.65it/s] | |
INFO 2024-08-18 20:08:04,318 lm-eval:411: Building contexts for mmlu_us_foreign_policy on rank 0... | |
100% 100/100 [00:00<00:00, 137.17it/s] | |
INFO 2024-08-18 20:08:05,053 lm-eval:411: Building contexts for mmlu_sociology on rank 0... | |
100% 201/201 [00:01<00:00, 137.20it/s] | |
INFO 2024-08-18 20:08:06,528 lm-eval:411: Building contexts for mmlu_security_studies on rank 0... | |
100% 245/245 [00:01<00:00, 137.06it/s] | |
INFO 2024-08-18 20:08:08,328 lm-eval:411: Building contexts for mmlu_public_relations on rank 0... | |
100% 110/110 [00:00<00:00, 137.91it/s] | |
INFO 2024-08-18 20:08:09,132 lm-eval:411: Building contexts for mmlu_professional_psychology on rank 0... | |
100% 612/612 [00:04<00:00, 137.30it/s] | |
INFO 2024-08-18 20:08:13,618 lm-eval:411: Building contexts for mmlu_professional_medicine on rank 0... | |
100% 272/272 [00:01<00:00, 137.57it/s] | |
INFO 2024-08-18 20:08:15,610 lm-eval:411: Building contexts for mmlu_professional_law on rank 0... | |
100% 1534/1534 [00:11<00:00, 137.50it/s] | |
INFO 2024-08-18 20:08:26,840 lm-eval:411: Building contexts for mmlu_professional_accounting on rank 0... | |
100% 282/282 [00:02<00:00, 137.72it/s] | |
INFO 2024-08-18 20:08:28,902 lm-eval:411: Building contexts for mmlu_prehistory on rank 0... | |
100% 324/324 [00:02<00:00, 137.90it/s] | |
INFO 2024-08-18 20:08:31,267 lm-eval:411: Building contexts for mmlu_philosophy on rank 0... | |
100% 311/311 [00:02<00:00, 137.34it/s] | |
INFO 2024-08-18 20:08:33,547 lm-eval:411: Building contexts for mmlu_nutrition on rank 0... | |
100% 306/306 [00:02<00:00, 137.31it/s] | |
INFO 2024-08-18 20:08:35,790 lm-eval:411: Building contexts for mmlu_moral_scenarios on rank 0... | |
100% 895/895 [00:06<00:00, 137.60it/s] | |
INFO 2024-08-18 20:08:42,336 lm-eval:411: Building contexts for mmlu_moral_disputes on rank 0... | |
100% 346/346 [00:02<00:00, 137.95it/s] | |
INFO 2024-08-18 20:08:44,861 lm-eval:411: Building contexts for mmlu_miscellaneous on rank 0... | |
100% 783/783 [00:05<00:00, 138.18it/s] | |
INFO 2024-08-18 20:08:50,564 lm-eval:411: Building contexts for mmlu_medical_genetics on rank 0... | |
100% 100/100 [00:00<00:00, 137.18it/s] | |
INFO 2024-08-18 20:08:51,298 lm-eval:411: Building contexts for mmlu_marketing on rank 0... | |
100% 234/234 [00:01<00:00, 137.53it/s] | |
INFO 2024-08-18 20:08:53,011 lm-eval:411: Building contexts for mmlu_management on rank 0... | |
100% 103/103 [00:00<00:00, 137.86it/s] | |
INFO 2024-08-18 20:08:53,764 lm-eval:411: Building contexts for mmlu_machine_learning on rank 0... | |
100% 112/112 [00:00<00:00, 137.95it/s] | |
INFO 2024-08-18 20:08:54,582 lm-eval:411: Building contexts for mmlu_logical_fallacies on rank 0... | |
100% 163/163 [00:01<00:00, 137.78it/s] | |
INFO 2024-08-18 20:08:55,773 lm-eval:411: Building contexts for mmlu_jurisprudence on rank 0... | |
100% 108/108 [00:00<00:00, 138.26it/s] | |
INFO 2024-08-18 20:08:56,559 lm-eval:411: Building contexts for mmlu_international_law on rank 0... | |
100% 121/121 [00:00<00:00, 137.86it/s] | |
INFO 2024-08-18 20:08:57,444 lm-eval:411: Building contexts for mmlu_human_sexuality on rank 0... | |
100% 131/131 [00:00<00:00, 137.92it/s] | |
INFO 2024-08-18 20:08:58,400 lm-eval:411: Building contexts for mmlu_human_aging on rank 0... | |
100% 223/223 [00:01<00:00, 138.55it/s] | |
INFO 2024-08-18 20:09:00,021 lm-eval:411: Building contexts for mmlu_high_school_world_history on rank 0... | |
100% 237/237 [00:01<00:00, 137.41it/s] | |
INFO 2024-08-18 20:09:01,757 lm-eval:411: Building contexts for mmlu_high_school_us_history on rank 0... | |
100% 204/204 [00:01<00:00, 137.87it/s] | |
INFO 2024-08-18 20:09:03,248 lm-eval:411: Building contexts for mmlu_high_school_statistics on rank 0... | |
100% 216/216 [00:01<00:00, 138.70it/s] | |
INFO 2024-08-18 20:09:04,816 lm-eval:411: Building contexts for mmlu_high_school_psychology on rank 0... | |
100% 545/545 [00:03<00:00, 138.10it/s] | |
INFO 2024-08-18 20:09:08,787 lm-eval:411: Building contexts for mmlu_high_school_physics on rank 0... | |
100% 151/151 [00:01<00:00, 137.64it/s] | |
INFO 2024-08-18 20:09:09,892 lm-eval:411: Building contexts for mmlu_high_school_microeconomics on rank 0... | |
100% 238/238 [00:01<00:00, 138.03it/s] | |
INFO 2024-08-18 20:09:11,628 lm-eval:411: Building contexts for mmlu_high_school_mathematics on rank 0... | |
100% 270/270 [00:01<00:00, 137.88it/s] | |
INFO 2024-08-18 20:09:13,599 lm-eval:411: Building contexts for mmlu_high_school_macroeconomics on rank 0... | |
100% 390/390 [00:02<00:00, 138.12it/s] | |
INFO 2024-08-18 20:09:16,441 lm-eval:411: Building contexts for mmlu_high_school_government_and_politics on rank 0... | |
100% 193/193 [00:01<00:00, 137.78it/s] | |
INFO 2024-08-18 20:09:17,851 lm-eval:411: Building contexts for mmlu_high_school_geography on rank 0... | |
100% 198/198 [00:01<00:00, 138.13it/s] | |
INFO 2024-08-18 20:09:19,294 lm-eval:411: Building contexts for mmlu_high_school_european_history on rank 0... | |
100% 165/165 [00:01<00:00, 136.67it/s] | |
INFO 2024-08-18 20:09:20,511 lm-eval:411: Building contexts for mmlu_high_school_computer_science on rank 0... | |
100% 100/100 [00:00<00:00, 137.40it/s] | |
INFO 2024-08-18 20:09:21,244 lm-eval:411: Building contexts for mmlu_high_school_chemistry on rank 0... | |
100% 203/203 [00:01<00:00, 107.90it/s] | |
INFO 2024-08-18 20:09:23,135 lm-eval:411: Building contexts for mmlu_high_school_biology on rank 0... | |
100% 310/310 [00:02<00:00, 137.74it/s] | |
INFO 2024-08-18 20:09:25,401 lm-eval:411: Building contexts for mmlu_global_facts on rank 0... | |
100% 100/100 [00:00<00:00, 138.25it/s] | |
INFO 2024-08-18 20:09:26,129 lm-eval:411: Building contexts for mmlu_formal_logic on rank 0... | |
100% 126/126 [00:00<00:00, 137.97it/s] | |
INFO 2024-08-18 20:09:27,049 lm-eval:411: Building contexts for mmlu_elementary_mathematics on rank 0... | |
100% 378/378 [00:02<00:00, 138.66it/s] | |
INFO 2024-08-18 20:09:29,792 lm-eval:411: Building contexts for mmlu_electrical_engineering on rank 0... | |
100% 145/145 [00:01<00:00, 138.30it/s] | |
INFO 2024-08-18 20:09:30,848 lm-eval:411: Building contexts for mmlu_econometrics on rank 0... | |
100% 114/114 [00:00<00:00, 138.62it/s] | |
INFO 2024-08-18 20:09:31,676 lm-eval:411: Building contexts for mmlu_conceptual_physics on rank 0... | |
100% 235/235 [00:01<00:00, 138.75it/s] | |
INFO 2024-08-18 20:09:33,382 lm-eval:411: Building contexts for mmlu_computer_security on rank 0... | |
100% 100/100 [00:00<00:00, 138.84it/s] | |
INFO 2024-08-18 20:09:34,107 lm-eval:411: Building contexts for mmlu_college_physics on rank 0... | |
100% 102/102 [00:00<00:00, 138.92it/s] | |
INFO 2024-08-18 20:09:34,846 lm-eval:411: Building contexts for mmlu_college_medicine on rank 0... | |
100% 173/173 [00:01<00:00, 138.30it/s] | |
INFO 2024-08-18 20:09:36,106 lm-eval:411: Building contexts for mmlu_college_mathematics on rank 0... | |
100% 100/100 [00:00<00:00, 138.92it/s] | |
INFO 2024-08-18 20:09:36,831 lm-eval:411: Building contexts for mmlu_college_computer_science on rank 0... | |
100% 100/100 [00:00<00:00, 138.70it/s] | |
INFO 2024-08-18 20:09:37,557 lm-eval:411: Building contexts for mmlu_college_chemistry on rank 0... | |
100% 100/100 [00:00<00:00, 138.24it/s] | |
INFO 2024-08-18 20:09:38,286 lm-eval:411: Building contexts for mmlu_college_biology on rank 0... | |
100% 144/144 [00:01<00:00, 139.08it/s] | |
INFO 2024-08-18 20:09:39,329 lm-eval:411: Building contexts for mmlu_clinical_knowledge on rank 0... | |
100% 265/265 [00:01<00:00, 138.97it/s] | |
INFO 2024-08-18 20:09:41,248 lm-eval:411: Building contexts for mmlu_business_ethics on rank 0... | |
100% 100/100 [00:00<00:00, 138.40it/s] | |
INFO 2024-08-18 20:09:41,976 lm-eval:411: Building contexts for mmlu_astronomy on rank 0... | |
100% 152/152 [00:01<00:00, 139.01it/s] | |
INFO 2024-08-18 20:09:43,077 lm-eval:411: Building contexts for mmlu_anatomy on rank 0... | |
100% 135/135 [00:00<00:00, 138.23it/s] | |
INFO 2024-08-18 20:09:44,061 lm-eval:411: Building contexts for mmlu_abstract_algebra on rank 0... | |
100% 100/100 [00:00<00:00, 139.49it/s] | |
INFO 2024-08-18 20:09:44,783 lm-eval:438: Running loglikelihood requests | |
Running loglikelihood requests: 0% 0/56168 [00:00<?, ?it/s]Passed argument batch_size = auto:1. Detecting largest batch size | |
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache) | |
Determined largest batch size: 16 | |
Running loglikelihood requests: 100% 56168/56168 [13:58<00:00, 66.96it/s] | |
WARNING 2024-08-18 20:26:55,272 lm-eval:1315: Failed to get model SHA for /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_192 at revision main. Error: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_192'. Use `repo_type` argument if needed. | |
fatal: not a git repository (or any of the parent directories): .git | |
CHECKPOINT EVALUATION: /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_192 SCORED 0.5271100893470168 | |
INFO 2024-08-18 20:27:00,200 lm-eval:152: Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | |
INFO 2024-08-18 20:27:00,200 lm-eval:189: Initializing hf model, with arguments: {'pretrained': '/var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_320', 'dtype': 'bfloat16'} | |
INFO 2024-08-18 20:27:00,202 lm-eval:170: Using device 'cuda' | |
WARNING 2024-08-18 20:27:41,442 lm-eval:251: Overwriting default num_fewshot of mmlu_world_religions from None to 5 | |
INFO 2024-08-18 20:27:41,442 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,442 lm-eval:251: Overwriting default num_fewshot of mmlu_virology from None to 5 | |
INFO 2024-08-18 20:27:41,442 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,442 lm-eval:251: Overwriting default num_fewshot of mmlu_us_foreign_policy from None to 5 | |
INFO 2024-08-18 20:27:41,442 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_sociology from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_security_studies from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_public_relations from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_professional_psychology from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_professional_medicine from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_professional_law from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_professional_accounting from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_prehistory from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_philosophy from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_nutrition from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_moral_scenarios from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_moral_disputes from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_miscellaneous from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_medical_genetics from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_marketing from None to 5 | |
INFO 2024-08-18 20:27:41,443 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,443 lm-eval:251: Overwriting default num_fewshot of mmlu_management from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_machine_learning from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_logical_fallacies from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_jurisprudence from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_international_law from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_human_sexuality from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_human_aging from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_world_history from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_us_history from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_statistics from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_psychology from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_physics from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_microeconomics from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_mathematics from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_macroeconomics from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_government_and_politics from None to 5 | |
INFO 2024-08-18 20:27:41,444 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,444 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_geography from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_european_history from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_computer_science from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_chemistry from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_high_school_biology from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_global_facts from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_formal_logic from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_elementary_mathematics from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_electrical_engineering from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_econometrics from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_conceptual_physics from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_computer_security from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_college_physics from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_college_medicine from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_college_mathematics from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,445 lm-eval:251: Overwriting default num_fewshot of mmlu_college_computer_science from None to 5 | |
INFO 2024-08-18 20:27:41,445 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,446 lm-eval:251: Overwriting default num_fewshot of mmlu_college_chemistry from None to 5 | |
INFO 2024-08-18 20:27:41,446 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,446 lm-eval:251: Overwriting default num_fewshot of mmlu_college_biology from None to 5 | |
INFO 2024-08-18 20:27:41,446 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,446 lm-eval:251: Overwriting default num_fewshot of mmlu_clinical_knowledge from None to 5 | |
INFO 2024-08-18 20:27:41,446 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,446 lm-eval:251: Overwriting default num_fewshot of mmlu_business_ethics from None to 5 | |
INFO 2024-08-18 20:27:41,446 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,446 lm-eval:251: Overwriting default num_fewshot of mmlu_astronomy from None to 5 | |
INFO 2024-08-18 20:27:41,446 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,446 lm-eval:251: Overwriting default num_fewshot of mmlu_anatomy from None to 5 | |
INFO 2024-08-18 20:27:41,446 lm-eval:261: Setting fewshot random generator seed to 1234 | |
WARNING 2024-08-18 20:27:41,446 lm-eval:251: Overwriting default num_fewshot of mmlu_abstract_algebra from None to 5 | |
INFO 2024-08-18 20:27:41,446 lm-eval:261: Setting fewshot random generator seed to 1234 | |
INFO 2024-08-18 20:27:41,451 lm-eval:411: Building contexts for mmlu_world_religions on rank 0... | |
100% 171/171 [00:01<00:00, 136.86it/s] | |
INFO 2024-08-18 20:27:42,709 lm-eval:411: Building contexts for mmlu_virology on rank 0... | |
100% 166/166 [00:01<00:00, 137.15it/s] | |
INFO 2024-08-18 20:27:43,928 lm-eval:411: Building contexts for mmlu_us_foreign_policy on rank 0... | |
100% 100/100 [00:00<00:00, 136.60it/s] | |
INFO 2024-08-18 20:27:44,666 lm-eval:411: Building contexts for mmlu_sociology on rank 0... | |
100% 201/201 [00:01<00:00, 136.57it/s] | |
INFO 2024-08-18 20:27:46,148 lm-eval:411: Building contexts for mmlu_security_studies on rank 0... | |
100% 245/245 [00:01<00:00, 136.39it/s] | |
INFO 2024-08-18 20:27:47,957 lm-eval:411: Building contexts for mmlu_public_relations on rank 0... | |
100% 110/110 [00:00<00:00, 137.57it/s] | |
INFO 2024-08-18 20:27:48,763 lm-eval:411: Building contexts for mmlu_professional_psychology on rank 0... | |
100% 612/612 [00:04<00:00, 136.85it/s] | |
INFO 2024-08-18 20:27:53,266 lm-eval:411: Building contexts for mmlu_professional_medicine on rank 0... | |
100% 272/272 [00:01<00:00, 136.23it/s] | |
INFO 2024-08-18 20:27:55,277 lm-eval:411: Building contexts for mmlu_professional_law on rank 0... | |
100% 1534/1534 [00:11<00:00, 137.38it/s] | |
INFO 2024-08-18 20:28:06,522 lm-eval:411: Building contexts for mmlu_professional_accounting on rank 0... | |
100% 282/282 [00:02<00:00, 137.60it/s] | |
INFO 2024-08-18 20:28:08,586 lm-eval:411: Building contexts for mmlu_prehistory on rank 0... | |
100% 324/324 [00:02<00:00, 137.90it/s] | |
INFO 2024-08-18 20:28:10,951 lm-eval:411: Building contexts for mmlu_philosophy on rank 0... | |
100% 311/311 [00:02<00:00, 138.05it/s] | |
INFO 2024-08-18 20:28:13,219 lm-eval:411: Building contexts for mmlu_nutrition on rank 0... | |
100% 306/306 [00:02<00:00, 138.17it/s] | |
INFO 2024-08-18 20:28:15,448 lm-eval:411: Building contexts for mmlu_moral_scenarios on rank 0... | |
100% 895/895 [00:06<00:00, 138.76it/s] | |
INFO 2024-08-18 20:28:21,941 lm-eval:411: Building contexts for mmlu_moral_disputes on rank 0... | |
100% 346/346 [00:02<00:00, 138.65it/s] | |
INFO 2024-08-18 20:28:24,454 lm-eval:411: Building contexts for mmlu_miscellaneous on rank 0... | |
100% 783/783 [00:05<00:00, 138.51it/s] | |
INFO 2024-08-18 20:28:30,144 lm-eval:411: Building contexts for mmlu_medical_genetics on rank 0... | |
100% 100/100 [00:00<00:00, 138.01it/s] | |
INFO 2024-08-18 20:28:30,874 lm-eval:411: Building contexts for mmlu_marketing on rank 0... | |
100% 234/234 [00:01<00:00, 138.10it/s] | |
INFO 2024-08-18 20:28:32,580 lm-eval:411: Building contexts for mmlu_management on rank 0... | |
100% 103/103 [00:00<00:00, 138.77it/s] | |
INFO 2024-08-18 20:28:33,327 lm-eval:411: Building contexts for mmlu_machine_learning on rank 0... | |
100% 112/112 [00:00<00:00, 138.61it/s] | |
INFO 2024-08-18 20:28:34,141 lm-eval:411: Building contexts for mmlu_logical_fallacies on rank 0... | |
100% 163/163 [00:01<00:00, 139.13it/s] | |
INFO 2024-08-18 20:28:35,321 lm-eval:411: Building contexts for mmlu_jurisprudence on rank 0... | |
100% 108/108 [00:00<00:00, 138.78it/s] | |
INFO 2024-08-18 20:28:36,105 lm-eval:411: Building contexts for mmlu_international_law on rank 0... | |
100% 121/121 [00:00<00:00, 138.84it/s] | |
INFO 2024-08-18 20:28:36,982 lm-eval:411: Building contexts for mmlu_human_sexuality on rank 0... | |
100% 131/131 [00:00<00:00, 138.95it/s] | |
INFO 2024-08-18 20:28:37,932 lm-eval:411: Building contexts for mmlu_human_aging on rank 0... | |
100% 223/223 [00:01<00:00, 139.25it/s] | |
INFO 2024-08-18 20:28:39,544 lm-eval:411: Building contexts for mmlu_high_school_world_history on rank 0... | |
100% 237/237 [00:01<00:00, 138.81it/s] | |
INFO 2024-08-18 20:28:41,264 lm-eval:411: Building contexts for mmlu_high_school_us_history on rank 0... | |
100% 204/204 [00:01<00:00, 136.33it/s] | |
INFO 2024-08-18 20:28:42,771 lm-eval:411: Building contexts for mmlu_high_school_statistics on rank 0... | |
100% 216/216 [00:01<00:00, 138.37it/s] | |
INFO 2024-08-18 20:28:44,343 lm-eval:411: Building contexts for mmlu_high_school_psychology on rank 0... | |
100% 545/545 [00:03<00:00, 137.23it/s] | |
INFO 2024-08-18 20:28:48,339 lm-eval:411: Building contexts for mmlu_high_school_physics on rank 0... | |
100% 151/151 [00:01<00:00, 138.28it/s] | |
INFO 2024-08-18 20:28:49,439 lm-eval:411: Building contexts for mmlu_high_school_microeconomics on rank 0... | |
100% 238/238 [00:01<00:00, 138.44it/s] | |
INFO 2024-08-18 20:28:51,170 lm-eval:411: Building contexts for mmlu_high_school_mathematics on rank 0... | |
100% 270/270 [00:01<00:00, 137.69it/s] | |
INFO 2024-08-18 20:28:53,144 lm-eval:411: Building contexts for mmlu_high_school_macroeconomics on rank 0... | |
100% 390/390 [00:02<00:00, 138.61it/s] | |
INFO 2024-08-18 20:28:55,975 lm-eval:411: Building contexts for mmlu_high_school_government_and_politics on rank 0... | |
100% 193/193 [00:01<00:00, 138.85it/s] | |
INFO 2024-08-18 20:28:57,375 lm-eval:411: Building contexts for mmlu_high_school_geography on rank 0... | |
100% 198/198 [00:01<00:00, 138.88it/s] | |
INFO 2024-08-18 20:28:58,810 lm-eval:411: Building contexts for mmlu_high_school_european_history on rank 0... | |
100% 165/165 [00:01<00:00, 135.99it/s] | |
INFO 2024-08-18 20:29:00,033 lm-eval:411: Building contexts for mmlu_high_school_computer_science on rank 0... | |
100% 100/100 [00:00<00:00, 138.28it/s] | |
INFO 2024-08-18 20:29:00,761 lm-eval:411: Building contexts for mmlu_high_school_chemistry on rank 0... | |
100% 203/203 [00:01<00:00, 135.51it/s] | |
INFO 2024-08-18 20:29:02,269 lm-eval:411: Building contexts for mmlu_high_school_biology on rank 0... | |
100% 310/310 [00:02<00:00, 115.72it/s] | |
INFO 2024-08-18 20:29:04,963 lm-eval:411: Building contexts for mmlu_global_facts on rank 0... | |
100% 100/100 [00:00<00:00, 135.44it/s] | |
INFO 2024-08-18 20:29:05,706 lm-eval:411: Building contexts for mmlu_formal_logic on rank 0... | |
100% 126/126 [00:00<00:00, 135.56it/s] | |
INFO 2024-08-18 20:29:06,642 lm-eval:411: Building contexts for mmlu_elementary_mathematics on rank 0... | |
100% 378/378 [00:02<00:00, 135.60it/s] | |
INFO 2024-08-18 20:29:09,447 lm-eval:411: Building contexts for mmlu_electrical_engineering on rank 0... | |
100% 145/145 [00:01<00:00, 136.37it/s] | |
INFO 2024-08-18 20:29:10,518 lm-eval:411: Building contexts for mmlu_econometrics on rank 0... | |
100% 114/114 [00:00<00:00, 135.71it/s] | |
INFO 2024-08-18 20:29:11,363 lm-eval:411: Building contexts for mmlu_conceptual_physics on rank 0... | |
100% 235/235 [00:01<00:00, 136.95it/s] | |
INFO 2024-08-18 20:29:13,091 lm-eval:411: Building contexts for mmlu_computer_security on rank 0... | |
100% 100/100 [00:00<00:00, 136.87it/s] | |
INFO 2024-08-18 20:29:13,827 lm-eval:411: Building contexts for mmlu_college_physics on rank 0... | |
100% 102/102 [00:00<00:00, 136.91it/s] | |
INFO 2024-08-18 20:29:14,577 lm-eval:411: Building contexts for mmlu_college_medicine on rank 0... | |
100% 173/173 [00:01<00:00, 136.72it/s] | |
INFO 2024-08-18 20:29:15,851 lm-eval:411: Building contexts for mmlu_college_mathematics on rank 0... | |
100% 100/100 [00:00<00:00, 136.59it/s] | |
INFO 2024-08-18 20:29:16,589 lm-eval:411: Building contexts for mmlu_college_computer_science on rank 0... | |
100% 100/100 [00:00<00:00, 136.43it/s] | |
INFO 2024-08-18 20:29:17,327 lm-eval:411: Building contexts for mmlu_college_chemistry on rank 0... | |
100% 100/100 [00:00<00:00, 136.50it/s] | |
INFO 2024-08-18 20:29:18,065 lm-eval:411: Building contexts for mmlu_college_biology on rank 0... | |
100% 144/144 [00:01<00:00, 136.85it/s] | |
INFO 2024-08-18 20:29:19,125 lm-eval:411: Building contexts for mmlu_clinical_knowledge on rank 0... | |
100% 265/265 [00:01<00:00, 137.03it/s] | |
INFO 2024-08-18 20:29:21,071 lm-eval:411: Building contexts for mmlu_business_ethics on rank 0... | |
100% 100/100 [00:00<00:00, 136.58it/s] | |
INFO 2024-08-18 20:29:21,808 lm-eval:411: Building contexts for mmlu_astronomy on rank 0... | |
100% 152/152 [00:01<00:00, 136.77it/s] | |
INFO 2024-08-18 20:29:22,927 lm-eval:411: Building contexts for mmlu_anatomy on rank 0... | |
100% 135/135 [00:00<00:00, 137.49it/s] | |
INFO 2024-08-18 20:29:23,916 lm-eval:411: Building contexts for mmlu_abstract_algebra on rank 0... | |
100% 100/100 [00:00<00:00, 138.03it/s] | |
INFO 2024-08-18 20:29:24,646 lm-eval:438: Running loglikelihood requests | |
Running loglikelihood requests: 0% 0/56168 [00:00<?, ?it/s]Passed argument batch_size = auto:1. Detecting largest batch size | |
Determined largest batch size: 16 | |
Running loglikelihood requests: 100% 56168/56168 [13:58<00:00, 66.98it/s] | |
WARNING 2024-08-18 20:46:42,688 lm-eval:1315: Failed to get model SHA for /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_320 at revision main. Error: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_320'. Use `repo_type` argument if needed. | |
fatal: not a git repository (or any of the parent directories): .git | |
CHECKPOINT EVALUATION: /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_320 SCORED 0.5283330937684491 | |
Training Phase 2/2... | |
TrainingArgs for current phase: TrainingArgs(model_path='/var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_320', chat_tmpl_path='/opt/app-root/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py', data_path='/var/mnt/inststg1/instructlab/generated/skills_train_msgs_2024-08-18T15_57_14.jsonl', ckpt_output_dir='/var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints', data_output_dir='/var/mnt/inststg1/instructlab/.local/share/instructlab/internal', max_seq_len=4096, max_batch_len=10000, num_epochs=2, effective_batch_size=3840, save_samples=0, learning_rate=2e-05, warmup_steps=25, is_padding_free=False, random_seed=42, checkpoint_at_epoch=True, mock_data=False, mock_data_len=0, deepspeed_options=DeepSpeedOptions(cpu_offload_optimizer=False, cpu_offload_optimizer_ratio=1.0, cpu_offload_optimizer_pin_memory=False, save_samples=None), disable_flash_attn=False, lora=None) | |
INFO 2024-08-18 20:46:47,145 root:611: eos: 32001, pad: 32002, system: 32003, user: 32004, assistant: 32005 | |
Generating train split: 10000 examples [00:00, 100155.07 examples/s] | |
tokenizing the dataset with /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_320 tokenizer... | |
Setting TOKENIZERS_PARALLELISM=false for forked processes. | |
WARNING 2024-08-18 20:46:47,316 datasets.arrow_dataset:3211: Setting TOKENIZERS_PARALLELISM=false for forked processes. | |
Map (num_proc=16): 100% 10000/10000 [00:02<00:00, 3765.08 examples/s] | |
ten largest length percentiles: | |
Setting TOKENIZERS_PARALLELISM=false for forked processes. | |
WARNING 2024-08-18 20:46:50,962 datasets.arrow_dataset:3211: Setting TOKENIZERS_PARALLELISM=false for forked processes. | |
Map (num_proc=16): 100% 10000/10000 [00:00<00:00, 16336.33 examples/s] | |
quantile 90th: 1283.0 | |
quantile 91th: 1367.0 | |
quantile 92th: 1453.0 | |
quantile 93th: 1579.0 | |
quantile 94th: 1704.0599999999995 | |
quantile 95th: 1843.1499999999978 | |
quantile 96th: 2046.1199999999972 | |
quantile 97th: 2356.179999999993 | |
quantile 98th: 2724.100000000002 | |
quantile 99th: 3213.0200000000004 | |
quantile 100th: 5765.0 | |
at 4096 max sequence length, the number of samples to be dropped is 19 | |
(0.19% of total) | |
quantile 0th: 70.0 | |
quantile 1th: 81.0 | |
quantile 2th: 85.0 | |
quantile 3th: 87.0 | |
quantile 4th: 91.0 | |
quantile 5th: 94.0 | |
quantile 6th: 97.93999999999994 | |
quantile 7th: 102.0 | |
quantile 8th: 108.0 | |
quantile 9th: 113.0 | |
quantile 10th: 118.0 | |
at 20 min sequence length, the number of samples to be dropped is 0 | |
checking the validity of the samples... | |
Setting TOKENIZERS_PARALLELISM=false for forked processes. | |
WARNING 2024-08-18 20:46:52,663 datasets.arrow_dataset:3211: Setting TOKENIZERS_PARALLELISM=false for forked processes. | |
Filter (num_proc=16): 100% 10000/10000 [00:01<00:00, 8244.08 examples/s] | |
INFO 2024-08-18 20:46:54,896 root:611: number of dropped samples: 19 -- out of 10000 | |
Categorizing training data type... | |
Data type sorting: 100% 9981/9981 [00:00<00:00, 112525.30it/s] | |
unmasking the appropriate message content... | |
Setting TOKENIZERS_PARALLELISM=false for forked processes. | |
WARNING 2024-08-18 20:46:57,636 datasets.arrow_dataset:3211: Setting TOKENIZERS_PARALLELISM=false for forked processes. | |
Map (num_proc=16): 100% 9981/9981 [00:01<00:00, 9644.98 examples/s] | |
The following are some examples of the processed data, with masked tokens (not to be learned) represented with <mask>. The unmasked tokens are the ones the model will learn to predict. Please review these samples to ensure the model is learning to predict expected tokens. | |
Instruction ex sample 6312: <mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask> | |
The TVs in "2001: A Space Odyssey" might initially seem random, but they are actually thoughtfully integrated into the film's narrative and symbolism. Here are some instances where TVs play a significant role: | |
1. **Planetary Alignment:** In the opening scene, a TV screen displays the alignment of the planets, which is crucial to the plot and the film's exploration of extraterrestrial intelligence. | |
2. **News Broadcasts:** Later in the movie, TVs are used to show news broadcasts, providing context and information about the ongoing space exploration and human presence beyond Earth. | |
3. **Interviews:** TVs are also used to present interviews with characters, offering insights into their thoughts and motivations. | |
4. **Monolith's Influence:** The iconic scene where the monolith appears to the apes is accompanied by a flashing TV screen. This juxtaposition highlights the monolith's influence over technology and human progress, as if the TV screen is a modern-day monolith guiding human development. | |
In essence, the TVs in "2001: A Space Odyssey" are carefully crafted elements that contribute to the film's visual language and themes, such as technology, human evolution, and the search for extraterrestrial intelligence.<|endoftext|> | |
Original Input: <|system|> | |
I am, Red Hat® Instruct Model based on Granite 7B, an AI language model developed by Red Hat and IBM Research, based on the Granite-7b-base language model. My primary function is to be a chat assistant. | |
<|user|> | |
What are the random TVs in 2001? | |
<|assistant|> | |
The TVs in "2001: A Space Odyssey" might initially seem random, but they are actually thoughtfully integrated into the film's narrative and symbolism. Here are some instances where TVs play a significant role: | |
1. **Planetary Alignment:** In the opening scene, a TV screen displays the alignment of the planets, which is crucial to the plot and the film's exploration of extraterrestrial intelligence. | |
2. **News Broadcasts:** Later in the movie, TVs are used to show news broadcasts, providing context and information about the ongoing space exploration and human presence beyond Earth. | |
3. **Interviews:** TVs are also used to present interviews with characters, offering insights into their thoughts and motivations. | |
4. **Monolith's Influence:** The iconic scene where the monolith appears to the apes is accompanied by a flashing TV screen. This juxtaposition highlights the monolith's influence over technology and human progress, as if the TV screen is a modern-day monolith guiding human development. | |
In essence, the TVs in "2001: A Space Odyssey" are carefully crafted elements that contribute to the film's visual language and themes, such as technology, human evolution, and the search for extraterrestrial intelligence.<|endoftext|> | |
Instruction ex sample 6891: <mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask> | |
Mars, the fourth planet from the Sun, has a fascinating and complex weather system. Here are some intriguing facts about the weather on Mars: | |
- **Thin Atmosphere:** Mars has a very thin atmosphere, which is only about 1% as dense as Earth's. It is primarily composed of carbon dioxide (CO2), with minor amounts of nitrogen and argon. | |
- **Extreme Temperature Swings:** Due to the thin atmosphere and Mars' distance from the Sun, the planet experiences extreme temperature fluctuations. Daytime temperatures can reach up to 70°F (20°C), while nighttime temperatures can plummet to -100°F (-73°C). | |
- **Dust Storms:** Mars is known for its massive dust storms that can cover the entire planet and last for months. These storms can reach speeds of up to 70 mph (113 km/h) and are so intense that they can alter the planet's albedo (reflectivity) and even be detected by telescopes on Earth. | |
- **Winds:** Mars has a surprisingly active wind system, with average wind speeds of about 22 mph (35 km/h). These winds are primarily caused by the planet's rotation and temperature differences between the equator and the poles. | |
- **Frozen Carbon Dioxide:** During the Martian winter, temperatures at the poles drop low enough for carbon dioxide to freeze, forming a layer of dry ice. This process contributes to the formation of the polar ice caps, which are primarily composed of water ice and dust. | |
- **Seasonal Changes:** Mars experiences seasonal changes due to its elliptical orbit around the Sun. The planet is closer to the Sun during its summer, leading to more extreme temperature variations and more frequent dust storms. | |
- **Absence of Liquid Water:** Due to Mars' low atmospheric pressure and temperature, liquid water cannot exist on the surface for an extended period. However, evidence suggests that water may have flowed on the planet's surface in the past, and there is the possibility of subsurface water reservoirs. | |
These fascinating weather characteristics make Mars an intriguing subject for scientists and researchers studying planetary atmospheres and climates.<|endoftext|> | |
Original Input: <|system|> | |
I am, Red Hat® Instruct Model based on Granite 7B, an AI language model developed by Red Hat and IBM Research, based on the Granite-7b-base language model. My primary function is to be a chat assistant. | |
<|user|> | |
What are some interesting facts about the weather on Mars? Please use bullet points. | |
<|assistant|> | |
Mars, the fourth planet from the Sun, has a fascinating and complex weather system. Here are some intriguing facts about the weather on Mars: | |
- **Thin Atmosphere:** Mars has a very thin atmosphere, which is only about 1% as dense as Earth's. It is primarily composed of carbon dioxide (CO2), with minor amounts of nitrogen and argon. | |
- **Extreme Temperature Swings:** Due to the thin atmosphere and Mars' distance from the Sun, the planet experiences extreme temperature fluctuations. Daytime temperatures can reach up to 70°F (20°C), while nighttime temperatures can plummet to -100°F (-73°C). | |
- **Dust Storms:** Mars is known for its massive dust storms that can cover the entire planet and last for months. These storms can reach speeds of up to 70 mph (113 km/h) and are so intense that they can alter the planet's albedo (reflectivity) and even be detected by telescopes on Earth. | |
- **Winds:** Mars has a surprisingly active wind system, with average wind speeds of about 22 mph (35 km/h). These winds are primarily caused by the planet's rotation and temperature differences between the equator and the poles. | |
- **Frozen Carbon Dioxide:** During the Martian winter, temperatures at the poles drop low enough for carbon dioxide to freeze, forming a layer of dry ice. This process contributes to the formation of the polar ice caps, which are primarily composed of water ice and dust. | |
- **Seasonal Changes:** Mars experiences seasonal changes due to its elliptical orbit around the Sun. The planet is closer to the Sun during its summer, leading to more extreme temperature variations and more frequent dust storms. | |
- **Absence of Liquid Water:** Due to Mars' low atmospheric pressure and temperature, liquid water cannot exist on the surface for an extended period. However, evidence suggests that water may have flowed on the planet's surface in the past, and there is the possibility of subsurface water reservoirs. | |
These fascinating weather characteristics make Mars an intriguing subject for scientists and researchers studying planetary atmospheres and climates.<|endoftext|> | |
Creating json from Arrow format: 100% 10/10 [00:01<00:00, 7.02ba/s] | |
Running command: torchrun --nnodes=1 --node_rank=0 --nproc_per_node=8 --rdzv_id=123 --rdzv_endpoint=127.0.0.1:12222 /opt/app-root/lib64/python3.11/site-packages/instructlab/training/main_ds.py --model_name_or_path=/var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_320 --data_path=/var/mnt/inststg1/instructlab/.local/share/instructlab/internal/data.jsonl --output_dir=/var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints --num_epochs=2 --effective_batch_size=3840 --learning_rate=2e-05 --num_warmup_steps=25 --save_samples=0 --log_level=INFO --max_batch_len=10000 --seed=42 --chat-tmpl-path=/opt/app-root/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py --checkpoint_at_epoch | |
W0818 20:47:08.033000 139686984577472 torch/distributed/run.py:757] | |
W0818 20:47:08.033000 139686984577472 torch/distributed/run.py:757] ***************************************** | |
W0818 20:47:08.033000 139686984577472 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
W0818 20:47:08.033000 139686984577472 torch/distributed/run.py:757] ***************************************** | |
[2024-08-18 20:47:10,896] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:47:11,042] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:47:11,152] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:47:11,205] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:47:11,218] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:47:11,232] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:47:11,260] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-08-18 20:47:11,287] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
model_name_or_path: /var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_320 | |
data_path: /var/mnt/inststg1/instructlab/.local/share/instructlab/internal/data.jsonl | |
output_dir: /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints | |
num_epochs: 2 | |
last_step: 0 | |
effective_batch_size: 3840 | |
learning_rate: 2.0e-05 | |
lr_scheduler: cosine | |
num_warmup_steps: 25 | |
save_samples: 0 | |
save_samples_ds: null | |
save_last: false | |
checkpoint_at_epoch: true | |
log_level: INFO | |
seed: 42 | |
mock_data: false | |
mock_len: 2600 | |
sharding_strategy: FULL_SHARD | |
is_granite: false | |
lora_r: 0 | |
lora_alpha: 32 | |
lora_dropout: 0.1 | |
lora_quant_bits: null | |
lora_target_modules: null | |
max_batch_len: 10000 | |
cpu_offload_optimizer: false | |
cpu_offload_optimizer_pin_memory: false | |
cpu_offload_optimizer_ratio: 1.0 | |
NEFTune_alpha: null | |
chat_tmpl_path: /opt/app-root/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py | |
disable_flash_attn: false | |
{ | |
"script_params": { | |
"model_name_or_path": "/var/mnt/inststg1/instructlab/phasedbasedir/phase1/checkpoints/hf_format/samples_320", | |
"data_path": "/var/mnt/inststg1/instructlab/.local/share/instructlab/internal/data.jsonl", | |
"output_dir": "/var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints", | |
"num_epochs": 2, | |
"last_step": 0, | |
"effective_batch_size": 3840, | |
"learning_rate": 2e-05, | |
"lr_scheduler": "cosine", | |
"num_warmup_steps": 25, | |
"save_samples": 0, | |
"save_samples_ds": null, | |
"save_last": false, | |
"checkpoint_at_epoch": true, | |
"log_level": "INFO", | |
"seed": 42, | |
"mock_data": false, | |
"mock_len": 2600, | |
"sharding_strategy": "FULL_SHARD", | |
"is_granite": false, | |
"lora_r": 0, | |
"lora_alpha": 32, | |
"lora_dropout": 0.1, | |
"lora_quant_bits": null, | |
"lora_target_modules": null, | |
"max_batch_len": 10000, | |
"cpu_offload_optimizer": false, | |
"cpu_offload_optimizer_pin_memory": false, | |
"cpu_offload_optimizer_ratio": 1.0, | |
"NEFTune_alpha": null, | |
"chat_tmpl_path": "/opt/app-root/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py", | |
"disable_flash_attn": false | |
}, | |
"timestamp": "2024-08-18T20:47:14.720513" | |
} | |
[2024-08-18 20:47:14,794] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-08-18 20:47:14,794] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl | |
tyler-a100-newimage-val:10546:10546 [0] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10546:10546 [0] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:10546:10546 [0] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
[2024-08-18 20:47:15,959] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-08-18 20:47:15,969] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-08-18 20:47:15,974] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-08-18 20:47:15,976] [INFO] [comm.py:637:init_distributed] cdb=None | |
tyler-a100-newimage-val:10548:10548 [2] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:10548:10548 [2] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10548:10548 [2] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
[2024-08-18 20:47:15,987] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-08-18 20:47:15,995] [INFO] [comm.py:637:init_distributed] cdb=None | |
tyler-a100-newimage-val:10550:10550 [4] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:10550:10550 [4] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10550:10550 [4] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
tyler-a100-newimage-val:10552:10552 [6] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:10552:10552 [6] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10552:10552 [6] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
[2024-08-18 20:47:16,031] [INFO] [comm.py:637:init_distributed] cdb=None | |
tyler-a100-newimage-val:10551:10551 [5] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:10551:10551 [5] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10551:10551 [5] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
tyler-a100-newimage-val:10549:10549 [3] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:10549:10549 [3] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10549:10549 [3] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
tyler-a100-newimage-val:10547:10547 [1] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:10547:10547 [1] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10547:10547 [1] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
tyler-a100-newimage-val:10553:10553 [7] NCCL INFO cudaDriverVersion 12040 | |
tyler-a100-newimage-val:10553:10553 [7] NCCL INFO Bootstrap : Using enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10553:10553 [7] NCCL INFO NCCL version 2.22.3+cuda12.5 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin. | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO NET/IB : No device found. | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.4<0> | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO ncclCommInitRank comm 0x56556127bf10 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId a040 commId 0x8ecf20a94c156f4c - Init START | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO ncclCommInitRank comm 0x562f40810030 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 8020 commId 0x8ecf20a94c156f4c - Init START | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO ncclCommInitRank comm 0x55812d458df0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId c060 commId 0x8ecf20a94c156f4c - Init START | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO ncclCommInitRank comm 0x55ed47dad9a0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId c050 commId 0x8ecf20a94c156f4c - Init START | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO ncclCommInitRank comm 0x55e00ec7dc40 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId e070 commId 0x8ecf20a94c156f4c - Init START | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO ncclCommInitRank comm 0x558cdebf44c0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId e080 commId 0x8ecf20a94c156f4c - Init START | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO ncclCommInitRank comm 0x55e4c1493930 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a030 commId 0x8ecf20a94c156f4c - Init START | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO ncclCommInitRank comm 0x55919a812fd0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 8010 commId 0x8ecf20a94c156f4c - Init START | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO Setting affinity for GPU 3 to ff,ffffffff | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO NVLS multicast support is not available on dev 3 | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffffffff | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO NVLS multicast support is not available on dev 2 | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffffffff | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO NVLS multicast support is not available on dev 1 | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO NVLS multicast support is not available on dev 4 | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO NVLS multicast support is not available on dev 5 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffffffff | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO NVLS multicast support is not available on dev 0 | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO NVLS multicast support is not available on dev 6 | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO NVLS multicast support is not available on dev 7 | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO comm 0x55e00ec7dc40 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0 | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO comm 0x56556127bf10 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0 | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO comm 0x55812d458df0 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0 | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO comm 0x55ed47dad9a0 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0 | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO comm 0x558cdebf44c0 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO comm 0x55919a812fd0 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO comm 0x55e4c1493930 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0 | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO comm 0x562f40810030 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0 | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO CC Off, Multi-GPU CC Off, workFifoBytes 1048576 | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO ncclCommInitRank comm 0x562f40810030 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 8020 commId 0x8ecf20a94c156f4c - Init COMPLETE | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:10547:11290 [1] NCCL INFO Init timings: rank 1 nranks 8 total 0.76 (kernels 0.14, bootstrap 0.27, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO ncclCommInitRank comm 0x55e4c1493930 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a030 commId 0x8ecf20a94c156f4c - Init COMPLETE | |
tyler-a100-newimage-val:10548:11283 [2] NCCL INFO Init timings: rank 2 nranks 8 total 0.79 (kernels 0.16, bootstrap 0.29, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO ncclCommInitRank comm 0x55812d458df0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId c060 commId 0x8ecf20a94c156f4c - Init COMPLETE | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO ncclCommInitRank comm 0x55e00ec7dc40 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId e070 commId 0x8ecf20a94c156f4c - Init COMPLETE | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:10551:11288 [5] NCCL INFO Init timings: rank 5 nranks 8 total 0.77 (kernels 0.14, bootstrap 0.28, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10552:11287 [6] NCCL INFO Init timings: rank 6 nranks 8 total 0.77 (kernels 0.14, bootstrap 0.28, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO ncclCommInitRank comm 0x56556127bf10 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId a040 commId 0x8ecf20a94c156f4c - Init COMPLETE | |
tyler-a100-newimage-val:10549:11289 [3] NCCL INFO Init timings: rank 3 nranks 8 total 0.76 (kernels 0.14, bootstrap 0.28, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO ncclCommInitRank comm 0x55ed47dad9a0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId c050 commId 0x8ecf20a94c156f4c - Init COMPLETE | |
tyler-a100-newimage-val:10550:11286 [4] NCCL INFO Init timings: rank 4 nranks 8 total 0.77 (kernels 0.15, bootstrap 0.28, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO ncclCommInitRank comm 0x55919a812fd0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 8010 commId 0x8ecf20a94c156f4c - Init COMPLETE | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so libnccl-net.so. Using internal tuner plugin. | |
tyler-a100-newimage-val:10546:11270 [0] NCCL INFO Init timings: rank 0 nranks 8 total 0.91 (kernels 0.18, bootstrap 0.39, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO ncclCommInitRank comm 0x558cdebf44c0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId e080 commId 0x8ecf20a94c156f4c - Init COMPLETE | |
tyler-a100-newimage-val:10553:11291 [7] NCCL INFO Init timings: rank 7 nranks 8 total 0.75 (kernels 0.16, bootstrap 0.25, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11313 [3] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10548:11311 [2] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10547:11312 [1] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10546:11315 [0] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10553:11314 [7] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10552:11310 [6] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10551:11308 [5] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10550:11309 [4] NCCL INFO Connected all rings | |
Generating train split: 9981 examples [00:01, 7916.84 examples/s] | |
Data length calculation: 100%|██████████| 9981/9981 [00:05<00:00, 1972.79it/s] | |
Data length calculation: 100%|██████████| 9981/9981 [00:05<00:00, 1957.24it/s] | |
Data length calculation: 100%|██████████| 9981/9981 [00:05<00:00, 1893.76it/s] | |
Data length calculation: 100%|██████████| 9981/9981 [00:05<00:00, 1968.18it/s] | |
Data length calculation: 100%|██████████| 9981/9981 [00:05<00:00, 1937.07it/s] | |
Data length calculation: 100%|██████████| 9981/9981 [00:05<00:00, 1968.84it/s] | |
Data length calculation: 100%|██████████| 9981/9981 [00:05<00:00, 1891.13it/s] | |
Data length calculation: 100%|██████████| 9981/9981 [00:05<00:00, 1843.73it/s] | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Detected CUDA files, patching ldflags | |
Emitting ninja build file /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... | |
/opt/app-root/lib64/python3.11/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. | |
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. | |
warnings.warn( | |
Building extension module fused_adam... | |
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
ninja: no work to do. | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.1128382682800293 seconds | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Detected CUDA files, patching ldflags | |
Emitting ninja build file /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... | |
/opt/app-root/lib64/python3.11/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. | |
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. | |
warnings.warn( | |
Building extension module fused_adam... | |
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
ninja: no work to do. | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.11071896553039551 seconds | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Detected CUDA files, patching ldflags | |
Emitting ninja build file /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... | |
/opt/app-root/lib64/python3.11/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. | |
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. | |
warnings.warn( | |
Building extension module fused_adam... | |
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
ninja: no work to do. | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.11228275299072266 seconds | |
{ | |
"num_gpus": 8, | |
"avg_sample_len": 608.8641418695521, | |
"effective_batch_size": 3840, | |
"max_batch_len_per_gpu": 10000, | |
"packing_max_batch_len": 8118, | |
"grad_accum": 36, | |
"num_batches": 121, | |
"avg_samples_per_batch": 82.48760330578513, | |
"samples_per_gpu": 13, | |
"timestamp": "2024-08-18T20:47:39.867974" | |
} | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Detected CUDA files, patching ldflags | |
Emitting ninja build file /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... | |
/opt/app-root/lib64/python3.11/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. | |
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. | |
warnings.warn( | |
Building extension module fused_adam... | |
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
ninja: no work to do. | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.1144556999206543 seconds | |
[2024-08-18 20:47:40,239] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.4, git-hash=unknown, git-branch=unknown | |
[2024-08-18 20:47:40,239] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.10187482833862305 seconds | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
Detected CUDA files, patching ldflags | |
Emitting ninja build file /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... | |
/opt/app-root/lib64/python3.11/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. | |
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. | |
warnings.warn( | |
Building extension module fused_adam... | |
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
Using /var/mnt/inststg1/instructlab/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... | |
ninja: no work to do. | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.11427879333496094 seconds | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.10196876525878906 seconds | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.10352158546447754 seconds | |
tyler-a100-newimage-val:10551:11410 [5] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10550:11411 [4] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10553:11409 [7] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10549:11415 [3] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10547:11408 [1] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10552:11412 [6] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10548:11407 [2] NCCL INFO Using network Socket | |
tyler-a100-newimage-val:10549:11415 [3] NCCL INFO bootstrapSplit: comm 0x565562e66530 parent 0x56556127bf10 rank 3 nranks 8 color -934961569 key 3 prev 2 next 4 - DONE | |
tyler-a100-newimage-val:10548:11407 [2] NCCL INFO bootstrapSplit: comm 0x55e4c30806d0 parent 0x55e4c1493930 rank 2 nranks 8 color -934961569 key 2 prev 1 next 3 - DONE | |
tyler-a100-newimage-val:10550:11411 [4] NCCL INFO bootstrapSplit: comm 0x55ed499a84f0 parent 0x55ed47dad9a0 rank 4 nranks 8 color -934961569 key 4 prev 3 next 5 - DONE | |
tyler-a100-newimage-val:10547:11408 [1] NCCL INFO bootstrapSplit: comm 0x562f423f9400 parent 0x562f40810030 rank 1 nranks 8 color -934961569 key 1 prev 0 next 2 - DONE | |
tyler-a100-newimage-val:10553:11409 [7] NCCL INFO bootstrapSplit: comm 0x558ce0813360 parent 0x558cdebf44c0 rank 7 nranks 8 color -934961569 key 7 prev 6 next 0 - DONE | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO bootstrapSplit: comm 0x55919c3fd580 parent 0x55919a812fd0 rank 0 nranks 8 color -934961569 key 0 prev 7 next 1 - DONE | |
tyler-a100-newimage-val:10551:11410 [5] NCCL INFO bootstrapSplit: comm 0x55812f05a9a0 parent 0x55812d458df0 rank 5 nranks 8 color -934961569 key 5 prev 4 next 6 - DONE | |
tyler-a100-newimage-val:10548:11407 [2] NCCL INFO ncclCommSplit comm 0x55e4c30806d0 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a030 parent 0x55e4c1493930 color -934961569 key 2 commId 0xc6ecd14a22a5889f - Init START | |
tyler-a100-newimage-val:10550:11411 [4] NCCL INFO ncclCommSplit comm 0x55ed499a84f0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId c050 parent 0x55ed47dad9a0 color -934961569 key 4 commId 0xc6ecd14a22a5889f - Init START | |
tyler-a100-newimage-val:10552:11412 [6] NCCL INFO bootstrapSplit: comm 0x55e0108836a0 parent 0x55e00ec7dc40 rank 6 nranks 8 color -934961569 key 6 prev 5 next 7 - DONE | |
tyler-a100-newimage-val:10553:11409 [7] NCCL INFO ncclCommSplit comm 0x558ce0813360 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId e080 parent 0x558cdebf44c0 color -934961569 key 7 commId 0xc6ecd14a22a5889f - Init START | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO ncclCommSplit comm 0x55919c3fd580 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 8010 parent 0x55919a812fd0 color -934961569 key 0 commId 0xc6ecd14a22a5889f - Init START | |
tyler-a100-newimage-val:10547:11408 [1] NCCL INFO ncclCommSplit comm 0x562f423f9400 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 8020 parent 0x562f40810030 color -934961569 key 1 commId 0xc6ecd14a22a5889f - Init START | |
tyler-a100-newimage-val:10551:11410 [5] NCCL INFO ncclCommSplit comm 0x55812f05a9a0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId c060 parent 0x55812d458df0 color -934961569 key 5 commId 0xc6ecd14a22a5889f - Init START | |
tyler-a100-newimage-val:10549:11415 [3] NCCL INFO ncclCommSplit comm 0x565562e66530 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId a040 parent 0x56556127bf10 color -934961569 key 3 commId 0xc6ecd14a22a5889f - Init START | |
tyler-a100-newimage-val:10552:11412 [6] NCCL INFO ncclCommSplit comm 0x55e0108836a0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId e070 parent 0x55e00ec7dc40 color -934961569 key 6 commId 0xc6ecd14a22a5889f - Init START | |
tyler-a100-newimage-val:10549:11415 [3] NCCL INFO Setting affinity for GPU 3 to ff,ffffffff | |
tyler-a100-newimage-val:10549:11415 [3] NCCL INFO NVLS multicast support is not available on dev 3 | |
tyler-a100-newimage-val:10548:11407 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffffffff | |
tyler-a100-newimage-val:10548:11407 [2] NCCL INFO NVLS multicast support is not available on dev 2 | |
tyler-a100-newimage-val:10547:11408 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffffffff | |
tyler-a100-newimage-val:10547:11408 [1] NCCL INFO NVLS multicast support is not available on dev 1 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffffffff | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO NVLS multicast support is not available on dev 0 | |
tyler-a100-newimage-val:10550:11411 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:10550:11411 [4] NCCL INFO NVLS multicast support is not available on dev 4 | |
tyler-a100-newimage-val:10552:11412 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:10552:11412 [6] NCCL INFO NVLS multicast support is not available on dev 6 | |
tyler-a100-newimage-val:10553:11409 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:10553:11409 [7] NCCL INFO NVLS multicast support is not available on dev 7 | |
tyler-a100-newimage-val:10551:11410 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ffffff00,00000000 | |
tyler-a100-newimage-val:10551:11410 [5] NCCL INFO NVLS multicast support is not available on dev 5 | |
tyler-a100-newimage-val:10548:11407 [2] NCCL INFO comm 0x55e4c30806d0 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0 | |
tyler-a100-newimage-val:10547:11408 [1] NCCL INFO comm 0x562f423f9400 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0 | |
tyler-a100-newimage-val:10553:11409 [7] NCCL INFO comm 0x558ce0813360 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO comm 0x55919c3fd580 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0 | |
tyler-a100-newimage-val:10552:11412 [6] NCCL INFO comm 0x55e0108836a0 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0 | |
tyler-a100-newimage-val:10551:11410 [5] NCCL INFO comm 0x55812f05a9a0 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10550:11411 [4] NCCL INFO comm 0x55ed499a84f0 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0 | |
tyler-a100-newimage-val:10549:11415 [3] NCCL INFO comm 0x565562e66530 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10548:11407 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10548:11407 [2] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10547:11408 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10551:11410 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 | |
tyler-a100-newimage-val:10547:11408 [1] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10551:11410 [5] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10553:11409 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10552:11412 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 | |
tyler-a100-newimage-val:10553:11409 [7] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10552:11412 [6] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10550:11411 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10549:11415 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10550:11411 [4] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10549:11415 [3] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO P2P Chunksize set to 524288 | |
tyler-a100-newimage-val:10551:11410 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10551:11410 [5] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10548:11407 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10548:11407 [2] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO CC Off, Multi-GPU CC Off, workFifoBytes 1048576 | |
tyler-a100-newimage-val:10549:11415 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10549:11415 [3] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10552:11412 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10552:11412 [6] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10547:11408 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10547:11408 [1] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10553:11409 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10553:11409 [7] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10550:11411 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-a100-newimage-val:10550:11411 [4] NCCL INFO 24 coll channels, 24 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-a100-newimage-val:10553:11409 [7] NCCL INFO ncclCommSplit comm 0x558ce0813360 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId e080 parent 0x558cdebf44c0 color -934961569 key 7 commId 0xc6ecd14a22a5889f - Init COMPLETE | |
tyler-a100-newimage-val:10549:11415 [3] NCCL INFO ncclCommSplit comm 0x565562e66530 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId a040 parent 0x56556127bf10 color -934961569 key 3 commId 0xc6ecd14a22a5889f - Init COMPLETE | |
tyler-a100-newimage-val:10551:11410 [5] NCCL INFO ncclCommSplit comm 0x55812f05a9a0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId c060 parent 0x55812d458df0 color -934961569 key 5 commId 0xc6ecd14a22a5889f - Init COMPLETE | |
tyler-a100-newimage-val:10547:11408 [1] NCCL INFO ncclCommSplit comm 0x562f423f9400 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 8020 parent 0x562f40810030 color -934961569 key 1 commId 0xc6ecd14a22a5889f - Init COMPLETE | |
tyler-a100-newimage-val:10553:11409 [7] NCCL INFO Init timings: rank 7 nranks 8 total 0.39 (kernels 0.00, bootstrap 0.05, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.03) | |
tyler-a100-newimage-val:10551:11410 [5] NCCL INFO Init timings: rank 5 nranks 8 total 0.39 (kernels 0.00, bootstrap 0.05, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10547:11408 [1] NCCL INFO Init timings: rank 1 nranks 8 total 0.39 (kernels 0.00, bootstrap 0.05, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.03) | |
tyler-a100-newimage-val:10549:11415 [3] NCCL INFO Init timings: rank 3 nranks 8 total 0.34 (kernels 0.00, bootstrap 0.00, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO ncclCommSplit comm 0x55919c3fd580 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 8010 parent 0x55919a812fd0 color -934961569 key 0 commId 0xc6ecd14a22a5889f - Init COMPLETE | |
tyler-a100-newimage-val:10546:11406 [0] NCCL INFO Init timings: rank 0 nranks 8 total 0.39 (kernels 0.00, bootstrap 0.05, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10552:11412 [6] NCCL INFO ncclCommSplit comm 0x55e0108836a0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId e070 parent 0x55e00ec7dc40 color -934961569 key 6 commId 0xc6ecd14a22a5889f - Init COMPLETE | |
tyler-a100-newimage-val:10548:11407 [2] NCCL INFO ncclCommSplit comm 0x55e4c30806d0 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a030 parent 0x55e4c1493930 color -934961569 key 2 commId 0xc6ecd14a22a5889f - Init COMPLETE | |
tyler-a100-newimage-val:10552:11412 [6] NCCL INFO Init timings: rank 6 nranks 8 total 0.39 (kernels 0.00, bootstrap 0.05, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10548:11407 [2] NCCL INFO Init timings: rank 2 nranks 8 total 0.39 (kernels 0.00, bootstrap 0.05, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.02) | |
tyler-a100-newimage-val:10550:11411 [4] NCCL INFO ncclCommSplit comm 0x55ed499a84f0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId c050 parent 0x55ed47dad9a0 color -934961569 key 4 commId 0xc6ecd14a22a5889f - Init COMPLETE | |
tyler-a100-newimage-val:10550:11411 [4] NCCL INFO Init timings: rank 4 nranks 8 total 0.39 (kernels 0.00, bootstrap 0.05, allgathers 0.00, topo 0.26, graphs 0.00, connections 0.05, rest 0.03) | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-a100-newimage-val:10548:11433 [2] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10549:11438 [3] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10550:11439 [4] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10551:11432 [5] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10552:11436 [6] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10553:11435 [7] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10546:11437 [0] NCCL INFO Connected all rings | |
tyler-a100-newimage-val:10547:11434 [1] NCCL INFO Connected all rings | |
[2024-08-18 20:47:46,090] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False | |
[2024-08-18 20:47:46,091] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer | |
[2024-08-18 20:47:46,091] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer | |
[2024-08-18 20:47:46,104] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam | |
[2024-08-18 20:47:46,104] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'> | |
[2024-08-18 20:47:46,104] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer | |
[2024-08-18 20:47:46,104] [INFO] [stage_1_and_2.py:148:__init__] Reduce bucket size 500,000,000 | |
[2024-08-18 20:47:46,104] [INFO] [stage_1_and_2.py:149:__init__] Allgather bucket size 500,000,000 | |
[2024-08-18 20:47:46,104] [INFO] [stage_1_and_2.py:150:__init__] CPU Offload: False | |
[2024-08-18 20:47:46,104] [INFO] [stage_1_and_2.py:151:__init__] Round robin gradient partitioning: False | |
[2024-08-18 20:47:59,000] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:47:59,024] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:48:00,036] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:48:00,385] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:48:00,831] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:48:00,924] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:48:01,063] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-08-18 20:48:01,367] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states | |
[2024-08-18 20:48:01,367] [INFO] [utils.py:782:see_memory_usage] MA 15.69 GB Max_MA 17.26 GB CA 17.26 GB Max_CA 17 GB | |
[2024-08-18 20:48:01,368] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 33.57 GB, percent = 2.7% | |
[2024-08-18 20:48:01,588] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states | |
[2024-08-18 20:48:01,589] [INFO] [utils.py:782:see_memory_usage] MA 15.69 GB Max_MA 18.83 GB CA 20.4 GB Max_CA 20 GB | |
[2024-08-18 20:48:01,589] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 33.58 GB, percent = 2.7% | |
[2024-08-18 20:48:01,590] [INFO] [stage_1_and_2.py:543:__init__] optimizer state initialized | |
[2024-08-18 20:48:01,807] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer | |
[2024-08-18 20:48:01,808] [INFO] [utils.py:782:see_memory_usage] MA 15.69 GB Max_MA 15.69 GB CA 20.4 GB Max_CA 20 GB | |
[2024-08-18 20:48:01,808] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 33.58 GB, percent = 2.7% | |
[2024-08-18 20:48:01,810] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer | |
[2024-08-18 20:48:01,810] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler | |
[2024-08-18 20:48:01,810] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7f171cc77e10> | |
[2024-08-18 20:48:01,810] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[(0.9, 0.95)] | |
[2024-08-18 20:48:01,811] [INFO] [config.py:997:print] DeepSpeedEngine configuration: | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] activation_checkpointing_config { | |
"partition_activations": false, | |
"contiguous_memory_optimization": false, | |
"cpu_checkpointing": false, | |
"number_checkpoints": null, | |
"synchronize_checkpoint_boundary": false, | |
"profile": false | |
} | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] amp_enabled .................. False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] amp_params ................... False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] autotuning_config ............ { | |
"enabled": false, | |
"start_step": null, | |
"end_step": null, | |
"metric_path": null, | |
"arg_mappings": null, | |
"metric": "throughput", | |
"model_info": null, | |
"results_dir": "autotuning_results", | |
"exps_dir": "autotuning_exps", | |
"overwrite": true, | |
"fast": true, | |
"start_profile_step": 3, | |
"end_profile_step": 5, | |
"tuner_type": "gridsearch", | |
"tuner_early_stopping": 5, | |
"tuner_num_trials": 50, | |
"model_info_path": null, | |
"mp_size": 1, | |
"max_train_batch_size": null, | |
"min_train_batch_size": 1, | |
"max_train_micro_batch_size_per_gpu": 1.024000e+03, | |
"min_train_micro_batch_size_per_gpu": 1, | |
"num_tuning_micro_batch_sizes": 3 | |
} | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] bfloat16_enabled ............. True | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] bfloat16_immediate_grad_update False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] checkpoint_parallel_write_pipeline False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] checkpoint_tag_validation_enabled True | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] checkpoint_tag_validation_fail False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f171cc59a90> | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] communication_data_type ...... None | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] curriculum_enabled_legacy .... False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] curriculum_params_legacy ..... False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] data_efficiency_enabled ...... False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] dataloader_drop_last ......... False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] disable_allgather ............ False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] dump_state ................... False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] dynamic_loss_scale_args ...... None | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] eigenvalue_enabled ........... False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] eigenvalue_gas_boundary_resolution 1 | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] eigenvalue_layer_name ........ bert.encoder.layer | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] eigenvalue_layer_num ......... 0 | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] eigenvalue_max_iter .......... 100 | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] eigenvalue_stability ......... 1e-06 | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] eigenvalue_tol ............... 0.01 | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] eigenvalue_verbose ........... False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] elasticity_enabled ........... False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] flops_profiler_config ........ { | |
"enabled": false, | |
"recompute_fwd_factor": 0.0, | |
"profile_step": 1, | |
"module_depth": -1, | |
"top_modules": 1, | |
"detailed": true, | |
"output_file": null | |
} | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] fp16_auto_cast ............... None | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] fp16_enabled ................. False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] fp16_master_weights_and_gradients False | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] global_rank .................. 0 | |
[2024-08-18 20:48:01,812] [INFO] [config.py:1001:print] grad_accum_dtype ............. None | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] gradient_accumulation_steps .. 36 | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] gradient_clipping ............ 1.0 | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] gradient_predivide_factor .... 1.0 | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] graph_harvesting ............. False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] initial_dynamic_scale ........ 1 | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] load_universal_checkpoint .... False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] loss_scale ................... 1.0 | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] memory_breakdown ............. False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] mics_hierarchial_params_gather False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] mics_shard_size .............. -1 | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] nebula_config ................ { | |
"enabled": false, | |
"persistent_storage_path": null, | |
"persistent_time_interval": 100, | |
"num_of_version_in_retention": 2, | |
"enable_nebula_load": true, | |
"load_path": null | |
} | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] optimizer_legacy_fusion ...... False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] optimizer_name ............... None | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] optimizer_params ............. None | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] pld_enabled .................. False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] pld_params ................... False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] prescale_gradients ........... False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] scheduler_name ............... None | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] scheduler_params ............. None | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] seq_parallel_communication_data_type torch.float32 | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] sparse_attention ............. None | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] sparse_gradients_enabled ..... False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] steps_per_print .............. 1 | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] timers_config ................ enabled=True synchronized=True | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] train_batch_size ............. 3744 | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] train_micro_batch_size_per_gpu 13 | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] use_data_before_expert_parallel_ False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] use_node_local_storage ....... False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] wall_clock_breakdown ......... False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] weight_quantization_config ... None | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] world_size ................... 8 | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] zero_allow_untested_optimizer False | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] zero_enabled ................. True | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] zero_force_ds_cpu_optimizer .. True | |
[2024-08-18 20:48:01,813] [INFO] [config.py:1001:print] zero_optimization_stage ...... 2 | |
[2024-08-18 20:48:01,814] [INFO] [config.py:987:print_user_config] json = { | |
"train_batch_size": 3.744000e+03, | |
"gradient_accumulation_steps": 36, | |
"train_micro_batch_size_per_gpu": 13, | |
"steps_per_print": 1, | |
"zero_optimization": { | |
"stage": 2, | |
"offload_param": { | |
"device": "none" | |
}, | |
"offload_optimizer": { | |
"device": "none" | |
} | |
}, | |
"bf16": { | |
"enabled": true | |
}, | |
"gradient_clipping": 1.0, | |
"prescale_gradients": false, | |
"wall_clock_breakdown": false | |
} | |
[2024-08-18 20:48:01,814] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
Epoch 0: 0%| | 0/121 [00:00<?, ?it/s] total tokens: 7992 num samples: 18 num padding tokens: 2022 - rank: 6 max len: 444 min len: 240 avg len: 331.6666666666667 num_loss_counted_tokens: 3649 | |
total tokens: 7752 num samples: 17 num padding tokens: 1950 - rank: 6 max len: 456 min len: 267 avg len: 341.29411764705884 num_loss_counted_tokens: 3614 | |
total tokens: 7784 num samples: 14 num padding tokens: 1814 - rank: 6 max len: 556 min len: 342 avg len: 426.42857142857144 num_loss_counted_tokens: 3061 total tokens: 8094 num samples: 19 num padding tokens: 2216 - rank: 6 max len: 426 min len: 215 avg len: 309.36842105263156 num_loss_counted_tokens: 3253 | |
total tokens: 7524 num samples: 12 num padding tokens: 2256 - rank: 6 max len: 627 min len: 286 avg len: 439.0 num_loss_counted_tokens: 3465 | |
total tokens: 8056 num samples: 19 num padding tokens: 1799 - rank: 6 max len: 424 min len: 237 avg len: 329.3157894736842 num_loss_counted_tokens: 3512 | |
total tokens: 7803 num samples: 17 num padding tokens: 1146 - rank: 6 max len: 459 min len: 281 avg len: 391.5882352941176 num_loss_counted_tokens: 4201 | |
total tokens: 6561 num samples: 3 num padding tokens: 1099 - rank: 1 max len: 2187 min len: 1316 avg len: 1820.6666666666667 num_loss_counted_tokens: 687 | |
total tokens: 8118 num samples: 3 num padding tokens: 403 - rank: 1 max len: 2706 min len: 2307 avg len: 2571.6666666666665 num_loss_counted_tokens: 336 | |
total tokens: 7851 num samples: 3 num padding tokens: 789 - rank: 1 max len: 2617 min len: 1940 avg len: 2354.0 num_loss_counted_tokens: 768 | |
total tokens: 8096 num samples: 16 num padding tokens: 1551 - rank: 6 max len: 506 min len: 312 avg len: 409.0625 num_loss_counted_tokens: 3732 | |
total tokens: 7089 num samples: 3 num padding tokens: 249 - rank: 1 max len: 2363 min len: 2182 avg len: 2280.0 num_loss_counted_tokens: 237 | |
total tokens: 7629 num samples: 3 num padding tokens: 370 - rank: 1 max len: 2543 min len: 2326 avg len: 2419.6666666666665 num_loss_counted_tokens: 668 | |
total tokens: 7752 num samples: 24 num padding tokens: 3285 - rank: 7 max len: 323 min len: 83 avg len: 186.125 num_loss_counted_tokens: 2064 | |
total tokens: 7776 num samples: 16 num padding tokens: 2005 - rank: 6 max len: 486 min len: 266 avg len: 360.6875 num_loss_counted_tokens: 3202 | |
total tokens: 7794 num samples: 9 num padding tokens: 1201 - rank: 4 max len: 866 min len: 628 avg len: 732.5555555555555 num_loss_counted_tokens: 3797 | |
total tokens: 8075 num samples: 19 num padding tokens: 2226 - rank: 6 max len: 425 min len: 232 avg len: 307.8421052631579 num_loss_counted_tokens: 3319 | |
total tokens: 7872 num samples: 16 num padding tokens: 1566 - rank: 6 max len: 492 min len: 271 avg len: 394.125 num_loss_counted_tokens: 3233 | |
total tokens: 5934 num samples: 23 num padding tokens: 2656 - rank: 7 max len: 258 min len: 72 avg len: 142.52173913043478 num_loss_counted_tokens: 1182 | |
total tokens: 8024 num samples: 8 num padding tokens: 985 - rank: 4 max len: 1003 min len: 741 avg len: 879.875 num_loss_counted_tokens: 5064 | |
total tokens: 7840 num samples: 8 num padding tokens: 691 - rank: 4 max len: 980 min len: 778 avg len: 893.625 num_loss_counted_tokens: 4008 | |
total tokens: 8016 num samples: 3 num padding tokens: 730 - rank: 1 max len: 2672 min len: 2280 avg len: 2428.6666666666665 num_loss_counted_tokens: 336 | |
total tokens: 8032 num samples: 8 num padding tokens: 791 - rank: 4 max len: 1004 min len: 846 avg len: 905.125 num_loss_counted_tokens: 6538 | |
total tokens: 8094 num samples: 19 num padding tokens: 1536 - rank: 6 max len: 426 min len: 259 avg len: 345.1578947368421 num_loss_counted_tokens: 3814 | |
total tokens: 7476 num samples: 4 num padding tokens: 1909 - rank: 1 max len: 1869 min len: 1081 avg len: 1391.75 num_loss_counted_tokens: 2210 | |
total tokens: 6380 num samples: 29 num padding tokens: 2359 - rank: 7 max len: 220 min len: 77 avg len: 138.6551724137931 num_loss_counted_tokens: 1514 | |
total tokens: 7328 num samples: 32 num padding tokens: 2417 - rank: 7 max len: 229 min len: 83 avg len: 153.46875 num_loss_counted_tokens: 2050 | |
total tokens: 6348 num samples: 23 num padding tokens: 1924 - rank: 7 max len: 276 min len: 79 avg len: 192.34782608695653 num_loss_counted_tokens: 1895 | |
total tokens: 7460 num samples: 5 num padding tokens: 807 - rank: 1 max len: 1492 min len: 1178 avg len: 1330.6 num_loss_counted_tokens: 2708 | |
total tokens: 7890 num samples: 30 num padding tokens: 2915 - rank: 7 max len: 263 min len: 77 avg len: 165.83333333333334 num_loss_counted_tokens: 1979 | |
total tokens: 7704 num samples: 9 num padding tokens: 736 - rank: 4 max len: 856 min len: 719 avg len: 774.2222222222222 num_loss_counted_tokens: 2859 | |
total tokens: 7920 num samples: 10 num padding tokens: 665 - rank: 4 max len: 792 min len: 622 avg len: 725.5 num_loss_counted_tokens: 4725 | |
total tokens: 7812 num samples: 9 num padding tokens: 769 - rank: 4 max len: 868 min len: 744 avg len: 782.5555555555555 num_loss_counted_tokens: 5045 | |
total tokens: 6171 num samples: 3 num padding tokens: 752 - rank: 1 max len: 2057 min len: 1416 avg len: 1806.3333333333333 num_loss_counted_tokens: 750 | |
total tokens: 7432 num samples: 4 num padding tokens: 296 - rank: 1 max len: 1858 min len: 1710 avg len: 1784.0 num_loss_counted_tokens: 829 | |
total tokens: 5684 num samples: 2 num padding tokens: 688 - rank: 1 max len: 2842 min len: 2154 avg len: 2498.0 num_loss_counted_tokens: 182 | |
total tokens: 7992 num samples: 18 num padding tokens: 1629 - rank: 6 max len: 444 min len: 270 avg len: 353.5 num_loss_counted_tokens: 3206 | |
total tokens: 7625 num samples: 25 num padding tokens: 3089 - rank: 7 max len: 305 min len: 83 avg len: 181.44 num_loss_counted_tokens: 2298 | |
total tokens: 7812 num samples: 31 num padding tokens: 2673 - rank: 7 max len: 252 min len: 81 avg len: 165.7741935483871 num_loss_counted_tokens: 2047 total tokens: 7871 num samples: 17 num padding tokens: 2098 - rank: 6 max len: 463 min len: 248 avg len: 339.5882352941176 num_loss_counted_tokens: 3582 | |
total tokens: 8060 num samples: 20 num padding tokens: 1971 - rank: 6 max len: 403 min len: 249 avg len: 304.45 num_loss_counted_tokens: 3294 | |
total tokens: 7288 num samples: 4 num padding tokens: 745 - rank: 1 max len: 1822 min len: 1504 avg len: 1635.75 num_loss_counted_tokens: 956 | |
total tokens: 6479 num samples: 31 num padding tokens: 2352 - rank: 7 max len: 209 min len: 79 avg len: 133.1290322580645 num_loss_counted_tokens: 1439 | |
total tokens: 6892 num samples: 4 num padding tokens: 402 - rank: 1 max len: 1723 min len: 1368 avg len: 1622.5 num_loss_counted_tokens: 752 | |
total tokens: 6423 num samples: 3 num padding tokens: 305 - rank: 1 max len: 2141 min len: 1976 avg len: 2039.3333333333333 num_loss_counted_tokens: 448 | |
total tokens: 7634 num samples: 11 num padding tokens: 386 - rank: 4 max len: 694 min len: 621 avg len: 658.9090909090909 num_loss_counted_tokens: 5313 | |
total tokens: 8070 num samples: 10 num padding tokens: 726 - rank: 4 max len: 807 min len: 627 avg len: 734.4 num_loss_counted_tokens: 5736 | |
total tokens: 8096 num samples: 11 num padding tokens: 777 - rank: 4 max len: 736 min len: 594 avg len: 665.3636363636364 num_loss_counted_tokens: 4205 | |
total tokens: 7560 num samples: 10 num padding tokens: 629 - rank: 4 max len: 756 min len: 626 avg len: 693.1 num_loss_counted_tokens: 4492 | |
total tokens: 7998 num samples: 31 num padding tokens: 3034 - rank: 7 max len: 258 min len: 74 avg len: 160.1290322580645 num_loss_counted_tokens: 2091 | |
total tokens: 8090 num samples: 10 num padding tokens: 409 - rank: 4 max len: 809 min len: 692 avg len: 768.1 num_loss_counted_tokens: 5031 | |
total tokens: 7740 num samples: 18 num padding tokens: 1725 - rank: 6 max len: 430 min len: 265 avg len: 334.1666666666667 num_loss_counted_tokens: 3515 | |
total tokens: 8080 num samples: 5 num padding tokens: 765 - rank: 1 max len: 1616 min len: 1290 avg len: 1463.0 num_loss_counted_tokens: 2912 | |
total tokens: 7904 num samples: 32 num padding tokens: 2758 - rank: 7 max len: 247 min len: 84 avg len: 160.8125 num_loss_counted_tokens: 2044 | |
total tokens: 7461 num samples: 9 num padding tokens: 766 - rank: 4 max len: 829 min len: 675 avg len: 743.8888888888889 num_loss_counted_tokens: 4179 | |
total tokens: 6168 num samples: 2 num padding tokens: 447 - rank: 1 max len: 3084 min len: 2637 avg len: 2860.5 num_loss_counted_tokens: 177 | |
total tokens: 7395 num samples: 29 num padding tokens: 2459 - rank: 7 max len: 255 min len: 81 avg len: 170.20689655172413 num_loss_counted_tokens: 1981 | |
total tokens: 6830 num samples: 5 num padding tokens: 395 - rank: 1 max len: 1366 min len: 1223 avg len: 1287.0 num_loss_counted_tokens: 2516 | |
total tokens: 7786 num samples: 17 num padding tokens: 1200 - rank: 6 max len: 458 min len: 290 avg len: 387.4117647058824 num_loss_counted_tokens: 3600 | |
total tokens: 6888 num samples: 24 num padding tokens: 2293 - rank: 7 max len: 287 min len: 81 avg len: 191.45833333333334 num_loss_counted_tokens: 2153 | |
total tokens: 7627 num samples: 29 num padding tokens: 2769 - rank: 7 max len: 263 min len: 78 avg len: 167.51724137931035 num_loss_counted_tokens: 2159 | |
total tokens: 5475 num samples: 25 num padding tokens: 1896 - rank: 7 max len: 219 min len: 81 avg len: 143.16 num_loss_counted_tokens: 1372 | |
total tokens: 6916 num samples: 28 num padding tokens: 2558 - rank: 7 max len: 247 min len: 77 avg len: 155.64285714285714 num_loss_counted_tokens: 1696 | |
total tokens: 7304 num samples: 8 num padding tokens: 673 - rank: 4 max len: 913 min len: 718 avg len: 828.875 num_loss_counted_tokens: 4366 | |
total tokens: 7950 num samples: 10 num padding tokens: 340 - rank: 4 max len: 795 min len: 724 avg len: 761.0 num_loss_counted_tokens: 5963 | |
total tokens: 7964 num samples: 11 num padding tokens: 504 - rank: 4 max len: 724 min len: 630 avg len: 678.1818181818181 num_loss_counted_tokens: 4558 | |
total tokens: 7410 num samples: 30 num padding tokens: 2942 - rank: 7 max len: 247 min len: 75 avg len: 148.93333333333334 num_loss_counted_tokens: 1669 | |
total tokens: 7630 num samples: 7 num padding tokens: 606 - rank: 4 max len: 1090 min len: 831 avg len: 1003.4285714285714 num_loss_counted_tokens: 3943 | |
total tokens: 7368 num samples: 4 num padding tokens: 761 - rank: 2 max len: 1842 min len: 1539 avg len: 1651.75 num_loss_counted_tokens: 2748 | |
total tokens: 7596 num samples: 6 num padding tokens: 1064 - rank: 2 max len: 1266 min len: 985 avg len: 1088.6666666666667 num_loss_counted_tokens: 3149 | |
total tokens: 7623 num samples: 11 num padding tokens: 1032 - rank: 5 max len: 693 min len: 517 avg len: 599.1818181818181 num_loss_counted_tokens: 4483 | |
total tokens: 7410 num samples: 10 num padding tokens: 803 - rank: 5 max len: 741 min len: 563 avg len: 660.7 num_loss_counted_tokens: 5017 | |
total tokens: 7969 num samples: 13 num padding tokens: 1180 - rank: 5 max len: 613 min len: 448 avg len: 522.2307692307693 num_loss_counted_tokens: 4157 | |
total tokens: 7596 num samples: 9 num padding tokens: 893 - rank: 5 max len: 844 min len: 632 avg len: 744.7777777777778 num_loss_counted_tokens: 4339 | |
total tokens: 7044 num samples: 4 num padding tokens: 776 - rank: 2 max len: 1761 min len: 1377 avg len: 1567.0 num_loss_counted_tokens: 1196 | |
total tokens: 7678 num samples: 11 num padding tokens: 1417 - rank: 5 max len: 698 min len: 476 avg len: 569.1818181818181 num_loss_counted_tokens: 4381 | |
total tokens: 7576 num samples: 4 num padding tokens: 2237 - rank: 2 max len: 1894 min len: 1097 avg len: 1334.75 num_loss_counted_tokens: 2139 | |
total tokens: 7656 num samples: 11 num padding tokens: 1011 - rank: 5 max len: 696 min len: 496 avg len: 604.0909090909091 num_loss_counted_tokens: 4175 | |
total tokens: 7204 num samples: 4 num padding tokens: 443 - rank: 2 max len: 1801 min len: 1576 avg len: 1690.25 num_loss_counted_tokens: 2660 total tokens: 8073 num samples: 13 num padding tokens: 925 - rank: 5 max len: 621 min len: 461 avg len: 549.8461538461538 num_loss_counted_tokens: 5127 | |
total tokens: 8076 num samples: 4 num padding tokens: 1351 - rank: 2 max len: 2019 min len: 1214 avg len: 1681.25 num_loss_counted_tokens: 936 | |
total tokens: 7032 num samples: 6 num padding tokens: 478 - rank: 2 max len: 1172 min len: 1016 avg len: 1092.3333333333333 num_loss_counted_tokens: 3008 | |
total tokens: 7917 num samples: 13 num padding tokens: 1158 - rank: 5 max len: 609 min len: 429 avg len: 519.9230769230769 num_loss_counted_tokens: 4603 | |
total tokens: 7329 num samples: 7 num padding tokens: 546 - rank: 2 max len: 1047 min len: 911 avg len: 969.0 num_loss_counted_tokens: 3492 | |
total tokens: 7020 num samples: 5 num padding tokens: 506 - rank: 2 max len: 1404 min len: 1134 avg len: 1302.8 num_loss_counted_tokens: 3834 | |
total tokens: 7692 num samples: 6 num padding tokens: 974 - rank: 2 max len: 1282 min len: 958 avg len: 1119.6666666666667 num_loss_counted_tokens: 3913 | |
total tokens: 7667 num samples: 11 num padding tokens: 908 - rank: 5 max len: 697 min len: 509 avg len: 614.4545454545455 num_loss_counted_tokens: 3445 | |
total tokens: 8099 num samples: 13 num padding tokens: 1144 - rank: 5 max len: 623 min len: 461 avg len: 535.0 num_loss_counted_tokens: 4390 | |
total tokens: 5862 num samples: 2 num padding tokens: 136 - rank: 0 max len: 2931 min len: 2795 avg len: 2863.0 num_loss_counted_tokens: 203 | |
total tokens: 5944 num samples: 2 num padding tokens: 101 - rank: 0 max len: 2972 min len: 2871 avg len: 2921.5 num_loss_counted_tokens: 163 | |
total tokens: 6814 num samples: 2 num padding tokens: 689 - rank: 0 max len: 3407 min len: 2718 avg len: 3062.5 num_loss_counted_tokens: 1104 | |
total tokens: 7872 num samples: 12 num padding tokens: 1453 - rank: 5 max len: 656 min len: 432 avg len: 534.9166666666666 num_loss_counted_tokens: 5009 | |
total tokens: 5966 num samples: 2 num padding tokens: 300 - rank: 0 max len: 2983 min len: 2683 avg len: 2833.0 num_loss_counted_tokens: 223 | |
total tokens: 7100 num samples: 5 num padding tokens: 1259 - rank: 3 max len: 1420 min len: 1008 avg len: 1168.2 num_loss_counted_tokens: 3506 | |
total tokens: 7858 num samples: 2 num padding tokens: 1075 - rank: 0 max len: 3929 min len: 2854 avg len: 3391.5 num_loss_counted_tokens: 419 | |
total tokens: 6586 num samples: 2 num padding tokens: 534 - rank: 0 max len: 3293 min len: 2759 avg len: 3026.0 num_loss_counted_tokens: 208 | |
total tokens: 6802 num samples: 2 num padding tokens: 582 - rank: 0 max len: 3401 min len: 2819 avg len: 3110.0 num_loss_counted_tokens: 197 | |
total tokens: 8076 num samples: 12 num padding tokens: 941 - rank: 5 max len: 673 min len: 519 avg len: 594.5833333333334 num_loss_counted_tokens: 4341 | |
total tokens: 8021 num samples: 13 num padding tokens: 958 - rank: 5 max len: 617 min len: 492 avg len: 543.3076923076923 num_loss_counted_tokens: 5546 | |
total tokens: 7404 num samples: 6 num padding tokens: 691 - rank: 2 max len: 1234 min len: 1037 avg len: 1118.8333333333333 num_loss_counted_tokens: 4452 | |
total tokens: 6990 num samples: 6 num padding tokens: 455 - rank: 3 max len: 1165 min len: 1010 avg len: 1089.1666666666667 num_loss_counted_tokens: 2326 | |
total tokens: 6835 num samples: 5 num padding tokens: 883 - rank: 2 max len: 1367 min len: 1028 avg len: 1190.4 num_loss_counted_tokens: 1769 | |
total tokens: 6480 num samples: 3 num padding tokens: 1330 - rank: 2 max len: 2160 min len: 1455 avg len: 1716.6666666666667 num_loss_counted_tokens: 3402 | |
total tokens: 7852 num samples: 13 num padding tokens: 1008 - rank: 5 max len: 604 min len: 462 avg len: 526.4615384615385 num_loss_counted_tokens: 4343 | |
total tokens: 6516 num samples: 4 num padding tokens: 643 - rank: 2 max len: 1629 min len: 1320 avg len: 1468.25 num_loss_counted_tokens: 3036 | |
total tokens: 7765 num samples: 5 num padding tokens: 937 - rank: 2 max len: 1553 min len: 1107 avg len: 1365.6 num_loss_counted_tokens: 2840 | |
total tokens: 7014 num samples: 6 num padding tokens: 643 - rank: 2 max len: 1169 min len: 980 avg len: 1061.8333333333333 num_loss_counted_tokens: 3641 | |
total tokens: 8106 num samples: 7 num padding tokens: 448 - rank: 2 max len: 1158 min len: 1032 avg len: 1094.0 num_loss_counted_tokens: 3634 | |
total tokens: 7014 num samples: 3 num padding tokens: 944 - rank: 0 max len: 2338 min len: 1749 avg len: 2023.3333333333333 num_loss_counted_tokens: 1782 | |
total tokens: 7696 num samples: 8 num padding tokens: 623 - rank: 3 max len: 962 min len: 803 avg len: 884.125 num_loss_counted_tokens: 4778 | |
total tokens: 7618 num samples: 13 num padding tokens: 886 - rank: 5 max len: 586 min len: 437 avg len: 517.8461538461538 num_loss_counted_tokens: 3929 | |
total tokens: 7155 num samples: 5 num padding tokens: 513 - rank: 3 max len: 1431 min len: 1168 avg len: 1328.4 num_loss_counted_tokens: 3100 | |
total tokens: 7350 num samples: 7 num padding tokens: 751 - rank: 3 max len: 1050 min len: 873 avg len: 942.7142857142857 num_loss_counted_tokens: 4426 | |
total tokens: 7744 num samples: 8 num padding tokens: 283 - rank: 3 max len: 968 min len: 869 avg len: 932.625 num_loss_counted_tokens: 4872 | |
total tokens: 5448 num samples: 2 num padding tokens: 825 - rank: 0 max len: 2724 min len: 1899 avg len: 2311.5 num_loss_counted_tokens: 314 | |
total tokens: 7836 num samples: 6 num padding tokens: 1317 - rank: 3 max len: 1306 min len: 968 avg len: 1086.5 num_loss_counted_tokens: 4937 | |
total tokens: 7854 num samples: 11 num padding tokens: 871 - rank: 5 max len: 714 min len: 532 avg len: 634.8181818181819 num_loss_counted_tokens: 4452 | |
total tokens: 7788 num samples: 3 num padding tokens: 965 - rank: 0 max len: 2596 min len: 1888 avg len: 2274.3333333333335 num_loss_counted_tokens: 304 | |
total tokens: 5614 num samples: 2 num padding tokens: 364 - rank: 0 max len: 2807 min len: 2443 avg len: 2625.0 num_loss_counted_tokens: 241 | |
total tokens: 7086 num samples: 3 num padding tokens: 1107 - rank: 0 max len: 2362 min len: 1776 avg len: 1993.0 num_loss_counted_tokens: 301 | |
total tokens: 8037 num samples: 9 num padding tokens: 856 - rank: 3 max len: 893 min len: 698 avg len: 797.8888888888889 num_loss_counted_tokens: 6093 | |
total tokens: 5792 num samples: 2 num padding tokens: 18 - rank: 0 max len: 2896 min len: 2878 avg len: 2887.0 num_loss_counted_tokens: 176 | |
total tokens: 6306 num samples: 2 num padding tokens: 290 - rank: 0 max len: 3153 min len: 2863 avg len: 3008.0 num_loss_counted_tokens: 181 | |
total tokens: 7942 num samples: 11 num padding tokens: 1739 - rank: 5 max len: 722 min len: 430 avg len: 563.9090909090909 num_loss_counted_tokens: 2898 | |
total tokens: 7796 num samples: 4 num padding tokens: 715 - rank: 0 max len: 1949 min len: 1412 avg len: 1770.25 num_loss_counted_tokens: 1543 | |
total tokens: 7984 num samples: 8 num padding tokens: 514 - rank: 3 max len: 998 min len: 886 avg len: 933.75 num_loss_counted_tokens: 3889 | |
total tokens: 7248 num samples: 8 num padding tokens: 747 - rank: 3 max len: 906 min len: 734 avg len: 812.625 num_loss_counted_tokens: 5216 | |
total tokens: 6489 num samples: 3 num padding tokens: 267 - rank: 0 max len: 2163 min len: 1989 avg len: 2074.0 num_loss_counted_tokens: 774 | |
total tokens: 8000 num samples: 8 num padding tokens: 939 - rank: 3 max len: 1000 min len: 809 avg len: 882.625 num_loss_counted_tokens: 4397 | |
total tokens: 7146 num samples: 6 num padding tokens: 436 - rank: 3 max len: 1191 min len: 1062 avg len: 1118.3333333333333 num_loss_counted_tokens: 3632 | |
total tokens: 7693 num samples: 7 num padding tokens: 1196 - rank: 3 max len: 1099 min len: 833 avg len: 928.1428571428571 num_loss_counted_tokens: 4428 | |
total tokens: 7592 num samples: 8 num padding tokens: 548 - rank: 3 max len: 949 min len: 799 avg len: 880.5 num_loss_counted_tokens: 5791 | |
total tokens: 8064 num samples: 8 num padding tokens: 1193 - rank: 3 max len: 1008 min len: 741 avg len: 858.875 num_loss_counted_tokens: 5624 | |
total tokens: 7960 num samples: 8 num padding tokens: 519 - rank: 3 max len: 995 min len: 797 avg len: 930.125 num_loss_counted_tokens: 4667 | |
total tokens: 7140 num samples: 5 num padding tokens: 805 - rank: 3 max len: 1428 min len: 1157 avg len: 1267.0 num_loss_counted_tokens: 2481 | |
total tokens: 7376 num samples: 2 num padding tokens: 122 - rank: 0 max len: 3688 min len: 3566 avg len: 3627.0 num_loss_counted_tokens: 334 | |
Per-token loss scaled by world size: 2.2650606297247577e-06Per-token loss scaled by world size: 0.0005290773115120828Per-token loss scaled by world size: 0.00031778833363205194Per-token loss scaled by world size: 0.0002596491831354797Per-token loss scaled by world size: 0.00032042598468251526Per-token loss scaled by world size: 0.00037021367461420596 | |
Per-token loss scaled by world size: 3.6662072488979902e-06 | |
Epoch: 0, Step: 1, Rank: 3, loss = 0.8319301605224609 | |
Epoch: 0, Step: 1, Rank: 2, loss = 0.6797291040420532Epoch: 0, Step: 1, Rank: 5, loss = 1.3850582838058472 | |
Epoch: 0, Step: 1, Rank: 7, loss = 0.8388351798057556Epoch: 0, Step: 1, Rank: 1, loss = 0.005929645616561174 | |
Epoch: 0, Step: 1, Rank: 4, loss = 0.9691731333732605 | |
Epoch: 0, Step: 1, Rank: 0, loss = 0.009597672149538994 | |
Per-token loss scaled by world size: 0.0004498241178225726 | |
Epoch: 0, Step: 1, Rank: 6, loss = 1.1775833368301392 | |
Epoch 0: 1%| | 1/121 [00:03<06:45, 3.38s/it]{ | |
"epoch": 0, | |
"step": 1, | |
"rank": 0, | |
"loss": 0.009597672149538994, | |
"overall_throughput": 35.709783908823596, | |
"lr": 0.0, | |
"cuda_mem_allocated": 17.990560054779053, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20943, | |
"batch_size": 70, | |
"total_loss": 0.737229585647583, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:05.199538" | |
} | |
total tokens: 6234 num samples: 3 num padding tokens: 200 - rank: 2 max len: 2078 min len: 1890 avg len: 2011.3333333333333 num_loss_counted_tokens: 1341 | |
total tokens: 5714 num samples: 2 num padding tokens: 5 - rank: 0 max len: 2857 min len: 2852 avg len: 2854.5 num_loss_counted_tokens: 145 | |
total tokens: 7392 num samples: 8 num padding tokens: 1326 - rank: 5 max len: 924 min len: 561 avg len: 758.25 num_loss_counted_tokens: 3044 | |
total tokens: 7340 num samples: 4 num padding tokens: 699 - rank: 3 max len: 1835 min len: 1455 avg len: 1660.25 num_loss_counted_tokens: 2623 | |
total tokens: 7627 num samples: 29 num padding tokens: 3037 - rank: 7 max len: 263 min len: 77 avg len: 158.27586206896552 num_loss_counted_tokens: 1957 | |
total tokens: 7242 num samples: 3 num padding tokens: 171 - rank: 1 max len: 2414 min len: 2311 avg len: 2357.0 num_loss_counted_tokens: 254 | |
total tokens: 7125 num samples: 5 num padding tokens: 1031 - rank: 4 max len: 1425 min len: 945 avg len: 1218.8 num_loss_counted_tokens: 3947 | |
total tokens: 8025 num samples: 15 num padding tokens: 2508 - rank: 6 max len: 535 min len: 266 avg len: 367.8 num_loss_counted_tokens: 3113 | |
Per-token loss scaled by world size: 0.00031914791907183826Per-token loss scaled by world size: 0.0003141801571473479Per-token loss scaled by world size: 0.0003882426244672388 | |
Per-token loss scaled by world size: 0.00020227984350640327 | |
Per-token loss scaled by world size: 5.1077040552627295e-05Per-token loss scaled by world size: 5.200964369578287e-05 | |
Per-token loss scaled by world size: 0.0002763153170235455 | |
Epoch: 0, Step: 2, Rank: 4, loss = 0.963906466960907 | |
Epoch: 0, Step: 2, Rank: 3, loss = 0.9489026069641113 | |
Epoch: 0, Step: 2, Rank: 2, loss = 0.6109356880187988 | |
Epoch: 0, Step: 2, Rank: 5, loss = 1.1725897789001465 | |
Epoch: 0, Step: 2, Rank: 0, loss = 0.15426543354988098 | |
Epoch: 0, Step: 2, Rank: 7, loss = 0.8345413208007812 | |
Per-token loss scaled by world size: 0.0004248657787684351 | |
Epoch: 0, Step: 2, Rank: 1, loss = 0.15708212554454803 | |
Epoch: 0, Step: 2, Rank: 6, loss = 1.2832008600234985 | |
Epoch 0: 2%|▏ | 2/121 [00:05<05:38, 2.85s/it] total tokens: 7986 num samples: 11 num padding tokens: 734 - rank: 4 max len: 726 min len: 605 avg len: 659.2727272727273 num_loss_counted_tokens: 4132 | |
{ | |
"epoch": 0, | |
"step": 2, | |
"rank": 0, | |
"loss": 0.15426543354988098, | |
"overall_throughput": 43.2007637745835, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.104323863983154, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24162, | |
"batch_size": 93, | |
"total_loss": 0.7656780481338501, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:07.698724" | |
} | |
total tokens: 7735 num samples: 17 num padding tokens: 1736 - rank: 6 max len: 455 min len: 245 avg len: 352.88235294117646 num_loss_counted_tokens: 3396 | |
total tokens: 5966 num samples: 2 num padding tokens: 555 - rank: 0 max len: 2983 min len: 2428 avg len: 2705.5 num_loss_counted_tokens: 181 | |
total tokens: 7852 num samples: 13 num padding tokens: 1205 - rank: 5 max len: 604 min len: 464 avg len: 511.3076923076923 num_loss_counted_tokens: 4550 | |
total tokens: 7928 num samples: 4 num padding tokens: 535 - rank: 1 max len: 1982 min len: 1721 avg len: 1848.25 num_loss_counted_tokens: 2525 | |
total tokens: 7836 num samples: 6 num padding tokens: 1664 - rank: 2 max len: 1306 min len: 926 avg len: 1028.6666666666667 num_loss_counted_tokens: 3620 | |
total tokens: 7821 num samples: 9 num padding tokens: 747 - rank: 3 max len: 869 min len: 729 avg len: 786.0 num_loss_counted_tokens: 5513 | |
total tokens: 4598 num samples: 19 num padding tokens: 1719 - rank: 7 max len: 242 min len: 75 avg len: 151.52631578947367 num_loss_counted_tokens: 1068 | |
Per-token loss scaled by world size: 0.00018360439571551979Per-token loss scaled by world size: 0.0003279669035691768 | |
Per-token loss scaled by world size: 2.2890385480422992e-06Per-token loss scaled by world size: 6.500220479210839e-05Per-token loss scaled by world size: 0.00032116335933096707 | |
Per-token loss scaled by world size: 0.00036416525836102664Per-token loss scaled by world size: 0.0005080102127976716 | |
Epoch: 0, Step: 3, Rank: 5, loss = 0.84148108959198 | |
Epoch: 0, Step: 3, Rank: 3, loss = 0.47108298540115356Epoch: 0, Step: 3, Rank: 1, loss = 0.16677941381931305 | |
Epoch: 0, Step: 3, Rank: 4, loss = 0.8240249156951904Epoch: 0, Step: 3, Rank: 0, loss = 0.005873100366443396 | |
Epoch: 0, Step: 3, Rank: 6, loss = 1.3034272193908691 | |
Per-token loss scaled by world size: 7.88167308201082e-05 | |
Epoch: 0, Step: 3, Rank: 7, loss = 0.9343570470809937 | |
Epoch: 0, Step: 3, Rank: 2, loss = 0.2022240310907364 | |
Epoch 0: 2%|▏ | 3/121 [00:08<05:19, 2.70s/it]{ | |
"epoch": 0, | |
"step": 3, | |
"rank": 0, | |
"loss": 0.005873100366443396, | |
"overall_throughput": 42.42993987932287, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.00035810470581, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20526, | |
"batch_size": 75, | |
"total_loss": 0.5936562418937683, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:10.210211" | |
} | |
total tokens: 6324 num samples: 2 num padding tokens: 135 - rank: 1 max len: 3162 min len: 3027 avg len: 3094.5 num_loss_counted_tokens: 175 | |
total tokens: 7644 num samples: 7 num padding tokens: 892 - rank: 4 max len: 1092 min len: 825 avg len: 964.5714285714286 num_loss_counted_tokens: 4646 | |
total tokens: 7192 num samples: 2 num padding tokens: 318 - rank: 0 max len: 3596 min len: 3278 avg len: 3437.0 num_loss_counted_tokens: 213 | |
total tokens: 7620 num samples: 10 num padding tokens: 1177 - rank: 5 max len: 762 min len: 501 avg len: 644.3 num_loss_counted_tokens: 4818 | |
total tokens: 7776 num samples: 16 num padding tokens: 1778 - rank: 6 max len: 486 min len: 269 avg len: 374.875 num_loss_counted_tokens: 3462 | |
total tokens: 8060 num samples: 31 num padding tokens: 2946 - rank: 7 max len: 260 min len: 79 avg len: 164.96774193548387 num_loss_counted_tokens: 2135 | |
total tokens: 8095 num samples: 5 num padding tokens: 1374 - rank: 3 max len: 1619 min len: 1120 avg len: 1344.2 num_loss_counted_tokens: 2765 | |
total tokens: 7320 num samples: 3 num padding tokens: 432 - rank: 2 max len: 2440 min len: 2027 avg len: 2296.0 num_loss_counted_tokens: 857 | |
Per-token loss scaled by world size: 0.0004511360311880708Per-token loss scaled by world size: 0.0004869260301347822Per-token loss scaled by world size: 4.640718543669209e-05Per-token loss scaled by world size: 8.355799946002662e-05 | |
Per-token loss scaled by world size: 6.561249392689206e-06 | |
Per-token loss scaled by world size: 0.00017396389739587903 | |
Epoch: 0, Step: 4, Rank: 1, loss = 0.12441766262054443 | |
Epoch: 0, Step: 4, Rank: 6, loss = 1.2094956636428833 | |
Epoch: 0, Step: 4, Rank: 5, loss = 1.3054486513137817 | |
Epoch: 0, Step: 4, Rank: 2, loss = 0.22401900589466095 | |
Epoch: 0, Step: 4, Rank: 0, loss = 0.017590709030628204 | |
Per-token loss scaled by world size: 0.00043431558879092336Epoch: 0, Step: 4, Rank: 7, loss = 0.4663971960544586 | |
Per-token loss scaled by world size: 0.00029926959541626275 | |
Epoch: 0, Step: 4, Rank: 4, loss = 1.1644001007080078 | |
Epoch: 0, Step: 4, Rank: 3, loss = 0.8023418188095093 | |
Epoch 0: 3%|▎ | 4/121 [00:10<05:08, 2.63s/it] total tokens: 7940 num samples: 10 num padding tokens: 987 - rank: 4 max len: 794 min len: 627 avg len: 695.3 num_loss_counted_tokens: 4306 | |
{ | |
"epoch": 0, | |
"step": 4, | |
"rank": 0, | |
"loss": 0.017590709030628204, | |
"overall_throughput": 42.48427743949919, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.00298833847046, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21448, | |
"batch_size": 75, | |
"total_loss": 0.664263904094696, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:12.736014" | |
} | |
total tokens: 7260 num samples: 4 num padding tokens: 876 - rank: 1 max len: 1815 min len: 1442 avg len: 1596.0 num_loss_counted_tokens: 1808 | |
total tokens: 8097 num samples: 3 num padding tokens: 887 - rank: 0 max len: 2699 min len: 2231 avg len: 2403.3333333333335 num_loss_counted_tokens: 260 | |
total tokens: 7776 num samples: 32 num padding tokens: 2768 - rank: 7 max len: 243 min len: 77 avg len: 156.5 num_loss_counted_tokens: 1979 | |
total tokens: 6895 num samples: 5 num padding tokens: 531 - rank: 2 max len: 1379 min len: 1160 avg len: 1272.8 num_loss_counted_tokens: 3055 | |
total tokens: 7923 num samples: 19 num padding tokens: 1608 - rank: 6 max len: 417 min len: 252 avg len: 332.36842105263156 num_loss_counted_tokens: 3042 | |
total tokens: 7512 num samples: 8 num padding tokens: 560 - rank: 3 max len: 939 min len: 795 avg len: 869.0 num_loss_counted_tokens: 4478 | |
total tokens: 8047 num samples: 13 num padding tokens: 1167 - rank: 5 max len: 619 min len: 439 avg len: 529.2307692307693 num_loss_counted_tokens: 4239 | |
Per-token loss scaled by world size: 0.00024933897657319903Per-token loss scaled by world size: 0.000386894796974957 | |
Per-token loss scaled by world size: 0.00021959797595627606 | |
Per-token loss scaled by world size: 3.401555431992165e-06 | |
Per-token loss scaled by world size: 5.781253548775567e-06Per-token loss scaled by world size: 0.00047684554010629654 | |
Per-token loss scaled by world size: 0.0002837673237081617 | |
Epoch: 0, Step: 5, Rank: 4, loss = 1.0060231685638428 | |
Epoch: 0, Step: 5, Rank: 2, loss = 0.6483436822891235 | |
Epoch: 0, Step: 5, Rank: 0, loss = 0.008844894357025623Epoch: 0, Step: 5, Rank: 3, loss = 0.571009635925293 | |
Epoch: 0, Step: 5, Rank: 1, loss = 0.015032704919576645 | |
Epoch: 0, Step: 5, Rank: 5, loss = 1.2399176359176636 | |
Epoch: 0, Step: 5, Rank: 7, loss = 0.7378659844398499 | |
Per-token loss scaled by world size: 0.00046695370110683143 | |
Epoch: 0, Step: 5, Rank: 6, loss = 1.2141963243484497 | |
Epoch 0: 4%|▍ | 5/121 [00:13<04:59, 2.58s/it] total tokens: 7651 num samples: 7 num padding tokens: 746 - rank: 4 max len: 1093 min len: 866 avg len: 986.4285714285714 num_loss_counted_tokens: 5678 | |
{ | |
"epoch": 0, | |
"step": 5, | |
"rank": 0, | |
"loss": 0.008844894357025623, | |
"overall_throughput": 43.14041038651036, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.102890491485596, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20802, | |
"batch_size": 80, | |
"total_loss": 0.6801542043685913, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:15.218258" | |
} | |
total tokens: 7038 num samples: 2 num padding tokens: 1091 - rank: 1 max len: 3519 min len: 2428 avg len: 2973.5 num_loss_counted_tokens: 159 | |
total tokens: 8028 num samples: 2 num padding tokens: 123 - rank: 0 max len: 4014 min len: 3891 avg len: 3952.5 num_loss_counted_tokens: 168 | |
total tokens: 7596 num samples: 12 num padding tokens: 1879 - rank: 6 max len: 633 min len: 336 avg len: 476.4166666666667 num_loss_counted_tokens: 3821 | |
total tokens: 8064 num samples: 24 num padding tokens: 3562 - rank: 7 max len: 336 min len: 89 avg len: 187.58333333333334 num_loss_counted_tokens: 1860 | |
total tokens: 7970 num samples: 5 num padding tokens: 726 - rank: 3 max len: 1594 min len: 1187 avg len: 1448.8 num_loss_counted_tokens: 2543 | |
total tokens: 7890 num samples: 10 num padding tokens: 666 - rank: 5 max len: 789 min len: 634 avg len: 722.4 num_loss_counted_tokens: 3801 | |
total tokens: 7000 num samples: 4 num padding tokens: 270 - rank: 2 max len: 1750 min len: 1627 avg len: 1682.5 num_loss_counted_tokens: 759 | |
Per-token loss scaled by world size: 0.0001642795541556552Per-token loss scaled by world size: 0.00021280848886817694Per-token loss scaled by world size: 0.00032824286608956754 | |
Per-token loss scaled by world size: 8.065341717156116e-06 | |
Per-token loss scaled by world size: 3.2945732527878135e-05 | |
Per-token loss scaled by world size: 0.00023678457364439964Per-token loss scaled by world size: 0.00048681392217986286 | |
Epoch: 0, Step: 6, Rank: 2, loss = 0.5886548757553101 | |
Epoch: 0, Step: 6, Rank: 4, loss = 0.907960832118988Epoch: 0, Step: 6, Rank: 3, loss = 0.45441779494285583 | |
Epoch: 0, Step: 6, Rank: 0, loss = 0.022309742867946625 | |
Epoch: 0, Step: 6, Rank: 1, loss = 0.0911320149898529 | |
Epoch: 0, Step: 6, Rank: 6, loss = 1.346588134765625 | |
Epoch: 0, Step: 6, Rank: 7, loss = 0.6549757122993469 | |
Per-token loss scaled by world size: 0.0005791043513454497 | |
Epoch: 0, Step: 6, Rank: 5, loss = 1.6018750667572021 | |
Epoch 0: 5%|▍ | 6/121 [00:15<04:56, 2.58s/it] total tokens: 7911 num samples: 9 num padding tokens: 476 - rank: 4 max len: 879 min len: 707 avg len: 826.1111111111111 num_loss_counted_tokens: 6136 | |
total tokens: 6741 num samples: 3 num padding tokens: 433 - rank: 1 max len: 2247 min len: 1894 avg len: 2102.6666666666665 num_loss_counted_tokens: 1931 | |
total tokens: 7502 num samples: 11 num padding tokens: 1303 - rank: 5 max len: 682 min len: 424 avg len: 563.5454545454545 num_loss_counted_tokens: 3560 | |
{ | |
"epoch": 0, | |
"step": 6, | |
"rank": 0, | |
"loss": 0.022309742867946625, | |
"overall_throughput": 41.65718757774177, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.077077388763428, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22129, | |
"batch_size": 78, | |
"total_loss": 0.7084892988204956, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:17.804745" | |
} | |
total tokens: 8020 num samples: 20 num padding tokens: 1280 - rank: 6 max len: 401 min len: 273 avg len: 337.0 num_loss_counted_tokens: 3403 | |
total tokens: 7714 num samples: 29 num padding tokens: 2377 - rank: 7 max len: 266 min len: 79 avg len: 184.0344827586207 num_loss_counted_tokens: 2540 | |
total tokens: 6716 num samples: 4 num padding tokens: 1107 - rank: 2 max len: 1679 min len: 1232 avg len: 1402.25 num_loss_counted_tokens: 2407 | |
total tokens: 6960 num samples: 6 num padding tokens: 657 - rank: 3 max len: 1160 min len: 960 avg len: 1050.5 num_loss_counted_tokens: 5018 | |
total tokens: 6996 num samples: 2 num padding tokens: 327 - rank: 0 max len: 3498 min len: 3171 avg len: 3334.5 num_loss_counted_tokens: 153 | |
Per-token loss scaled by world size: 0.0002843443362507969Per-token loss scaled by world size: 0.00017875904450193048Per-token loss scaled by world size: 0.00013562251115217805 | |
Per-token loss scaled by world size: 0.0001140675667556934 | |
Per-token loss scaled by world size: 0.00023188847990240902Per-token loss scaled by world size: 0.00043197604827582836 | |
Per-token loss scaled by world size: 0.00019801303278654814 | |
Epoch: 0, Step: 7, Rank: 6, loss = 0.8738256692886353 | |
Epoch: 0, Step: 7, Rank: 4, loss = 0.5493488907814026 | |
Epoch: 0, Step: 7, Rank: 0, loss = 0.4167849123477936Epoch: 0, Step: 7, Rank: 7, loss = 0.7126222848892212Epoch: 0, Step: 7, Rank: 1, loss = 0.35054388642311096 | |
Epoch: 0, Step: 7, Rank: 2, loss = 0.6085187792778015 | |
Epoch: 0, Step: 7, Rank: 5, loss = 1.3275164365768433 | |
Per-token loss scaled by world size: 0.00021533554536290467 | |
Epoch: 0, Step: 7, Rank: 3, loss = 0.6617530584335327 | |
Epoch 0: 6%|▌ | 7/121 [00:18<04:50, 2.55s/it] total tokens: 7806 num samples: 3 num padding tokens: 966 - rank: 1 max len: 2602 min len: 1917 avg len: 2280.0 num_loss_counted_tokens: 944 | |
total tokens: 7488 num samples: 9 num padding tokens: 542 - rank: 4 max len: 832 min len: 711 avg len: 771.7777777777778 num_loss_counted_tokens: 3897 | |
{ | |
"epoch": 0, | |
"step": 7, | |
"rank": 0, | |
"loss": 0.4167849123477936, | |
"overall_throughput": 43.4992160904002, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.127665996551514, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24585, | |
"batch_size": 88, | |
"total_loss": 0.6876142621040344, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:20.296605" | |
} | |
total tokens: 7400 num samples: 4 num padding tokens: 570 - rank: 2 max len: 1850 min len: 1579 avg len: 1707.5 num_loss_counted_tokens: 1604 | |
total tokens: 7920 num samples: 16 num padding tokens: 2028 - rank: 6 max len: 495 min len: 286 avg len: 368.25 num_loss_counted_tokens: 3190 | |
total tokens: 7140 num samples: 6 num padding tokens: 1155 - rank: 3 max len: 1190 min len: 845 avg len: 997.5 num_loss_counted_tokens: 3914 | |
total tokens: 7424 num samples: 2 num padding tokens: 302 - rank: 0 max len: 3712 min len: 3410 avg len: 3561.0 num_loss_counted_tokens: 261 | |
total tokens: 7744 num samples: 11 num padding tokens: 907 - rank: 5 max len: 704 min len: 540 avg len: 621.5454545454545 num_loss_counted_tokens: 4091 | |
total tokens: 7672 num samples: 28 num padding tokens: 2305 - rank: 7 max len: 274 min len: 83 avg len: 191.67857142857142 num_loss_counted_tokens: 2631 | |
Per-token loss scaled by world size: 0.00014226992789190263Per-token loss scaled by world size: 0.00031214559567160904Per-token loss scaled by world size: 0.00031845251214690506Per-token loss scaled by world size: 0.0002571563527453691 | |
Per-token loss scaled by world size: 0.00039170257514342666 | |
Per-token loss scaled by world size: 0.00012162854545749724 | |
Per-token loss scaled by world size: 0.00013610723544843495 | |
Epoch: 0, Step: 8, Rank: 6, loss = 1.0465071201324463 | |
Epoch: 0, Step: 8, Rank: 4, loss = 1.0676518678665161 | |
Epoch: 0, Step: 8, Rank: 2, loss = 0.47697773575782776Epoch: 0, Step: 8, Rank: 3, loss = 0.8621488213539124 | |
Epoch: 0, Step: 8, Rank: 1, loss = 0.4077748954296112Epoch: 0, Step: 8, Rank: 5, loss = 1.3132318258285522 | |
Epoch: 0, Step: 8, Rank: 7, loss = 0.4563165009021759 | |
Per-token loss scaled by world size: 2.5762397854123265e-05 | |
Epoch: 0, Step: 8, Rank: 0, loss = 0.08637166023254395 | |
Epoch 0: 7%|▋ | 8/121 [00:21<04:49, 2.56s/it] total tokens: 6510 num samples: 3 num padding tokens: 1131 - rank: 1 max len: 2170 min len: 1481 avg len: 1793.0 num_loss_counted_tokens: 969 | |
total tokens: 8001 num samples: 9 num padding tokens: 764 - rank: 4 max len: 889 min len: 750 avg len: 804.1111111111111 num_loss_counted_tokens: 5953 | |
total tokens: 6042 num samples: 19 num padding tokens: 2404 - rank: 7 max len: 318 min len: 88 avg len: 191.47368421052633 num_loss_counted_tokens: 1475 | |
{ | |
"epoch": 0, | |
"step": 8, | |
"rank": 0, | |
"loss": 0.08637166023254395, | |
"overall_throughput": 41.99042053701942, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.229600429534912, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26821, | |
"batch_size": 90, | |
"total_loss": 0.7146224975585938, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:22.870728" | |
} | |
total tokens: 7525 num samples: 7 num padding tokens: 620 - rank: 3 max len: 1075 min len: 938 avg len: 986.4285714285714 num_loss_counted_tokens: 5075 | |
total tokens: 7710 num samples: 6 num padding tokens: 714 - rank: 2 max len: 1285 min len: 1097 avg len: 1166.0 num_loss_counted_tokens: 4799 | |
total tokens: 7742 num samples: 14 num padding tokens: 1362 - rank: 6 max len: 553 min len: 343 avg len: 455.7142857142857 num_loss_counted_tokens: 3929 | |
total tokens: 6622 num samples: 2 num padding tokens: 836 - rank: 0 max len: 3311 min len: 2475 avg len: 2893.0 num_loss_counted_tokens: 693 | |
total tokens: 8096 num samples: 11 num padding tokens: 746 - rank: 5 max len: 736 min len: 587 avg len: 668.1818181818181 num_loss_counted_tokens: 5289 | |
Per-token loss scaled by world size: 0.00032484609982930124Per-token loss scaled by world size: 0.000325270724715665Per-token loss scaled by world size: 0.000487986282678321Per-token loss scaled by world size: 0.00039277857285924256 | |
Per-token loss scaled by world size: 0.00037055223947390914Per-token loss scaled by world size: 1.2821310519939288e-06 | |
Per-token loss scaled by world size: 2.2653903215541504e-05Epoch: 0, Step: 9, Rank: 4, loss = 0.9635331630706787Epoch: 0, Step: 9, Rank: 6, loss = 1.1635082960128784 | |
Epoch: 0, Step: 9, Rank: 5, loss = 1.4455373287200928Epoch: 0, Step: 9, Rank: 0, loss = 0.0037979925982654095 | |
Epoch: 0, Step: 9, Rank: 7, loss = 1.0976684093475342Epoch: 0, Step: 9, Rank: 2, loss = 0.9622753262519836 | |
Epoch: 0, Step: 9, Rank: 1, loss = 0.06710652261972427 | |
Per-token loss scaled by world size: 0.0003276054630987346 | |
Epoch: 0, Step: 9, Rank: 3, loss = 0.9704492688179016 | |
Epoch 0: 7%|▋ | 9/121 [00:23<04:45, 2.55s/it] total tokens: 7932 num samples: 3 num padding tokens: 1331 - rank: 1 max len: 2644 min len: 1965 avg len: 2200.3333333333335 num_loss_counted_tokens: 863 | |
total tokens: 7389 num samples: 9 num padding tokens: 630 - rank: 4 max len: 821 min len: 683 avg len: 751.0 num_loss_counted_tokens: 4429 | |
{ | |
"epoch": 0, | |
"step": 9, | |
"rank": 0, | |
"loss": 0.0037979925982654095, | |
"overall_throughput": 42.802430491911245, | |
"lr": 0.0, | |
"cuda_mem_allocated": 17.982967853546143, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23698, | |
"batch_size": 89, | |
"total_loss": 0.8342345356941223, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:25.401772" | |
} | |
total tokens: 8088 num samples: 12 num padding tokens: 982 - rank: 5 max len: 674 min len: 500 avg len: 592.1666666666666 num_loss_counted_tokens: 4684 | |
total tokens: 6798 num samples: 2 num padding tokens: 340 - rank: 0 max len: 3399 min len: 3059 avg len: 3229.0 num_loss_counted_tokens: 156 | |
total tokens: 6642 num samples: 27 num padding tokens: 2019 - rank: 7 max len: 246 min len: 76 avg len: 171.22222222222223 num_loss_counted_tokens: 1990 | |
total tokens: 6668 num samples: 4 num padding tokens: 1622 - rank: 2 max len: 1667 min len: 1049 avg len: 1261.5 num_loss_counted_tokens: 1065 | |
total tokens: 7939 num samples: 17 num padding tokens: 2169 - rank: 6 max len: 467 min len: 248 avg len: 339.4117647058824 num_loss_counted_tokens: 3544 | |
total tokens: 7528 num samples: 8 num padding tokens: 554 - rank: 3 max len: 941 min len: 832 avg len: 871.75 num_loss_counted_tokens: 4720 | |
Per-token loss scaled by world size: 0.00029833969892933965Per-token loss scaled by world size: 0.0003632043662946671Per-token loss scaled by world size: 0.00018327771977055818Per-token loss scaled by world size: 0.00017725562793202698 | |
Per-token loss scaled by world size: 1.4638754691986833e-05Per-token loss scaled by world size: 0.0002214470150647685Per-token loss scaled by world size: 5.87268550589215e-06 | |
Epoch: 0, Step: 10, Rank: 2, loss = 0.5432663559913635Epoch: 0, Step: 10, Rank: 6, loss = 0.9143738746643066 | |
Epoch: 0, Step: 10, Rank: 4, loss = 1.1131759881973267 | |
Epoch: 0, Step: 10, Rank: 3, loss = 0.5617232918739319 | |
Epoch: 0, Step: 10, Rank: 0, loss = 0.01799904741346836 | |
Epoch: 0, Step: 10, Rank: 1, loss = 0.04486595466732979 | |
Epoch: 0, Step: 10, Rank: 7, loss = 0.6787074208259583 | |
Per-token loss scaled by world size: 0.0003915868583135307 | |
Epoch: 0, Step: 10, Rank: 5, loss = 1.200164794921875 | |
Epoch 0: 8%|▊ | 10/121 [00:26<04:41, 2.54s/it] total tokens: 7056 num samples: 6 num padding tokens: 1539 - rank: 4 max len: 1176 min len: 760 avg len: 919.5 num_loss_counted_tokens: 3752 | |
total tokens: 6318 num samples: 2 num padding tokens: 744 - rank: 1 max len: 3159 min len: 2415 avg len: 2787.0 num_loss_counted_tokens: 641 | |
{ | |
"epoch": 0, | |
"step": 10, | |
"rank": 0, | |
"loss": 0.01799904741346836, | |
"overall_throughput": 43.20698130216357, | |
"lr": 0.0, | |
"cuda_mem_allocated": 17.961437225341797, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24519, | |
"batch_size": 84, | |
"total_loss": 0.6342846155166626, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:27.904244" | |
} | |
total tokens: 7876 num samples: 11 num padding tokens: 1081 - rank: 5 max len: 716 min len: 528 avg len: 617.7272727272727 num_loss_counted_tokens: 3741 | |
total tokens: 6586 num samples: 2 num padding tokens: 33 - rank: 0 max len: 3293 min len: 3260 avg len: 3276.5 num_loss_counted_tokens: 401 | |
total tokens: 7084 num samples: 4 num padding tokens: 1038 - rank: 3 max len: 1771 min len: 1326 avg len: 1511.5 num_loss_counted_tokens: 2816 | |
total tokens: 7956 num samples: 17 num padding tokens: 1245 - rank: 6 max len: 468 min len: 316 avg len: 394.7647058823529 num_loss_counted_tokens: 3928 | |
total tokens: 6378 num samples: 3 num padding tokens: 279 - rank: 2 max len: 2126 min len: 1873 avg len: 2033.0 num_loss_counted_tokens: 1061 | |
total tokens: 7852 num samples: 26 num padding tokens: 3174 - rank: 7 max len: 302 min len: 84 avg len: 179.92307692307693 num_loss_counted_tokens: 2103 | |
Per-token loss scaled by world size: 0.00014277359878178686Per-token loss scaled by world size: 0.00017172133084386587Per-token loss scaled by world size: 0.00032692356035113335Per-token loss scaled by world size: 0.00021291511075105518Per-token loss scaled by world size: 0.0002838084474205971 | |
Per-token loss scaled by world size: 8.463160156679805e-06 | |
Per-token loss scaled by world size: 0.0002184695185860619 | |
Epoch: 0, Step: 11, Rank: 5, loss = 1.096379041671753 | |
Epoch: 0, Step: 11, Rank: 0, loss = 0.02838226407766342 | |
Epoch: 0, Step: 11, Rank: 2, loss = 0.5758889317512512 | |
Epoch: 0, Step: 11, Rank: 1, loss = 0.47880908846855164Epoch: 0, Step: 11, Rank: 6, loss = 0.7140374183654785Epoch: 0, Step: 11, Rank: 4, loss = 0.9517871141433716 | |
Epoch: 0, Step: 11, Rank: 7, loss = 0.7326648235321045 | |
Per-token loss scaled by world size: 0.00020073003543075174 | |
Epoch: 0, Step: 11, Rank: 3, loss = 0.6731732487678528 | |
Epoch 0: 9%|▉ | 11/121 [00:28<04:39, 2.54s/it] total tokens: 7000 num samples: 4 num padding tokens: 1372 - rank: 1 max len: 1750 min len: 1261 avg len: 1407.0 num_loss_counted_tokens: 902 | |
total tokens: 7730 num samples: 10 num padding tokens: 533 - rank: 4 max len: 773 min len: 666 avg len: 719.7 num_loss_counted_tokens: 4342 | |
total tokens: 7980 num samples: 12 num padding tokens: 984 - rank: 5 max len: 665 min len: 484 avg len: 583.0 num_loss_counted_tokens: 4069 | |
{ | |
"epoch": 0, | |
"step": 11, | |
"rank": 0, | |
"loss": 0.02838226407766342, | |
"overall_throughput": 42.203161621451606, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.220097064971924, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26829, | |
"batch_size": 89, | |
"total_loss": 0.6563901901245117, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:30.459939" | |
} | |
total tokens: 7854 num samples: 17 num padding tokens: 1998 - rank: 6 max len: 462 min len: 265 avg len: 344.47058823529414 num_loss_counted_tokens: 3040 | |
total tokens: 7953 num samples: 3 num padding tokens: 883 - rank: 0 max len: 2651 min len: 1841 avg len: 2356.6666666666665 num_loss_counted_tokens: 501 | |
total tokens: 7992 num samples: 8 num padding tokens: 643 - rank: 3 max len: 999 min len: 846 avg len: 918.625 num_loss_counted_tokens: 5458 | |
total tokens: 8060 num samples: 31 num padding tokens: 2438 - rank: 7 max len: 260 min len: 78 avg len: 181.3548387096774 num_loss_counted_tokens: 2341 | |
total tokens: 7314 num samples: 6 num padding tokens: 570 - rank: 2 max len: 1219 min len: 1012 avg len: 1124.0 num_loss_counted_tokens: 2551 | |
Per-token loss scaled by world size: 0.0002459367678966373Per-token loss scaled by world size: 0.00032957716030068696Per-token loss scaled by world size: 2.192043893955997e-06 | |
Per-token loss scaled by world size: 3.339715112815611e-05 | |
Per-token loss scaled by world size: 0.0003242892271373421 | |
Per-token loss scaled by world size: 0.00027953533572144806 | |
Per-token loss scaled by world size: 0.0003221108636353165 | |
Epoch: 0, Step: 12, Rank: 0, loss = 0.005932766944169998 | |
Epoch: 0, Step: 12, Rank: 2, loss = 0.8920005559921265 | |
Epoch: 0, Step: 12, Rank: 1, loss = 0.09038939327001572 | |
Epoch: 0, Step: 12, Rank: 3, loss = 0.6656278967857361 | |
Epoch: 0, Step: 12, Rank: 5, loss = 0.8776887655258179 | |
Epoch: 0, Step: 12, Rank: 4, loss = 0.7565624117851257 | |
Epoch: 0, Step: 12, Rank: 7, loss = 0.8717930316925049 | |
Per-token loss scaled by world size: 0.00047857032041065395 | |
Epoch: 0, Step: 12, Rank: 6, loss = 1.2952505350112915 | |
Epoch 0: 10%|▉ | 12/121 [00:31<04:35, 2.53s/it] total tokens: 7648 num samples: 8 num padding tokens: 826 - rank: 4 max len: 956 min len: 722 avg len: 852.75 num_loss_counted_tokens: 4969 | |
total tokens: 7680 num samples: 3 num padding tokens: 714 - rank: 1 max len: 2560 min len: 2201 avg len: 2322.0 num_loss_counted_tokens: 995 | |
{ | |
"epoch": 0, | |
"step": 12, | |
"rank": 0, | |
"loss": 0.005932766944169998, | |
"overall_throughput": 43.402175224914764, | |
"lr": 0.0, | |
"cuda_mem_allocated": 17.94186305999756, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21652, | |
"batch_size": 76, | |
"total_loss": 0.6819056272506714, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:32.991146" | |
} | |
total tokens: 7664 num samples: 16 num padding tokens: 1534 - rank: 6 max len: 479 min len: 276 avg len: 383.125 num_loss_counted_tokens: 3860 | |
total tokens: 8106 num samples: 7 num padding tokens: 750 - rank: 3 max len: 1158 min len: 961 avg len: 1050.857142857143 num_loss_counted_tokens: 5265 | |
total tokens: 7888 num samples: 29 num padding tokens: 2471 - rank: 7 max len: 272 min len: 82 avg len: 186.79310344827587 num_loss_counted_tokens: 2429 | |
total tokens: 7306 num samples: 2 num padding tokens: 864 - rank: 0 max len: 3653 min len: 2789 avg len: 3221.0 num_loss_counted_tokens: 191 | |
total tokens: 6890 num samples: 5 num padding tokens: 518 - rank: 2 max len: 1378 min len: 1170 avg len: 1274.4 num_loss_counted_tokens: 2274 | |
total tokens: 7689 num samples: 11 num padding tokens: 1127 - rank: 5 max len: 699 min len: 483 avg len: 596.5454545454545 num_loss_counted_tokens: 4497 | |
Per-token loss scaled by world size: 0.00013505632523447275Per-token loss scaled by world size: 0.00034081621561199427Per-token loss scaled by world size: 0.0004060634528286755Per-token loss scaled by world size: 0.0003271996683906764 | |
Per-token loss scaled by world size: 2.6962425181409344e-06 | |
Per-token loss scaled by world size: 0.00019702856661751866 | |
Per-token loss scaled by world size: 2.0444547317310935e-06 | |
Epoch: 0, Step: 13, Rank: 2, loss = 0.39850056171417236 | |
Epoch: 0, Step: 13, Rank: 6, loss = 1.0056208372116089Epoch: 0, Step: 13, Rank: 4, loss = 1.1981409788131714Epoch: 0, Step: 13, Rank: 3, loss = 0.9654435515403748 | |
Epoch: 0, Step: 13, Rank: 0, loss = 0.007955600507557392 | |
Epoch: 0, Step: 13, Rank: 7, loss = 0.5813574194908142 | |
Epoch: 0, Step: 13, Rank: 1, loss = 0.006032418925315142 | |
Per-token loss scaled by world size: 0.000633390387520194 | |
Epoch: 0, Step: 13, Rank: 5, loss = 1.868897557258606 | |
Epoch 0: 11%|█ | 13/121 [00:33<04:32, 2.53s/it] total tokens: 8085 num samples: 11 num padding tokens: 649 - rank: 4 max len: 735 min len: 614 avg len: 676.0 num_loss_counted_tokens: 3645 | |
total tokens: 6609 num samples: 3 num padding tokens: 940 - rank: 1 max len: 2203 min len: 1630 avg len: 1889.6666666666667 num_loss_counted_tokens: 239 | |
total tokens: 7854 num samples: 17 num padding tokens: 1445 - rank: 6 max len: 462 min len: 277 avg len: 377.0 num_loss_counted_tokens: 3417 | |
{ | |
"epoch": 0, | |
"step": 13, | |
"rank": 0, | |
"loss": 0.007955600507557392, | |
"overall_throughput": 42.85566372115263, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.043649196624756, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23605, | |
"batch_size": 89, | |
"total_loss": 0.7539936304092407, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:35.510670" | |
} | |
total tokens: 7722 num samples: 13 num padding tokens: 1086 - rank: 5 max len: 594 min len: 465 avg len: 510.46153846153845 num_loss_counted_tokens: 3752 | |
total tokens: 4062 num samples: 1 num padding tokens: 0 - rank: 0 max len: 4062 min len: 4062 avg len: 4062.0 num_loss_counted_tokens: 85 | |
total tokens: 7917 num samples: 29 num padding tokens: 2683 - rank: 7 max len: 273 min len: 76 avg len: 180.48275862068965 num_loss_counted_tokens: 2192 | |
total tokens: 8091 num samples: 9 num padding tokens: 685 - rank: 3 max len: 899 min len: 760 avg len: 822.8888888888889 num_loss_counted_tokens: 6014 | |
total tokens: 7815 num samples: 5 num padding tokens: 2132 - rank: 2 max len: 1563 min len: 901 avg len: 1136.6 num_loss_counted_tokens: 3256 | |
Per-token loss scaled by world size: 0.00030266563408076763Per-token loss scaled by world size: 0.0001793744886526838Per-token loss scaled by world size: 0.00013343075988814235 | |
Per-token loss scaled by world size: 8.247572259278968e-05 | |
Per-token loss scaled by world size: 0.00023257164866663516Per-token loss scaled by world size: 0.00025159039068967104 | |
Epoch: 0, Step: 14, Rank: 3, loss = 0.6006578803062439 | |
Epoch: 0, Step: 14, Rank: 5, loss = 1.0135136842727661 | |
Epoch: 0, Step: 14, Rank: 0, loss = 0.2761802673339844 | |
Epoch: 0, Step: 14, Rank: 1, loss = 0.4468095600605011 | |
Per-token loss scaled by world size: 0.0003185720997862518 | |
Epoch: 0, Step: 14, Rank: 4, loss = 0.8424819111824036 | |
Epoch: 0, Step: 14, Rank: 7, loss = 0.7787952423095703 | |
Per-token loss scaled by world size: 9.590814443072304e-05 | |
Epoch: 0, Step: 14, Rank: 6, loss = 1.066778540611267 | |
Epoch: 0, Step: 14, Rank: 2, loss = 0.3211604058742523 | |
Epoch 0: 12%|█▏ | 14/121 [00:36<04:31, 2.54s/it] total tokens: 6432 num samples: 3 num padding tokens: 621 - rank: 1 max len: 2144 min len: 1779 avg len: 1937.0 num_loss_counted_tokens: 328 | |
total tokens: 7650 num samples: 9 num padding tokens: 968 - rank: 4 max len: 850 min len: 696 avg len: 742.4444444444445 num_loss_counted_tokens: 4090 | |
total tokens: 7776 num samples: 16 num padding tokens: 1817 - rank: 6 max len: 486 min len: 267 avg len: 372.4375 num_loss_counted_tokens: 2945 | |
{ | |
"epoch": 0, | |
"step": 14, | |
"rank": 0, | |
"loss": 0.2761802673339844, | |
"overall_throughput": 41.914163559432374, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.220842361450195, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26789, | |
"batch_size": 100, | |
"total_loss": 0.6682971715927124, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:38.048693" | |
} | |
total tokens: 7546 num samples: 11 num padding tokens: 994 - rank: 5 max len: 686 min len: 507 avg len: 595.6363636363636 num_loss_counted_tokens: 4306 | |
total tokens: 7953 num samples: 33 num padding tokens: 2532 - rank: 7 max len: 241 min len: 83 avg len: 164.27272727272728 num_loss_counted_tokens: 2377 | |
total tokens: 7696 num samples: 8 num padding tokens: 273 - rank: 3 max len: 962 min len: 903 avg len: 927.875 num_loss_counted_tokens: 3069 | |
total tokens: 7708 num samples: 2 num padding tokens: 1229 - rank: 0 max len: 3854 min len: 2625 avg len: 3239.5 num_loss_counted_tokens: 176 | |
total tokens: 6624 num samples: 4 num padding tokens: 1021 - rank: 2 max len: 1656 min len: 1051 avg len: 1400.75 num_loss_counted_tokens: 1421 | |
Per-token loss scaled by world size: 0.0004944643005728722Per-token loss scaled by world size: 0.0002606770722195506Per-token loss scaled by world size: 0.0004838722525164485Per-token loss scaled by world size: 5.0423706852598116e-05 | |
Per-token loss scaled by world size: 0.0004016592283733189 | |
Per-token loss scaled by world size: 6.617276085307822e-05 | |
Epoch: 0, Step: 15, Rank: 6, loss = 0.7229878306388855Epoch: 0, Step: 15, Rank: 5, loss = 1.3713966608047485 | |
Epoch: 0, Step: 15, Rank: 4, loss = 1.3420196771621704 | |
Epoch: 0, Step: 15, Rank: 0, loss = 0.13985015451908112Epoch: 0, Step: 15, Rank: 1, loss = 0.18353015184402466 | |
Epoch: 0, Step: 15, Rank: 7, loss = 1.1140018701553345 | |
Per-token loss scaled by world size: 8.276257722172886e-05 | |
Per-token loss scaled by world size: 0.000270542484940961 | |
Epoch: 0, Step: 15, Rank: 2, loss = 0.22954201698303223 | |
Epoch: 0, Step: 15, Rank: 3, loss = 0.7503495812416077 | |
Epoch 0: 12%|█▏ | 15/121 [00:38<04:29, 2.54s/it] total tokens: 7595 num samples: 7 num padding tokens: 570 - rank: 4 max len: 1085 min len: 923 avg len: 1003.5714285714286 num_loss_counted_tokens: 4160 | |
total tokens: 6930 num samples: 2 num padding tokens: 569 - rank: 1 max len: 3465 min len: 2896 avg len: 3180.5 num_loss_counted_tokens: 174 | |
total tokens: 7542 num samples: 9 num padding tokens: 1278 - rank: 5 max len: 838 min len: 611 avg len: 696.0 num_loss_counted_tokens: 3843 | |
total tokens: 8061 num samples: 3 num padding tokens: 917 - rank: 2 max len: 2687 min len: 1952 avg len: 2381.3333333333335 num_loss_counted_tokens: 302 | |
{ | |
"epoch": 0, | |
"step": 15, | |
"rank": 0, | |
"loss": 0.13985015451908112, | |
"overall_throughput": 42.649491925532274, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.06497097015381, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22188, | |
"batch_size": 90, | |
"total_loss": 0.7317097187042236, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:40.589466" | |
} | |
total tokens: 4070 num samples: 1 num padding tokens: 0 - rank: 0 max len: 4070 min len: 4070 avg len: 4070.0 num_loss_counted_tokens: 1038 | |
total tokens: 7565 num samples: 5 num padding tokens: 974 - rank: 3 max len: 1513 min len: 1101 avg len: 1318.2 num_loss_counted_tokens: 2867 | |
total tokens: 7852 num samples: 13 num padding tokens: 2212 - rank: 6 max len: 604 min len: 287 avg len: 433.84615384615387 num_loss_counted_tokens: 3284 | |
total tokens: 5720 num samples: 20 num padding tokens: 1667 - rank: 7 max len: 286 min len: 83 avg len: 202.65 num_loss_counted_tokens: 1807 | |
Per-token loss scaled by world size: 0.00036834663478657603Per-token loss scaled by world size: 0.00037199081270955503Per-token loss scaled by world size: 0.00025284758885391057 | |
Per-token loss scaled by world size: 0.00017388923151884228 | |
Per-token loss scaled by world size: 4.525140013811324e-07 | |
Per-token loss scaled by world size: 0.00022274823277257383 | |
Per-token loss scaled by world size: 9.103088814299554e-05 | |
Epoch: 0, Step: 16, Rank: 4, loss = 1.2040791511535645 | |
Epoch: 0, Step: 16, Rank: 5, loss = 1.215991497039795Epoch: 0, Step: 16, Rank: 3, loss = 0.8265271782875061 | |
Epoch: 0, Step: 16, Rank: 7, loss = 0.5684221386909485 | |
Epoch: 0, Step: 16, Rank: 0, loss = 0.0014792117290198803 | |
Epoch: 0, Step: 16, Rank: 2, loss = 0.7281361222267151 | |
Epoch: 0, Step: 16, Rank: 1, loss = 0.29756858944892883 | |
Per-token loss scaled by world size: 0.00036798955989070237 | |
Epoch: 0, Step: 16, Rank: 6, loss = 1.2029118537902832 | |
Epoch 0: 13%|█▎ | 16/121 [00:41<04:25, 2.52s/it] total tokens: 7950 num samples: 10 num padding tokens: 556 - rank: 4 max len: 795 min len: 706 avg len: 739.4 num_loss_counted_tokens: 4706 | |
total tokens: 7167 num samples: 3 num padding tokens: 1839 - rank: 1 max len: 2389 min len: 1434 avg len: 1776.0 num_loss_counted_tokens: 2175 | |
{ | |
"epoch": 0, | |
"step": 16, | |
"rank": 0, | |
"loss": 0.0014792117290198803, | |
"overall_throughput": 43.42571067791906, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.137038707733154, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26151, | |
"batch_size": 89, | |
"total_loss": 0.7556394934654236, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:43.079559" | |
} | |
total tokens: 7904 num samples: 16 num padding tokens: 2012 - rank: 6 max len: 494 min len: 282 avg len: 368.25 num_loss_counted_tokens: 3677 | |
total tokens: 7689 num samples: 11 num padding tokens: 1030 - rank: 5 max len: 699 min len: 501 avg len: 605.3636363636364 num_loss_counted_tokens: 4515 | |
total tokens: 6075 num samples: 25 num padding tokens: 2510 - rank: 7 max len: 243 min len: 84 avg len: 142.6 num_loss_counted_tokens: 1404 | |
total tokens: 7752 num samples: 8 num padding tokens: 648 - rank: 3 max len: 969 min len: 812 avg len: 888.0 num_loss_counted_tokens: 4511 | |
total tokens: 7658 num samples: 7 num padding tokens: 214 - rank: 2 max len: 1094 min len: 982 avg len: 1063.4285714285713 num_loss_counted_tokens: 4703 | |
total tokens: 6666 num samples: 2 num padding tokens: 735 - rank: 0 max len: 3333 min len: 2598 avg len: 2965.5 num_loss_counted_tokens: 156 | |
Per-token loss scaled by world size: 0.0003832130169030279Per-token loss scaled by world size: 0.00025972415460273623Per-token loss scaled by world size: 0.0007209046743810177 | |
Per-token loss scaled by world size: 3.690614175866358e-05Per-token loss scaled by world size: 0.0004276617255527526 | |
Per-token loss scaled by world size: 7.648386599612422e-06 | |
Per-token loss scaled by world size: 0.00031741950078867376 | |
Epoch: 0, Step: 17, Rank: 0, loss = 0.0850963369011879 | |
Epoch: 0, Step: 17, Rank: 4, loss = 0.8835934400558472Epoch: 0, Step: 17, Rank: 6, loss = 1.6622259616851807 | |
Epoch: 0, Step: 17, Rank: 3, loss = 0.5988589525222778 | |
Epoch: 0, Step: 17, Rank: 2, loss = 0.9860810041427612Epoch: 0, Step: 17, Rank: 1, loss = 0.017635267227888107 | |
Epoch: 0, Step: 17, Rank: 7, loss = 0.7318900227546692 | |
Per-token loss scaled by world size: 0.00037659640656784177 | |
Epoch: 0, Step: 17, Rank: 5, loss = 0.8683371543884277 | |
Epoch 0: 14%|█▍ | 17/121 [00:43<04:21, 2.51s/it] total tokens: 8082 num samples: 9 num padding tokens: 734 - rank: 4 max len: 898 min len: 742 avg len: 816.4444444444445 num_loss_counted_tokens: 4032 | |
total tokens: 8112 num samples: 4 num padding tokens: 808 - rank: 1 max len: 2028 min len: 1729 avg len: 1826.0 num_loss_counted_tokens: 1009 | |
total tokens: 7680 num samples: 12 num padding tokens: 950 - rank: 5 max len: 640 min len: 496 avg len: 560.8333333333334 num_loss_counted_tokens: 4500 | |
{ | |
"epoch": 0, | |
"step": 17, | |
"rank": 0, | |
"loss": 0.0850963369011879, | |
"overall_throughput": 43.64044780600529, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.1714825630188, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18446, | |
"batch_size": 76, | |
"total_loss": 0.7292147874832153, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:45.559713" | |
} | |
total tokens: 7664 num samples: 16 num padding tokens: 2265 - rank: 6 max len: 479 min len: 253 avg len: 337.4375 num_loss_counted_tokens: 2968 | |
total tokens: 7756 num samples: 7 num padding tokens: 792 - rank: 3 max len: 1108 min len: 923 avg len: 994.8571428571429 num_loss_counted_tokens: 4875 | |
total tokens: 7194 num samples: 2 num padding tokens: 1367 - rank: 0 max len: 3597 min len: 2230 avg len: 2913.5 num_loss_counted_tokens: 192 | |
total tokens: 6448 num samples: 26 num padding tokens: 1927 - rank: 7 max len: 248 min len: 98 avg len: 173.8846153846154 num_loss_counted_tokens: 2050 | |
total tokens: 7075 num samples: 5 num padding tokens: 857 - rank: 2 max len: 1415 min len: 1116 avg len: 1243.6 num_loss_counted_tokens: 3663 | |
Per-token loss scaled by world size: 0.00015676565817557275Per-token loss scaled by world size: 0.0004898877814412117 | |
Per-token loss scaled by world size: 0.00045096693793311715 | |
Per-token loss scaled by world size: 5.226111625233898e-06Per-token loss scaled by world size: 8.227287253248505e-06 | |
Per-token loss scaled by world size: 0.0005342444637790322Per-token loss scaled by world size: 0.0005171209922991693 | |
Epoch: 0, Step: 18, Rank: 3, loss = 1.0057395696640015Epoch: 0, Step: 18, Rank: 2, loss = 0.3218398988246918 | |
Epoch: 0, Step: 18, Rank: 5, loss = 0.925835132598877 | |
Epoch: 0, Step: 18, Rank: 1, loss = 0.01072920672595501 | |
Epoch: 0, Step: 18, Rank: 0, loss = 0.016890620812773705 | |
Epoch: 0, Step: 18, Rank: 7, loss = 1.096803903579712 | |
Epoch: 0, Step: 18, Rank: 4, loss = 1.0616494417190552 | |
Per-token loss scaled by world size: 0.0007897767936810851 | |
Epoch: 0, Step: 18, Rank: 6, loss = 1.6214118003845215 | |
Epoch 0: 15%|█▍ | 18/121 [00:46<04:17, 2.50s/it] total tokens: 7335 num samples: 3 num padding tokens: 432 - rank: 1 max len: 2445 min len: 2093 avg len: 2301.0 num_loss_counted_tokens: 1145 | |
total tokens: 7455 num samples: 7 num padding tokens: 746 - rank: 4 max len: 1065 min len: 855 avg len: 958.4285714285714 num_loss_counted_tokens: 3764 | |
{ | |
"epoch": 0, | |
"step": 18, | |
"rank": 0, | |
"loss": 0.016890620812773705, | |
"overall_throughput": 43.99619876252009, | |
"lr": 0.0, | |
"cuda_mem_allocated": 17.973182678222656, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 16424, | |
"batch_size": 69, | |
"total_loss": 0.7576124668121338, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:48.023775" | |
} | |
total tokens: 7188 num samples: 4 num padding tokens: 887 - rank: 2 max len: 1797 min len: 1322 avg len: 1575.25 num_loss_counted_tokens: 1620 | |
total tokens: 8016 num samples: 16 num padding tokens: 1990 - rank: 6 max len: 501 min len: 289 avg len: 376.625 num_loss_counted_tokens: 3443 total tokens: 7200 num samples: 25 num padding tokens: 2743 - rank: 7 max len: 288 min len: 77 avg len: 178.28 num_loss_counted_tokens: 2061 | |
total tokens: 7980 num samples: 10 num padding tokens: 1229 - rank: 5 max len: 798 min len: 551 avg len: 675.1 num_loss_counted_tokens: 3751 | |
total tokens: 6852 num samples: 2 num padding tokens: 182 - rank: 0 max len: 3426 min len: 3244 avg len: 3335.0 num_loss_counted_tokens: 203 | |
total tokens: 7290 num samples: 6 num padding tokens: 282 - rank: 3 max len: 1215 min len: 1115 avg len: 1168.0 num_loss_counted_tokens: 4808 | |
Per-token loss scaled by world size: 0.0002617795253172517Per-token loss scaled by world size: 0.0003632722655311227Per-token loss scaled by world size: 0.00038097533979453146Per-token loss scaled by world size: 0.00018842382996808738Per-token loss scaled by world size: 0.00016033223073463887 | |
Per-token loss scaled by world size: 2.0858458356087795e-06 | |
Per-token loss scaled by world size: 0.00016425059584435076 | |
Epoch: 0, Step: 19, Rank: 2, loss = 0.5884711742401123Epoch: 0, Step: 19, Rank: 4, loss = 1.1345447301864624 | |
Epoch: 0, Step: 19, Rank: 1, loss = 0.5007376074790955Epoch: 0, Step: 19, Rank: 6, loss = 0.8175702095031738 | |
Epoch: 0, Step: 19, Rank: 5, loss = 1.189833641052246 | |
Epoch: 0, Step: 19, Rank: 0, loss = 0.006514357402920723 | |
Epoch: 0, Step: 19, Rank: 7, loss = 0.5129751563072205 | |
Per-token loss scaled by world size: 0.00031291748746298254 | |
Epoch: 0, Step: 19, Rank: 3, loss = 0.9772804379463196 | |
Epoch 0: 16%|█▌ | 19/121 [00:48<04:16, 2.51s/it] total tokens: 7540 num samples: 10 num padding tokens: 800 - rank: 4 max len: 754 min len: 591 avg len: 674.0 num_loss_counted_tokens: 3840 | |
total tokens: 6924 num samples: 4 num padding tokens: 344 - rank: 1 max len: 1731 min len: 1592 avg len: 1645.0 num_loss_counted_tokens: 954 | |
total tokens: 7018 num samples: 29 num padding tokens: 2429 - rank: 7 max len: 242 min len: 75 avg len: 158.24137931034483 num_loss_counted_tokens: 1886 | |
{ | |
"epoch": 0, | |
"step": 19, | |
"rank": 0, | |
"loss": 0.006514357402920723, | |
"overall_throughput": 42.48141431618144, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.00298833847046, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24985, | |
"batch_size": 81, | |
"total_loss": 0.7159909009933472, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:50.578884" | |
} | |
total tokens: 7980 num samples: 14 num padding tokens: 1456 - rank: 5 max len: 570 min len: 372 avg len: 466.0 num_loss_counted_tokens: 4358 | |
total tokens: 7658 num samples: 7 num padding tokens: 1001 - rank: 3 max len: 1094 min len: 779 avg len: 951.0 num_loss_counted_tokens: 3845 | |
total tokens: 6948 num samples: 3 num padding tokens: 823 - rank: 0 max len: 2316 min len: 1860 avg len: 2041.6666666666667 num_loss_counted_tokens: 2568 | |
total tokens: 8030 num samples: 22 num padding tokens: 1466 - rank: 6 max len: 365 min len: 248 avg len: 298.3636363636364 num_loss_counted_tokens: 3426 | |
total tokens: 7350 num samples: 5 num padding tokens: 547 - rank: 2 max len: 1470 min len: 1198 avg len: 1360.6 num_loss_counted_tokens: 936 | |
Per-token loss scaled by world size: 4.6467721404042095e-06Per-token loss scaled by world size: 1.0636807928676717e-05Per-token loss scaled by world size: 7.807435031281784e-05Per-token loss scaled by world size: 0.000532266276422888Per-token loss scaled by world size: 0.0005103853181935847 | |
Per-token loss scaled by world size: 0.0004919093335047364Per-token loss scaled by world size: 0.0005770818097516894 | |
Epoch: 0, Step: 20, Rank: 0, loss = 0.025356819853186607 | |
Epoch: 0, Step: 20, Rank: 2, loss = 0.18611949682235718 | |
Epoch: 0, Step: 20, Rank: 1, loss = 0.01107732392847538 | |
Epoch: 0, Step: 20, Rank: 6, loss = 1.2688562870025635 | |
Epoch: 0, Step: 20, Rank: 4, loss = 1.1726503372192383Epoch: 0, Step: 20, Rank: 7, loss = 1.2166948318481445 | |
Epoch: 0, Step: 20, Rank: 5, loss = 1.3756909370422363 | |
Per-token loss scaled by world size: 0.00012562941992655396 | |
Epoch: 0, Step: 20, Rank: 3, loss = 0.29948481917381287 | |
Epoch 0: 17%|█▋ | 20/121 [00:51<04:14, 2.52s/it] total tokens: 5636 num samples: 2 num padding tokens: 753 - rank: 1 max len: 2818 min len: 2065 avg len: 2441.5 num_loss_counted_tokens: 151 | |
total tokens: 8030 num samples: 11 num padding tokens: 694 - rank: 4 max len: 730 min len: 602 avg len: 666.9090909090909 num_loss_counted_tokens: 4998 | |
{ | |
"epoch": 0, | |
"step": 20, | |
"rank": 0, | |
"loss": 0.025356819853186607, | |
"overall_throughput": 42.41401913459215, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.149494647979736, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19071, | |
"batch_size": 76, | |
"total_loss": 0.6944913268089294, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:53.125315" | |
} | |
total tokens: 7472 num samples: 8 num padding tokens: 683 - rank: 3 max len: 934 min len: 765 avg len: 848.625 num_loss_counted_tokens: 4018 | |
total tokens: 7984 num samples: 4 num padding tokens: 1049 - rank: 2 max len: 1996 min len: 1356 avg len: 1733.75 num_loss_counted_tokens: 1408 | |
total tokens: 7440 num samples: 24 num padding tokens: 2407 - rank: 7 max len: 310 min len: 86 avg len: 209.70833333333334 num_loss_counted_tokens: 2199 | |
total tokens: 7635 num samples: 15 num padding tokens: 1392 - rank: 6 max len: 509 min len: 313 avg len: 416.2 num_loss_counted_tokens: 3986 | |
total tokens: 5858 num samples: 2 num padding tokens: 28 - rank: 0 max len: 2929 min len: 2901 avg len: 2915.0 num_loss_counted_tokens: 167 | |
total tokens: 7813 num samples: 13 num padding tokens: 588 - rank: 5 max len: 601 min len: 517 avg len: 555.7692307692307 num_loss_counted_tokens: 4116 | |
Per-token loss scaled by world size: 0.0003165990929119289Per-token loss scaled by world size: 0.00013289590424392372Per-token loss scaled by world size: 0.00024944794131442904Per-token loss scaled by world size: 0.00046029582154005766 | |
Per-token loss scaled by world size: 0.00031609757570549846Per-token loss scaled by world size: 0.000298293714877218Per-token loss scaled by world size: 0.0003350040642544627 | |
Epoch: 0, Step: 21, Rank: 1, loss = 0.3848499357700348 | |
Epoch: 0, Step: 21, Rank: 4, loss = 1.3329591751098633 | |
Epoch: 0, Step: 21, Rank: 2, loss = 0.7223700284957886 | |
Epoch: 0, Step: 21, Rank: 3, loss = 0.9168314337730408 | |
Epoch: 0, Step: 21, Rank: 6, loss = 0.9153790473937988 | |
Epoch: 0, Step: 21, Rank: 7, loss = 0.8638213276863098 | |
Epoch: 0, Step: 21, Rank: 5, loss = 0.9701299071311951 | |
Per-token loss scaled by world size: 2.031196345342323e-06 | |
Epoch: 0, Step: 21, Rank: 0, loss = 0.005882090888917446 | |
Epoch 0: 17%|█▋ | 21/121 [00:53<04:14, 2.55s/it] total tokens: 7450 num samples: 10 num padding tokens: 522 - rank: 4 max len: 745 min len: 656 avg len: 692.8 num_loss_counted_tokens: 3966 | |
total tokens: 6090 num samples: 3 num padding tokens: 775 - rank: 1 max len: 2030 min len: 1546 avg len: 1771.6666666666667 num_loss_counted_tokens: 2594 | |
{ | |
"epoch": 0, | |
"step": 21, | |
"rank": 0, | |
"loss": 0.005882090888917446, | |
"overall_throughput": 41.659196665794404, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.256999015808105, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23167, | |
"batch_size": 94, | |
"total_loss": 0.7640278339385986, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:55.723471" | |
} | |
total tokens: 7656 num samples: 12 num padding tokens: 886 - rank: 5 max len: 638 min len: 429 avg len: 564.1666666666666 num_loss_counted_tokens: 4895 | |
total tokens: 7392 num samples: 8 num padding tokens: 529 - rank: 3 max len: 924 min len: 753 avg len: 857.875 num_loss_counted_tokens: 5456 | |
total tokens: 7110 num samples: 3 num padding tokens: 189 - rank: 0 max len: 2370 min len: 2243 avg len: 2307.0 num_loss_counted_tokens: 433 | |
total tokens: 7885 num samples: 19 num padding tokens: 1785 - rank: 6 max len: 415 min len: 258 avg len: 321.05263157894734 num_loss_counted_tokens: 3103 | |
total tokens: 7453 num samples: 29 num padding tokens: 2759 - rank: 7 max len: 257 min len: 79 avg len: 161.86206896551724 num_loss_counted_tokens: 1834 | |
total tokens: 8082 num samples: 6 num padding tokens: 1602 - rank: 2 max len: 1347 min len: 925 avg len: 1080.0 num_loss_counted_tokens: 3092 | |
Per-token loss scaled by world size: 0.00046551000559702516Per-token loss scaled by world size: 0.00031762762228026986Per-token loss scaled by world size: 0.0006262522656470537 | |
Per-token loss scaled by world size: 0.00032212832593359053 | |
Per-token loss scaled by world size: 0.0005319734336808324 | |
Per-token loss scaled by world size: 7.452299178112298e-05Per-token loss scaled by world size: 8.590232027927414e-06 | |
Epoch: 0, Step: 22, Rank: 4, loss = 1.093308448791504Epoch: 0, Step: 22, Rank: 7, loss = 0.7459881901741028 | |
Epoch: 0, Step: 22, Rank: 6, loss = 1.4708317518234253 | |
Epoch: 0, Step: 22, Rank: 3, loss = 0.7565586566925049 | |
Epoch: 0, Step: 22, Rank: 5, loss = 1.249406099319458 | |
Epoch: 0, Step: 22, Rank: 2, loss = 0.1750265657901764 | |
Epoch: 0, Step: 22, Rank: 1, loss = 0.020175233483314514 | |
Per-token loss scaled by world size: 8.447452273685485e-06 | |
Epoch: 0, Step: 22, Rank: 0, loss = 0.019839897751808167 | |
Epoch 0: 18%|█▊ | 22/121 [00:56<04:16, 2.59s/it] total tokens: 7168 num samples: 7 num padding tokens: 1126 - rank: 4 max len: 1024 min len: 712 avg len: 863.1428571428571 num_loss_counted_tokens: 4103 | |
total tokens: 6628 num samples: 2 num padding tokens: 99 - rank: 1 max len: 3314 min len: 3215 avg len: 3264.5 num_loss_counted_tokens: 217 | |
{ | |
"epoch": 0, | |
"step": 22, | |
"rank": 0, | |
"loss": 0.019839897751808167, | |
"overall_throughput": 40.13505236925628, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.249396324157715, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18789, | |
"batch_size": 66, | |
"total_loss": 0.6913918256759644, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:48:58.416941" | |
} | |
total tokens: 7821 num samples: 11 num padding tokens: 1199 - rank: 5 max len: 711 min len: 507 avg len: 602.0 num_loss_counted_tokens: 4500 | |
total tokens: 7330 num samples: 5 num padding tokens: 1004 - rank: 3 max len: 1466 min len: 1087 avg len: 1265.2 num_loss_counted_tokens: 1284 | |
total tokens: 7936 num samples: 16 num padding tokens: 1776 - rank: 6 max len: 496 min len: 286 avg len: 385.0 num_loss_counted_tokens: 4076 | |
total tokens: 8070 num samples: 30 num padding tokens: 2921 - rank: 7 max len: 269 min len: 82 avg len: 171.63333333333333 num_loss_counted_tokens: 2107 total tokens: 7552 num samples: 2 num padding tokens: 409 - rank: 0 max len: 3776 min len: 3367 avg len: 3571.5 num_loss_counted_tokens: 182 | |
total tokens: 8049 num samples: 3 num padding tokens: 2360 - rank: 2 max len: 2683 min len: 1483 avg len: 1896.3333333333333 num_loss_counted_tokens: 515 | |
Per-token loss scaled by world size: 0.0002995161630678922Per-token loss scaled by world size: 0.0001995390048250556Per-token loss scaled by world size: 4.880544565821765e-06Per-token loss scaled by world size: 0.00033291871659457684 | |
Per-token loss scaled by world size: 0.00010466719686519355Per-token loss scaled by world size: 0.00026131211780011654 | |
Per-token loss scaled by world size: 0.00020597832917701453 | |
Epoch: 0, Step: 23, Rank: 0, loss = 0.015341991558670998 | |
Epoch: 0, Step: 23, Rank: 2, loss = 0.6272508502006531 | |
Epoch: 0, Step: 23, Rank: 3, loss = 0.9415290355682373 | |
Epoch: 0, Step: 23, Rank: 5, loss = 1.04653000831604 | |
Epoch: 0, Step: 23, Rank: 7, loss = 0.8214346170425415Epoch: 0, Step: 23, Rank: 1, loss = 0.3290213346481323Epoch: 0, Step: 23, Rank: 4, loss = 0.6474928855895996 | |
Per-token loss scaled by world size: 0.00029968167655169964 | |
Epoch: 0, Step: 23, Rank: 6, loss = 0.9420493841171265 | |
Epoch 0: 19%|█▉ | 23/121 [00:59<04:11, 2.57s/it] total tokens: 7217 num samples: 7 num padding tokens: 812 - rank: 4 max len: 1031 min len: 810 avg len: 915.0 num_loss_counted_tokens: 5570 | |
total tokens: 7335 num samples: 3 num padding tokens: 893 - rank: 1 max len: 2445 min len: 1991 avg len: 2147.3333333333335 num_loss_counted_tokens: 303 | |
{ | |
"epoch": 0, | |
"step": 23, | |
"rank": 0, | |
"loss": 0.015341991558670998, | |
"overall_throughput": 43.085386891030744, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.126072883605957, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25148, | |
"batch_size": 84, | |
"total_loss": 0.6713312864303589, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:00.926547" | |
} | |
total tokens: 8025 num samples: 15 num padding tokens: 1566 - rank: 6 max len: 535 min len: 337 avg len: 430.6 num_loss_counted_tokens: 4270 | |
total tokens: 7872 num samples: 24 num padding tokens: 3099 - rank: 7 max len: 328 min len: 81 avg len: 198.875 num_loss_counted_tokens: 2380 | |
total tokens: 7830 num samples: 10 num padding tokens: 881 - rank: 5 max len: 783 min len: 590 avg len: 694.9 num_loss_counted_tokens: 5375 | |
total tokens: 7664 num samples: 4 num padding tokens: 1037 - rank: 2 max len: 1916 min len: 1514 avg len: 1656.75 num_loss_counted_tokens: 1726 | |
total tokens: 7794 num samples: 6 num padding tokens: 783 - rank: 3 max len: 1299 min len: 1032 avg len: 1168.5 num_loss_counted_tokens: 3507 | |
total tokens: 7062 num samples: 2 num padding tokens: 743 - rank: 0 max len: 3531 min len: 2788 avg len: 3159.5 num_loss_counted_tokens: 192 | |
Per-token loss scaled by world size: 0.00014580012066289783Per-token loss scaled by world size: 0.0005001741228625178Per-token loss scaled by world size: 0.0002923521969933063 | |
Per-token loss scaled by world size: 0.00035334189306013286Per-token loss scaled by world size: 0.00044065553811378777 | |
Per-token loss scaled by world size: 6.498985749203712e-05 | |
Epoch: 0, Step: 24, Rank: 2, loss = 0.37419599294662476 | |
Epoch: 0, Step: 24, Rank: 4, loss = 1.2836968898773193Epoch: 0, Step: 24, Rank: 6, loss = 0.9068519473075867 | |
Epoch: 0, Step: 24, Rank: 1, loss = 0.1667964607477188 | |
Epoch: 0, Step: 24, Rank: 3, loss = 0.7503219246864319 | |
Epoch: 0, Step: 24, Rank: 7, loss = 1.130942463874817 | |
Per-token loss scaled by world size: 0.0006109050591476262 | |
Epoch: 0, Step: 24, Rank: 5, loss = 1.567887783050537 | |
Per-token loss scaled by world size: 1.688349584583193e-05 | |
Epoch: 0, Step: 24, Rank: 0, loss = 0.04333149269223213 | |
Epoch 0: 20%|█▉ | 24/121 [01:01<04:07, 2.55s/it] total tokens: 7492 num samples: 4 num padding tokens: 997 - rank: 1 max len: 1873 min len: 1355 avg len: 1623.75 num_loss_counted_tokens: 1310 | |
total tokens: 8000 num samples: 10 num padding tokens: 696 - rank: 4 max len: 800 min len: 666 avg len: 730.4 num_loss_counted_tokens: 5339 | |
{ | |
"epoch": 0, | |
"step": 24, | |
"rank": 0, | |
"loss": 0.04333149269223213, | |
"overall_throughput": 42.861407474478405, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.177217483520508, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20532, | |
"batch_size": 79, | |
"total_loss": 0.7780030965805054, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:03.453834" | |
} | |
total tokens: 7904 num samples: 19 num padding tokens: 1180 - rank: 6 max len: 416 min len: 291 avg len: 353.89473684210526 num_loss_counted_tokens: 3756 | |
total tokens: 7920 num samples: 12 num padding tokens: 1406 - rank: 5 max len: 660 min len: 441 avg len: 542.8333333333334 num_loss_counted_tokens: 4093 | |
total tokens: 7830 num samples: 27 num padding tokens: 2551 - rank: 7 max len: 290 min len: 78 avg len: 195.5185185185185 num_loss_counted_tokens: 2643 | |
total tokens: 5734 num samples: 2 num padding tokens: 736 - rank: 0 max len: 2867 min len: 2131 avg len: 2499.0 num_loss_counted_tokens: 169 | |
total tokens: 8088 num samples: 6 num padding tokens: 1050 - rank: 2 max len: 1348 min len: 1049 avg len: 1173.0 num_loss_counted_tokens: 3708 | |
total tokens: 8040 num samples: 8 num padding tokens: 661 - rank: 3 max len: 1005 min len: 825 avg len: 922.375 num_loss_counted_tokens: 5105 | |
Per-token loss scaled by world size: 0.00023669454094488174Per-token loss scaled by world size: 0.00023404715466313064Per-token loss scaled by world size: 0.00022105168318375945 | |
Per-token loss scaled by world size: 0.00032421553623862565 | |
Per-token loss scaled by world size: 1.7478114386904053e-05 | |
Per-token loss scaled by world size: 0.00014570211351383477Per-token loss scaled by world size: 5.8525503845885396e-05 | |
Epoch: 0, Step: 25, Rank: 2, loss = 0.8244895935058594 | |
Epoch: 0, Step: 25, Rank: 4, loss = 0.7787098288536072Epoch: 0, Step: 25, Rank: 6, loss = 1.1421302556991577Epoch: 0, Step: 25, Rank: 3, loss = 0.8338156938552856 | |
Epoch: 0, Step: 25, Rank: 0, loss = 0.06157102435827255 | |
Epoch: 0, Step: 25, Rank: 1, loss = 0.2061707228422165 | |
Epoch: 0, Step: 25, Rank: 7, loss = 0.5132721066474915 | |
Per-token loss scaled by world size: 0.00032130838371813297 | |
Epoch: 0, Step: 25, Rank: 5, loss = 1.1318891048431396 | |
Epoch 0: 21%|██ | 25/121 [01:04<04:04, 2.55s/it] total tokens: 7385 num samples: 7 num padding tokens: 1010 - rank: 4 max len: 1055 min len: 819 avg len: 910.7142857142857 num_loss_counted_tokens: 2829 | |
total tokens: 6350 num samples: 2 num padding tokens: 809 - rank: 1 max len: 3175 min len: 2366 avg len: 2770.5 num_loss_counted_tokens: 1136 | |
total tokens: 7100 num samples: 25 num padding tokens: 2255 - rank: 7 max len: 284 min len: 85 avg len: 193.8 num_loss_counted_tokens: 2161 | |
{ | |
"epoch": 0, | |
"step": 25, | |
"rank": 0, | |
"loss": 0.06157102435827255, | |
"overall_throughput": 42.60322198576968, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.081379890441895, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 28182, | |
"batch_size": 71, | |
"total_loss": 0.6865060329437256, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:05.995018" | |
} | |
total tokens: 7800 num samples: 15 num padding tokens: 2044 - rank: 6 max len: 520 min len: 291 avg len: 383.73333333333335 num_loss_counted_tokens: 4084 | |
total tokens: 7464 num samples: 4 num padding tokens: 487 - rank: 2 max len: 1866 min len: 1672 avg len: 1744.25 num_loss_counted_tokens: 862 | |
total tokens: 7970 num samples: 10 num padding tokens: 1552 - rank: 5 max len: 797 min len: 528 avg len: 641.8 num_loss_counted_tokens: 3059 | |
total tokens: 7472 num samples: 2 num padding tokens: 315 - rank: 0 max len: 3736 min len: 3421 avg len: 3578.5 num_loss_counted_tokens: 178 | |
total tokens: 7635 num samples: 5 num padding tokens: 1225 - rank: 3 max len: 1527 min len: 1071 avg len: 1282.0 num_loss_counted_tokens: 2498 | |
Per-token loss scaled by world size: 0.0005163732566870749Per-token loss scaled by world size: 0.00043581612408161163Per-token loss scaled by world size: 0.0003451558295637369Per-token loss scaled by world size: 0.00011102599819423631Per-token loss scaled by world size: 7.73636857047677e-05 | |
Per-token loss scaled by world size: 1.7906730818140204e-06Per-token loss scaled by world size: 0.00031262030825018883 | |
Epoch: 0, Step: 26, Rank: 4, loss = 1.1685864925384521Epoch: 0, Step: 26, Rank: 2, loss = 0.2977023422718048Epoch: 0, Step: 26, Rank: 3, loss = 0.9254922270774841Epoch: 0, Step: 26, Rank: 0, loss = 0.004801466129720211Epoch: 0, Step: 26, Rank: 6, loss = 1.3845903873443604 | |
Epoch: 0, Step: 26, Rank: 1, loss = 0.207441046833992 | |
Epoch: 0, Step: 26, Rank: 7, loss = 0.8382523059844971 | |
Per-token loss scaled by world size: 0.0004404305072966963 | |
Epoch: 0, Step: 26, Rank: 5, loss = 1.1809593439102173 | |
Epoch 0: 21%|██▏ | 26/121 [01:06<04:02, 2.55s/it] total tokens: 8012 num samples: 4 num padding tokens: 1086 - rank: 1 max len: 2003 min len: 1474 avg len: 1731.5 num_loss_counted_tokens: 2801 | |
total tokens: 7784 num samples: 8 num padding tokens: 708 - rank: 4 max len: 973 min len: 760 avg len: 884.5 num_loss_counted_tokens: 4202 | |
{ | |
"epoch": 0, | |
"step": 26, | |
"rank": 0, | |
"loss": 0.004801466129720211, | |
"overall_throughput": 42.41860146698506, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.102412223815918, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21451, | |
"batch_size": 82, | |
"total_loss": 0.7509781718254089, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:08.547049" | |
} | |
total tokens: 6995 num samples: 5 num padding tokens: 461 - rank: 2 max len: 1399 min len: 1155 avg len: 1306.8 num_loss_counted_tokens: 2776 | |
total tokens: 7920 num samples: 11 num padding tokens: 1235 - rank: 5 max len: 720 min len: 497 avg len: 607.7272727272727 num_loss_counted_tokens: 4165 | |
total tokens: 7553 num samples: 7 num padding tokens: 401 - rank: 3 max len: 1079 min len: 985 avg len: 1021.7142857142857 num_loss_counted_tokens: 5788 | |
total tokens: 7540 num samples: 26 num padding tokens: 2466 - rank: 7 max len: 290 min len: 86 avg len: 195.15384615384616 num_loss_counted_tokens: 2386 | |
total tokens: 7776 num samples: 16 num padding tokens: 1715 - rank: 6 max len: 486 min len: 293 avg len: 378.8125 num_loss_counted_tokens: 3442 | |
total tokens: 8067 num samples: 3 num padding tokens: 761 - rank: 0 max len: 2689 min len: 2058 avg len: 2435.3333333333335 num_loss_counted_tokens: 864 | |
Per-token loss scaled by world size: 0.0004792925319634378Per-token loss scaled by world size: 0.0002461661642882973Per-token loss scaled by world size: 0.0004854958679061383Per-token loss scaled by world size: 3.6859477404505014e-05 | |
Per-token loss scaled by world size: 0.00043515616562217474Per-token loss scaled by world size: 3.782783096539788e-05Per-token loss scaled by world size: 1.9859728126903065e-05 | |
Epoch: 0, Step: 27, Rank: 3, loss = 0.5675053000450134 | |
Epoch: 0, Step: 27, Rank: 5, loss = 1.1192500591278076Epoch: 0, Step: 27, Rank: 0, loss = 0.08497491478919983 | |
Epoch: 0, Step: 27, Rank: 7, loss = 1.1049489974975586 | |
Epoch: 0, Step: 27, Rank: 4, loss = 1.0031981468200684 | |
Epoch: 0, Step: 27, Rank: 2, loss = 0.0457841195166111 | |
Epoch: 0, Step: 27, Rank: 1, loss = 0.08720733970403671 | |
Per-token loss scaled by world size: 0.0007065049139782786 | |
Epoch: 0, Step: 27, Rank: 6, loss = 1.6287587881088257 | |
Epoch 0: 22%|██▏ | 27/121 [01:09<03:58, 2.54s/it] total tokens: 7665 num samples: 3 num padding tokens: 261 - rank: 1 max len: 2555 min len: 2313 avg len: 2468.0 num_loss_counted_tokens: 2129 | |
total tokens: 7147 num samples: 7 num padding tokens: 782 - rank: 4 max len: 1021 min len: 860 avg len: 909.2857142857143 num_loss_counted_tokens: 4214 | |
total tokens: 7680 num samples: 10 num padding tokens: 867 - rank: 5 max len: 768 min len: 585 avg len: 681.3 num_loss_counted_tokens: 4557 | |
{ | |
"epoch": 0, | |
"step": 27, | |
"rank": 0, | |
"loss": 0.08497491478919983, | |
"overall_throughput": 43.23576441221499, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.077077388763428, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18443, | |
"batch_size": 71, | |
"total_loss": 0.7052034735679626, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:11.052914" | |
} | |
total tokens: 5712 num samples: 2 num padding tokens: 241 - rank: 0 max len: 2856 min len: 2615 avg len: 2735.5 num_loss_counted_tokens: 234 | |
total tokens: 6724 num samples: 4 num padding tokens: 898 - rank: 3 max len: 1681 min len: 1052 avg len: 1456.5 num_loss_counted_tokens: 2574 | |
total tokens: 8092 num samples: 14 num padding tokens: 1753 - rank: 6 max len: 578 min len: 324 avg len: 452.7857142857143 num_loss_counted_tokens: 2859 | |
total tokens: 8060 num samples: 4 num padding tokens: 681 - rank: 2 max len: 2015 min len: 1739 avg len: 1844.75 num_loss_counted_tokens: 813 | |
total tokens: 8092 num samples: 28 num padding tokens: 3097 - rank: 7 max len: 289 min len: 81 avg len: 178.39285714285714 num_loss_counted_tokens: 2310 | |
Per-token loss scaled by world size: 0.0003594549198169261Per-token loss scaled by world size: 0.0002452041080687195Per-token loss scaled by world size: 0.0001836109149735421Per-token loss scaled by world size: 0.00030733394669368863Per-token loss scaled by world size: 0.0003232009767089039 | |
Per-token loss scaled by world size: 0.000357407407136634 | |
Per-token loss scaled by world size: 3.657307388493791e-05 | |
Epoch: 0, Step: 28, Rank: 3, loss = 0.7112144827842712Epoch: 0, Step: 28, Rank: 2, loss = 0.5325634479522705 | |
Epoch: 0, Step: 28, Rank: 6, loss = 1.0425989627838135 | |
Epoch: 0, Step: 28, Rank: 4, loss = 0.8914220929145813Epoch: 0, Step: 28, Rank: 5, loss = 1.0366601943969727 | |
Epoch: 0, Step: 28, Rank: 7, loss = 0.9374444484710693 | |
Epoch: 0, Step: 28, Rank: 1, loss = 0.10608020424842834 | |
Per-token loss scaled by world size: 4.0188886487158015e-05 | |
Epoch: 0, Step: 28, Rank: 0, loss = 0.11656786501407623 | |
Epoch 0: 23%|██▎ | 28/121 [01:11<03:56, 2.55s/it] total tokens: 7986 num samples: 11 num padding tokens: 839 - rank: 4 max len: 726 min len: 536 avg len: 649.7272727272727 num_loss_counted_tokens: 4414 | |
total tokens: 7616 num samples: 4 num padding tokens: 1782 - rank: 1 max len: 1904 min len: 1228 avg len: 1458.5 num_loss_counted_tokens: 1665 | |
{ | |
"epoch": 0, | |
"step": 28, | |
"rank": 0, | |
"loss": 0.11656786501407623, | |
"overall_throughput": 42.00576591775989, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.240712642669678, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23204, | |
"batch_size": 91, | |
"total_loss": 0.6718189716339111, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:13.626212" | |
} | |
total tokens: 8040 num samples: 20 num padding tokens: 1891 - rank: 6 max len: 402 min len: 208 avg len: 307.45 num_loss_counted_tokens: 3493 | |
total tokens: 4944 num samples: 24 num padding tokens: 1763 - rank: 7 max len: 206 min len: 77 avg len: 132.54166666666666 num_loss_counted_tokens: 1195 | |
total tokens: 6352 num samples: 2 num padding tokens: 1168 - rank: 0 max len: 3176 min len: 2008 avg len: 2592.0 num_loss_counted_tokens: 1140 | |
total tokens: 7960 num samples: 10 num padding tokens: 318 - rank: 3 max len: 796 min len: 730 avg len: 764.2 num_loss_counted_tokens: 6637 | |
total tokens: 7294 num samples: 7 num padding tokens: 669 - rank: 2 max len: 1042 min len: 832 avg len: 946.4285714285714 num_loss_counted_tokens: 3958 | |
total tokens: 7905 num samples: 15 num padding tokens: 563 - rank: 5 max len: 527 min len: 412 avg len: 489.46666666666664 num_loss_counted_tokens: 4169 | |
Per-token loss scaled by world size: 0.0004329077200964093Per-token loss scaled by world size: 5.7912915508495644e-05Per-token loss scaled by world size: 0.0001508180284872651Per-token loss scaled by world size: 3.933108018827625e-06 | |
Per-token loss scaled by world size: 0.00037304253783077 | |
Per-token loss scaled by world size: 0.0003069050144404173Per-token loss scaled by world size: 0.00022216846991796046 | |
Epoch: 0, Step: 29, Rank: 0, loss = 0.0120353102684021 | |
Epoch: 0, Step: 29, Rank: 5, loss = 1.3246976137161255 | |
Epoch: 0, Step: 29, Rank: 1, loss = 0.17721351981163025Epoch: 0, Step: 29, Rank: 2, loss = 0.46150317788124084 | |
Epoch: 0, Step: 29, Rank: 4, loss = 0.6798354983329773Epoch: 0, Step: 29, Rank: 7, loss = 0.9391293525695801Epoch: 0, Step: 29, Rank: 6, loss = 1.1415101289749146 | |
Per-token loss scaled by world size: 0.00031356202089227736 | |
Epoch: 0, Step: 29, Rank: 3, loss = 0.9594997763633728 | |
Epoch 0: 24%|██▍ | 29/121 [01:14<03:55, 2.55s/it] total tokens: 8085 num samples: 11 num padding tokens: 512 - rank: 4 max len: 735 min len: 626 avg len: 688.4545454545455 num_loss_counted_tokens: 5373 | |
total tokens: 6820 num samples: 4 num padding tokens: 921 - rank: 1 max len: 1705 min len: 1306 avg len: 1474.75 num_loss_counted_tokens: 1453 | |
{ | |
"epoch": 0, | |
"step": 29, | |
"rank": 0, | |
"loss": 0.0120353102684021, | |
"overall_throughput": 42.0762720880897, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.163118362426758, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24480, | |
"batch_size": 81, | |
"total_loss": 0.711928129196167, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:16.196587" | |
} | |
total tokens: 8056 num samples: 8 num padding tokens: 879 - rank: 3 max len: 1007 min len: 755 avg len: 897.125 num_loss_counted_tokens: 5796 | |
total tokens: 8021 num samples: 13 num padding tokens: 880 - rank: 5 max len: 617 min len: 473 avg len: 549.3076923076923 num_loss_counted_tokens: 4711 | |
total tokens: 7992 num samples: 2 num padding tokens: 2114 - rank: 0 max len: 3996 min len: 1882 avg len: 2939.0 num_loss_counted_tokens: 854 | |
total tokens: 7803 num samples: 17 num padding tokens: 1971 - rank: 6 max len: 459 min len: 257 avg len: 343.05882352941177 num_loss_counted_tokens: 3525 | |
total tokens: 8064 num samples: 32 num padding tokens: 3179 - rank: 7 max len: 252 min len: 75 avg len: 152.65625 num_loss_counted_tokens: 2045 | |
total tokens: 7854 num samples: 7 num padding tokens: 337 - rank: 2 max len: 1122 min len: 1010 avg len: 1073.857142857143 num_loss_counted_tokens: 4505 | |
Per-token loss scaled by world size: 0.00014981771528255194Per-token loss scaled by world size: 0.00036861959961242974Per-token loss scaled by world size: 0.00042424429557286203Per-token loss scaled by world size: 6.0521342675201595e-06 | |
Per-token loss scaled by world size: 0.00027959863655269146 | |
Per-token loss scaled by world size: 0.00027874435181729496 | |
Per-token loss scaled by world size: 2.435126134514576e-06 | |
Epoch: 0, Step: 30, Rank: 6, loss = 1.0413503646850586Epoch: 0, Step: 30, Rank: 5, loss = 1.1984901428222656 | |
Epoch: 0, Step: 30, Rank: 2, loss = 0.4232350289821625Epoch: 0, Step: 30, Rank: 0, loss = 0.01709727942943573 | |
Epoch: 0, Step: 30, Rank: 4, loss = 0.7898661494255066 | |
Epoch: 0, Step: 30, Rank: 7, loss = 0.7874528169631958 | |
Epoch: 0, Step: 30, Rank: 1, loss = 0.0068792314268648624 | |
Per-token loss scaled by world size: 0.0004282180452719331 | |
Epoch: 0, Step: 30, Rank: 3, loss = 1.2097159624099731 | |
Epoch 0: 25%|██▍ | 30/121 [01:16<03:52, 2.56s/it] total tokens: 7893 num samples: 9 num padding tokens: 1033 - rank: 4 max len: 877 min len: 680 avg len: 762.2222222222222 num_loss_counted_tokens: 5017 | |
{ | |
"epoch": 0, | |
"step": 30, | |
"rank": 0, | |
"loss": 0.01709727942943573, | |
"overall_throughput": 42.42562330633586, | |
"lr": 0.0, | |
"cuda_mem_allocated": 17.776453971862793, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22600, | |
"batch_size": 88, | |
"total_loss": 0.684260904788971, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:18.756982" | |
} | |
total tokens: 6591 num samples: 3 num padding tokens: 424 - rank: 1 max len: 2197 min len: 1974 avg len: 2055.6666666666665 num_loss_counted_tokens: 2362 | |
total tokens: 7974 num samples: 2 num padding tokens: 1251 - rank: 0 max len: 3987 min len: 2736 avg len: 3361.5 num_loss_counted_tokens: 364 | |
total tokens: 7440 num samples: 6 num padding tokens: 982 - rank: 3 max len: 1240 min len: 953 avg len: 1076.3333333333333 num_loss_counted_tokens: 2782 | |
total tokens: 7192 num samples: 31 num padding tokens: 2694 - rank: 7 max len: 232 min len: 81 avg len: 145.09677419354838 num_loss_counted_tokens: 1761 | |
total tokens: 8056 num samples: 19 num padding tokens: 2186 - rank: 6 max len: 424 min len: 234 avg len: 308.94736842105266 num_loss_counted_tokens: 3218 | |
total tokens: 7920 num samples: 12 num padding tokens: 1246 - rank: 5 max len: 660 min len: 465 avg len: 556.1666666666666 num_loss_counted_tokens: 4600 | |
total tokens: 7096 num samples: 4 num padding tokens: 1017 - rank: 2 max len: 1774 min len: 1287 avg len: 1519.75 num_loss_counted_tokens: 2154 | |
Per-token loss scaled by world size: 0.00010154087067348883Per-token loss scaled by world size: 8.184825674106833e-06Per-token loss scaled by world size: 1.3264020708447788e-06Per-token loss scaled by world size: 0.0004786914505530149 | |
Per-token loss scaled by world size: 0.0006691211019642651 | |
Per-token loss scaled by world size: 0.0006968624657019973Per-token loss scaled by world size: 0.0005032268818467855 | |
Epoch: 0, Step: 31, Rank: 0, loss = 0.01914430782198906 | |
Epoch: 0, Step: 31, Rank: 6, loss = 1.1196593046188354 | |
Epoch: 0, Step: 31, Rank: 1, loss = 0.0031024543568491936 | |
Epoch: 0, Step: 31, Rank: 2, loss = 0.23750409483909607 | |
Epoch: 0, Step: 31, Rank: 4, loss = 1.6299612522125244Epoch: 0, Step: 31, Rank: 5, loss = 1.5650743246078491Epoch: 0, Step: 31, Rank: 7, loss = 1.1770477294921875 | |
Per-token loss scaled by world size: 0.00016472380957566202 | |
Epoch: 0, Step: 31, Rank: 3, loss = 0.3852889835834503 | |
Epoch 0: 26%|██▌ | 31/121 [01:19<03:49, 2.55s/it] total tokens: 5876 num samples: 2 num padding tokens: 1112 - rank: 1 max len: 2938 min len: 1826 avg len: 2382.0 num_loss_counted_tokens: 148 | |
total tokens: 7476 num samples: 7 num padding tokens: 1132 - rank: 4 max len: 1068 min len: 776 avg len: 906.2857142857143 num_loss_counted_tokens: 5713 | |
{ | |
"epoch": 0, | |
"step": 31, | |
"rank": 0, | |
"loss": 0.01914430782198906, | |
"overall_throughput": 42.41999987357855, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.21198320388794, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18712, | |
"batch_size": 86, | |
"total_loss": 0.7670978307723999, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:21.283912" | |
} | |
total tokens: 7170 num samples: 6 num padding tokens: 395 - rank: 3 max len: 1195 min len: 1074 avg len: 1129.1666666666667 num_loss_counted_tokens: 4609 | |
total tokens: 7808 num samples: 16 num padding tokens: 1337 - rank: 6 max len: 488 min len: 315 avg len: 404.4375 num_loss_counted_tokens: 4322 | |
total tokens: 6692 num samples: 4 num padding tokens: 769 - rank: 2 max len: 1673 min len: 1240 avg len: 1480.75 num_loss_counted_tokens: 983 | |
total tokens: 8036 num samples: 28 num padding tokens: 2902 - rank: 7 max len: 287 min len: 83 avg len: 183.35714285714286 num_loss_counted_tokens: 2351 | |
total tokens: 6152 num samples: 2 num padding tokens: 42 - rank: 0 max len: 3076 min len: 3034 avg len: 3055.0 num_loss_counted_tokens: 181 | |
total tokens: 8030 num samples: 11 num padding tokens: 1077 - rank: 5 max len: 730 min len: 530 avg len: 632.0909090909091 num_loss_counted_tokens: 4192 | |
Per-token loss scaled by world size: 0.0005925196455791593Per-token loss scaled by world size: 0.0004467185935936868Per-token loss scaled by world size: 0.0003141453198622912 | |
Per-token loss scaled by world size: 0.0005347821279428899Per-token loss scaled by world size: 7.6175206231710035e-06Per-token loss scaled by world size: 0.0005325234378688037 | |
Per-token loss scaled by world size: 7.270013156812638e-05 | |
Epoch: 0, Step: 32, Rank: 3, loss = 0.6862111687660217 | |
Epoch: 0, Step: 32, Rank: 6, loss = 1.2942850589752197 | |
Epoch: 0, Step: 32, Rank: 4, loss = 0.9758009314537048 | |
Epoch: 0, Step: 32, Rank: 1, loss = 0.016639521345496178Epoch: 0, Step: 32, Rank: 5, loss = 1.1681647300720215 | |
Epoch: 0, Step: 32, Rank: 0, loss = 0.15880435705184937 | |
Epoch: 0, Step: 32, Rank: 7, loss = 1.1632308959960938 | |
Per-token loss scaled by world size: 2.2659537535218988e-06 | |
Epoch: 0, Step: 32, Rank: 2, loss = 0.004949692636728287 | |
Epoch 0: 26%|██▋ | 32/121 [01:22<03:48, 2.57s/it]{ | |
"epoch": 0, | |
"step": 32, | |
"rank": 0, | |
"loss": 0.15880435705184937, | |
"overall_throughput": 41.6872406777795, | |
"lr": 0.0, | |
"cuda_mem_allocated": 17.77738618850708, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 17475, | |
"batch_size": 60, | |
"total_loss": 0.6835108399391174, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:23.890605" | |
} | |
total tokens: 8016 num samples: 4 num padding tokens: 1255 - rank: 1 max len: 2004 min len: 1389 avg len: 1690.25 num_loss_counted_tokens: 3118 | |
total tokens: 7668 num samples: 9 num padding tokens: 475 - rank: 4 max len: 852 min len: 747 avg len: 799.2222222222222 num_loss_counted_tokens: 4806 | |
total tokens: 6662 num samples: 2 num padding tokens: 558 - rank: 0 max len: 3331 min len: 2773 avg len: 3052.0 num_loss_counted_tokens: 194 | |
total tokens: 6006 num samples: 22 num padding tokens: 1923 - rank: 7 max len: 273 min len: 87 avg len: 185.5909090909091 num_loss_counted_tokens: 1883 | |
total tokens: 8000 num samples: 16 num padding tokens: 1855 - rank: 6 max len: 500 min len: 279 avg len: 384.0625 num_loss_counted_tokens: 3217 | |
total tokens: 8043 num samples: 7 num padding tokens: 1088 - rank: 3 max len: 1149 min len: 859 avg len: 993.5714285714286 num_loss_counted_tokens: 2802 | |
total tokens: 7986 num samples: 6 num padding tokens: 566 - rank: 2 max len: 1331 min len: 1187 avg len: 1236.6666666666667 num_loss_counted_tokens: 1663 | |
total tokens: 7920 num samples: 11 num padding tokens: 899 - rank: 5 max len: 720 min len: 534 avg len: 638.2727272727273 num_loss_counted_tokens: 5060 | |
Per-token loss scaled by world size: 0.00021643155196215957Per-token loss scaled by world size: 0.00016316254914272577Per-token loss scaled by world size: 0.0003613443404901773Per-token loss scaled by world size: 0.0002246944495709613Per-token loss scaled by world size: 0.00030351741588674486 | |
Per-token loss scaled by world size: 1.8829136934073176e-06 | |
Per-token loss scaled by world size: 0.0001424902438884601 | |
Epoch: 0, Step: 33, Rank: 6, loss = 0.7259596586227417 | |
Epoch: 0, Step: 33, Rank: 2, loss = 0.6992632746696472 | |
Epoch: 0, Step: 33, Rank: 1, loss = 0.5271577835083008 | |
Epoch: 0, Step: 33, Rank: 5, loss = 1.167458415031433 | |
Epoch: 0, Step: 33, Rank: 0, loss = 0.006083458662033081Epoch: 0, Step: 33, Rank: 4, loss = 0.9806268215179443 | |
Epoch: 0, Step: 33, Rank: 7, loss = 0.46036818623542786 | |
Per-token loss scaled by world size: 0.0003278390795458108 | |
Epoch: 0, Step: 33, Rank: 3, loss = 1.0592070817947388 | |
Epoch 0: 27%|██▋ | 33/121 [01:24<03:44, 2.55s/it] total tokens: 7733 num samples: 11 num padding tokens: 698 - rank: 4 max len: 703 min len: 598 avg len: 639.5454545454545 num_loss_counted_tokens: 4778 | |
total tokens: 6987 num samples: 3 num padding tokens: 912 - rank: 1 max len: 2329 min len: 1634 avg len: 2025.0 num_loss_counted_tokens: 1273 | |
{ | |
"epoch": 0, | |
"step": 33, | |
"rank": 0, | |
"loss": 0.006083458662033081, | |
"overall_throughput": 42.6693002092833, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.08671236038208, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25847, | |
"batch_size": 82, | |
"total_loss": 0.7032655477523804, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:26.403211" | |
} | |
total tokens: 7062 num samples: 6 num padding tokens: 1127 - rank: 2 max len: 1177 min len: 929 avg len: 989.1666666666666 num_loss_counted_tokens: 3984 | |
total tokens: 7810 num samples: 22 num padding tokens: 1750 - rank: 6 max len: 355 min len: 236 avg len: 275.45454545454544 num_loss_counted_tokens: 2894 | |
total tokens: 7312 num samples: 8 num padding tokens: 597 - rank: 3 max len: 914 min len: 774 avg len: 839.375 num_loss_counted_tokens: 4040 | |
total tokens: 7990 num samples: 34 num padding tokens: 2478 - rank: 7 max len: 235 min len: 87 avg len: 162.11764705882354 num_loss_counted_tokens: 1905 | |
total tokens: 7683 num samples: 13 num padding tokens: 1541 - rank: 5 max len: 591 min len: 394 avg len: 472.46153846153845 num_loss_counted_tokens: 3832 | |
total tokens: 6662 num samples: 2 num padding tokens: 451 - rank: 0 max len: 3331 min len: 2880 avg len: 3105.5 num_loss_counted_tokens: 160 | |
Per-token loss scaled by world size: 0.0002979582059197128Per-token loss scaled by world size: 5.8393885410623625e-05Per-token loss scaled by world size: 8.860254183673533e-07Per-token loss scaled by world size: 0.00030089422944001853Per-token loss scaled by world size: 0.0003473999386187643Per-token loss scaled by world size: 0.00043760568951256573 | |
Per-token loss scaled by world size: 0.00025744541198946536 | |
Epoch: 0, Step: 34, Rank: 0, loss = 0.002579330699518323Epoch: 0, Step: 34, Rank: 1, loss = 0.1699918955564499 | |
Epoch: 0, Step: 34, Rank: 2, loss = 0.8673935532569885 | |
Epoch: 0, Step: 34, Rank: 4, loss = 0.87594074010849 | |
Epoch: 0, Step: 34, Rank: 5, loss = 1.2739248275756836Epoch: 0, Step: 34, Rank: 6, loss = 1.0113246440887451 | |
Epoch: 0, Step: 34, Rank: 7, loss = 0.7494558095932007 | |
Per-token loss scaled by world size: 0.00023933911870699376 | |
Epoch: 0, Step: 34, Rank: 3, loss = 0.6967461109161377 | |
Epoch 0: 28%|██▊ | 34/121 [01:27<03:42, 2.56s/it] total tokens: 7905 num samples: 3 num padding tokens: 745 - rank: 1 max len: 2635 min len: 2036 avg len: 2386.6666666666665 num_loss_counted_tokens: 966 | |
total tokens: 7504 num samples: 7 num padding tokens: 972 - rank: 4 max len: 1072 min len: 783 avg len: 933.1428571428571 num_loss_counted_tokens: 3983 | |
{ | |
"epoch": 0, | |
"step": 34, | |
"rank": 0, | |
"loss": 0.002579330699518323, | |
"overall_throughput": 41.96938385940714, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.149734020233154, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23289, | |
"batch_size": 81, | |
"total_loss": 0.7059195637702942, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:28.982671" | |
} | |
total tokens: 7917 num samples: 13 num padding tokens: 2041 - rank: 6 max len: 609 min len: 308 avg len: 452.0 num_loss_counted_tokens: 4345 | |
total tokens: 7310 num samples: 5 num padding tokens: 1026 - rank: 3 max len: 1462 min len: 1079 avg len: 1256.8 num_loss_counted_tokens: 3091 | |
total tokens: 6994 num samples: 26 num padding tokens: 2708 - rank: 7 max len: 269 min len: 78 avg len: 164.84615384615384 num_loss_counted_tokens: 1792 | |
total tokens: 7628 num samples: 4 num padding tokens: 1180 - rank: 2 max len: 1907 min len: 1481 avg len: 1612.0 num_loss_counted_tokens: 3200 | |
total tokens: 6582 num samples: 2 num padding tokens: 537 - rank: 0 max len: 3291 min len: 2754 avg len: 3022.5 num_loss_counted_tokens: 226 | |
total tokens: 7740 num samples: 10 num padding tokens: 701 - rank: 5 max len: 774 min len: 622 avg len: 703.9 num_loss_counted_tokens: 5163 | |
Per-token loss scaled by world size: 0.00038499291986227036Per-token loss scaled by world size: 0.00033336589694954455Per-token loss scaled by world size: 6.31858165434096e-06 | |
Per-token loss scaled by world size: 0.0004092359740752727Per-token loss scaled by world size: 0.000349278881913051 | |
Per-token loss scaled by world size: 9.329826571047306e-05 | |
Per-token loss scaled by world size: 0.000329840142512694 | |
Epoch: 0, Step: 35, Rank: 1, loss = 0.8665429949760437Epoch: 0, Step: 35, Rank: 6, loss = 0.9079067707061768Epoch: 0, Step: 35, Rank: 3, loss = 1.0007410049438477Epoch: 0, Step: 35, Rank: 0, loss = 0.01642436347901821 | |
Epoch: 0, Step: 35, Rank: 4, loss = 1.0637577772140503Epoch: 0, Step: 35, Rank: 2, loss = 0.24251717329025269 | |
Epoch: 0, Step: 35, Rank: 7, loss = 0.8573781847953796 | |
Per-token loss scaled by world size: 0.0002692708803806454 | |
Epoch: 0, Step: 35, Rank: 5, loss = 0.6999359726905823 | |
Epoch 0: 29%|██▉ | 35/121 [01:29<03:38, 2.55s/it] total tokens: 7188 num samples: 4 num padding tokens: 317 - rank: 1 max len: 1797 min len: 1593 avg len: 1717.75 num_loss_counted_tokens: 2604 | |
total tokens: 7240 num samples: 8 num padding tokens: 858 - rank: 4 max len: 905 min len: 728 avg len: 797.75 num_loss_counted_tokens: 5344 | |
{ | |
"epoch": 0, | |
"step": 35, | |
"rank": 0, | |
"loss": 0.01642436347901821, | |
"overall_throughput": 43.07030586294916, | |
"lr": 0.0, | |
"cuda_mem_allocated": 18.10886526107788, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20795, | |
"batch_size": 73, | |
"total_loss": 0.7069005370140076, | |
"gradnorm": null, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:31.496950" | |
} | |
total tokens: 7574 num samples: 7 num padding tokens: 648 - rank: 3 max len: 1082 min len: 913 avg len: 989.4285714285714 num_loss_counted_tokens: 4812 | |
total tokens: 7353 num samples: 3 num padding tokens: 1218 - rank: 0 max len: 2451 min len: 1808 avg len: 2045.0 num_loss_counted_tokens: 890 | |
total tokens: 7225 num samples: 5 num padding tokens: 790 - rank: 2 max len: 1445 min len: 1127 avg len: 1287.0 num_loss_counted_tokens: 2573 | |
total tokens: 7777 num samples: 11 num padding tokens: 857 - rank: 5 max len: 707 min len: 516 avg len: 629.0909090909091 num_loss_counted_tokens: 3335 | |
total tokens: 7672 num samples: 28 num padding tokens: 2867 - rank: 7 max len: 274 min len: 75 avg len: 171.60714285714286 num_loss_counted_tokens: 2189 | |
total tokens: 7650 num samples: 15 num padding tokens: 1519 - rank: 6 max len: 510 min len: 298 avg len: 408.73333333333335 num_loss_counted_tokens: 3925 | |
Per-token loss scaled by world size: 0.00017819351342041045Per-token loss scaled by world size: 0.00010071766882902011Per-token loss scaled by world size: 0.00047568074660375714Per-token loss scaled by world size: 0.00022655159409623593Per-token loss scaled by world size: 0.00010502615623408929Per-token loss scaled by world size: 0.0003213490708731115 | |
Per-token loss scaled by world size: 0.00031152847805060446 | |
Epoch: 0, Step: 36, Rank: 3, loss = 0.6177212595939636 | |
Epoch: 0, Step: 36, Rank: 2, loss = 0.28636693954467773Epoch: 0, Step: 36, Rank: 5, loss = 1.2970030307769775 | |
Epoch: 0, Step: 36, Rank: 4, loss = 0.876198410987854 | |
Epoch: 0, Step: 36, Rank: 0, loss = 0.485866904258728 | |
Epoch: 0, Step: 36, Rank: 1, loss = 0.27461931109428406 | |
Epoch: 0, Step: 36, Rank: 7, loss = 0.8494213223457336 | |
Per-token loss scaled by world size: 0.0004315480182413012 | |
Epoch: 0, Step: 36, Rank: 6, loss = 1.1766695976257324 | |
[2024-08-18 20:49:34,071] [INFO] [logging.py:96:log_dist] [Rank 0] step=1, skipped=0, lr=[8.000000000000001e-07], mom=[(0.9, 0.95)] | |
Epoch 0: 30%|██▉ | 36/121 [01:32<03:38, 2.57s/it] total tokens: 8019 num samples: 11 num padding tokens: 830 - rank: 4 max len: 729 min len: 589 avg len: 653.5454545454545 num_loss_counted_tokens: 4361 | |
total tokens: 7575 num samples: 5 num padding tokens: 525 - rank: 1 max len: 1515 min len: 1255 avg len: 1410.0 num_loss_counted_tokens: 2654 | |
{ | |
"epoch": 0, | |
"step": 36, | |
"rank": 0, | |
"loss": 0.485866904258728, | |
"overall_throughput": 41.05076993788474, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 22.813036918640137, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21813, | |
"batch_size": 94, | |
"total_loss": 0.7329833507537842, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:34.202497" | |
} | |
total tokens: 8078 num samples: 14 num padding tokens: 1294 - rank: 5 max len: 577 min len: 385 avg len: 484.57142857142856 num_loss_counted_tokens: 4682 | |
total tokens: 8085 num samples: 21 num padding tokens: 1544 - rank: 6 max len: 385 min len: 243 avg len: 311.4761904761905 num_loss_counted_tokens: 3795 | |
total tokens: 7680 num samples: 32 num padding tokens: 2281 - rank: 7 max len: 240 min len: 78 avg len: 168.71875 num_loss_counted_tokens: 2460 | |
total tokens: 7004 num samples: 2 num padding tokens: 1514 - rank: 0 max len: 3502 min len: 1988 avg len: 2745.0 num_loss_counted_tokens: 188 | |
total tokens: 7840 num samples: 7 num padding tokens: 725 - rank: 2 max len: 1120 min len: 917 avg len: 1016.4285714285714 num_loss_counted_tokens: 4215 | |
total tokens: 7911 num samples: 9 num padding tokens: 701 - rank: 3 max len: 879 min len: 758 avg len: 801.1111111111111 num_loss_counted_tokens: 5865 | |
Per-token loss scaled by world size: 0.0003182947402819991Per-token loss scaled by world size: 0.00048430776223540306Per-token loss scaled by world size: 0.00047009342233650386Per-token loss scaled by world size: 0.00042154916445724666Per-token loss scaled by world size: 5.139104814588791e-06 | |
Per-token loss scaled by world size: 0.0004850963596254587 | |
Epoch: 0, Step: 37, Rank: 4, loss = 1.2365219593048096Epoch: 0, Step: 37, Rank: 5, loss = 1.2739109992980957 | |
Epoch: 0, Step: 37, Rank: 3, loss = 0.8372344970703125Epoch: 0, Step: 37, Rank: 7, loss = 1.1088323593139648 | |
Epoch: 0, Step: 37, Rank: 0, loss = 0.013517772778868675 | |
Per-token loss scaled by world size: 2.1868495423404966e-06Epoch: 0, Step: 37, Rank: 6, loss = 1.2759853601455688 | |
Per-token loss scaled by world size: 0.00011173654638696462 | |
Epoch: 0, Step: 37, Rank: 1, loss = 0.005752234254032373Epoch: 0, Step: 37, Rank: 2, loss = 0.2939090132713318 | |
Epoch 0: 31%|███ | 37/121 [01:34<03:36, 2.58s/it] total tokens: 8060 num samples: 10 num padding tokens: 654 - rank: 4 max len: 806 min len: 663 avg len: 740.6 num_loss_counted_tokens: 4953 | |
total tokens: 7920 num samples: 4 num padding tokens: 1463 - rank: 1 max len: 1980 min len: 1324 avg len: 1614.25 num_loss_counted_tokens: 2571 | |
{ | |
"epoch": 0, | |
"step": 37, | |
"rank": 0, | |
"loss": 0.013517772778868675, | |
"overall_throughput": 42.263056677916275, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.268142223358154, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21043, | |
"batch_size": 79, | |
"total_loss": 0.7557079792022705, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:36.725200" | |
} | |
total tokens: 7872 num samples: 12 num padding tokens: 1903 - rank: 5 max len: 656 min len: 391 avg len: 497.4166666666667 num_loss_counted_tokens: 3985 | |
total tokens: 7497 num samples: 7 num padding tokens: 1399 - rank: 3 max len: 1071 min len: 808 avg len: 871.1428571428571 num_loss_counted_tokens: 4584 | |
total tokens: 7800 num samples: 20 num padding tokens: 2069 - rank: 6 max len: 390 min len: 221 avg len: 286.55 num_loss_counted_tokens: 3264 | |
total tokens: 7914 num samples: 6 num padding tokens: 519 - rank: 2 max len: 1319 min len: 1143 avg len: 1232.5 num_loss_counted_tokens: 2296 | |
total tokens: 6765 num samples: 3 num padding tokens: 359 - rank: 0 max len: 2255 min len: 2010 avg len: 2135.3333333333335 num_loss_counted_tokens: 396 | |
total tokens: 4796 num samples: 22 num padding tokens: 1785 - rank: 7 max len: 218 min len: 77 avg len: 136.86363636363637 num_loss_counted_tokens: 1137 | |
Per-token loss scaled by world size: 0.00023003398382570595Per-token loss scaled by world size: 0.00023557165695820004Per-token loss scaled by world size: 0.0003318150993436575Per-token loss scaled by world size: 3.583161378628574e-05Per-token loss scaled by world size: 0.00029603790608234704 | |
Per-token loss scaled by world size: 0.00038251461228355765 | |
Per-token loss scaled by world size: 0.00028330745408311486 | |
Epoch: 0, Step: 38, Rank: 6, loss = 0.7295815348625183 | |
Epoch: 0, Step: 38, Rank: 0, loss = 0.11364444345235825 | |
Epoch: 0, Step: 38, Rank: 7, loss = 0.7471449375152588 | |
Epoch: 0, Step: 38, Rank: 1, loss = 1.0523930788040161 | |
Epoch: 0, Step: 38, Rank: 4, loss = 0.8985450267791748 | |
Epoch: 0, Step: 38, Rank: 3, loss = 0.9389212131500244Epoch: 0, Step: 38, Rank: 5, loss = 1.2131929397583008 | |
Per-token loss scaled by world size: 0.00017416744958609343 | |
Epoch: 0, Step: 38, Rank: 2, loss = 0.5523938536643982 | |
Epoch 0: 31%|███▏ | 38/121 [01:37<03:32, 2.56s/it] total tokens: 8055 num samples: 9 num padding tokens: 294 - rank: 4 max len: 895 min len: 809 avg len: 862.3333333333334 num_loss_counted_tokens: 5164 | |
total tokens: 8007 num samples: 3 num padding tokens: 615 - rank: 1 max len: 2669 min len: 2320 avg len: 2464.0 num_loss_counted_tokens: 299 | |
total tokens: 7556 num samples: 4 num padding tokens: 692 - rank: 2 max len: 1889 min len: 1486 avg len: 1716.0 num_loss_counted_tokens: 3448 | |
total tokens: 7990 num samples: 10 num padding tokens: 1519 - rank: 5 max len: 799 min len: 505 avg len: 647.1 num_loss_counted_tokens: 3664 | |
{ | |
"epoch": 0, | |
"step": 38, | |
"rank": 0, | |
"loss": 0.11364444345235825, | |
"overall_throughput": 42.686060661097876, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.416475296020508, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25373, | |
"batch_size": 90, | |
"total_loss": 0.7807271480560303, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:39.251905" | |
} | |
total tokens: 8064 num samples: 16 num padding tokens: 1612 - rank: 6 max len: 504 min len: 282 avg len: 403.25 num_loss_counted_tokens: 3518 | |
total tokens: 7266 num samples: 6 num padding tokens: 1143 - rank: 3 max len: 1211 min len: 903 avg len: 1020.5 num_loss_counted_tokens: 5513 | |
total tokens: 6204 num samples: 22 num padding tokens: 2569 - rank: 7 max len: 282 min len: 76 avg len: 165.22727272727272 num_loss_counted_tokens: 1531 | |
total tokens: 7148 num samples: 2 num padding tokens: 587 - rank: 0 max len: 3574 min len: 2987 avg len: 3280.5 num_loss_counted_tokens: 224 | |
Per-token loss scaled by world size: 0.000557654129806906Per-token loss scaled by world size: 0.0007022880017757416Per-token loss scaled by world size: 0.0008294832659885287Per-token loss scaled by world size: 0.00019126593542750925Per-token loss scaled by world size: 0.00040766337770037353 | |
Per-token loss scaled by world size: 6.8775539148191456e-06 | |
Per-token loss scaled by world size: 7.391309281956637e-06 | |
Epoch: 0, Step: 39, Rank: 6, loss = 1.4909573793411255 | |
Epoch: 0, Step: 39, Rank: 0, loss = 0.014601047150790691Epoch: 0, Step: 39, Rank: 5, loss = 1.7609930038452148 | |
Epoch: 0, Step: 39, Rank: 7, loss = 1.1838997602462769 | |
Epoch: 0, Step: 39, Rank: 3, loss = 0.40605756640434265 | |
Epoch: 0, Step: 39, Rank: 4, loss = 0.8654693365097046 | |
Epoch: 0, Step: 39, Rank: 1, loss = 0.01569174975156784 | |
Per-token loss scaled by world size: 8.316225284943357e-05 | |
Epoch: 0, Step: 39, Rank: 2, loss = 0.17655345797538757 | |
Epoch 0: 32%|███▏ | 39/121 [01:39<03:29, 2.55s/it] total tokens: 7136 num samples: 4 num padding tokens: 383 - rank: 1 max len: 1784 min len: 1652 avg len: 1688.25 num_loss_counted_tokens: 2347 | |
total tokens: 7360 num samples: 8 num padding tokens: 773 - rank: 4 max len: 920 min len: 750 avg len: 823.375 num_loss_counted_tokens: 5270 | |
{ | |
"epoch": 0, | |
"step": 39, | |
"rank": 0, | |
"loss": 0.014601047150790691, | |
"overall_throughput": 42.970882858906755, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.469857692718506, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 16984, | |
"batch_size": 76, | |
"total_loss": 0.7392778992652893, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:41.778226" | |
} | |
total tokens: 7925 num samples: 5 num padding tokens: 458 - rank: 2 max len: 1585 min len: 1370 avg len: 1493.4 num_loss_counted_tokens: 2405 | |
total tokens: 8022 num samples: 14 num padding tokens: 2297 - rank: 6 max len: 573 min len: 288 avg len: 408.92857142857144 num_loss_counted_tokens: 2953 | |
total tokens: 7733 num samples: 11 num padding tokens: 641 - rank: 5 max len: 703 min len: 579 avg len: 644.7272727272727 num_loss_counted_tokens: 5059 | |
total tokens: 7326 num samples: 6 num padding tokens: 857 - rank: 3 max len: 1221 min len: 972 avg len: 1078.1666666666667 num_loss_counted_tokens: 4306 | |
total tokens: 5446 num samples: 2 num padding tokens: 263 - rank: 0 max len: 2723 min len: 2460 avg len: 2591.5 num_loss_counted_tokens: 241 | |
total tokens: 8100 num samples: 30 num padding tokens: 2620 - rank: 7 max len: 270 min len: 87 avg len: 182.66666666666666 num_loss_counted_tokens: 2660 | |
Per-token loss scaled by world size: 0.0004144566773902625Per-token loss scaled by world size: 0.0004883252549916506Per-token loss scaled by world size: 0.00025119862402789295Per-token loss scaled by world size: 0.0002532459329813719Per-token loss scaled by world size: 0.0001632209459785372Per-token loss scaled by world size: 6.004169335938059e-06 | |
Per-token loss scaled by world size: 2.448785608066828e-06 | |
Epoch: 0, Step: 40, Rank: 0, loss = 0.017504405230283737Epoch: 0, Step: 40, Rank: 5, loss = 1.4236512184143066 | |
Epoch: 0, Step: 40, Rank: 7, loss = 0.7323381900787354Epoch: 0, Step: 40, Rank: 2, loss = 0.47585028409957886Epoch: 0, Step: 40, Rank: 4, loss = 0.7383068799972534 | |
Epoch: 0, Step: 40, Rank: 6, loss = 1.2082966566085815 | |
Epoch: 0, Step: 40, Rank: 1, loss = 0.007139128167182207 | |
Per-token loss scaled by world size: 0.00020900421077385545 | |
Epoch: 0, Step: 40, Rank: 3, loss = 0.609325647354126 | |
Epoch 0: 33%|███▎ | 40/121 [01:42<03:25, 2.54s/it] total tokens: 5720 num samples: 2 num padding tokens: 100 - rank: 1 max len: 2860 min len: 2760 avg len: 2810.0 num_loss_counted_tokens: 172 | |
total tokens: 7550 num samples: 10 num padding tokens: 676 - rank: 4 max len: 755 min len: 637 avg len: 687.4 num_loss_counted_tokens: 3236 | |
{ | |
"epoch": 0, | |
"step": 40, | |
"rank": 0, | |
"loss": 0.017504405230283737, | |
"overall_throughput": 43.000670432034845, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.411304473876953, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23323, | |
"batch_size": 71, | |
"total_loss": 0.6515514850616455, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:44.295336" | |
} | |
total tokens: 7617 num samples: 3 num padding tokens: 1777 - rank: 2 max len: 2539 min len: 1345 avg len: 1946.6666666666667 num_loss_counted_tokens: 824 | |
total tokens: 7627 num samples: 29 num padding tokens: 2077 - rank: 7 max len: 263 min len: 91 avg len: 191.3793103448276 num_loss_counted_tokens: 2397 | |
total tokens: 8024 num samples: 8 num padding tokens: 1091 - rank: 3 max len: 1003 min len: 759 avg len: 866.625 num_loss_counted_tokens: 3193 | |
total tokens: 7800 num samples: 20 num padding tokens: 1486 - rank: 6 max len: 390 min len: 264 avg len: 315.7 num_loss_counted_tokens: 3339 | |
total tokens: 6426 num samples: 2 num padding tokens: 109 - rank: 0 max len: 3213 min len: 3104 avg len: 3158.5 num_loss_counted_tokens: 160 | |
total tokens: 8099 num samples: 13 num padding tokens: 1516 - rank: 5 max len: 623 min len: 406 avg len: 506.38461538461536 num_loss_counted_tokens: 4165 | |
Per-token loss scaled by world size: 0.00030310056172311306Per-token loss scaled by world size: 0.0002651209069881588Per-token loss scaled by world size: 0.00026846988475881517 | |
Per-token loss scaled by world size: 0.00022448382514994591Per-token loss scaled by world size: 2.106957708747359e-06Per-token loss scaled by world size: 0.0002726210805121809 | |
Per-token loss scaled by world size: 9.102401236305013e-05 | |
Epoch: 0, Step: 41, Rank: 2, loss = 0.8657191395759583 | |
Epoch: 0, Step: 41, Rank: 6, loss = 0.876654863357544Epoch: 0, Step: 41, Rank: 5, loss = 0.9897370338439941 | |
Epoch: 0, Step: 41, Rank: 4, loss = 0.7330238819122314 | |
Epoch: 0, Step: 41, Rank: 7, loss = 0.8902100324630737Epoch: 0, Step: 41, Rank: 0, loss = 0.006880007218569517 | |
Epoch: 0, Step: 41, Rank: 1, loss = 0.29722753167152405 | |
Per-token loss scaled by world size: 0.00029196811374276876 | |
Epoch: 0, Step: 41, Rank: 3, loss = 0.9533854126930237 | |
Epoch 0: 34%|███▍ | 41/121 [01:45<03:23, 2.55s/it] total tokens: 7158 num samples: 3 num padding tokens: 546 - rank: 1 max len: 2386 min len: 2074 avg len: 2204.0 num_loss_counted_tokens: 274 | |
total tokens: 8024 num samples: 8 num padding tokens: 1137 - rank: 4 max len: 1003 min len: 719 avg len: 860.875 num_loss_counted_tokens: 6152 | |
{ | |
"epoch": 0, | |
"step": 41, | |
"rank": 0, | |
"loss": 0.006880007218569517, | |
"overall_throughput": 42.297091054292345, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.25260829925537, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26123, | |
"batch_size": 88, | |
"total_loss": 0.7016047239303589, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:46.855893" | |
} | |
total tokens: 6468 num samples: 22 num padding tokens: 2174 - rank: 7 max len: 294 min len: 86 avg len: 195.1818181818182 num_loss_counted_tokens: 2218 | |
total tokens: 8112 num samples: 4 num padding tokens: 1504 - rank: 2 max len: 2028 min len: 1392 avg len: 1652.0 num_loss_counted_tokens: 1907 | |
total tokens: 7936 num samples: 16 num padding tokens: 1181 - rank: 6 max len: 496 min len: 317 avg len: 422.1875 num_loss_counted_tokens: 3813 | |
total tokens: 5644 num samples: 2 num padding tokens: 18 - rank: 0 max len: 2822 min len: 2804 avg len: 2813.0 num_loss_counted_tokens: 165 | |
total tokens: 7667 num samples: 11 num padding tokens: 1375 - rank: 5 max len: 697 min len: 508 avg len: 572.0 num_loss_counted_tokens: 4263 | |
total tokens: 6950 num samples: 5 num padding tokens: 1094 - rank: 3 max len: 1390 min len: 1006 avg len: 1171.2 num_loss_counted_tokens: 2509 | |
Per-token loss scaled by world size: 0.0001050201608450152Per-token loss scaled by world size: 0.00023053436598274857Per-token loss scaled by world size: 0.0009069386287592351Per-token loss scaled by world size: 0.0004071406729053706 | |
Per-token loss scaled by world size: 0.0005334314191713929Per-token loss scaled by world size: 4.823124982067384e-06 | |
Per-token loss scaled by world size: 7.580199599033222e-05 | |
Epoch: 0, Step: 42, Rank: 3, loss = 0.4843238890171051Epoch: 0, Step: 42, Rank: 6, loss = 1.9053646326065063 | |
Epoch: 0, Step: 42, Rank: 0, loss = 0.010132783092558384Epoch: 0, Step: 42, Rank: 4, loss = 0.8553516864776611Epoch: 0, Step: 42, Rank: 2, loss = 0.22063423693180084 | |
Epoch: 0, Step: 42, Rank: 7, loss = 1.1206727027893066 | |
Epoch: 0, Step: 42, Rank: 1, loss = 0.15925051271915436 | |
Per-token loss scaled by world size: 0.0006252930616028607 | |
Epoch: 0, Step: 42, Rank: 5, loss = 1.3136625289916992 | |
Epoch 0: 35%|███▍ | 42/121 [01:47<03:20, 2.53s/it] total tokens: 8063 num samples: 11 num padding tokens: 996 - rank: 4 max len: 733 min len: 561 avg len: 642.4545454545455 num_loss_counted_tokens: 4498 | |
total tokens: 6616 num samples: 4 num padding tokens: 1100 - rank: 1 max len: 1654 min len: 1144 avg len: 1379.0 num_loss_counted_tokens: 966 | |
{ | |
"epoch": 0, | |
"step": 42, | |
"rank": 0, | |
"loss": 0.010132783092558384, | |
"overall_throughput": 43.158935099078796, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.46029806137085, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 16807, | |
"batch_size": 70, | |
"total_loss": 0.758674144744873, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:49.361713" | |
} | |
total tokens: 7756 num samples: 14 num padding tokens: 706 - rank: 5 max len: 554 min len: 470 avg len: 503.57142857142856 num_loss_counted_tokens: 4826 | |
total tokens: 7803 num samples: 17 num padding tokens: 1773 - rank: 6 max len: 459 min len: 256 avg len: 354.70588235294116 num_loss_counted_tokens: 2701 | |
total tokens: 7983 num samples: 9 num padding tokens: 500 - rank: 3 max len: 887 min len: 781 avg len: 831.4444444444445 num_loss_counted_tokens: 4578 | |
total tokens: 7525 num samples: 7 num padding tokens: 786 - rank: 2 max len: 1075 min len: 894 avg len: 962.7142857142857 num_loss_counted_tokens: 3223 | |
total tokens: 6855 num samples: 3 num padding tokens: 919 - rank: 0 max len: 2285 min len: 1795 avg len: 1978.6666666666667 num_loss_counted_tokens: 310 | |
total tokens: 7808 num samples: 32 num padding tokens: 2971 - rank: 7 max len: 244 min len: 79 avg len: 151.15625 num_loss_counted_tokens: 1950 | |
Per-token loss scaled by world size: 0.00014639626897405833Per-token loss scaled by world size: 0.00042091766954399645Per-token loss scaled by world size: 0.00011771616118494421Per-token loss scaled by world size: 0.00020373229926917702Per-token loss scaled by world size: 0.00029096510843373835Per-token loss scaled by world size: 0.0002795852196868509 | |
Per-token loss scaled by world size: 0.0003187089751008898 | |
Epoch: 0, Step: 43, Rank: 5, loss = 1.3902910947799683 | |
Epoch: 0, Step: 43, Rank: 4, loss = 0.6729277968406677 | |
Epoch: 0, Step: 43, Rank: 2, loss = 0.3888164758682251Epoch: 0, Step: 43, Rank: 1, loss = 0.4835468530654907 | |
Epoch: 0, Step: 43, Rank: 3, loss = 1.0526957511901855 | |
Epoch: 0, Step: 43, Rank: 7, loss = 0.9234700202941895 | |
Epoch: 0, Step: 43, Rank: 6, loss = 0.9610577821731567 | |
Per-token loss scaled by world size: 1.8369590179645456e-05 | |
Epoch: 0, Step: 43, Rank: 0, loss = 0.0606747567653656 | |
Epoch 0: 36%|███▌ | 43/121 [01:50<03:19, 2.56s/it] total tokens: 7484 num samples: 4 num padding tokens: 327 - rank: 1 max len: 1871 min len: 1693 avg len: 1789.25 num_loss_counted_tokens: 3106 | |
total tokens: 8050 num samples: 10 num padding tokens: 630 - rank: 4 max len: 805 min len: 673 avg len: 742.0 num_loss_counted_tokens: 6379 | |
total tokens: 7696 num samples: 8 num padding tokens: 569 - rank: 3 max len: 962 min len: 816 avg len: 890.875 num_loss_counted_tokens: 5177{ | |
"epoch": 0, | |
"step": 43, | |
"rank": 0, | |
"loss": 0.0606747567653656, | |
"overall_throughput": 41.53024928853794, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.530761241912842, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26424, | |
"batch_size": 80, | |
"total_loss": 0.7416850328445435, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:51.968138" | |
} | |
total tokens: 7056 num samples: 28 num padding tokens: 3094 - rank: 7 max len: 252 min len: 75 avg len: 141.5 num_loss_counted_tokens: 1402 | |
total tokens: 8086 num samples: 13 num padding tokens: 1584 - rank: 5 max len: 622 min len: 401 avg len: 500.15384615384613 num_loss_counted_tokens: 4091 | |
total tokens: 5748 num samples: 2 num padding tokens: 405 - rank: 0 max len: 2874 min len: 2469 avg len: 2671.5 num_loss_counted_tokens: 176 | |
total tokens: 7158 num samples: 6 num padding tokens: 886 - rank: 2 max len: 1193 min len: 985 avg len: 1045.3333333333333 num_loss_counted_tokens: 2751 | |
total tokens: 7780 num samples: 20 num padding tokens: 1751 - rank: 6 max len: 389 min len: 257 avg len: 301.45 num_loss_counted_tokens: 3387 | |
Per-token loss scaled by world size: 0.0004709336790256202Per-token loss scaled by world size: 0.00041861337376758456Per-token loss scaled by world size: 0.00045743229566141963Per-token loss scaled by world size: 0.00041176279773935676Per-token loss scaled by world size: 1.3287355614011176e-05Per-token loss scaled by world size: 0.00043119132169522345Per-token loss scaled by world size: 0.0003093344275839627 | |
Epoch: 0, Step: 44, Rank: 5, loss = 1.1258552074432373 | |
Epoch: 0, Step: 44, Rank: 1, loss = 1.030312180519104Epoch: 0, Step: 44, Rank: 7, loss = 1.1590855121612549 | |
Epoch: 0, Step: 44, Rank: 0, loss = 0.03270350396633148 | |
Epoch: 0, Step: 44, Rank: 4, loss = 1.0134512186050415Epoch: 0, Step: 44, Rank: 6, loss = 1.0612696409225464 | |
Epoch: 0, Step: 44, Rank: 3, loss = 0.7613493800163269 | |
Per-token loss scaled by world size: 6.520144233945757e-05 | |
Epoch: 0, Step: 44, Rank: 2, loss = 0.16047704219818115 | |
Epoch 0: 36%|███▋ | 44/121 [01:52<03:17, 2.57s/it] total tokens: 5546 num samples: 2 num padding tokens: 467 - rank: 1 max len: 2773 min len: 2306 avg len: 2539.5 num_loss_counted_tokens: 136 | |
total tokens: 7947 num samples: 9 num padding tokens: 1026 - rank: 4 max len: 883 min len: 694 avg len: 769.0 num_loss_counted_tokens: 4965 | |
{ | |
"epoch": 0, | |
"step": 44, | |
"rank": 0, | |
"loss": 0.03270350396633148, | |
"overall_throughput": 41.61664148118311, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.250525951385498, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19690, | |
"batch_size": 72, | |
"total_loss": 0.7930629253387451, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:54.572590" | |
} | |
total tokens: 7632 num samples: 6 num padding tokens: 867 - rank: 3 max len: 1272 min len: 946 avg len: 1127.5 num_loss_counted_tokens: 1800 | |
total tokens: 6420 num samples: 30 num padding tokens: 2314 - rank: 7 max len: 214 min len: 73 avg len: 136.86666666666667 num_loss_counted_tokens: 1458 | |
total tokens: 6984 num samples: 4 num padding tokens: 806 - rank: 2 max len: 1746 min len: 1334 avg len: 1544.5 num_loss_counted_tokens: 1942 | |
total tokens: 6094 num samples: 2 num padding tokens: 150 - rank: 0 max len: 3047 min len: 2897 avg len: 2972.0 num_loss_counted_tokens: 493 | |
total tokens: 7900 num samples: 20 num padding tokens: 1997 - rank: 6 max len: 395 min len: 231 avg len: 295.15 num_loss_counted_tokens: 3387 | |
total tokens: 8052 num samples: 12 num padding tokens: 956 - rank: 5 max len: 671 min len: 482 avg len: 591.3333333333334 num_loss_counted_tokens: 5097 | |
Per-token loss scaled by world size: 0.0003629309358075261Per-token loss scaled by world size: 0.00019072710711043328Per-token loss scaled by world size: 0.0003041441086679697Per-token loss scaled by world size: 0.00040468695806339383Per-token loss scaled by world size: 5.418559885583818e-05Per-token loss scaled by world size: 7.64827273087576e-05 | |
Per-token loss scaled by world size: 0.00011607163469307125 | |
Epoch: 0, Step: 45, Rank: 6, loss = 1.2099664211273193 | |
Epoch: 0, Step: 45, Rank: 2, loss = 0.6358603239059448 | |
Epoch: 0, Step: 45, Rank: 0, loss = 0.2549838423728943Epoch: 0, Step: 45, Rank: 5, loss = 1.3491756916046143 | |
Epoch: 0, Step: 45, Rank: 4, loss = 1.0139784812927246Epoch: 0, Step: 45, Rank: 1, loss = 0.18064801394939423 | |
Per-token loss scaled by world size: 0.00039161398308351636 | |
Epoch: 0, Step: 45, Rank: 7, loss = 0.38696831464767456 | |
Epoch: 0, Step: 45, Rank: 3, loss = 1.3055920600891113 | |
Epoch 0: 37%|███▋ | 45/121 [01:55<03:14, 2.57s/it] total tokens: 7215 num samples: 3 num padding tokens: 642 - rank: 1 max len: 2405 min len: 2050 avg len: 2191.0 num_loss_counted_tokens: 335 | |
total tokens: 8046 num samples: 9 num padding tokens: 1094 - rank: 4 max len: 894 min len: 726 avg len: 772.4444444444445 num_loss_counted_tokens: 5000 | |
total tokens: 4725 num samples: 25 num padding tokens: 1257 - rank: 7 max len: 189 min len: 75 avg len: 138.72 num_loss_counted_tokens: 1367 | |
{ | |
"epoch": 0, | |
"step": 45, | |
"rank": 0, | |
"loss": 0.2549838423728943, | |
"overall_throughput": 42.42738944609394, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.32686471939087, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26671, | |
"batch_size": 93, | |
"total_loss": 0.792146623134613, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:57.124489" | |
} | |
total tokens: 7820 num samples: 17 num padding tokens: 2391 - rank: 6 max len: 460 min len: 196 avg len: 319.3529411764706 num_loss_counted_tokens: 2893 | |
total tokens: 7320 num samples: 6 num padding tokens: 832 - rank: 3 max len: 1220 min len: 937 avg len: 1081.3333333333333 num_loss_counted_tokens: 4277 | |
total tokens: 7832 num samples: 11 num padding tokens: 1541 - rank: 5 max len: 712 min len: 471 avg len: 571.9090909090909 num_loss_counted_tokens: 4862 | |
total tokens: 6974 num samples: 2 num padding tokens: 808 - rank: 0 max len: 3487 min len: 2679 avg len: 3083.0 num_loss_counted_tokens: 194 | |
total tokens: 6796 num samples: 4 num padding tokens: 645 - rank: 2 max len: 1699 min len: 1367 avg len: 1537.75 num_loss_counted_tokens: 2448 | |
Per-token loss scaled by world size: 0.00021682196529582143Per-token loss scaled by world size: 0.00032285196357406676Per-token loss scaled by world size: 0.00028426622156985104Per-token loss scaled by world size: 0.000325443601468578Per-token loss scaled by world size: 0.00019496992172207683Per-token loss scaled by world size: 0.0003603589429985732 | |
Per-token loss scaled by world size: 9.507144568488002e-05 | |
Epoch: 0, Step: 46, Rank: 5, loss = 1.2730580568313599Epoch: 0, Step: 46, Rank: 6, loss = 1.0042414665222168Epoch: 0, Step: 46, Rank: 3, loss = 0.7659777998924255 | |
Epoch: 0, Step: 46, Rank: 4, loss = 1.1497108936309814Epoch: 0, Step: 46, Rank: 2, loss = 1.1405552625656128Epoch: 0, Step: 46, Rank: 7, loss = 0.6887800097465515 | |
Epoch: 0, Step: 46, Rank: 1, loss = 0.3358636498451233 | |
Per-token loss scaled by world size: 4.201751289656386e-05 | |
Epoch: 0, Step: 46, Rank: 0, loss = 0.14843736588954926 | |
Epoch 0: 38%|███▊ | 46/121 [01:57<03:12, 2.57s/it] total tokens: 6948 num samples: 4 num padding tokens: 662 - rank: 1 max len: 1737 min len: 1416 avg len: 1571.5 num_loss_counted_tokens: 3855 | |
total tokens: 7308 num samples: 9 num padding tokens: 773 - rank: 4 max len: 812 min len: 625 avg len: 726.1111111111111 num_loss_counted_tokens: 4386 | |
{ | |
"epoch": 0, | |
"step": 46, | |
"rank": 0, | |
"loss": 0.14843736588954926, | |
"overall_throughput": 42.16837318958055, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.52260398864746, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 28262, | |
"batch_size": 94, | |
"total_loss": 0.8133281469345093, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:49:59.691426" | |
} | |
total tokens: 8021 num samples: 13 num padding tokens: 1587 - rank: 5 max len: 617 min len: 431 avg len: 494.9230769230769 num_loss_counted_tokens: 4488 | |
total tokens: 7343 num samples: 7 num padding tokens: 562 - rank: 3 max len: 1049 min len: 905 avg len: 968.7142857142857 num_loss_counted_tokens: 4549 | |
total tokens: 7704 num samples: 18 num padding tokens: 1519 - rank: 6 max len: 428 min len: 278 avg len: 343.6111111111111 num_loss_counted_tokens: 3313 | |
total tokens: 8004 num samples: 29 num padding tokens: 3202 - rank: 7 max len: 276 min len: 86 avg len: 165.58620689655172 num_loss_counted_tokens: 1941 | |
total tokens: 8016 num samples: 6 num padding tokens: 940 - rank: 2 max len: 1336 min len: 1075 avg len: 1179.3333333333333 num_loss_counted_tokens: 3672 | |
total tokens: 7050 num samples: 3 num padding tokens: 713 - rank: 0 max len: 2350 min len: 1747 avg len: 2112.3333333333335 num_loss_counted_tokens: 1822 | |
Per-token loss scaled by world size: 0.00035897750058211386Per-token loss scaled by world size: 0.00042433346970938146Per-token loss scaled by world size: 0.00017432670574635267Per-token loss scaled by world size: 0.0005883869016543031Per-token loss scaled by world size: 0.00017508945893496275 | |
Per-token loss scaled by world size: 0.00023119074467103928 | |
Per-token loss scaled by world size: 0.0002113927184836939 | |
Epoch: 0, Step: 47, Rank: 5, loss = 1.6370394229888916 | |
Epoch: 0, Step: 47, Rank: 6, loss = 1.1806018352508545Epoch: 0, Step: 47, Rank: 1, loss = 0.4850204885005951 | |
Epoch: 0, Step: 47, Rank: 4, loss = 0.9987651109695435 | |
Epoch: 0, Step: 47, Rank: 7, loss = 0.6432304382324219Epoch: 0, Step: 47, Rank: 2, loss = 0.4871426522731781 | |
Epoch: 0, Step: 47, Rank: 3, loss = 0.5881474018096924 | |
Per-token loss scaled by world size: 3.264834595029242e-05 | |
Epoch: 0, Step: 47, Rank: 0, loss = 0.09083586186170578 | |
Epoch 0: 39%|███▉ | 47/121 [02:00<03:10, 2.58s/it] total tokens: 5918 num samples: 2 num padding tokens: 291 - rank: 1 max len: 2959 min len: 2668 avg len: 2813.5 num_loss_counted_tokens: 895 | |
total tokens: 5478 num samples: 22 num padding tokens: 2168 - rank: 7 max len: 249 min len: 85 avg len: 150.45454545454547 num_loss_counted_tokens: 1254 | |
total tokens: 7456 num samples: 8 num padding tokens: 650 - rank: 4 max len: 932 min len: 776 avg len: 850.75 num_loss_counted_tokens: 5212 | |
total tokens: 7815 num samples: 15 num padding tokens: 1566 - rank: 6 max len: 521 min len: 301 avg len: 416.6 num_loss_counted_tokens: 3998 | |
{ | |
"epoch": 0, | |
"step": 47, | |
"rank": 0, | |
"loss": 0.09083586186170578, | |
"overall_throughput": 41.45521968839562, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.520647048950195, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22258, | |
"batch_size": 86, | |
"total_loss": 0.7638478875160217, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:02.300320" | |
} | |
total tokens: 7480 num samples: 4 num padding tokens: 1532 - rank: 2 max len: 1870 min len: 1277 avg len: 1487.0 num_loss_counted_tokens: 1969 | |
total tokens: 6978 num samples: 6 num padding tokens: 608 - rank: 3 max len: 1163 min len: 946 avg len: 1061.6666666666667 num_loss_counted_tokens: 4439 | |
total tokens: 7890 num samples: 2 num padding tokens: 381 - rank: 0 max len: 3945 min len: 3564 avg len: 3754.5 num_loss_counted_tokens: 441 | |
total tokens: 8019 num samples: 11 num padding tokens: 950 - rank: 5 max len: 729 min len: 528 avg len: 642.6363636363636 num_loss_counted_tokens: 5004 | |
Per-token loss scaled by world size: 0.0004503819509409368Per-token loss scaled by world size: 0.0003560640325304121Per-token loss scaled by world size: 0.0002961498685181141Per-token loss scaled by world size: 0.0003928189689759165 | |
Per-token loss scaled by world size: 6.809273327235132e-05Per-token loss scaled by world size: 3.832683887594612e-06Per-token loss scaled by world size: 5.2919685913366266e-06 | |
Epoch: 0, Step: 48, Rank: 7, loss = 1.0013855695724487Epoch: 0, Step: 48, Rank: 3, loss = 0.8328844904899597 | |
Epoch: 0, Step: 48, Rank: 6, loss = 1.2666429281234741 | |
Epoch: 0, Step: 48, Rank: 4, loss = 1.1047542095184326 | |
Epoch: 0, Step: 48, Rank: 0, loss = 0.010778944939374924Epoch: 0, Step: 48, Rank: 2, loss = 0.19150230288505554Epoch: 0, Step: 48, Rank: 1, loss = 0.014883000403642654 | |
Per-token loss scaled by world size: 0.00040955503936856985 | |
Epoch: 0, Step: 48, Rank: 5, loss = 1.1518223285675049 | |
Epoch 0: 40%|███▉ | 48/121 [02:03<03:07, 2.56s/it] total tokens: 7580 num samples: 10 num padding tokens: 754 - rank: 4 max len: 758 min len: 613 avg len: 682.6 num_loss_counted_tokens: 3916 | |
total tokens: 7806 num samples: 3 num padding tokens: 1441 - rank: 1 max len: 2602 min len: 1696 avg len: 2121.6666666666665 num_loss_counted_tokens: 1824 | |
{ | |
"epoch": 0, | |
"step": 48, | |
"rank": 0, | |
"loss": 0.010778944939374924, | |
"overall_throughput": 42.77144474456274, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.303375244140625, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22499, | |
"batch_size": 76, | |
"total_loss": 0.6968317627906799, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:04.867177" | |
} | |
total tokens: 7440 num samples: 30 num padding tokens: 2674 - rank: 7 max len: 248 min len: 85 avg len: 158.86666666666667 num_loss_counted_tokens: 1971 | |
total tokens: 7644 num samples: 7 num padding tokens: 772 - rank: 3 max len: 1092 min len: 894 avg len: 981.7142857142857 num_loss_counted_tokens: 5193 | |
total tokens: 7580 num samples: 2 num padding tokens: 877 - rank: 0 max len: 3790 min len: 2913 avg len: 3351.5 num_loss_counted_tokens: 999 | |
total tokens: 8025 num samples: 5 num padding tokens: 1188 - rank: 2 max len: 1605 min len: 1101 avg len: 1367.4 num_loss_counted_tokens: 2038 | |
total tokens: 8100 num samples: 18 num padding tokens: 2074 - rank: 6 max len: 450 min len: 258 avg len: 334.77777777777777 num_loss_counted_tokens: 3301 | |
total tokens: 7709 num samples: 13 num padding tokens: 980 - rank: 5 max len: 593 min len: 457 avg len: 517.6153846153846 num_loss_counted_tokens: 4236 | |
Per-token loss scaled by world size: 0.0005574136739596725Per-token loss scaled by world size: 0.0002091079077217728Per-token loss scaled by world size: 0.0003604785306379199 | |
Per-token loss scaled by world size: 0.00025925517547875643Per-token loss scaled by world size: 0.00025941740022972226Per-token loss scaled by world size: 0.0002179427247028798 | |
Per-token loss scaled by world size: 5.799124210170703e-06 | |
Epoch: 0, Step: 49, Rank: 6, loss = 1.024795413017273Epoch: 0, Step: 49, Rank: 3, loss = 0.5944676399230957 | |
Epoch: 0, Step: 49, Rank: 5, loss = 1.5846574306488037 | |
Epoch: 0, Step: 49, Rank: 1, loss = 0.6195839047431946 | |
Epoch: 0, Step: 49, Rank: 4, loss = 0.7370300889015198Epoch: 0, Step: 49, Rank: 0, loss = 0.016486184671521187Epoch: 0, Step: 49, Rank: 7, loss = 0.737491250038147 | |
Per-token loss scaled by world size: 0.00016608217265456915 | |
Epoch: 0, Step: 49, Rank: 2, loss = 0.47215086221694946 | |
Epoch 0: 40%|████ | 49/121 [02:05<03:04, 2.57s/it] total tokens: 7560 num samples: 6 num padding tokens: 946 - rank: 1 max len: 1260 min len: 979 avg len: 1102.3333333333333 num_loss_counted_tokens: 3644 | |
total tokens: 7788 num samples: 12 num padding tokens: 734 - rank: 4 max len: 649 min len: 534 avg len: 587.8333333333334 num_loss_counted_tokens: 4810 | |
{ | |
"epoch": 0, | |
"step": 49, | |
"rank": 0, | |
"loss": 0.016486184671521187, | |
"overall_throughput": 41.99615706250001, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.364055633544922, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22743, | |
"batch_size": 77, | |
"total_loss": 0.7233328223228455, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:07.406672" | |
} | |
total tokens: 7995 num samples: 15 num padding tokens: 1460 - rank: 5 max len: 533 min len: 364 avg len: 435.6666666666667 num_loss_counted_tokens: 4262 | |
total tokens: 3168 num samples: 18 num padding tokens: 844 - rank: 7 max len: 176 min len: 76 avg len: 129.11111111111111 num_loss_counted_tokens: 813 | |
total tokens: 7920 num samples: 22 num padding tokens: 2223 - rank: 6 max len: 360 min len: 188 avg len: 258.95454545454544 num_loss_counted_tokens: 2786 | |
total tokens: 7416 num samples: 8 num padding tokens: 720 - rank: 2 max len: 927 min len: 750 avg len: 837.0 num_loss_counted_tokens: 4900 | |
total tokens: 7248 num samples: 3 num padding tokens: 1314 - rank: 0 max len: 2416 min len: 1340 avg len: 1978.0 num_loss_counted_tokens: 2507 | |
total tokens: 7500 num samples: 10 num padding tokens: 393 - rank: 3 max len: 750 min len: 674 avg len: 710.7 num_loss_counted_tokens: 4132 | |
Per-token loss scaled by world size: 0.0003894062538165599Per-token loss scaled by world size: 0.0003162138455081731Per-token loss scaled by world size: 0.00023851577134337276Per-token loss scaled by world size: 0.00032866382389329374 | |
Per-token loss scaled by world size: 0.00045591729576699436 | |
Per-token loss scaled by world size: 4.146520223002881e-05Per-token loss scaled by world size: 5.4783604355179705e-06 | |
Epoch: 0, Step: 50, Rank: 7, loss = 0.9038181900978088 | |
Epoch: 0, Step: 50, Rank: 5, loss = 1.113020420074463 | |
Epoch: 0, Step: 50, Rank: 4, loss = 1.3031256198883057 | |
Epoch: 0, Step: 50, Rank: 3, loss = 0.6817377209663391 | |
Epoch: 0, Step: 50, Rank: 2, loss = 0.9394033551216125 | |
Epoch: 0, Step: 50, Rank: 0, loss = 0.01565852388739586Epoch: 0, Step: 50, Rank: 1, loss = 0.1185179129242897 | |
Per-token loss scaled by world size: 0.00044163045822642744 | |
Epoch: 0, Step: 50, Rank: 6, loss = 1.2622902393341064 | |
Epoch 0: 41%|████▏ | 50/121 [02:08<03:00, 2.54s/it] total tokens: 7206 num samples: 3 num padding tokens: 1221 - rank: 1 max len: 2402 min len: 1648 avg len: 1995.0 num_loss_counted_tokens: 854 | |
total tokens: 7592 num samples: 8 num padding tokens: 1031 - rank: 4 max len: 949 min len: 737 avg len: 820.125 num_loss_counted_tokens: 4482 | |
{ | |
"epoch": 0, | |
"step": 50, | |
"rank": 0, | |
"loss": 0.01565852388739586, | |
"overall_throughput": 43.999185393816596, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.364055633544922, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22866, | |
"batch_size": 99, | |
"total_loss": 0.79219651222229, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:09.869463" | |
} | |
total tokens: 7938 num samples: 14 num padding tokens: 2583 - rank: 6 max len: 567 min len: 269 avg len: 382.5 num_loss_counted_tokens: 3524 | |
total tokens: 7014 num samples: 6 num padding tokens: 736 - rank: 3 max len: 1169 min len: 949 avg len: 1046.3333333333333 num_loss_counted_tokens: 3459 | |
total tokens: 7525 num samples: 5 num padding tokens: 656 - rank: 2 max len: 1505 min len: 1205 avg len: 1373.8 num_loss_counted_tokens: 3152 | |
total tokens: 8070 num samples: 30 num padding tokens: 3026 - rank: 7 max len: 269 min len: 86 avg len: 168.13333333333333 num_loss_counted_tokens: 2116 | |
total tokens: 7821 num samples: 11 num padding tokens: 546 - rank: 5 max len: 711 min len: 583 avg len: 661.3636363636364 num_loss_counted_tokens: 4062 | |
total tokens: 6454 num samples: 2 num padding tokens: 77 - rank: 0 max len: 3227 min len: 3150 avg len: 3188.5 num_loss_counted_tokens: 196 | |
Per-token loss scaled by world size: 0.00025689046015031636Per-token loss scaled by world size: 0.0004693289229180664Per-token loss scaled by world size: 0.0002972199581563473Per-token loss scaled by world size: 0.00019820936722680926Per-token loss scaled by world size: 0.0002842875546775758 | |
Per-token loss scaled by world size: 4.825916403206065e-05Per-token loss scaled by world size: 2.7838473215524573e-06 | |
Epoch: 0, Step: 51, Rank: 6, loss = 1.3355927467346191 | |
Epoch: 0, Step: 51, Rank: 7, loss = 0.8458136916160583 | |
Epoch: 0, Step: 51, Rank: 2, loss = 0.7310460209846497 | |
Epoch: 0, Step: 51, Rank: 1, loss = 0.13733351230621338 | |
Epoch: 0, Step: 51, Rank: 4, loss = 0.5640543103218079 | |
Epoch: 0, Step: 51, Rank: 3, loss = 0.8090112805366516 | |
Epoch: 0, Step: 51, Rank: 0, loss = 0.007922133430838585 | |
Per-token loss scaled by world size: 0.0003865555045194924 | |
Epoch: 0, Step: 51, Rank: 5, loss = 1.100040316581726 | |
Epoch 0: 42%|████▏ | 51/121 [02:10<02:58, 2.55s/it] total tokens: 7845 num samples: 5 num padding tokens: 525 - rank: 1 max len: 1569 min len: 1326 avg len: 1464.0 num_loss_counted_tokens: 3941 | |
total tokens: 7520 num samples: 10 num padding tokens: 686 - rank: 4 max len: 752 min len: 619 avg len: 683.4 num_loss_counted_tokens: 4307 | |
{ | |
"epoch": 0, | |
"step": 51, | |
"rank": 0, | |
"loss": 0.007922133430838585, | |
"overall_throughput": 42.29003789361672, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.354268074035645, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22766, | |
"batch_size": 70, | |
"total_loss": 0.6913517117500305, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:12.436230" | |
} | |
total tokens: 7780 num samples: 20 num padding tokens: 1768 - rank: 6 max len: 389 min len: 232 avg len: 300.6 num_loss_counted_tokens: 3299 | |
total tokens: 5290 num samples: 23 num padding tokens: 1933 - rank: 7 max len: 230 min len: 81 avg len: 145.95652173913044 num_loss_counted_tokens: 1376 | |
total tokens: 7761 num samples: 13 num padding tokens: 933 - rank: 5 max len: 597 min len: 418 avg len: 525.2307692307693 num_loss_counted_tokens: 4968 | |
total tokens: 7912 num samples: 8 num padding tokens: 555 - rank: 3 max len: 989 min len: 821 avg len: 919.625 num_loss_counted_tokens: 5323 | |
total tokens: 7374 num samples: 3 num padding tokens: 1194 - rank: 0 max len: 2458 min len: 1773 avg len: 2060.0 num_loss_counted_tokens: 336 | |
total tokens: 7728 num samples: 6 num padding tokens: 549 - rank: 2 max len: 1288 min len: 1035 avg len: 1196.5 num_loss_counted_tokens: 3900 | |
Per-token loss scaled by world size: 0.00019784610776696354Per-token loss scaled by world size: 0.000183841708349064Per-token loss scaled by world size: 9.399676491739228e-05 | |
Per-token loss scaled by world size: 0.0001539696240797639 | |
Per-token loss scaled by world size: 0.0002607290807645768Per-token loss scaled by world size: 0.000336777011398226 | |
Per-token loss scaled by world size: 0.0004033475706819445 | |
Epoch: 0, Step: 52, Rank: 0, loss = 0.30163562297821045 | |
Epoch: 0, Step: 52, Rank: 3, loss = 0.5899480581283569Epoch: 0, Step: 52, Rank: 2, loss = 0.6348881721496582 | |
Epoch: 0, Step: 52, Rank: 1, loss = 0.4940885007381439 | |
Epoch: 0, Step: 52, Rank: 6, loss = 1.0807174444198608 | |
Epoch: 0, Step: 52, Rank: 7, loss = 0.8366796374320984 | |
Epoch: 0, Step: 52, Rank: 4, loss = 1.2943423986434937 | |
Per-token loss scaled by world size: 0.00024687970289960504 | |
Epoch: 0, Step: 52, Rank: 5, loss = 0.7922369241714478 | |
Epoch 0: 43%|████▎ | 52/121 [02:13<02:54, 2.53s/it] total tokens: 5434 num samples: 2 num padding tokens: 361 - rank: 1 max len: 2717 min len: 2356 avg len: 2536.5 num_loss_counted_tokens: 214 | |
total tokens: 8091 num samples: 9 num padding tokens: 724 - rank: 4 max len: 899 min len: 723 avg len: 818.5555555555555 num_loss_counted_tokens: 4597 | |
{ | |
"epoch": 0, | |
"step": 52, | |
"rank": 0, | |
"loss": 0.30163562297821045, | |
"overall_throughput": 43.565651051419955, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.44568157196045, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25672, | |
"batch_size": 81, | |
"total_loss": 0.7530670762062073, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:14.922017" | |
} | |
total tokens: 7384 num samples: 4 num padding tokens: 597 - rank: 2 max len: 1846 min len: 1598 avg len: 1696.75 num_loss_counted_tokens: 648 | |
total tokens: 5825 num samples: 25 num padding tokens: 2240 - rank: 7 max len: 233 min len: 87 avg len: 143.4 num_loss_counted_tokens: 1267 | |
total tokens: 8040 num samples: 15 num padding tokens: 2522 - rank: 6 max len: 536 min len: 238 avg len: 367.8666666666667 num_loss_counted_tokens: 3089 | |
total tokens: 7434 num samples: 6 num padding tokens: 1318 - rank: 3 max len: 1239 min len: 914 avg len: 1019.3333333333334 num_loss_counted_tokens: 4080 | |
total tokens: 7854 num samples: 11 num padding tokens: 785 - rank: 5 max len: 714 min len: 537 avg len: 642.6363636363636 num_loss_counted_tokens: 3685 | |
total tokens: 7772 num samples: 2 num padding tokens: 602 - rank: 0 max len: 3886 min len: 3284 avg len: 3585.0 num_loss_counted_tokens: 194 | |
Per-token loss scaled by world size: 0.00032663694582879543Per-token loss scaled by world size: 0.0003048842481803149Per-token loss scaled by world size: 0.000158169845235534Per-token loss scaled by world size: 0.00026047308347187936 | |
Per-token loss scaled by world size: 0.00023948316811583936 | |
Per-token loss scaled by world size: 0.00011947691382374614Per-token loss scaled by world size: 5.034709374740487e-06 | |
Epoch: 0, Step: 53, Rank: 6, loss = 1.0754791498184204 | |
Epoch: 0, Step: 53, Rank: 5, loss = 1.1522117853164673 | |
Epoch: 0, Step: 53, Rank: 1, loss = 0.557944118976593 | |
Epoch: 0, Step: 53, Rank: 4, loss = 0.9188187718391418Epoch: 0, Step: 53, Rank: 2, loss = 0.4214548170566559 | |
Epoch: 0, Step: 53, Rank: 7, loss = 0.8447768688201904Epoch: 0, Step: 53, Rank: 0, loss = 0.017759937793016434 | |
Per-token loss scaled by world size: 0.00029550379258580506 | |
Epoch: 0, Step: 53, Rank: 3, loss = 1.0423896312713623 | |
Epoch 0: 44%|████▍ | 53/121 [02:15<02:51, 2.53s/it] total tokens: 6792 num samples: 3 num padding tokens: 382 - rank: 1 max len: 2264 min len: 2050 avg len: 2136.6666666666665 num_loss_counted_tokens: 2284 | |
total tokens: 7866 num samples: 9 num padding tokens: 794 - rank: 4 max len: 874 min len: 678 avg len: 785.7777777777778 num_loss_counted_tokens: 5164 | |
{ | |
"epoch": 0, | |
"step": 53, | |
"rank": 0, | |
"loss": 0.017759937793016434, | |
"overall_throughput": 42.74150473533702, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.40515947341919, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 28220, | |
"batch_size": 101, | |
"total_loss": 0.7538543939590454, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:17.454732" | |
} | |
total tokens: 7777 num samples: 7 num padding tokens: 867 - rank: 3 max len: 1111 min len: 895 avg len: 987.1428571428571 num_loss_counted_tokens: 4712 | |
total tokens: 5152 num samples: 23 num padding tokens: 1800 - rank: 7 max len: 224 min len: 71 avg len: 145.7391304347826 num_loss_counted_tokens: 1286 | |
total tokens: 7932 num samples: 12 num padding tokens: 1042 - rank: 5 max len: 661 min len: 452 avg len: 574.1666666666666 num_loss_counted_tokens: 4158 | |
total tokens: 6598 num samples: 2 num padding tokens: 710 - rank: 0 max len: 3299 min len: 2589 avg len: 2944.0 num_loss_counted_tokens: 177 | |
total tokens: 8100 num samples: 18 num padding tokens: 1853 - rank: 6 max len: 450 min len: 235 avg len: 347.05555555555554 num_loss_counted_tokens: 3720 | |
total tokens: 8076 num samples: 4 num padding tokens: 2220 - rank: 2 max len: 2019 min len: 1211 avg len: 1464.0 num_loss_counted_tokens: 1401 | |
Per-token loss scaled by world size: 0.00026205729227513075Per-token loss scaled by world size: 0.00021129030210431665Per-token loss scaled by world size: 0.00041250750655308366Per-token loss scaled by world size: 0.0004137573123443872Per-token loss scaled by world size: 0.00038468287675641477 | |
Per-token loss scaled by world size: 6.601931090699509e-06 | |
Per-token loss scaled by world size: 0.0001686068280832842 | |
Epoch: 0, Step: 54, Rank: 5, loss = 1.1955498456954956 | |
Epoch: 0, Step: 54, Rank: 1, loss = 0.612372100353241 | |
Epoch: 0, Step: 54, Rank: 3, loss = 0.7595075368881226 | |
Epoch: 0, Step: 54, Rank: 6, loss = 1.1991721391677856 | |
Epoch: 0, Step: 54, Rank: 4, loss = 1.114907145500183 | |
Epoch: 0, Step: 54, Rank: 0, loss = 0.019134046509861946 | |
Epoch: 0, Step: 54, Rank: 7, loss = 0.48866474628448486 | |
Per-token loss scaled by world size: 0.00018809801258612424 | |
Epoch: 0, Step: 54, Rank: 2, loss = 0.5451550483703613 | |
Epoch 0: 45%|████▍ | 54/121 [02:18<02:49, 2.53s/it] total tokens: 6536 num samples: 4 num padding tokens: 440 - rank: 1 max len: 1634 min len: 1437 avg len: 1524.0 num_loss_counted_tokens: 1512 | |
total tokens: 7530 num samples: 10 num padding tokens: 291 - rank: 4 max len: 753 min len: 677 avg len: 723.9 num_loss_counted_tokens: 3325 | |
total tokens: 6312 num samples: 24 num padding tokens: 2423 - rank: 7 max len: 263 min len: 87 avg len: 162.04166666666666 num_loss_counted_tokens: 1548 | |
{ | |
"epoch": 0, | |
"step": 54, | |
"rank": 0, | |
"loss": 0.019134046509861946, | |
"overall_throughput": 42.550455941342655, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.375274658203125, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23186, | |
"batch_size": 84, | |
"total_loss": 0.7418078184127808, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:20.000394" | |
} | |
total tokens: 7968 num samples: 12 num padding tokens: 974 - rank: 5 max len: 664 min len: 458 avg len: 582.8333333333334 num_loss_counted_tokens: 4692 | |
total tokens: 7020 num samples: 5 num padding tokens: 1164 - rank: 2 max len: 1404 min len: 1078 avg len: 1171.2 num_loss_counted_tokens: 2148 | |
total tokens: 7786 num samples: 17 num padding tokens: 1313 - rank: 6 max len: 458 min len: 268 avg len: 380.7647058823529 num_loss_counted_tokens: 3857 | |
total tokens: 6240 num samples: 3 num padding tokens: 692 - rank: 0 max len: 2080 min len: 1719 avg len: 1849.3333333333333 num_loss_counted_tokens: 339 | |
total tokens: 8032 num samples: 8 num padding tokens: 1051 - rank: 3 max len: 1004 min len: 774 avg len: 872.625 num_loss_counted_tokens: 4797 | |
Per-token loss scaled by world size: 0.00031934864819049835Per-token loss scaled by world size: 0.00028757311520166695Per-token loss scaled by world size: 0.0002729191619437188Per-token loss scaled by world size: 5.039776624471415e-06Per-token loss scaled by world size: 5.635480647470104e-06Per-token loss scaled by world size: 0.00025633463519625366 | |
Per-token loss scaled by world size: 0.00021784953423775733 | |
Epoch: 0, Step: 55, Rank: 1, loss = 0.014716777950525284 | |
Epoch: 0, Step: 55, Rank: 4, loss = 0.7969580292701721Epoch: 0, Step: 55, Rank: 2, loss = 0.8397494554519653 | |
Epoch: 0, Step: 55, Rank: 5, loss = 0.9325379729270935Epoch: 0, Step: 55, Rank: 0, loss = 0.016456307843327522 | |
Epoch: 0, Step: 55, Rank: 3, loss = 0.7485291957855225 | |
Epoch: 0, Step: 55, Rank: 7, loss = 0.6361478567123413 | |
Per-token loss scaled by world size: 0.0004028878756798804 | |
Epoch: 0, Step: 55, Rank: 6, loss = 1.176482915878296 | |
Epoch 0: 45%|████▌ | 55/121 [02:20<02:48, 2.55s/it] total tokens: 6669 num samples: 3 num padding tokens: 1016 - rank: 1 max len: 2223 min len: 1683 avg len: 1884.3333333333333 num_loss_counted_tokens: 2050 | |
total tokens: 7551 num samples: 9 num padding tokens: 500 - rank: 4 max len: 839 min len: 731 avg len: 783.4444444444445 num_loss_counted_tokens: 5307 | |
total tokens: 7815 num samples: 5 num padding tokens: 1302 - rank: 2 max len: 1563 min len: 1114 avg len: 1302.6 num_loss_counted_tokens: 3153 | |
total tokens: 6000 num samples: 25 num padding tokens: 1935 - rank: 7 max len: 240 min len: 71 avg len: 162.6 num_loss_counted_tokens: 1696 | |
{ | |
"epoch": 0, | |
"step": 55, | |
"rank": 0, | |
"loss": 0.016456307843327522, | |
"overall_throughput": 41.61745941199812, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.42158031463623, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23361, | |
"batch_size": 72, | |
"total_loss": 0.6451972723007202, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:22.600693" | |
} | |
total tokens: 8008 num samples: 8 num padding tokens: 692 - rank: 3 max len: 1001 min len: 855 avg len: 914.5 num_loss_counted_tokens: 2914 | |
total tokens: 7755 num samples: 11 num padding tokens: 872 - rank: 5 max len: 705 min len: 568 avg len: 625.7272727272727 num_loss_counted_tokens: 3890 | |
total tokens: 5474 num samples: 2 num padding tokens: 71 - rank: 0 max len: 2737 min len: 2666 avg len: 2701.5 num_loss_counted_tokens: 203 | |
total tokens: 7952 num samples: 16 num padding tokens: 1838 - rank: 6 max len: 497 min len: 242 avg len: 382.125 num_loss_counted_tokens: 3616 | |
Per-token loss scaled by world size: 0.00027937223785556853Per-token loss scaled by world size: 0.0003551024419721216Per-token loss scaled by world size: 0.0003812481591012329 | |
Per-token loss scaled by world size: 0.00039887617458589375 | |
Per-token loss scaled by world size: 0.00017063321138266474 | |
Per-token loss scaled by world size: 0.0001592883054399863 | |
Per-token loss scaled by world size: 5.90530635236064e-06 | |
Epoch: 0, Step: 56, Rank: 5, loss = 1.202885627746582Epoch: 0, Step: 56, Rank: 6, loss = 1.1203925609588623 | |
Epoch: 0, Step: 56, Rank: 7, loss = 0.8814542889595032 | |
Epoch: 0, Step: 56, Rank: 4, loss = 1.2585041522979736 | |
Epoch: 0, Step: 56, Rank: 3, loss = 0.5025745034217834 | |
Epoch: 0, Step: 56, Rank: 1, loss = 0.5383691191673279 | |
Epoch: 0, Step: 56, Rank: 0, loss = 0.018631979823112488 | |
Per-token loss scaled by world size: 0.0001534399198135361 | |
Epoch: 0, Step: 56, Rank: 2, loss = 0.4841221272945404 | |
Epoch 0: 46%|████▋ | 56/121 [02:23<02:46, 2.55s/it] total tokens: 8090 num samples: 10 num padding tokens: 807 - rank: 4 max len: 809 min len: 667 avg len: 728.3 num_loss_counted_tokens: 3324 | |
total tokens: 7916 num samples: 4 num padding tokens: 1432 - rank: 1 max len: 1979 min len: 1204 avg len: 1621.0 num_loss_counted_tokens: 3189 | |
{ | |
"epoch": 0, | |
"step": 56, | |
"rank": 0, | |
"loss": 0.018631979823112488, | |
"overall_throughput": 42.399025208803295, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.21819305419922, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25241, | |
"batch_size": 80, | |
"total_loss": 0.7508668899536133, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:25.156511" | |
} | |
total tokens: 7794 num samples: 18 num padding tokens: 1747 - rank: 6 max len: 433 min len: 260 avg len: 335.94444444444446 num_loss_counted_tokens: 3466 | |
total tokens: 7560 num samples: 3 num padding tokens: 565 - rank: 0 max len: 2520 min len: 2123 avg len: 2331.6666666666665 num_loss_counted_tokens: 291 | |
total tokens: 7956 num samples: 12 num padding tokens: 1282 - rank: 5 max len: 663 min len: 471 avg len: 556.1666666666666 num_loss_counted_tokens: 3682 | |
total tokens: 7158 num samples: 6 num padding tokens: 910 - rank: 2 max len: 1193 min len: 973 avg len: 1041.3333333333333 num_loss_counted_tokens: 3894 | |
total tokens: 7020 num samples: 27 num padding tokens: 2005 - rank: 7 max len: 260 min len: 75 avg len: 185.74074074074073 num_loss_counted_tokens: 2158 | |
total tokens: 7528 num samples: 8 num padding tokens: 511 - rank: 3 max len: 941 min len: 825 avg len: 877.125 num_loss_counted_tokens: 5733 | |
Per-token loss scaled by world size: 0.0006988957757130265Per-token loss scaled by world size: 0.0001346730423392728Per-token loss scaled by world size: 0.0006961524486541748Per-token loss scaled by world size: 0.0004565907292999327 | |
Per-token loss scaled by world size: 0.0009174557635560632Per-token loss scaled by world size: 6.04268007009523e-06 | |
Per-token loss scaled by world size: 2.950769612652948e-06 | |
Epoch: 0, Step: 57, Rank: 2, loss = 0.29436159133911133 | |
Epoch: 0, Step: 57, Rank: 7, loss = 1.5216152667999268 | |
Epoch: 0, Step: 57, Rank: 4, loss = 0.9979931712150574 | |
Epoch: 0, Step: 57, Rank: 6, loss = 1.527611494064331 | |
Epoch: 0, Step: 57, Rank: 5, loss = 2.005328893661499Epoch: 0, Step: 57, Rank: 0, loss = 0.013207787647843361 | |
Epoch: 0, Step: 57, Rank: 1, loss = 0.006449644919484854 | |
Per-token loss scaled by world size: 0.0004939697682857513 | |
Epoch: 0, Step: 57, Rank: 3, loss = 1.079694390296936 | |
Epoch 0: 47%|████▋ | 57/121 [02:25<02:43, 2.55s/it] total tokens: 7420 num samples: 4 num padding tokens: 675 - rank: 1 max len: 1855 min len: 1520 avg len: 1686.25 num_loss_counted_tokens: 2384 | |
total tokens: 7770 num samples: 10 num padding tokens: 828 - rank: 4 max len: 777 min len: 623 avg len: 694.2 num_loss_counted_tokens: 2018 | |
{ | |
"epoch": 0, | |
"step": 57, | |
"rank": 0, | |
"loss": 0.013207787647843361, | |
"overall_throughput": 42.49635893812786, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.335302352905273, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 17486, | |
"batch_size": 87, | |
"total_loss": 0.9307827949523926, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:27.695911" | |
} | |
total tokens: 8062 num samples: 29 num padding tokens: 2902 - rank: 7 max len: 278 min len: 80 avg len: 177.93103448275863 num_loss_counted_tokens: 2355 | |
total tokens: 7464 num samples: 6 num padding tokens: 1141 - rank: 2 max len: 1244 min len: 904 avg len: 1053.8333333333333 num_loss_counted_tokens: 4123 | |
total tokens: 7808 num samples: 16 num padding tokens: 1561 - rank: 6 max len: 488 min len: 309 avg len: 390.4375 num_loss_counted_tokens: 4152 | |
total tokens: 7865 num samples: 13 num padding tokens: 912 - rank: 5 max len: 605 min len: 489 avg len: 534.8461538461538 num_loss_counted_tokens: 4080 | |
total tokens: 7839 num samples: 9 num padding tokens: 355 - rank: 3 max len: 871 min len: 779 avg len: 831.5555555555555 num_loss_counted_tokens: 4751 | |
total tokens: 6874 num samples: 2 num padding tokens: 130 - rank: 0 max len: 3437 min len: 3307 avg len: 3372.0 num_loss_counted_tokens: 164 | |
Per-token loss scaled by world size: 0.0002828103897627443Per-token loss scaled by world size: 0.0004572872712742537Per-token loss scaled by world size: 1.0020711442848551e-06Per-token loss scaled by world size: 0.0006151991547085345Per-token loss scaled by world size: 0.00043195782927796245 | |
Per-token loss scaled by world size: 7.227377864182927e-06 | |
Per-token loss scaled by world size: 0.00031230703461915255 | |
Epoch: 0, Step: 58, Rank: 6, loss = 1.217584490776062 | |
Epoch: 0, Step: 58, Rank: 0, loss = 0.0026681397575885057 | |
Epoch: 0, Step: 58, Rank: 3, loss = 0.7530180215835571Epoch: 0, Step: 58, Rank: 4, loss = 1.150141716003418 | |
Epoch: 0, Step: 58, Rank: 5, loss = 1.6380445957183838 | |
Epoch: 0, Step: 58, Rank: 1, loss = 0.019243797287344933 | |
Epoch: 0, Step: 58, Rank: 7, loss = 0.831556499004364 | |
Per-token loss scaled by world size: 7.78582543716766e-05 | |
Epoch: 0, Step: 58, Rank: 2, loss = 0.2073073387145996 | |
Epoch 0: 48%|████▊ | 58/121 [02:28<02:40, 2.56s/it] total tokens: 6570 num samples: 3 num padding tokens: 500 - rank: 1 max len: 2190 min len: 1919 avg len: 2023.3333333333333 num_loss_counted_tokens: 450 | |
total tokens: 7944 num samples: 8 num padding tokens: 930 - rank: 4 max len: 993 min len: 794 avg len: 876.75 num_loss_counted_tokens: 6270 | |
{ | |
"epoch": 0, | |
"step": 58, | |
"rank": 0, | |
"loss": 0.0026681397575885057, | |
"overall_throughput": 42.239983019087354, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.242695808410645, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21301, | |
"batch_size": 71, | |
"total_loss": 0.7274456024169922, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:30.263573" | |
} | |
total tokens: 5496 num samples: 24 num padding tokens: 2346 - rank: 7 max len: 229 min len: 77 avg len: 131.25 num_loss_counted_tokens: 1044 | |
total tokens: 7190 num samples: 2 num padding tokens: 692 - rank: 0 max len: 3595 min len: 2903 avg len: 3249.0 num_loss_counted_tokens: 177 | |
total tokens: 8113 num samples: 19 num padding tokens: 2176 - rank: 6 max len: 427 min len: 230 avg len: 312.4736842105263 num_loss_counted_tokens: 3174 | |
total tokens: 7020 num samples: 4 num padding tokens: 898 - rank: 2 max len: 1755 min len: 1310 avg len: 1530.5 num_loss_counted_tokens: 778 | |
total tokens: 7494 num samples: 6 num padding tokens: 623 - rank: 3 max len: 1249 min len: 1027 avg len: 1145.1666666666667 num_loss_counted_tokens: 2420 | |
total tokens: 8107 num samples: 11 num padding tokens: 1465 - rank: 5 max len: 737 min len: 430 avg len: 603.8181818181819 num_loss_counted_tokens: 4656 | |
Per-token loss scaled by world size: 0.00026943007833324373Per-token loss scaled by world size: 0.00027157709700986743Per-token loss scaled by world size: 0.00050318957073614 | |
Per-token loss scaled by world size: 0.0003404158051125705Per-token loss scaled by world size: 0.0005186275229789317 | |
Per-token loss scaled by world size: 5.279351626086282e-06 | |
Per-token loss scaled by world size: 6.721797399222851e-05 | |
Epoch: 0, Step: 59, Rank: 4, loss = 1.4499406814575195Epoch: 0, Step: 59, Rank: 7, loss = 0.7825493812561035 | |
Epoch: 0, Step: 59, Rank: 6, loss = 0.7763627767562866 | |
Epoch: 0, Step: 59, Rank: 5, loss = 1.4944251775741577Epoch: 0, Step: 59, Rank: 2, loss = 0.9809081554412842 | |
Epoch: 0, Step: 59, Rank: 0, loss = 0.015212451107800007 | |
Epoch: 0, Step: 59, Rank: 1, loss = 0.19368860125541687 | |
Per-token loss scaled by world size: 0.0002404269325779751 | |
Epoch: 0, Step: 59, Rank: 3, loss = 0.6927902102470398 | |
Epoch 0: 49%|████▉ | 59/121 [02:31<02:38, 2.56s/it] total tokens: 7623 num samples: 11 num padding tokens: 1253 - rank: 4 max len: 693 min len: 515 avg len: 579.0909090909091 num_loss_counted_tokens: 4023 | |
total tokens: 7490 num samples: 5 num padding tokens: 1411 - rank: 1 max len: 1498 min len: 1040 avg len: 1215.8 num_loss_counted_tokens: 1224 | |
{ | |
"epoch": 0, | |
"step": 59, | |
"rank": 0, | |
"loss": 0.015212451107800007, | |
"overall_throughput": 42.27999530101241, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.386998653411865, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23052, | |
"batch_size": 97, | |
"total_loss": 0.7982346415519714, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:32.865411" | |
} | |
total tokens: 7981 num samples: 23 num padding tokens: 1404 - rank: 6 max len: 347 min len: 241 avg len: 285.95652173913044 num_loss_counted_tokens: 2899 | |
total tokens: 7140 num samples: 30 num padding tokens: 2458 - rank: 7 max len: 238 min len: 77 avg len: 156.06666666666666 num_loss_counted_tokens: 1738 | |
total tokens: 7938 num samples: 9 num padding tokens: 784 - rank: 3 max len: 882 min len: 719 avg len: 794.8888888888889 num_loss_counted_tokens: 5307 | |
total tokens: 7189 num samples: 7 num padding tokens: 565 - rank: 2 max len: 1027 min len: 894 avg len: 946.2857142857143 num_loss_counted_tokens: 3237 | |
total tokens: 8096 num samples: 16 num padding tokens: 1168 - rank: 5 max len: 506 min len: 348 avg len: 433.0 num_loss_counted_tokens: 3888 | |
total tokens: 7446 num samples: 3 num padding tokens: 1212 - rank: 0 max len: 2482 min len: 1681 avg len: 2078.0 num_loss_counted_tokens: 1401 | |
Per-token loss scaled by world size: 0.0002737885224632919Per-token loss scaled by world size: 0.00014497540541924536Per-token loss scaled by world size: 0.00019299837003927678Per-token loss scaled by world size: 0.0003046545316465199Per-token loss scaled by world size: 0.0004120226949453354Per-token loss scaled by world size: 0.00017287737864535302 | |
Per-token loss scaled by world size: 9.607095989849768e-07 | |
Epoch: 0, Step: 60, Rank: 2, loss = 0.6385592222213745Epoch: 0, Step: 60, Rank: 1, loss = 0.4796692430973053 | |
Epoch: 0, Step: 60, Rank: 6, loss = 1.00798761844635Epoch: 0, Step: 60, Rank: 4, loss = 1.3632285594940186Epoch: 0, Step: 60, Rank: 3, loss = 0.9058635830879211 | |
Epoch: 0, Step: 60, Rank: 7, loss = 0.5719864368438721 | |
Epoch: 0, Step: 60, Rank: 0, loss = 0.0031786279287189245 | |
Per-token loss scaled by world size: 0.0002788409183267504 | |
Epoch: 0, Step: 60, Rank: 5, loss = 0.9225800633430481 | |
Epoch 0: 50%|████▉ | 60/121 [02:33<02:35, 2.55s/it] total tokens: 6675 num samples: 3 num padding tokens: 898 - rank: 1 max len: 2225 min len: 1369 avg len: 1925.6666666666667 num_loss_counted_tokens: 1482 | |
total tokens: 7610 num samples: 10 num padding tokens: 836 - rank: 4 max len: 761 min len: 583 avg len: 677.4 num_loss_counted_tokens: 5671 | |
{ | |
"epoch": 0, | |
"step": 60, | |
"rank": 0, | |
"loss": 0.0031786279287189245, | |
"overall_throughput": 42.89061324541665, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.254440784454346, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26469, | |
"batch_size": 91, | |
"total_loss": 0.7366316914558411, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:35.353516" | |
} | |
total tokens: 7735 num samples: 17 num padding tokens: 2066 - rank: 6 max len: 455 min len: 253 avg len: 333.47058823529414 num_loss_counted_tokens: 2881 | |
total tokens: 7116 num samples: 6 num padding tokens: 703 - rank: 2 max len: 1186 min len: 991 avg len: 1068.8333333333333 num_loss_counted_tokens: 4466 | |
total tokens: 5920 num samples: 2 num padding tokens: 120 - rank: 0 max len: 2960 min len: 2840 avg len: 2900.0 num_loss_counted_tokens: 179 | |
total tokens: 7696 num samples: 8 num padding tokens: 616 - rank: 3 max len: 962 min len: 775 avg len: 885.0 num_loss_counted_tokens: 4386 | |
total tokens: 8096 num samples: 32 num padding tokens: 2221 - rank: 7 max len: 253 min len: 72 avg len: 183.59375 num_loss_counted_tokens: 2722 | |
total tokens: 7566 num samples: 13 num padding tokens: 544 - rank: 5 max len: 582 min len: 459 avg len: 540.1538461538462 num_loss_counted_tokens: 4576 | |
Per-token loss scaled by world size: 0.0001878267794381827Per-token loss scaled by world size: 0.0002473706554155797Per-token loss scaled by world size: 0.0005609646323136985Per-token loss scaled by world size: 2.916532139352057e-05Per-token loss scaled by world size: 0.0005521869170479476 | |
Per-token loss scaled by world size: 2.548310840211343e-05 | |
Per-token loss scaled by world size: 0.0002686498628463596 | |
Epoch: 0, Step: 61, Rank: 3, loss = 0.4526155889034271 | |
Epoch: 0, Step: 61, Rank: 0, loss = 0.07028113305568695Epoch: 0, Step: 61, Rank: 2, loss = 0.5961014032363892 | |
Epoch: 0, Step: 61, Rank: 6, loss = 1.3306324481964111Epoch: 0, Step: 61, Rank: 4, loss = 1.3517844676971436 | |
Epoch: 0, Step: 61, Rank: 1, loss = 0.061407919973134995 | |
Epoch: 0, Step: 61, Rank: 7, loss = 0.6473789811134338 | |
Per-token loss scaled by world size: 0.000720518350135535 | |
Epoch: 0, Step: 61, Rank: 5, loss = 1.7362691164016724 | |
Epoch 0: 50%|█████ | 61/121 [02:36<02:32, 2.55s/it] total tokens: 6764 num samples: 4 num padding tokens: 645 - rank: 1 max len: 1691 min len: 1416 avg len: 1529.75 num_loss_counted_tokens: 1674 | |
total tokens: 7416 num samples: 9 num padding tokens: 608 - rank: 4 max len: 824 min len: 677 avg len: 756.4444444444445 num_loss_counted_tokens: 5244 | |
total tokens: 8088 num samples: 12 num padding tokens: 844 - rank: 5 max len: 674 min len: 545 avg len: 603.6666666666666 num_loss_counted_tokens: 4651 | |
{ | |
"epoch": 0, | |
"step": 61, | |
"rank": 0, | |
"loss": 0.07028113305568695, | |
"overall_throughput": 42.67067345316129, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.295628547668457, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19278, | |
"batch_size": 85, | |
"total_loss": 0.7808088660240173, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:37.891642" | |
} | |
total tokens: 7800 num samples: 24 num padding tokens: 3935 - rank: 7 max len: 325 min len: 82 avg len: 161.04166666666666 num_loss_counted_tokens: 1576 | |
total tokens: 7965 num samples: 15 num padding tokens: 1199 - rank: 6 max len: 531 min len: 325 avg len: 451.06666666666666 num_loss_counted_tokens: 4050 | |
total tokens: 7488 num samples: 8 num padding tokens: 552 - rank: 3 max len: 936 min len: 830 avg len: 867.0 num_loss_counted_tokens: 2535 | |
total tokens: 6438 num samples: 2 num padding tokens: 1075 - rank: 0 max len: 3219 min len: 2144 avg len: 2681.5 num_loss_counted_tokens: 201 | |
total tokens: 7952 num samples: 7 num padding tokens: 583 - rank: 2 max len: 1136 min len: 993 avg len: 1052.7142857142858 num_loss_counted_tokens: 5968 | |
Per-token loss scaled by world size: 0.0004150475433561951Per-token loss scaled by world size: 0.0003149463445879519Per-token loss scaled by world size: 0.000596669502556324 | |
Per-token loss scaled by world size: 6.351516731228912e-06Per-token loss scaled by world size: 8.575078709327499e-07Per-token loss scaled by world size: 0.00024007105093915015 | |
Per-token loss scaled by world size: 0.00015118405281100422 | |
Epoch: 0, Step: 62, Rank: 5, loss = 1.5943009853363037 | |
Epoch: 0, Step: 62, Rank: 3, loss = 0.8415366411209106 | |
Epoch: 0, Step: 62, Rank: 0, loss = 0.016971252858638763 | |
Epoch: 0, Step: 62, Rank: 4, loss = 1.1090070009231567Epoch: 0, Step: 62, Rank: 2, loss = 0.6414698362350464 | |
Epoch: 0, Step: 62, Rank: 1, loss = 0.0022912609856575727 | |
Epoch: 0, Step: 62, Rank: 7, loss = 0.40396377444267273 | |
Per-token loss scaled by world size: 0.0003845185856334865 | |
Epoch: 0, Step: 62, Rank: 6, loss = 1.0274336338043213 | |
Epoch 0: 51%|█████ | 62/121 [02:38<02:29, 2.53s/it] total tokens: 7392 num samples: 4 num padding tokens: 338 - rank: 1 max len: 1848 min len: 1671 avg len: 1763.5 num_loss_counted_tokens: 2320 | |
total tokens: 8112 num samples: 8 num padding tokens: 1667 - rank: 4 max len: 1014 min len: 705 avg len: 805.625 num_loss_counted_tokens: 4978 | |
{ | |
"epoch": 0, | |
"step": 62, | |
"rank": 0, | |
"loss": 0.016971252858638763, | |
"overall_throughput": 43.357387151746714, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.4012451171875, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21376, | |
"batch_size": 77, | |
"total_loss": 0.7046218514442444, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:40.388446" | |
} | |
total tokens: 7860 num samples: 30 num padding tokens: 2842 - rank: 7 max len: 262 min len: 79 avg len: 167.26666666666668 num_loss_counted_tokens: 2118 | |
total tokens: 7711 num samples: 11 num padding tokens: 1331 - rank: 5 max len: 701 min len: 445 avg len: 580.0 num_loss_counted_tokens: 5182 | |
total tokens: 7974 num samples: 18 num padding tokens: 2210 - rank: 6 max len: 443 min len: 270 avg len: 320.22222222222223 num_loss_counted_tokens: 2977 | |
total tokens: 6676 num samples: 4 num padding tokens: 617 - rank: 2 max len: 1669 min len: 1336 avg len: 1514.75 num_loss_counted_tokens: 2305 | |
total tokens: 7974 num samples: 6 num padding tokens: 795 - rank: 3 max len: 1329 min len: 1046 avg len: 1196.5 num_loss_counted_tokens: 3823 | |
total tokens: 6596 num samples: 2 num padding tokens: 195 - rank: 0 max len: 3298 min len: 3103 avg len: 3200.5 num_loss_counted_tokens: 198 | |
Per-token loss scaled by world size: 0.00032304422347806394Per-token loss scaled by world size: 0.0002737718168646097Per-token loss scaled by world size: 0.0002592895762063563Per-token loss scaled by world size: 0.0002177765272790566Per-token loss scaled by world size: 0.00020476435020100325Per-token loss scaled by world size: 0.00020230024529155344Per-token loss scaled by world size: 5.3382074838737026e-05 | |
Epoch: 0, Step: 63, Rank: 0, loss = 0.1870107501745224Epoch: 0, Step: 63, Rank: 6, loss = 0.908356249332428Epoch: 0, Step: 63, Rank: 4, loss = 0.7173407077789307Epoch: 0, Step: 63, Rank: 5, loss = 1.1317046880722046 | |
Epoch: 0, Step: 63, Rank: 1, loss = 0.959091067314148Epoch: 0, Step: 63, Rank: 3, loss = 0.7087083458900452Epoch: 0, Step: 63, Rank: 7, loss = 0.7629256248474121 | |
Per-token loss scaled by world size: 0.0001610093895578757 | |
Epoch: 0, Step: 63, Rank: 2, loss = 0.5640561580657959 | |
Epoch 0: 52%|█████▏ | 63/121 [02:41<02:27, 2.54s/it] total tokens: 7680 num samples: 8 num padding tokens: 903 - rank: 4 max len: 960 min len: 763 avg len: 847.125 num_loss_counted_tokens: 3503 | |
total tokens: 7896 num samples: 3 num padding tokens: 1173 - rank: 1 max len: 2632 min len: 1954 avg len: 2241.0 num_loss_counted_tokens: 499 | |
{ | |
"epoch": 0, | |
"step": 63, | |
"rank": 0, | |
"loss": 0.1870107501745224, | |
"overall_throughput": 42.41853134282842, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.40930986404419, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 28026, | |
"batch_size": 89, | |
"total_loss": 0.7423991560935974, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:42.939707" | |
} | |
total tokens: 7400 num samples: 10 num padding tokens: 1048 - rank: 5 max len: 740 min len: 555 avg len: 635.2 num_loss_counted_tokens: 4022 | |
total tokens: 7010 num samples: 5 num padding tokens: 576 - rank: 2 max len: 1402 min len: 1179 avg len: 1286.8 num_loss_counted_tokens: 3377 | |
total tokens: 7182 num samples: 27 num padding tokens: 2131 - rank: 7 max len: 266 min len: 74 avg len: 187.07407407407408 num_loss_counted_tokens: 2558 | |
total tokens: 7935 num samples: 15 num padding tokens: 1793 - rank: 6 max len: 529 min len: 277 avg len: 409.46666666666664 num_loss_counted_tokens: 3824 | |
total tokens: 7548 num samples: 2 num padding tokens: 861 - rank: 0 max len: 3774 min len: 2913 avg len: 3343.5 num_loss_counted_tokens: 218 | |
total tokens: 6996 num samples: 6 num padding tokens: 573 - rank: 3 max len: 1166 min len: 965 avg len: 1070.5 num_loss_counted_tokens: 3633 | |
Per-token loss scaled by world size: 0.0005391178419813514Per-token loss scaled by world size: 0.00022075393644627184 | |
Per-token loss scaled by world size: 8.797919872449711e-05 | |
Per-token loss scaled by world size: 0.00035083515103906393 | |
Per-token loss scaled by world size: 0.0003944748896174133Per-token loss scaled by world size: 3.479456063359976e-05 | |
Per-token loss scaled by world size: 0.0002142135490430519 | |
Epoch: 0, Step: 64, Rank: 3, loss = 0.64051753282547 | |
Epoch: 0, Step: 64, Rank: 2, loss = 0.25527164340019226 | |
Epoch: 0, Step: 64, Rank: 5, loss = 1.5642504692077637 | |
Epoch: 0, Step: 64, Rank: 4, loss = 1.0179481506347656 | |
Epoch: 0, Step: 64, Rank: 6, loss = 1.144568920135498 | |
Epoch: 0, Step: 64, Rank: 1, loss = 0.10095641762018204Epoch: 0, Step: 64, Rank: 7, loss = 0.6215406060218811 | |
Per-token loss scaled by world size: 2.743201912380755e-05 | |
Epoch: 0, Step: 64, Rank: 0, loss = 0.07959400117397308 | |
Epoch 0: 53%|█████▎ | 64/121 [02:43<02:26, 2.57s/it] total tokens: 7266 num samples: 2 num padding tokens: 794 - rank: 1 max len: 3633 min len: 2839 avg len: 3236.0 num_loss_counted_tokens: 186 | |
total tokens: 7720 num samples: 8 num padding tokens: 848 - rank: 4 max len: 965 min len: 791 avg len: 859.0 num_loss_counted_tokens: 4634 | |
{ | |
"epoch": 0, | |
"step": 64, | |
"rank": 0, | |
"loss": 0.07959400117397308, | |
"overall_throughput": 41.08188082412845, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.510859966278076, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23212, | |
"batch_size": 70, | |
"total_loss": 0.6780809760093689, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:45.576193" | |
} | |
total tokens: 7564 num samples: 31 num padding tokens: 2701 - rank: 7 max len: 244 min len: 82 avg len: 156.8709677419355 num_loss_counted_tokens: 1913 | |
total tokens: 4065 num samples: 1 num padding tokens: 0 - rank: 0 max len: 4065 min len: 4065 avg len: 4065.0 num_loss_counted_tokens: 82 | |
total tokens: 8030 num samples: 5 num padding tokens: 1753 - rank: 3 max len: 1606 min len: 989 avg len: 1255.4 num_loss_counted_tokens: 3735 | |
total tokens: 7720 num samples: 10 num padding tokens: 1003 - rank: 5 max len: 772 min len: 589 avg len: 671.7 num_loss_counted_tokens: 4179 | |
total tokens: 7860 num samples: 4 num padding tokens: 620 - rank: 2 max len: 1965 min len: 1643 avg len: 1810.0 num_loss_counted_tokens: 1863 | |
total tokens: 7800 num samples: 15 num padding tokens: 1926 - rank: 6 max len: 520 min len: 284 avg len: 391.6 num_loss_counted_tokens: 3775 | |
Per-token loss scaled by world size: 0.000316505174851045Per-token loss scaled by world size: 0.00018296584312338382Per-token loss scaled by world size: 0.00035575314541347325Per-token loss scaled by world size: 0.00033105004695244133Per-token loss scaled by world size: 0.00041151116602122784Per-token loss scaled by world size: 4.141435056226328e-05 | |
Per-token loss scaled by world size: 0.0004561956156976521 | |
Epoch: 0, Step: 65, Rank: 6, loss = 0.928863525390625 | |
Epoch: 0, Step: 65, Rank: 0, loss = 0.12154076248407364Epoch: 0, Step: 65, Rank: 1, loss = 0.5369589924812317Epoch: 0, Step: 65, Rank: 3, loss = 1.0440465211868286 | |
Epoch: 0, Step: 65, Rank: 7, loss = 0.9715490937232971 | |
Epoch: 0, Step: 65, Rank: 5, loss = 1.3388200998306274 | |
Epoch: 0, Step: 65, Rank: 4, loss = 1.2076823711395264 | |
Per-token loss scaled by world size: 0.00015677251212764531 | |
Epoch: 0, Step: 65, Rank: 2, loss = 0.4600881338119507 | |
Epoch 0: 54%|█████▎ | 65/121 [02:46<02:23, 2.57s/it] total tokens: 6615 num samples: 3 num padding tokens: 1003 - rank: 1 max len: 2205 min len: 1609 avg len: 1870.6666666666667 num_loss_counted_tokens: 351 | |
total tokens: 7942 num samples: 11 num padding tokens: 995 - rank: 4 max len: 722 min len: 567 avg len: 631.5454545454545 num_loss_counted_tokens: 4015 | |
total tokens: 7820 num samples: 20 num padding tokens: 1923 - rank: 6 max len: 391 min len: 224 avg len: 294.85 num_loss_counted_tokens: 3347 | |
{ | |
"epoch": 0, | |
"step": 65, | |
"rank": 0, | |
"loss": 0.12154076248407364, | |
"overall_throughput": 42.203863937195706, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.473669052124023, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23478, | |
"batch_size": 88, | |
"total_loss": 0.8261936902999878, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:48.138254" | |
} | |
total tokens: 4515 num samples: 21 num padding tokens: 1409 - rank: 7 max len: 215 min len: 76 avg len: 147.9047619047619 num_loss_counted_tokens: 1105 | |
total tokens: 7320 num samples: 8 num padding tokens: 903 - rank: 3 max len: 915 min len: 724 avg len: 802.125 num_loss_counted_tokens: 4454 | |
total tokens: 7735 num samples: 5 num padding tokens: 1812 - rank: 2 max len: 1547 min len: 1021 avg len: 1184.6 num_loss_counted_tokens: 1072 | |
total tokens: 7896 num samples: 14 num padding tokens: 1195 - rank: 5 max len: 564 min len: 397 avg len: 478.64285714285717 num_loss_counted_tokens: 4241 | |
total tokens: 6444 num samples: 2 num padding tokens: 493 - rank: 0 max len: 3222 min len: 2729 avg len: 2975.5 num_loss_counted_tokens: 461 | |
Per-token loss scaled by world size: 0.00020512452465482056Per-token loss scaled by world size: 0.00037579398485831916Per-token loss scaled by world size: 0.00022079057816881686Per-token loss scaled by world size: 0.0002576705301180482 | |
Per-token loss scaled by world size: 5.168040661374107e-05 | |
Per-token loss scaled by world size: 0.00019716547103598714 | |
Per-token loss scaled by world size: 7.161292160162702e-05 | |
Epoch: 0, Step: 66, Rank: 6, loss = 0.8971443176269531Epoch: 0, Step: 66, Rank: 4, loss = 1.3084206581115723 | |
Epoch: 0, Step: 66, Rank: 0, loss = 0.17993825674057007 | |
Epoch: 0, Step: 66, Rank: 3, loss = 0.7687376141548157 | |
Epoch: 0, Step: 66, Rank: 2, loss = 0.7141923308372498 | |
Epoch: 0, Step: 66, Rank: 1, loss = 0.6864808797836304 | |
Epoch: 0, Step: 66, Rank: 7, loss = 0.249338299036026 | |
Per-token loss scaled by world size: 0.0002357129706069827 | |
Epoch: 0, Step: 66, Rank: 5, loss = 0.8206936120986938 | |
Epoch 0: 55%|█████▍ | 66/121 [02:48<02:20, 2.55s/it] total tokens: 6852 num samples: 3 num padding tokens: 921 - rank: 1 max len: 2284 min len: 1726 avg len: 1977.0 num_loss_counted_tokens: 3152 | |
total tokens: 7542 num samples: 9 num padding tokens: 1092 - rank: 4 max len: 838 min len: 652 avg len: 716.6666666666666 num_loss_counted_tokens: 3407 | |
total tokens: 7788 num samples: 33 num padding tokens: 2754 - rank: 7 max len: 236 min len: 71 avg len: 152.54545454545453 num_loss_counted_tokens: 2032 | |
{ | |
"epoch": 0, | |
"step": 66, | |
"rank": 0, | |
"loss": 0.17993825674057007, | |
"overall_throughput": 43.1739543464684, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.432954788208008, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 27854, | |
"batch_size": 94, | |
"total_loss": 0.7031182050704956, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:50.647621" | |
} | |
total tokens: 7980 num samples: 19 num padding tokens: 1539 - rank: 6 max len: 420 min len: 240 avg len: 339.0 num_loss_counted_tokens: 3367 | |
total tokens: 7175 num samples: 7 num padding tokens: 309 - rank: 3 max len: 1025 min len: 903 avg len: 980.8571428571429 num_loss_counted_tokens: 5453 | |
total tokens: 7130 num samples: 5 num padding tokens: 1200 - rank: 2 max len: 1426 min len: 1084 avg len: 1186.0 num_loss_counted_tokens: 3905 | |
total tokens: 7388 num samples: 2 num padding tokens: 641 - rank: 0 max len: 3694 min len: 3053 avg len: 3373.5 num_loss_counted_tokens: 603 | |
total tokens: 7668 num samples: 12 num padding tokens: 1327 - rank: 5 max len: 639 min len: 423 avg len: 528.4166666666666 num_loss_counted_tokens: 3896 | |
Per-token loss scaled by world size: 0.00023726793006062508Per-token loss scaled by world size: 0.00031606658012606204Per-token loss scaled by world size: 0.000504097668454051Per-token loss scaled by world size: 0.00039712167927064Per-token loss scaled by world size: 0.0004929095157422125Per-token loss scaled by world size: 6.066870355425635e-06Per-token loss scaled by world size: 2.3820351998438127e-05 | |
Epoch: 0, Step: 67, Rank: 2, loss = 0.6478897333145142 | |
Epoch: 0, Step: 67, Rank: 3, loss = 0.8630593419075012Epoch: 0, Step: 67, Rank: 6, loss = 1.3765016794204712 | |
Epoch: 0, Step: 67, Rank: 0, loss = 0.01656634733080864 | |
Epoch: 0, Step: 67, Rank: 7, loss = 1.08439040184021 | |
Epoch: 0, Step: 67, Rank: 4, loss = 1.3459510803222656 | |
Epoch: 0, Step: 67, Rank: 1, loss = 0.06504444777965546 | |
Per-token loss scaled by world size: 0.00038537452928721905 | |
Epoch: 0, Step: 67, Rank: 5, loss = 1.0523133277893066 | |
Epoch 0: 55%|█████▌ | 67/121 [02:51<02:16, 2.53s/it] total tokens: 6858 num samples: 3 num padding tokens: 879 - rank: 1 max len: 2286 min len: 1606 avg len: 1993.0 num_loss_counted_tokens: 2226 | |
total tokens: 8001 num samples: 9 num padding tokens: 744 - rank: 4 max len: 889 min len: 750 avg len: 806.3333333333334 num_loss_counted_tokens: 3999 | |
{ | |
"epoch": 0, | |
"step": 67, | |
"rank": 0, | |
"loss": 0.01656634733080864, | |
"overall_throughput": 43.254684230948214, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.338647842407227, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21845, | |
"batch_size": 79, | |
"total_loss": 0.8064644932746887, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:53.149207" | |
} | |
total tokens: 8085 num samples: 11 num padding tokens: 2256 - rank: 5 max len: 735 min len: 410 avg len: 529.9090909090909 num_loss_counted_tokens: 4137 | |
total tokens: 7679 num samples: 7 num padding tokens: 788 - rank: 3 max len: 1097 min len: 910 avg len: 984.4285714285714 num_loss_counted_tokens: 4646 | |
total tokens: 6030 num samples: 2 num padding tokens: 245 - rank: 0 max len: 3015 min len: 2770 avg len: 2892.5 num_loss_counted_tokens: 180 | |
total tokens: 8040 num samples: 20 num padding tokens: 1710 - rank: 6 max len: 402 min len: 246 avg len: 316.5 num_loss_counted_tokens: 2948 | |
total tokens: 7520 num samples: 32 num padding tokens: 2159 - rank: 7 max len: 235 min len: 85 avg len: 167.53125 num_loss_counted_tokens: 2187 | |
total tokens: 7895 num samples: 5 num padding tokens: 1603 - rank: 2 max len: 1579 min len: 1158 avg len: 1258.4 num_loss_counted_tokens: 2148 | |
Per-token loss scaled by world size: 0.00019273992802482098Per-token loss scaled by world size: 0.00030933329253457487Per-token loss scaled by world size: 0.00017293139535468072Per-token loss scaled by world size: 0.00035478913923725486 | |
Per-token loss scaled by world size: 0.0003922785690519959 | |
Per-token loss scaled by world size: 4.257708951627137e-06 | |
Per-token loss scaled by world size: 0.0001415474253008142 | |
Epoch: 0, Step: 68, Rank: 6, loss = 1.0613998174667358 | |
Epoch: 0, Step: 68, Rank: 2, loss = 0.6613388657569885Epoch: 0, Step: 68, Rank: 5, loss = 1.3460057973861694 | |
Epoch: 0, Step: 68, Rank: 0, loss = 0.014609264209866524 | |
Epoch: 0, Step: 68, Rank: 4, loss = 1.2173702716827393Epoch: 0, Step: 68, Rank: 1, loss = 0.5933708548545837 | |
Epoch: 0, Step: 68, Rank: 7, loss = 0.4856846034526825 | |
Per-token loss scaled by world size: 0.00021395196381490678 | |
Epoch: 0, Step: 68, Rank: 3, loss = 0.7341226935386658 | |
Epoch 0: 56%|█████▌ | 68/121 [02:53<02:14, 2.54s/it] total tokens: 8082 num samples: 9 num padding tokens: 769 - rank: 4 max len: 898 min len: 707 avg len: 812.5555555555555 num_loss_counted_tokens: 4363 | |
total tokens: 7876 num samples: 4 num padding tokens: 706 - rank: 1 max len: 1969 min len: 1606 avg len: 1792.5 num_loss_counted_tokens: 565 | |
{ | |
"epoch": 0, | |
"step": 68, | |
"rank": 0, | |
"loss": 0.014609264209866524, | |
"overall_throughput": 42.56272046948144, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.448001861572266, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 27450, | |
"batch_size": 88, | |
"total_loss": 0.7642378211021423, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:55.692310" | |
} | |
total tokens: 7612 num samples: 11 num padding tokens: 1187 - rank: 5 max len: 692 min len: 458 avg len: 584.0909090909091 num_loss_counted_tokens: 4392 | |
total tokens: 5586 num samples: 2 num padding tokens: 652 - rank: 0 max len: 2793 min len: 2141 avg len: 2467.0 num_loss_counted_tokens: 151 | |
total tokens: 8060 num samples: 31 num padding tokens: 3056 - rank: 7 max len: 260 min len: 75 avg len: 161.41935483870967 num_loss_counted_tokens: 2101 | |
total tokens: 7220 num samples: 5 num padding tokens: 823 - rank: 2 max len: 1444 min len: 1143 avg len: 1279.4 num_loss_counted_tokens: 1187 | |
total tokens: 7902 num samples: 18 num padding tokens: 1530 - rank: 6 max len: 439 min len: 282 avg len: 354.0 num_loss_counted_tokens: 3796 | |
total tokens: 7882 num samples: 7 num padding tokens: 581 - rank: 3 max len: 1126 min len: 922 avg len: 1043.0 num_loss_counted_tokens: 4213 | |
Per-token loss scaled by world size: 0.0005160618457011878Per-token loss scaled by world size: 0.00044540074304677546Per-token loss scaled by world size: 4.21712247771211e-05Per-token loss scaled by world size: 0.0002244754577986896Per-token loss scaled by world size: 0.0007427233504131436 | |
Per-token loss scaled by world size: 9.168142241833266e-06 | |
Per-token loss scaled by world size: 0.0003082228358834982 | |
Epoch: 0, Step: 69, Rank: 2, loss = 0.09369391947984695 | |
Epoch: 0, Step: 69, Rank: 5, loss = 0.9895691275596619Epoch: 0, Step: 69, Rank: 6, loss = 1.1465604305267334Epoch: 0, Step: 69, Rank: 3, loss = 0.49872833490371704 | |
Epoch: 0, Step: 69, Rank: 4, loss = 1.6501456499099731 | |
Epoch: 0, Step: 69, Rank: 1, loss = 0.02036931924521923 | |
Epoch: 0, Step: 69, Rank: 7, loss = 0.6847940683364868Per-token loss scaled by world size: 4.934850949211977e-06 | |
Epoch: 0, Step: 69, Rank: 0, loss = 0.010964005254209042 | |
Epoch 0: 57%|█████▋ | 69/121 [02:56<02:12, 2.55s/it] total tokens: 7194 num samples: 6 num padding tokens: 688 - rank: 1 max len: 1199 min len: 1008 avg len: 1084.3333333333333 num_loss_counted_tokens: 3201 | |
total tokens: 7668 num samples: 12 num padding tokens: 514 - rank: 4 max len: 639 min len: 540 avg len: 596.1666666666666 num_loss_counted_tokens: 3626 | |
{ | |
"epoch": 0, | |
"step": 69, | |
"rank": 0, | |
"loss": 0.010964005254209042, | |
"overall_throughput": 41.668886771410214, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.496148586273193, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 17774, | |
"batch_size": 74, | |
"total_loss": 0.6368531584739685, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:50:58.281411" | |
} | |
total tokens: 3168 num samples: 24 num padding tokens: 653 - rank: 7 max len: 132 min len: 86 avg len: 104.79166666666667 num_loss_counted_tokens: 693 | |
total tokens: 5566 num samples: 2 num padding tokens: 709 - rank: 0 max len: 2783 min len: 2074 avg len: 2428.5 num_loss_counted_tokens: 267 | |
total tokens: 8040 num samples: 15 num padding tokens: 1436 - rank: 5 max len: 536 min len: 357 avg len: 440.26666666666665 num_loss_counted_tokens: 4304 | |
total tokens: 7872 num samples: 24 num padding tokens: 2389 - rank: 6 max len: 328 min len: 136 avg len: 228.45833333333334 num_loss_counted_tokens: 2377 | |
total tokens: 7784 num samples: 8 num padding tokens: 564 - rank: 2 max len: 973 min len: 847 avg len: 902.5 num_loss_counted_tokens: 5185 | |
total tokens: 7800 num samples: 10 num padding tokens: 617 - rank: 3 max len: 780 min len: 641 avg len: 718.3 num_loss_counted_tokens: 5599 | |
Per-token loss scaled by world size: 0.00038898465572856367Per-token loss scaled by world size: 0.0003429916687309742Per-token loss scaled by world size: 0.0005293539143167436Per-token loss scaled by world size: 2.475303517712746e-06 | |
Per-token loss scaled by world size: 0.00041178142419084907 | |
Per-token loss scaled by world size: 0.00016001032781787217Per-token loss scaled by world size: 3.3767562854336575e-05 | |
Epoch: 0, Step: 70, Rank: 6, loss = 0.9818993806838989 | |
Epoch: 0, Step: 70, Rank: 4, loss = 1.515407919883728 | |
Epoch: 0, Step: 70, Rank: 5, loss = 1.1135658025741577 | |
Epoch: 0, Step: 70, Rank: 0, loss = 0.00708617502823472 | |
Epoch: 0, Step: 70, Rank: 3, loss = 1.1788272857666016 | |
Epoch: 0, Step: 70, Rank: 7, loss = 0.4580695629119873 | |
Epoch: 0, Step: 70, Rank: 1, loss = 0.09666808694601059 | |
Per-token loss scaled by world size: 0.00012752025213558227 | |
Epoch: 0, Step: 70, Rank: 2, loss = 0.3650586009025574 | |
Epoch 0: 58%|█████▊ | 70/121 [02:58<02:09, 2.54s/it] total tokens: 7424 num samples: 8 num padding tokens: 763 - rank: 4 max len: 928 min len: 726 avg len: 832.625 num_loss_counted_tokens: 3620 | |
total tokens: 7068 num samples: 4 num padding tokens: 755 - rank: 1 max len: 1767 min len: 1403 avg len: 1578.25 num_loss_counted_tokens: 1097 | |
{ | |
"epoch": 0, | |
"step": 70, | |
"rank": 0, | |
"loss": 0.00708617502823472, | |
"overall_throughput": 42.86151276291768, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.356226444244385, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22902, | |
"batch_size": 78, | |
"total_loss": 0.7145729064941406, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:00.845490" | |
} | |
total tokens: 6700 num samples: 25 num padding tokens: 2499 - rank: 7 max len: 268 min len: 77 avg len: 168.04 num_loss_counted_tokens: 1712 | |
total tokens: 7984 num samples: 16 num padding tokens: 1302 - rank: 6 max len: 499 min len: 278 avg len: 417.625 num_loss_counted_tokens: 3478 | |
total tokens: 7644 num samples: 7 num padding tokens: 654 - rank: 3 max len: 1092 min len: 950 avg len: 998.5714285714286 num_loss_counted_tokens: 5657 | |
total tokens: 6950 num samples: 5 num padding tokens: 699 - rank: 2 max len: 1390 min len: 1102 avg len: 1250.2 num_loss_counted_tokens: 2977 | |
total tokens: 7590 num samples: 11 num padding tokens: 1060 - rank: 5 max len: 690 min len: 509 avg len: 593.6363636363636 num_loss_counted_tokens: 4071 | |
total tokens: 7488 num samples: 3 num padding tokens: 1132 - rank: 0 max len: 2496 min len: 1787 avg len: 2118.6666666666665 num_loss_counted_tokens: 1930 | |
Per-token loss scaled by world size: 0.00010220974945696071Per-token loss scaled by world size: 0.0004755923873744905Per-token loss scaled by world size: 0.00013658934039995074Per-token loss scaled by world size: 0.0005745669477619231Per-token loss scaled by world size: 0.00038079574005678296Per-token loss scaled by world size: 1.1699220294758561e-06 | |
Per-token loss scaled by world size: 0.0002442343102302402 | |
Epoch: 0, Step: 71, Rank: 6, loss = 1.3208389282226562 | |
Epoch: 0, Step: 71, Rank: 5, loss = 1.595716118812561 | |
Epoch: 0, Step: 71, Rank: 0, loss = 0.0032491658348590136Epoch: 0, Step: 71, Rank: 1, loss = 0.28386202454566956Epoch: 0, Step: 71, Rank: 2, loss = 0.3793427646160126Epoch: 0, Step: 71, Rank: 4, loss = 1.0575649738311768 | |
Epoch: 0, Step: 71, Rank: 7, loss = 0.6782997250556946 | |
Per-token loss scaled by world size: 0.0002879296080209315 | |
Epoch: 0, Step: 71, Rank: 3, loss = 0.7996525168418884 | |
Epoch 0: 59%|█████▊ | 71/121 [03:01<02:07, 2.54s/it] total tokens: 7900 num samples: 10 num padding tokens: 433 - rank: 4 max len: 790 min len: 700 avg len: 746.7 num_loss_counted_tokens: 4392 | |
total tokens: 7940 num samples: 5 num padding tokens: 1033 - rank: 1 max len: 1588 min len: 1214 avg len: 1381.4 num_loss_counted_tokens: 4515 | |
{ | |
"epoch": 0, | |
"step": 71, | |
"rank": 0, | |
"loss": 0.0032491658348590136, | |
"overall_throughput": 42.58421912234305, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 24.31266736984253, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22218, | |
"batch_size": 83, | |
"total_loss": 0.7648157477378845, | |
"gradnorm": 0.9589425325393677, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:03.350519" | |
} | |
total tokens: 7645 num samples: 11 num padding tokens: 1259 - rank: 5 max len: 695 min len: 477 avg len: 580.5454545454545 num_loss_counted_tokens: 4944 | |
total tokens: 8092 num samples: 17 num padding tokens: 1854 - rank: 6 max len: 476 min len: 261 avg len: 366.94117647058823 num_loss_counted_tokens: 3900 | |
total tokens: 7931 num samples: 7 num padding tokens: 505 - rank: 2 max len: 1133 min len: 979 avg len: 1060.857142857143 num_loss_counted_tokens: 4912 | |
total tokens: 7776 num samples: 8 num padding tokens: 515 - rank: 3 max len: 972 min len: 822 avg len: 907.625 num_loss_counted_tokens: 5257 | |
total tokens: 7904 num samples: 32 num padding tokens: 2745 - rank: 7 max len: 247 min len: 71 avg len: 161.21875 num_loss_counted_tokens: 2232 | |
total tokens: 8004 num samples: 4 num padding tokens: 730 - rank: 0 max len: 2001 min len: 1656 avg len: 1818.5 num_loss_counted_tokens: 2493 | |
Per-token loss scaled by world size: 0.0004883022629655898Per-token loss scaled by world size: 0.00043922686018049717Per-token loss scaled by world size: 0.0004386535147204995 | |
Per-token loss scaled by world size: 0.00020862463861703873 | |
Per-token loss scaled by world size: 0.0003196638426743448Per-token loss scaled by world size: 4.3209151954215486e-06 | |
Per-token loss scaled by world size: 0.0001734672114253044 | |
Epoch: 0, Step: 72, Rank: 6, loss = 1.25338876247406 | |
Epoch: 0, Step: 72, Rank: 5, loss = 1.2517526149749756Epoch: 0, Step: 72, Rank: 4, loss = 1.393431544303894 | |
Epoch: 0, Step: 72, Rank: 2, loss = 0.5953364968299866 | |
Epoch: 0, Step: 72, Rank: 7, loss = 0.9122007489204407 | |
Epoch: 0, Step: 72, Rank: 0, loss = 0.012330272234976292 | |
Epoch: 0, Step: 72, Rank: 1, loss = 0.4950103759765625 | |
Per-token loss scaled by world size: 0.00020103121642023325 | |
Epoch: 0, Step: 72, Rank: 3, loss = 0.5736677050590515 | |
[2024-08-18 20:51:05,925] [INFO] [logging.py:96:log_dist] [Rank 0] step=2, skipped=0, lr=[1.6000000000000001e-06], mom=[(0.9, 0.95)] | |
Epoch 0: 60%|█████▉ | 72/121 [03:04<02:06, 2.58s/it] total tokens: 7434 num samples: 7 num padding tokens: 801 - rank: 4 max len: 1062 min len: 831 avg len: 947.5714285714286 num_loss_counted_tokens: 3620 | |
total tokens: 7284 num samples: 3 num padding tokens: 180 - rank: 1 max len: 2428 min len: 2292 avg len: 2368.0 num_loss_counted_tokens: 273 | |
{ | |
"epoch": 0, | |
"step": 72, | |
"rank": 0, | |
"loss": 0.012330272234976292, | |
"overall_throughput": 41.0709419873187, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 22.637446880340576, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22829, | |
"batch_size": 79, | |
"total_loss": 0.8108897805213928, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:06.059291" | |
} | |
total tokens: 6880 num samples: 5 num padding tokens: 854 - rank: 3 max len: 1376 min len: 1145 avg len: 1205.2 num_loss_counted_tokens: 2855 | |
total tokens: 6178 num samples: 2 num padding tokens: 622 - rank: 0 max len: 3089 min len: 2467 avg len: 2778.0 num_loss_counted_tokens: 309 | |
total tokens: 7306 num samples: 26 num padding tokens: 2608 - rank: 7 max len: 281 min len: 72 avg len: 180.69230769230768 num_loss_counted_tokens: 1912 | |
total tokens: 7960 num samples: 4 num padding tokens: 988 - rank: 2 max len: 1990 min len: 1586 avg len: 1743.0 num_loss_counted_tokens: 1479 | |
total tokens: 7548 num samples: 12 num padding tokens: 2286 - rank: 6 max len: 629 min len: 305 avg len: 438.5 num_loss_counted_tokens: 3206 | |
total tokens: 8040 num samples: 10 num padding tokens: 894 - rank: 5 max len: 804 min len: 634 avg len: 714.6 num_loss_counted_tokens: 4636 | |
Per-token loss scaled by world size: 0.00029286538483574986Per-token loss scaled by world size: 0.0002602968306746334Per-token loss scaled by world size: 0.00021679738711100072Per-token loss scaled by world size: 0.00021336728241294622Per-token loss scaled by world size: 0.0002807514392770827 | |
Per-token loss scaled by world size: 2.977332087539253e-06 | |
Per-token loss scaled by world size: 0.00019802094902843237 | |
Epoch: 0, Step: 73, Rank: 1, loss = 0.8374074697494507 | |
Epoch: 0, Step: 73, Rank: 6, loss = 0.9421845078468323Epoch: 0, Step: 73, Rank: 4, loss = 0.6974642872810364 | |
Epoch: 0, Step: 73, Rank: 0, loss = 0.00957844965159893Epoch: 0, Step: 73, Rank: 3, loss = 0.6864292025566101 | |
Epoch: 0, Step: 73, Rank: 2, loss = 0.9032124280929565 | |
Epoch: 0, Step: 73, Rank: 7, loss = 0.6370581388473511 | |
Per-token loss scaled by world size: 0.0002398234064457938 | |
Epoch: 0, Step: 73, Rank: 5, loss = 0.7715418934822083 | |
Epoch 0: 60%|██████ | 73/121 [03:06<02:03, 2.57s/it] total tokens: 7832 num samples: 8 num padding tokens: 827 - rank: 4 max len: 979 min len: 822 avg len: 875.625 num_loss_counted_tokens: 5623 | |
total tokens: 7455 num samples: 3 num padding tokens: 291 - rank: 1 max len: 2485 min len: 2232 avg len: 2388.0 num_loss_counted_tokens: 600 | |
{ | |
"epoch": 0, | |
"step": 73, | |
"rank": 0, | |
"loss": 0.00957844965159893, | |
"overall_throughput": 41.76512858398771, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.471110343933105, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25737, | |
"batch_size": 88, | |
"total_loss": 0.6856094598770142, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:08.555309" | |
} | |
total tokens: 8020 num samples: 10 num padding tokens: 1418 - rank: 5 max len: 802 min len: 571 avg len: 660.2 num_loss_counted_tokens: 4480 | |
total tokens: 5714 num samples: 2 num padding tokens: 293 - rank: 0 max len: 2857 min len: 2564 avg len: 2710.5 num_loss_counted_tokens: 246 | |
total tokens: 7260 num samples: 6 num padding tokens: 413 - rank: 3 max len: 1210 min len: 1056 avg len: 1141.1666666666667 num_loss_counted_tokens: 3104 | |
total tokens: 6768 num samples: 4 num padding tokens: 992 - rank: 2 max len: 1692 min len: 1291 avg len: 1444.0 num_loss_counted_tokens: 3075 | |
total tokens: 7602 num samples: 14 num padding tokens: 1899 - rank: 6 max len: 543 min len: 306 avg len: 407.35714285714283 num_loss_counted_tokens: 3724 | |
total tokens: 7930 num samples: 26 num padding tokens: 2791 - rank: 7 max len: 305 min len: 79 avg len: 197.65384615384616 num_loss_counted_tokens: 2193 | |
Per-token loss scaled by world size: 0.0004040475469082594Per-token loss scaled by world size: 0.00014303348143585026Per-token loss scaled by world size: 0.00015468306082766503Per-token loss scaled by world size: 0.00038016383768990636Per-token loss scaled by world size: 0.0002839408116415143Per-token loss scaled by world size: 0.00020860570657532662 | |
Per-token loss scaled by world size: 3.69467556993186e-06 | |
Epoch: 0, Step: 74, Rank: 5, loss = 1.1417745351791382 | |
Epoch: 0, Step: 74, Rank: 1, loss = 0.4645712375640869Epoch: 0, Step: 74, Rank: 4, loss = 0.4295831620693207Epoch: 0, Step: 74, Rank: 6, loss = 1.2135063409805298Epoch: 0, Step: 74, Rank: 7, loss = 0.8527806997299194 | |
Epoch: 0, Step: 74, Rank: 0, loss = 0.011096496134996414 | |
Epoch: 0, Step: 74, Rank: 2, loss = 0.6265211701393127 | |
Per-token loss scaled by world size: 0.00020632839004974812 | |
Epoch: 0, Step: 74, Rank: 3, loss = 0.6196815371513367 | |
Epoch 0: 61%|██████ | 74/121 [03:09<02:00, 2.55s/it] total tokens: 7696 num samples: 8 num padding tokens: 1763 - rank: 4 max len: 962 min len: 651 avg len: 741.625 num_loss_counted_tokens: 5122 | |
total tokens: 7854 num samples: 3 num padding tokens: 565 - rank: 1 max len: 2618 min len: 2101 avg len: 2429.6666666666665 num_loss_counted_tokens: 298 | |
{ | |
"epoch": 0, | |
"step": 74, | |
"rank": 0, | |
"loss": 0.011096496134996414, | |
"overall_throughput": 41.849181139415684, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.389501094818115, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24027, | |
"batch_size": 89, | |
"total_loss": 0.6699394583702087, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:11.072098" | |
} | |
total tokens: 7530 num samples: 5 num padding tokens: 1325 - rank: 3 max len: 1506 min len: 1053 avg len: 1241.0 num_loss_counted_tokens: 2113 | |
total tokens: 7860 num samples: 30 num padding tokens: 3145 - rank: 7 max len: 262 min len: 80 avg len: 157.16666666666666 num_loss_counted_tokens: 1940 | |
total tokens: 7596 num samples: 12 num padding tokens: 1105 - rank: 5 max len: 633 min len: 472 avg len: 540.9166666666666 num_loss_counted_tokens: 4291 | |
total tokens: 4061 num samples: 1 num padding tokens: 0 - rank: 0 max len: 4061 min len: 4061 avg len: 4061.0 num_loss_counted_tokens: 393 | |
total tokens: 6255 num samples: 3 num padding tokens: 930 - rank: 2 max len: 2085 min len: 1507 avg len: 1775.0 num_loss_counted_tokens: 1265 | |
total tokens: 7905 num samples: 17 num padding tokens: 1989 - rank: 6 max len: 465 min len: 271 avg len: 348.0 num_loss_counted_tokens: 3447 | |
Per-token loss scaled by world size: 0.0008345923852175474Per-token loss scaled by world size: 0.0001891565480036661Per-token loss scaled by world size: 0.0006257555796764791 | |
Per-token loss scaled by world size: 5.515092198038474e-06Per-token loss scaled by world size: 0.00020594018860720098Per-token loss scaled by world size: 4.789793456438929e-05 | |
Per-token loss scaled by world size: 9.402850264450535e-05 | |
Epoch: 0, Step: 75, Rank: 5, loss = 1.483744740486145 | |
Epoch: 0, Step: 75, Rank: 3, loss = 0.44851380586624146 | |
Epoch: 0, Step: 75, Rank: 0, loss = 0.013076973147690296Epoch: 0, Step: 75, Rank: 4, loss = 1.9789228439331055 | |
Epoch: 0, Step: 75, Rank: 2, loss = 0.22295333445072174 | |
Epoch: 0, Step: 75, Rank: 1, loss = 0.11357199400663376Epoch: 0, Step: 75, Rank: 7, loss = 0.48830991983413696 | |
Per-token loss scaled by world size: 0.0005321354838088155 | |
Epoch: 0, Step: 75, Rank: 6, loss = 1.2617597579956055 | |
Epoch 0: 62%|██████▏ | 75/121 [03:11<01:57, 2.54s/it] total tokens: 7851 num samples: 3 num padding tokens: 1455 - rank: 1 max len: 2617 min len: 1665 avg len: 2132.0 num_loss_counted_tokens: 271 | |
total tokens: 7112 num samples: 7 num padding tokens: 1148 - rank: 4 max len: 1016 min len: 665 avg len: 852.0 num_loss_counted_tokens: 3987 | |
total tokens: 8086 num samples: 13 num padding tokens: 1573 - rank: 5 max len: 622 min len: 389 avg len: 501.0 num_loss_counted_tokens: 4062 | |
{ | |
"epoch": 0, | |
"step": 75, | |
"rank": 0, | |
"loss": 0.013076973147690296, | |
"overall_throughput": 41.95248737744596, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.426692962646484, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18969, | |
"batch_size": 77, | |
"total_loss": 0.7513566613197327, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:13.595856" | |
} | |
total tokens: 7182 num samples: 6 num padding tokens: 591 - rank: 3 max len: 1197 min len: 1043 avg len: 1098.5 num_loss_counted_tokens: 5313 | |
total tokens: 6628 num samples: 4 num padding tokens: 547 - rank: 2 max len: 1657 min len: 1381 avg len: 1520.25 num_loss_counted_tokens: 947 | |
total tokens: 3496 num samples: 19 num padding tokens: 1312 - rank: 7 max len: 184 min len: 78 avg len: 114.94736842105263 num_loss_counted_tokens: 656 | |
total tokens: 7938 num samples: 21 num padding tokens: 2607 - rank: 6 max len: 378 min len: 187 avg len: 253.85714285714286 num_loss_counted_tokens: 2571 | |
total tokens: 7226 num samples: 2 num padding tokens: 50 - rank: 0 max len: 3613 min len: 3563 avg len: 3588.0 num_loss_counted_tokens: 179 | |
Per-token loss scaled by world size: 0.00010904129885602742Per-token loss scaled by world size: 0.00042939232662320137Per-token loss scaled by world size: 0.0003037904389202595 | |
Per-token loss scaled by world size: 0.00046344727161340415Per-token loss scaled by world size: 6.435919203795493e-05Per-token loss scaled by world size: 0.0002804531832225621 | |
Per-token loss scaled by world size: 0.00045534392120316625 | |
Epoch: 0, Step: 76, Rank: 5, loss = 1.2729872465133667 | |
Epoch: 0, Step: 76, Rank: 6, loss = 1.3739473819732666 | |
Epoch: 0, Step: 76, Rank: 2, loss = 0.900624692440033 | |
Epoch: 0, Step: 76, Rank: 0, loss = 0.19080086052417755Epoch: 0, Step: 76, Rank: 1, loss = 0.32326656579971313 | |
Epoch: 0, Step: 76, Rank: 7, loss = 0.8314384818077087 | |
Epoch: 0, Step: 76, Rank: 4, loss = 1.3499239683151245 | |
Per-token loss scaled by world size: 0.000354817311745137 | |
Epoch: 0, Step: 76, Rank: 3, loss = 1.0519002676010132 | |
Epoch 0: 63%|██████▎ | 76/121 [03:14<01:54, 2.54s/it] total tokens: 7095 num samples: 5 num padding tokens: 648 - rank: 4 max len: 1419 min len: 1179 avg len: 1289.4 num_loss_counted_tokens: 3185 | |
total tokens: 6052 num samples: 2 num padding tokens: 305 - rank: 1 max len: 3026 min len: 2721 avg len: 2873.5 num_loss_counted_tokens: 349 | |
{ | |
"epoch": 0, | |
"step": 76, | |
"rank": 0, | |
"loss": 0.19080086052417755, | |
"overall_throughput": 41.77716057032778, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.457417488098145, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23717, | |
"batch_size": 104, | |
"total_loss": 0.9118610620498657, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:16.129445" | |
} | |
total tokens: 7774 num samples: 13 num padding tokens: 2031 - rank: 6 max len: 598 min len: 311 avg len: 441.7692307692308 num_loss_counted_tokens: 3287 | |
total tokens: 8034 num samples: 3 num padding tokens: 672 - rank: 2 max len: 2678 min len: 2124 avg len: 2454.0 num_loss_counted_tokens: 1267 | |
total tokens: 7544 num samples: 4 num padding tokens: 572 - rank: 3 max len: 1886 min len: 1658 avg len: 1743.0 num_loss_counted_tokens: 711 | |
total tokens: 7700 num samples: 7 num padding tokens: 2236 - rank: 5 max len: 1100 min len: 637 avg len: 780.5714285714286 num_loss_counted_tokens: 3735 | |
total tokens: 6648 num samples: 24 num padding tokens: 2581 - rank: 7 max len: 277 min len: 90 avg len: 169.45833333333334 num_loss_counted_tokens: 1750 | |
total tokens: 6410 num samples: 2 num padding tokens: 53 - rank: 0 max len: 3205 min len: 3152 avg len: 3178.5 num_loss_counted_tokens: 196 | |
Per-token loss scaled by world size: 0.00034956797026097775Per-token loss scaled by world size: 0.00019042924395762384Per-token loss scaled by world size: 0.00021594665304291993Per-token loss scaled by world size: 0.000333549891365692Per-token loss scaled by world size: 0.00039773472235538065 | |
Per-token loss scaled by world size: 1.5378537909782608e-06Per-token loss scaled by world size: 1.5691426597186364e-05 | |
Epoch: 0, Step: 77, Rank: 7, loss = 1.0991719961166382Epoch: 0, Step: 77, Rank: 6, loss = 0.7116252183914185 | |
Epoch: 0, Step: 77, Rank: 4, loss = 1.3106850385665894 | |
Epoch: 0, Step: 77, Rank: 5, loss = 1.1519575119018555 | |
Epoch: 0, Step: 77, Rank: 2, loss = 0.6275357604026794 | |
Epoch: 0, Step: 77, Rank: 1, loss = 0.0517091378569603Epoch: 0, Step: 77, Rank: 0, loss = 0.005067804828286171 | |
Per-token loss scaled by world size: 0.0002473424538038671 | |
Epoch: 0, Step: 77, Rank: 3, loss = 0.8150861859321594 | |
Epoch 0: 64%|██████▎ | 77/121 [03:16<01:51, 2.53s/it] total tokens: 7920 num samples: 9 num padding tokens: 741 - rank: 4 max len: 880 min len: 742 avg len: 797.6666666666666 num_loss_counted_tokens: 3310 | |
total tokens: 5446 num samples: 2 num padding tokens: 50 - rank: 1 max len: 2723 min len: 2673 avg len: 2698.0 num_loss_counted_tokens: 175 | |
total tokens: 7689 num samples: 11 num padding tokens: 1344 - rank: 5 max len: 699 min len: 481 avg len: 576.8181818181819 num_loss_counted_tokens: 3004 | |
total tokens: 8041 num samples: 17 num padding tokens: 1807 - rank: 6 max len: 473 min len: 265 avg len: 366.70588235294116 num_loss_counted_tokens: 4129 | |
{ | |
"epoch": 0, | |
"step": 77, | |
"rank": 0, | |
"loss": 0.005067804828286171, | |
"overall_throughput": 42.455461086288004, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.274834632873535, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26363, | |
"batch_size": 91, | |
"total_loss": 0.7216048836708069, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:18.625029" | |
} | |
total tokens: 6488 num samples: 2 num padding tokens: 319 - rank: 0 max len: 3244 min len: 2925 avg len: 3084.5 num_loss_counted_tokens: 219 | |
total tokens: 7337 num samples: 29 num padding tokens: 2640 - rank: 7 max len: 253 min len: 79 avg len: 161.9655172413793 num_loss_counted_tokens: 1852 | |
total tokens: 7398 num samples: 6 num padding tokens: 1043 - rank: 3 max len: 1233 min len: 937 avg len: 1059.1666666666667 num_loss_counted_tokens: 4139 | |
total tokens: 7473 num samples: 3 num padding tokens: 1919 - rank: 2 max len: 2491 min len: 1521 avg len: 1851.3333333333333 num_loss_counted_tokens: 1621 | |
Per-token loss scaled by world size: 0.00012608377437572926Per-token loss scaled by world size: 0.00035810453118756413Per-token loss scaled by world size: 0.00015491498925257474Per-token loss scaled by world size: 0.00043326299055479467 | |
Per-token loss scaled by world size: 0.00016809521184768528 | |
Per-token loss scaled by world size: 3.594179133870057e-06 | |
Per-token loss scaled by world size: 0.0003268007712904364 | |
Epoch: 0, Step: 78, Rank: 6, loss = 1.1593186855316162Epoch: 0, Step: 78, Rank: 5, loss = 1.4026347398757935 | |
Epoch: 0, Step: 78, Rank: 7, loss = 0.5015178918838501 | |
Epoch: 0, Step: 78, Rank: 3, loss = 0.40818047523498535 | |
Epoch: 0, Step: 78, Rank: 0, loss = 0.011635705828666687 | |
Epoch: 0, Step: 78, Rank: 1, loss = 0.5441872477531433 | |
Epoch: 0, Step: 78, Rank: 4, loss = 1.0579766035079956 | |
Per-token loss scaled by world size: 0.000327433692291379 | |
Epoch: 0, Step: 78, Rank: 2, loss = 1.060025691986084 | |
Epoch 0: 64%|██████▍ | 78/121 [03:19<01:49, 2.54s/it] total tokens: 7496 num samples: 8 num padding tokens: 719 - rank: 4 max len: 937 min len: 761 avg len: 847.125 num_loss_counted_tokens: 4720 | |
total tokens: 8100 num samples: 4 num padding tokens: 633 - rank: 1 max len: 2025 min len: 1692 avg len: 1866.75 num_loss_counted_tokens: 1316 | |
{ | |
"epoch": 0, | |
"step": 78, | |
"rank": 0, | |
"loss": 0.011635705828666687, | |
"overall_throughput": 41.25017196728111, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.33673620223999, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25899, | |
"batch_size": 81, | |
"total_loss": 0.7681846618652344, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:21.186251" | |
} | |
total tokens: 8107 num samples: 11 num padding tokens: 1051 - rank: 5 max len: 737 min len: 542 avg len: 641.4545454545455 num_loss_counted_tokens: 4630 | |
total tokens: 7305 num samples: 5 num padding tokens: 524 - rank: 2 max len: 1461 min len: 1128 avg len: 1356.2 num_loss_counted_tokens: 3964 | |
total tokens: 7209 num samples: 27 num padding tokens: 2040 - rank: 7 max len: 267 min len: 82 avg len: 191.44444444444446 num_loss_counted_tokens: 2311 | |
total tokens: 7672 num samples: 7 num padding tokens: 538 - rank: 3 max len: 1096 min len: 942 avg len: 1019.1428571428571 num_loss_counted_tokens: 4249 | |
total tokens: 7226 num samples: 2 num padding tokens: 698 - rank: 0 max len: 3613 min len: 2915 avg len: 3264.0 num_loss_counted_tokens: 204 | |
total tokens: 8070 num samples: 15 num padding tokens: 1557 - rank: 6 max len: 538 min len: 297 avg len: 434.2 num_loss_counted_tokens: 3942 | |
Per-token loss scaled by world size: 0.0003785255830734968Per-token loss scaled by world size: 0.00020959046378266066Per-token loss scaled by world size: 0.0004416834854055196Per-token loss scaled by world size: 0.0002668427478056401 | |
Per-token loss scaled by world size: 0.00010036973981186748Per-token loss scaled by world size: 0.00018033267406281084Per-token loss scaled by world size: 4.8121955842361785e-06 | |
Epoch: 0, Step: 79, Rank: 6, loss = 0.6261777281761169 | |
Epoch: 0, Step: 79, Rank: 5, loss = 1.319584608078003 | |
Epoch: 0, Step: 79, Rank: 4, loss = 1.1308925151824951 | |
Epoch: 0, Step: 79, Rank: 7, loss = 0.797226071357727 | |
Epoch: 0, Step: 79, Rank: 0, loss = 0.014377035200595856 | |
Epoch: 0, Step: 79, Rank: 1, loss = 0.5387663841247559 | |
Epoch: 0, Step: 79, Rank: 2, loss = 0.2998671531677246 | |
Per-token loss scaled by world size: 0.00022867463121656328 | |
Epoch: 0, Step: 79, Rank: 3, loss = 0.6831940412521362 | |
Epoch 0: 65%|██████▌ | 79/121 [03:21<01:46, 2.55s/it] total tokens: 7794 num samples: 9 num padding tokens: 491 - rank: 4 max len: 866 min len: 750 avg len: 811.4444444444445 num_loss_counted_tokens: 5401 | |
total tokens: 7551 num samples: 3 num padding tokens: 552 - rank: 1 max len: 2517 min len: 2018 avg len: 2333.0 num_loss_counted_tokens: 491 | |
{ | |
"epoch": 0, | |
"step": 79, | |
"rank": 0, | |
"loss": 0.014377035200595856, | |
"overall_throughput": 41.31761838872816, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.35622549057007, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23901, | |
"batch_size": 83, | |
"total_loss": 0.6762607097625732, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:23.748903" | |
} | |
total tokens: 7966 num samples: 14 num padding tokens: 2590 - rank: 6 max len: 569 min len: 271 avg len: 384.0 num_loss_counted_tokens: 3308 | |
total tokens: 7416 num samples: 4 num padding tokens: 852 - rank: 2 max len: 1854 min len: 1322 avg len: 1641.0 num_loss_counted_tokens: 506 | |
total tokens: 7931 num samples: 11 num padding tokens: 889 - rank: 5 max len: 721 min len: 597 avg len: 640.1818181818181 num_loss_counted_tokens: 5323 | |
total tokens: 7806 num samples: 6 num padding tokens: 1777 - rank: 3 max len: 1301 min len: 876 avg len: 1004.8333333333334 num_loss_counted_tokens: 4853 | |
total tokens: 5476 num samples: 2 num padding tokens: 96 - rank: 0 max len: 2738 min len: 2642 avg len: 2690.0 num_loss_counted_tokens: 179 | |
total tokens: 8100 num samples: 30 num padding tokens: 2273 - rank: 7 max len: 270 min len: 83 avg len: 194.23333333333332 num_loss_counted_tokens: 2747 | |
Per-token loss scaled by world size: 0.00042449356988072395Per-token loss scaled by world size: 0.00028748821932822466Per-token loss scaled by world size: 0.0002529154298827052Per-token loss scaled by world size: 0.0005231253453530371Per-token loss scaled by world size: 0.0002102917933370918Per-token loss scaled by world size: 5.35248773303465e-06 | |
Per-token loss scaled by world size: 0.00035050552105531096 | |
Epoch: 0, Step: 80, Rank: 6, loss = 1.4146617650985718 | |
Epoch: 0, Step: 80, Rank: 0, loss = 0.01447446458041668Epoch: 0, Step: 80, Rank: 4, loss = 0.5686815977096558 | |
Epoch: 0, Step: 80, Rank: 3, loss = 0.7774400115013123Epoch: 0, Step: 80, Rank: 5, loss = 1.1479367017745972 | |
Epoch: 0, Step: 80, Rank: 2, loss = 0.6839465498924255 | |
Epoch: 0, Step: 80, Rank: 7, loss = 0.9478545188903809 | |
Per-token loss scaled by world size: 1.0431926966703031e-06 | |
Epoch: 0, Step: 80, Rank: 1, loss = 0.002821053843945265 | |
Epoch 0: 66%|██████▌ | 80/121 [03:24<01:44, 2.54s/it] total tokens: 7600 num samples: 10 num padding tokens: 502 - rank: 4 max len: 760 min len: 672 avg len: 709.8 num_loss_counted_tokens: 4962 | |
total tokens: 7035 num samples: 5 num padding tokens: 571 - rank: 1 max len: 1407 min len: 1097 avg len: 1292.8 num_loss_counted_tokens: 3599 | |
{ | |
"epoch": 0, | |
"step": 80, | |
"rank": 0, | |
"loss": 0.01447446458041668, | |
"overall_throughput": 41.61237764414521, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.469753742218018, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21634, | |
"batch_size": 76, | |
"total_loss": 0.6947270631790161, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:26.290599" | |
} | |
total tokens: 7760 num samples: 16 num padding tokens: 1837 - rank: 6 max len: 485 min len: 278 avg len: 370.1875 num_loss_counted_tokens: 3781 | |
total tokens: 7595 num samples: 7 num padding tokens: 592 - rank: 2 max len: 1085 min len: 934 avg len: 1000.4285714285714 num_loss_counted_tokens: 4208 | |
total tokens: 7444 num samples: 4 num padding tokens: 741 - rank: 0 max len: 1861 min len: 1498 avg len: 1675.75 num_loss_counted_tokens: 2607 | |
total tokens: 7440 num samples: 8 num padding tokens: 519 - rank: 3 max len: 930 min len: 764 avg len: 865.125 num_loss_counted_tokens: 6064 | |
total tokens: 8100 num samples: 30 num padding tokens: 2932 - rank: 7 max len: 270 min len: 75 avg len: 172.26666666666668 num_loss_counted_tokens: 2182 | |
total tokens: 8016 num samples: 12 num padding tokens: 980 - rank: 5 max len: 668 min len: 496 avg len: 586.3333333333334 num_loss_counted_tokens: 5943 | |
Per-token loss scaled by world size: 0.0002643285261001438Per-token loss scaled by world size: 0.000505154428537935Per-token loss scaled by world size: 0.0003831658395938575Per-token loss scaled by world size: 0.0005561576108448207 | |
Per-token loss scaled by world size: 4.442329100129427e-06Per-token loss scaled by world size: 0.000311601092107594 | |
Per-token loss scaled by world size: 3.0491105462715495e-06 | |
Epoch: 0, Step: 81, Rank: 5, loss = 1.2860599756240845 | |
Epoch: 0, Step: 81, Rank: 3, loss = 0.6729474067687988 | |
Epoch: 0, Step: 81, Rank: 6, loss = 1.4159077405929565 | |
Epoch: 0, Step: 81, Rank: 4, loss = 0.9754922986030579Epoch: 0, Step: 81, Rank: 1, loss = 0.011309614405035973 | |
Epoch: 0, Step: 81, Rank: 7, loss = 0.7932974100112915 | |
Epoch: 0, Step: 81, Rank: 0, loss = 0.007762654218822718 | |
Per-token loss scaled by world size: 0.00019164555124007165 | |
Epoch: 0, Step: 81, Rank: 2, loss = 0.4879056215286255 | |
Epoch 0: 67%|██████▋ | 81/121 [03:27<01:41, 2.55s/it]{ | |
"epoch": 0, | |
"step": 81, | |
"rank": 0, | |
"loss": 0.007762654218822718, | |
"overall_throughput": 41.32577987310498, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.05413246154785, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20367, | |
"batch_size": 76, | |
"total_loss": 0.7063353061676025, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:28.868839" | |
} | |
total tokens: 5876 num samples: 2 num padding tokens: 1017 - rank: 1 max len: 2938 min len: 1921 avg len: 2429.5 num_loss_counted_tokens: 591 | |
total tokens: 7936 num samples: 8 num padding tokens: 1689 - rank: 4 max len: 992 min len: 598 avg len: 780.875 num_loss_counted_tokens: 4729 | |
total tokens: 6752 num samples: 2 num padding tokens: 73 - rank: 0 max len: 3376 min len: 3303 avg len: 3339.5 num_loss_counted_tokens: 449 | |
total tokens: 8018 num samples: 19 num padding tokens: 2548 - rank: 6 max len: 422 min len: 234 avg len: 287.89473684210526 num_loss_counted_tokens: 3302 | |
total tokens: 7156 num samples: 4 num padding tokens: 826 - rank: 2 max len: 1789 min len: 1335 avg len: 1582.5 num_loss_counted_tokens: 3840 | |
total tokens: 6552 num samples: 28 num padding tokens: 2284 - rank: 7 max len: 234 min len: 79 avg len: 152.42857142857142 num_loss_counted_tokens: 1601 | |
total tokens: 7540 num samples: 13 num padding tokens: 837 - rank: 5 max len: 580 min len: 438 avg len: 515.6153846153846 num_loss_counted_tokens: 4083 | |
total tokens: 7512 num samples: 6 num padding tokens: 558 - rank: 3 max len: 1252 min len: 1059 avg len: 1159.0 num_loss_counted_tokens: 3038 | |
Per-token loss scaled by world size: 0.000629897927865386Per-token loss scaled by world size: 0.0006152652204036713Per-token loss scaled by world size: 0.00011580222053453326Per-token loss scaled by world size: 0.0004951037117280066 | |
Per-token loss scaled by world size: 0.000213472536415793 | |
Per-token loss scaled by world size: 1.2029913705191575e-05Per-token loss scaled by world size: 5.17758380738087e-05 | |
Epoch: 0, Step: 82, Rank: 4, loss = 1.4647926092147827 | |
Epoch: 0, Step: 82, Rank: 6, loss = 1.4996294975280762Epoch: 0, Step: 82, Rank: 2, loss = 0.27569612860679626 | |
Epoch: 0, Step: 82, Rank: 3, loss = 1.1787182092666626 | |
Epoch: 0, Step: 82, Rank: 7, loss = 0.5082247257232666 | |
Epoch: 0, Step: 82, Rank: 1, loss = 0.02864021621644497Epoch: 0, Step: 82, Rank: 0, loss = 0.1232653260231018 | |
Per-token loss scaled by world size: 0.0006569805555045605 | |
Epoch: 0, Step: 82, Rank: 5, loss = 1.5641064643859863 | |
Epoch 0: 68%|██████▊ | 82/121 [03:29<01:38, 2.53s/it] total tokens: 7308 num samples: 9 num padding tokens: 762 - rank: 4 max len: 812 min len: 669 avg len: 727.3333333333334 num_loss_counted_tokens: 4589 | |
total tokens: 7874 num samples: 31 num padding tokens: 2845 - rank: 7 max len: 254 min len: 81 avg len: 162.2258064516129 num_loss_counted_tokens: 2174 | |
total tokens: 7372 num samples: 4 num padding tokens: 311 - rank: 1 max len: 1843 min len: 1686 avg len: 1765.25 num_loss_counted_tokens: 1401 | |
{ | |
"epoch": 0, | |
"step": 82, | |
"rank": 0, | |
"loss": 0.1232653260231018, | |
"overall_throughput": 42.003063993949816, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.33745241165161, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19046, | |
"batch_size": 84, | |
"total_loss": 0.8303841352462769, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:31.350016" | |
} | |
total tokens: 7866 num samples: 19 num padding tokens: 1707 - rank: 6 max len: 414 min len: 255 avg len: 324.1578947368421 num_loss_counted_tokens: 3285 | |
total tokens: 6580 num samples: 4 num padding tokens: 1811 - rank: 2 max len: 1645 min len: 1002 avg len: 1192.25 num_loss_counted_tokens: 2171 | |
total tokens: 8094 num samples: 3 num padding tokens: 948 - rank: 0 max len: 2698 min len: 2046 avg len: 2382.0 num_loss_counted_tokens: 2180 | |
total tokens: 7992 num samples: 8 num padding tokens: 875 - rank: 3 max len: 999 min len: 819 avg len: 889.625 num_loss_counted_tokens: 5913 | |
total tokens: 7982 num samples: 13 num padding tokens: 1310 - rank: 5 max len: 614 min len: 424 avg len: 513.2307692307693 num_loss_counted_tokens: 3981 | |
Per-token loss scaled by world size: 0.00023041688837110996Per-token loss scaled by world size: 0.0001501823280705139Per-token loss scaled by world size: 0.000244573806412518Per-token loss scaled by world size: 0.0003567738749552518Per-token loss scaled by world size: 0.00027550142840482295 | |
Per-token loss scaled by world size: 3.432213998166844e-05 | |
Per-token loss scaled by world size: 0.00022764307504985482 | |
Epoch: 0, Step: 83, Rank: 5, loss = 1.1512646675109863Epoch: 0, Step: 83, Rank: 4, loss = 0.4846196174621582Epoch: 0, Step: 83, Rank: 7, loss = 0.7435265183448792 | |
Epoch: 0, Step: 83, Rank: 0, loss = 0.11075326055288315Epoch: 0, Step: 83, Rank: 1, loss = 0.7892091274261475 | |
Epoch: 0, Step: 83, Rank: 3, loss = 0.8890087008476257 | |
Epoch: 0, Step: 83, Rank: 2, loss = 0.7345757484436035 | |
Per-token loss scaled by world size: 0.0002607592905405909 | |
Epoch: 0, Step: 83, Rank: 6, loss = 0.8414376378059387 | |
Epoch 0: 69%|██████▊ | 83/121 [03:32<01:35, 2.52s/it] total tokens: 7168 num samples: 7 num padding tokens: 519 - rank: 4 max len: 1024 min len: 877 avg len: 949.8571428571429 num_loss_counted_tokens: 3980 | |
total tokens: 6612 num samples: 3 num padding tokens: 963 - rank: 1 max len: 2204 min len: 1680 avg len: 1883.0 num_loss_counted_tokens: 659 | |
{ | |
"epoch": 0, | |
"step": 83, | |
"rank": 0, | |
"loss": 0.11075326055288315, | |
"overall_throughput": 42.37488901314667, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.450260639190674, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25815, | |
"batch_size": 90, | |
"total_loss": 0.7180494070053101, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:33.841762" | |
} | |
total tokens: 6544 num samples: 4 num padding tokens: 763 - rank: 2 max len: 1636 min len: 1316 avg len: 1445.25 num_loss_counted_tokens: 481 | |
total tokens: 7930 num samples: 10 num padding tokens: 919 - rank: 5 max len: 793 min len: 605 avg len: 701.1 num_loss_counted_tokens: 4841 | |
total tokens: 7994 num samples: 14 num padding tokens: 1260 - rank: 6 max len: 571 min len: 361 avg len: 481.0 num_loss_counted_tokens: 5292 | |
total tokens: 8096 num samples: 23 num padding tokens: 2810 - rank: 7 max len: 352 min len: 91 avg len: 229.82608695652175 num_loss_counted_tokens: 2516 | |
total tokens: 7590 num samples: 6 num padding tokens: 565 - rank: 3 max len: 1265 min len: 1087 avg len: 1170.8333333333333 num_loss_counted_tokens: 4039 | |
total tokens: 6258 num samples: 2 num padding tokens: 564 - rank: 0 max len: 3129 min len: 2565 avg len: 2847.0 num_loss_counted_tokens: 179 | |
Per-token loss scaled by world size: 0.00019247813906986266Per-token loss scaled by world size: 0.00029174372320994735 | |
Per-token loss scaled by world size: 0.0004131880996283144Per-token loss scaled by world size: 0.0003363724099472165 | |
Per-token loss scaled by world size: 0.0005087562603875995Per-token loss scaled by world size: 0.0001915783795993775Per-token loss scaled by world size: 3.281491217421717e-06 | |
Epoch: 0, Step: 84, Rank: 3, loss = 0.8194716572761536 | |
Epoch: 0, Step: 84, Rank: 2, loss = 0.540647029876709Epoch: 0, Step: 84, Rank: 7, loss = 0.9448280930519104Epoch: 0, Step: 84, Rank: 4, loss = 1.1605937480926514 | |
Epoch: 0, Step: 84, Rank: 0, loss = 0.009217298589646816 | |
Epoch: 0, Step: 84, Rank: 5, loss = 1.429032802581787 | |
Epoch: 0, Step: 84, Rank: 1, loss = 0.5381197333335876 | |
Per-token loss scaled by world size: 0.0003772681811824441 | |
Epoch: 0, Step: 84, Rank: 6, loss = 1.0596991777420044 | |
Epoch 0: 69%|██████▉ | 84/121 [03:34<01:33, 2.52s/it] total tokens: 6480 num samples: 3 num padding tokens: 343 - rank: 1 max len: 2160 min len: 1943 avg len: 2045.6666666666667 num_loss_counted_tokens: 909 | |
total tokens: 7448 num samples: 8 num padding tokens: 1052 - rank: 4 max len: 931 min len: 740 avg len: 799.5 num_loss_counted_tokens: 3154 | |
{ | |
"epoch": 0, | |
"step": 84, | |
"rank": 0, | |
"loss": 0.009217298589646816, | |
"overall_throughput": 41.98783375148776, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.287980556488037, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22471, | |
"batch_size": 89, | |
"total_loss": 0.8127012252807617, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:36.362094" | |
} | |
total tokens: 7896 num samples: 28 num padding tokens: 3153 - rank: 7 max len: 282 min len: 74 avg len: 169.39285714285714 num_loss_counted_tokens: 2024 | |
total tokens: 8115 num samples: 15 num padding tokens: 2233 - rank: 6 max len: 541 min len: 292 avg len: 392.1333333333333 num_loss_counted_tokens: 3345 | |
total tokens: 5744 num samples: 2 num padding tokens: 367 - rank: 0 max len: 2872 min len: 2505 avg len: 2688.5 num_loss_counted_tokens: 161 | |
total tokens: 7436 num samples: 4 num padding tokens: 1471 - rank: 2 max len: 1859 min len: 1182 avg len: 1491.25 num_loss_counted_tokens: 773 | |
total tokens: 8071 num samples: 7 num padding tokens: 817 - rank: 3 max len: 1153 min len: 974 avg len: 1036.2857142857142 num_loss_counted_tokens: 4123 | |
total tokens: 7788 num samples: 11 num padding tokens: 766 - rank: 5 max len: 708 min len: 547 avg len: 638.3636363636364 num_loss_counted_tokens: 3782 | |
Per-token loss scaled by world size: 0.0005787216359749436Per-token loss scaled by world size: 0.0005308112595230341Per-token loss scaled by world size: 0.00033112603705376387Per-token loss scaled by world size: 0.00014354031009133905Per-token loss scaled by world size: 0.00046847882913425565Per-token loss scaled by world size: 3.301608558103908e-06 | |
Per-token loss scaled by world size: 3.5768789530266076e-06 | |
Epoch: 0, Step: 85, Rank: 6, loss = 1.3779860734939575Epoch: 0, Step: 85, Rank: 5, loss = 1.2161710262298584Epoch: 0, Step: 85, Rank: 7, loss = 0.8596031665802002 | |
Epoch: 0, Step: 85, Rank: 1, loss = 0.008570975624024868Epoch: 0, Step: 85, Rank: 4, loss = 1.5023614168167114 | |
Epoch: 0, Step: 85, Rank: 2, loss = 0.37263065576553345 | |
Epoch: 0, Step: 85, Rank: 0, loss = 0.009285577572882175 | |
Per-token loss scaled by world size: 0.00042734169983305037 | |
Epoch: 0, Step: 85, Rank: 3, loss = 1.1093790531158447 | |
Epoch 0: 70%|███████ | 85/121 [03:37<01:31, 2.53s/it] total tokens: 7898 num samples: 11 num padding tokens: 499 - rank: 4 max len: 718 min len: 607 avg len: 672.6363636363636 num_loss_counted_tokens: 3857 | |
total tokens: 8060 num samples: 5 num padding tokens: 394 - rank: 1 max len: 1612 min len: 1437 avg len: 1533.2 num_loss_counted_tokens: 1786 | |
{ | |
"epoch": 0, | |
"step": 85, | |
"rank": 0, | |
"loss": 0.009285577572882175, | |
"overall_throughput": 41.311858360949856, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.234922885894775, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20768, | |
"batch_size": 87, | |
"total_loss": 0.8069984912872314, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:38.922540" | |
} | |
total tokens: 7898 num samples: 22 num padding tokens: 1234 - rank: 6 max len: 359 min len: 264 avg len: 302.90909090909093 num_loss_counted_tokens: 3332 | |
total tokens: 6834 num samples: 3 num padding tokens: 886 - rank: 0 max len: 2278 min len: 1616 avg len: 1982.6666666666667 num_loss_counted_tokens: 909 | |
total tokens: 7680 num samples: 6 num padding tokens: 1041 - rank: 2 max len: 1280 min len: 938 avg len: 1106.5 num_loss_counted_tokens: 3266 | |
total tokens: 7683 num samples: 13 num padding tokens: 1257 - rank: 5 max len: 591 min len: 368 avg len: 494.3076923076923 num_loss_counted_tokens: 4067 | |
total tokens: 7395 num samples: 29 num padding tokens: 2483 - rank: 7 max len: 255 min len: 89 avg len: 169.3793103448276 num_loss_counted_tokens: 2026 | |
total tokens: 7408 num samples: 8 num padding tokens: 818 - rank: 3 max len: 926 min len: 725 avg len: 823.75 num_loss_counted_tokens: 4017 | |
Per-token loss scaled by world size: 0.00045185594353824854Per-token loss scaled by world size: 0.0003287219151388854Per-token loss scaled by world size: 0.00010264909360557795Per-token loss scaled by world size: 0.0003051054081879556Per-token loss scaled by world size: 0.00028172050951980054Per-token loss scaled by world size: 1.640593291085679e-05 | |
Per-token loss scaled by world size: 8.82493841345422e-05 | |
Epoch: 0, Step: 86, Rank: 2, loss = 1.0376107692718506Epoch: 0, Step: 86, Rank: 6, loss = 0.9630652666091919 | |
Epoch: 0, Step: 86, Rank: 1, loss = 0.3240118622779846 | |
Epoch: 0, Step: 86, Rank: 3, loss = 1.4262832403182983 | |
Epoch: 0, Step: 86, Rank: 0, loss = 0.05178532749414444 | |
Epoch: 0, Step: 86, Rank: 4, loss = 0.8892507553100586 | |
Epoch: 0, Step: 86, Rank: 7, loss = 0.2785591781139374 | |
Per-token loss scaled by world size: 0.000460325536550954 | |
Epoch: 0, Step: 86, Rank: 5, loss = 1.4530175924301147 | |
Epoch 0: 71%|███████ | 86/121 [03:39<01:28, 2.53s/it] total tokens: 3744 num samples: 18 num padding tokens: 1271 - rank: 7 max len: 208 min len: 81 avg len: 137.38888888888889 num_loss_counted_tokens: 914 | |
total tokens: 7208 num samples: 4 num padding tokens: 870 - rank: 1 max len: 1802 min len: 1447 avg len: 1584.5 num_loss_counted_tokens: 2098 | |
total tokens: 7308 num samples: 9 num padding tokens: 993 - rank: 4 max len: 812 min len: 635 avg len: 701.6666666666666 num_loss_counted_tokens: 3474 | |
{ | |
"epoch": 0, | |
"step": 86, | |
"rank": 0, | |
"loss": 0.05178532749414444, | |
"overall_throughput": 41.951886200619036, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.232909202575684, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25252, | |
"batch_size": 101, | |
"total_loss": 0.802947998046875, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:41.442803" | |
} | |
total tokens: 7896 num samples: 21 num padding tokens: 2014 - rank: 6 max len: 376 min len: 211 avg len: 280.0952380952381 num_loss_counted_tokens: 3459 | |
total tokens: 7364 num samples: 7 num padding tokens: 790 - rank: 3 max len: 1052 min len: 812 avg len: 939.1428571428571 num_loss_counted_tokens: 3186 | |
total tokens: 7872 num samples: 6 num padding tokens: 665 - rank: 2 max len: 1312 min len: 1060 avg len: 1201.1666666666667 num_loss_counted_tokens: 5354 | |
total tokens: 7982 num samples: 13 num padding tokens: 1356 - rank: 5 max len: 614 min len: 393 avg len: 509.6923076923077 num_loss_counted_tokens: 4015 | |
total tokens: 6650 num samples: 2 num padding tokens: 959 - rank: 0 max len: 3325 min len: 2366 avg len: 2845.5 num_loss_counted_tokens: 183 | |
Per-token loss scaled by world size: 0.0001364344934700057Per-token loss scaled by world size: 0.00033442748826928437Per-token loss scaled by world size: 0.00019135570619255304Per-token loss scaled by world size: 0.00039014805224724114Per-token loss scaled by world size: 7.375221321126446e-05 | |
Per-token loss scaled by world size: 3.955068677896634e-05 | |
Per-token loss scaled by world size: 0.00019416131544858217 | |
Epoch: 0, Step: 87, Rank: 4, loss = 0.4185469150543213 | |
Epoch: 0, Step: 87, Rank: 2, loss = 0.5870314836502075Epoch: 0, Step: 87, Rank: 5, loss = 1.1968766450881958 | |
Epoch: 0, Step: 87, Rank: 0, loss = 0.12133162468671799Epoch: 0, Step: 87, Rank: 3, loss = 1.02593994140625Epoch: 0, Step: 87, Rank: 1, loss = 0.22625336050987244 | |
Epoch: 0, Step: 87, Rank: 7, loss = 0.5956383943557739 | |
Per-token loss scaled by world size: 0.00031522451899945736 | |
Epoch: 0, Step: 87, Rank: 6, loss = 0.9670300483703613 | |
Epoch 0: 72%|███████▏ | 87/121 [03:42<01:25, 2.52s/it] total tokens: 7015 num samples: 5 num padding tokens: 1264 - rank: 4 max len: 1403 min len: 1017 avg len: 1150.2 num_loss_counted_tokens: 3249 | |
total tokens: 5650 num samples: 2 num padding tokens: 51 - rank: 1 max len: 2825 min len: 2774 avg len: 2799.5 num_loss_counted_tokens: 191 | |
total tokens: 5482 num samples: 2 num padding tokens: 83 - rank: 2 max len: 2741 min len: 2658 avg len: 2699.5 num_loss_counted_tokens: 171 | |
{ | |
"epoch": 0, | |
"step": 87, | |
"rank": 0, | |
"loss": 0.12133162468671799, | |
"overall_throughput": 42.28776662058589, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.46161460876465, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24542, | |
"batch_size": 79, | |
"total_loss": 0.642331063747406, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:43.944539" | |
} | |
total tokens: 7560 num samples: 4 num padding tokens: 1086 - rank: 3 max len: 1890 min len: 1404 avg len: 1618.5 num_loss_counted_tokens: 1603 | |
total tokens: 7830 num samples: 9 num padding tokens: 1197 - rank: 5 max len: 870 min len: 619 avg len: 737.0 num_loss_counted_tokens: 3609 | |
total tokens: 7956 num samples: 13 num padding tokens: 2029 - rank: 6 max len: 612 min len: 285 avg len: 455.9230769230769 num_loss_counted_tokens: 3596 | |
total tokens: 5358 num samples: 19 num padding tokens: 1354 - rank: 7 max len: 282 min len: 86 avg len: 210.73684210526315 num_loss_counted_tokens: 1995 | |
total tokens: 7698 num samples: 2 num padding tokens: 950 - rank: 0 max len: 3849 min len: 2899 avg len: 3374.0 num_loss_counted_tokens: 1217 | |
Per-token loss scaled by world size: 0.00014240843302104622Per-token loss scaled by world size: 0.000148817416629754Per-token loss scaled by world size: 0.0001530916924821213Per-token loss scaled by world size: 0.00020883062097709626Per-token loss scaled by world size: 0.00023989545297808945 | |
Per-token loss scaled by world size: 0.00017666697385720909 | |
Per-token loss scaled by world size: 0.0001427593524567783 | |
Epoch: 0, Step: 88, Rank: 5, loss = 0.8521594405174255Epoch: 0, Step: 88, Rank: 6, loss = 0.9789233803749084Epoch: 0, Step: 88, Rank: 2, loss = 0.6247097849845886Epoch: 0, Step: 88, Rank: 3, loss = 0.6072680950164795 | |
Epoch: 0, Step: 88, Rank: 4, loss = 0.5811154246330261 | |
Epoch: 0, Step: 88, Rank: 1, loss = 0.7209116816520691 | |
Epoch: 0, Step: 88, Rank: 7, loss = 0.5825473666191101 | |
Per-token loss scaled by world size: 0.00011780338536482304 | |
Epoch: 0, Step: 88, Rank: 0, loss = 0.480711430311203 | |
Epoch 0: 73%|███████▎ | 88/121 [03:44<01:23, 2.54s/it] total tokens: 7821 num samples: 11 num padding tokens: 369 - rank: 4 max len: 711 min len: 644 avg len: 677.4545454545455 num_loss_counted_tokens: 4895 | |
total tokens: 8108 num samples: 4 num padding tokens: 1270 - rank: 1 max len: 2027 min len: 1314 avg len: 1709.5 num_loss_counted_tokens: 973 | |
total tokens: 7920 num samples: 8 num padding tokens: 1147 - rank: 3 max len: 990 min len: 714 avg len: 846.625 num_loss_counted_tokens: 2895 | |
{ | |
"epoch": 0, | |
"step": 88, | |
"rank": 0, | |
"loss": 0.480711430311203, | |
"overall_throughput": 41.16723941483895, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.52360773086548, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 32645, | |
"batch_size": 94, | |
"total_loss": 0.6785432696342468, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:46.514923" | |
} | |
total tokens: 7888 num samples: 17 num padding tokens: 1450 - rank: 6 max len: 464 min len: 277 avg len: 378.70588235294116 num_loss_counted_tokens: 4530 | |
total tokens: 6446 num samples: 2 num padding tokens: 340 - rank: 0 max len: 3223 min len: 2883 avg len: 3053.0 num_loss_counted_tokens: 686 | |
total tokens: 7860 num samples: 30 num padding tokens: 2767 - rank: 7 max len: 262 min len: 77 avg len: 169.76666666666668 num_loss_counted_tokens: 2196 | |
total tokens: 7398 num samples: 6 num padding tokens: 659 - rank: 2 max len: 1233 min len: 1028 avg len: 1123.1666666666667 num_loss_counted_tokens: 3542 | |
total tokens: 7536 num samples: 12 num padding tokens: 714 - rank: 5 max len: 628 min len: 502 avg len: 568.5 num_loss_counted_tokens: 5605 | |
Per-token loss scaled by world size: 0.0003699270309880376Per-token loss scaled by world size: 0.0005684850038960576Per-token loss scaled by world size: 5.893620254937559e-06 | |
Per-token loss scaled by world size: 3.489888695185073e-05 | |
Per-token loss scaled by world size: 0.0005643228068947792Per-token loss scaled by world size: 0.0003445304755587131Per-token loss scaled by world size: 0.00016975219477899373 | |
Epoch: 0, Step: 89, Rank: 5, loss = 1.299698829650879 | |
Epoch: 0, Step: 89, Rank: 1, loss = 0.013474289327859879 | |
Epoch: 0, Step: 89, Rank: 3, loss = 0.8457456827163696 | |
Epoch: 0, Step: 89, Rank: 0, loss = 0.07978758215904236 | |
Epoch: 0, Step: 89, Rank: 4, loss = 0.3880959451198578Epoch: 0, Step: 89, Rank: 6, loss = 1.2901830673217773Epoch: 0, Step: 89, Rank: 7, loss = 0.7876827716827393 | |
Per-token loss scaled by world size: 0.00024518067948520184 | |
Epoch: 0, Step: 89, Rank: 2, loss = 0.5605443120002747 | |
Epoch 0: 74%|███████▎ | 89/121 [03:47<01:21, 2.55s/it] total tokens: 5448 num samples: 2 num padding tokens: 812 - rank: 1 max len: 2724 min len: 1912 avg len: 2318.0 num_loss_counted_tokens: 205 | |
total tokens: 7690 num samples: 10 num padding tokens: 487 - rank: 4 max len: 769 min len: 675 avg len: 720.3 num_loss_counted_tokens: 4105 | |
{ | |
"epoch": 0, | |
"step": 89, | |
"rank": 0, | |
"loss": 0.07978758215904236, | |
"overall_throughput": 41.16520368540104, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.30566644668579, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18290, | |
"batch_size": 69, | |
"total_loss": 0.6581515669822693, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:49.084401" | |
} | |
total tokens: 7360 num samples: 4 num padding tokens: 675 - rank: 2 max len: 1840 min len: 1482 avg len: 1671.25 num_loss_counted_tokens: 836 | |
total tokens: 7812 num samples: 12 num padding tokens: 1280 - rank: 5 max len: 651 min len: 447 avg len: 544.3333333333334 num_loss_counted_tokens: 2926 | |
total tokens: 7095 num samples: 5 num padding tokens: 1469 - rank: 3 max len: 1419 min len: 943 avg len: 1125.2 num_loss_counted_tokens: 1900 | |
total tokens: 7740 num samples: 18 num padding tokens: 1493 - rank: 6 max len: 430 min len: 282 avg len: 347.05555555555554 num_loss_counted_tokens: 3796 | |
total tokens: 7248 num samples: 2 num padding tokens: 215 - rank: 0 max len: 3624 min len: 3409 avg len: 3516.5 num_loss_counted_tokens: 197 | |
total tokens: 7772 num samples: 29 num padding tokens: 2475 - rank: 7 max len: 268 min len: 82 avg len: 182.6551724137931 num_loss_counted_tokens: 2574 | |
Per-token loss scaled by world size: 0.0002169163926737383Per-token loss scaled by world size: 0.00031902806949801743Per-token loss scaled by world size: 0.000320168532198295Per-token loss scaled by world size: 0.00028694834327325225Per-token loss scaled by world size: 2.5503815777483396e-05 | |
Per-token loss scaled by world size: 1.9868204617523588e-05 | |
Per-token loss scaled by world size: 0.00033540837466716766 | |
Epoch: 0, Step: 90, Rank: 4, loss = 0.9190002083778381Epoch: 0, Step: 90, Rank: 6, loss = 0.9222854375839233 | |
Epoch: 0, Step: 90, Rank: 3, loss = 0.8265905380249023Epoch: 0, Step: 90, Rank: 1, loss = 0.07346692681312561Epoch: 0, Step: 90, Rank: 0, loss = 0.057232845574617386 | |
Epoch: 0, Step: 90, Rank: 2, loss = 0.6248548030853271 | |
Epoch: 0, Step: 90, Rank: 7, loss = 0.9661857485771179 | |
Per-token loss scaled by world size: 0.0004802969633601606 | |
Epoch: 0, Step: 90, Rank: 5, loss = 1.3835554122924805 | |
Epoch 0: 74%|███████▍ | 90/121 [03:49<01:18, 2.54s/it] total tokens: 7656 num samples: 11 num padding tokens: 851 - rank: 4 max len: 696 min len: 552 avg len: 618.6363636363636 num_loss_counted_tokens: 4472 | |
total tokens: 7875 num samples: 5 num padding tokens: 1795 - rank: 1 max len: 1575 min len: 1090 avg len: 1216.0 num_loss_counted_tokens: 4251 | |
{ | |
"epoch": 0, | |
"step": 90, | |
"rank": 0, | |
"loss": 0.057232845574617386, | |
"overall_throughput": 42.04731734702061, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.250526905059814, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23045, | |
"batch_size": 73, | |
"total_loss": 0.7216464877128601, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:51.641363" | |
} | |
total tokens: 8100 num samples: 15 num padding tokens: 1058 - rank: 5 max len: 540 min len: 390 avg len: 469.46666666666664 num_loss_counted_tokens: 4269 | |
total tokens: 7154 num samples: 7 num padding tokens: 458 - rank: 2 max len: 1022 min len: 870 avg len: 956.5714285714286 num_loss_counted_tokens: 2467 | |
total tokens: 7200 num samples: 3 num padding tokens: 1334 - rank: 0 max len: 2400 min len: 1618 avg len: 1955.3333333333333 num_loss_counted_tokens: 241 | |
total tokens: 7389 num samples: 9 num padding tokens: 493 - rank: 3 max len: 821 min len: 703 avg len: 766.2222222222222 num_loss_counted_tokens: 4922 | |
total tokens: 7945 num samples: 35 num padding tokens: 2409 - rank: 7 max len: 227 min len: 85 avg len: 158.17142857142858 num_loss_counted_tokens: 2310 | |
total tokens: 7986 num samples: 22 num padding tokens: 1512 - rank: 6 max len: 363 min len: 227 avg len: 294.27272727272725 num_loss_counted_tokens: 3717 | |
Per-token loss scaled by world size: 0.0006446933257393539Per-token loss scaled by world size: 0.00028916815062984824Per-token loss scaled by world size: 0.00012860735296271741 | |
Per-token loss scaled by world size: 0.00038613073411397636Per-token loss scaled by world size: 3.499255626593367e-06 | |
Per-token loss scaled by world size: 0.0005611648084595799 | |
Per-token loss scaled by world size: 2.536307329137344e-05 | |
Epoch: 0, Step: 91, Rank: 3, loss = 0.6820392608642578 | |
Epoch: 0, Step: 91, Rank: 1, loss = 0.008253431878983974 | |
Epoch: 0, Step: 91, Rank: 5, loss = 1.520589828491211 | |
Epoch: 0, Step: 91, Rank: 7, loss = 0.9107375741004944Epoch: 0, Step: 91, Rank: 2, loss = 0.303336501121521 | |
Epoch: 0, Step: 91, Rank: 4, loss = 1.3235772848129272 | |
Epoch: 0, Step: 91, Rank: 0, loss = 0.05982197821140289 | |
Per-token loss scaled by world size: 0.0006536963628605008 | |
Epoch: 0, Step: 91, Rank: 6, loss = 1.5418245792388916 | |
Epoch 0: 75%|███████▌ | 91/121 [03:52<01:16, 2.55s/it] total tokens: 7800 num samples: 8 num padding tokens: 997 - rank: 4 max len: 975 min len: 702 avg len: 850.375 num_loss_counted_tokens: 5549 | |
total tokens: 7290 num samples: 3 num padding tokens: 1298 - rank: 1 max len: 2430 min len: 1736 avg len: 1997.3333333333333 num_loss_counted_tokens: 1662 | |
{ | |
"epoch": 0, | |
"step": 91, | |
"rank": 0, | |
"loss": 0.05982197821140289, | |
"overall_throughput": 41.22796863364372, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.05379819869995, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18869, | |
"batch_size": 79, | |
"total_loss": 0.7937725186347961, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:54.169866" | |
} | |
total tokens: 5968 num samples: 2 num padding tokens: 197 - rank: 0 max len: 2984 min len: 2787 avg len: 2885.5 num_loss_counted_tokens: 209 | |
total tokens: 7667 num samples: 11 num padding tokens: 919 - rank: 5 max len: 697 min len: 558 avg len: 613.4545454545455 num_loss_counted_tokens: 4489 | |
total tokens: 8062 num samples: 29 num padding tokens: 2764 - rank: 7 max len: 278 min len: 78 avg len: 182.68965517241378 num_loss_counted_tokens: 2067 | |
total tokens: 6764 num samples: 4 num padding tokens: 631 - rank: 2 max len: 1691 min len: 1371 avg len: 1533.25 num_loss_counted_tokens: 1920 | |
total tokens: 7920 num samples: 15 num padding tokens: 1922 - rank: 6 max len: 528 min len: 310 avg len: 399.8666666666667 num_loss_counted_tokens: 3579 | |
total tokens: 7693 num samples: 7 num padding tokens: 417 - rank: 3 max len: 1099 min len: 981 avg len: 1039.4285714285713 num_loss_counted_tokens: 6131 | |
Per-token loss scaled by world size: 0.0005888827727176249Per-token loss scaled by world size: 0.0006443694583140314Per-token loss scaled by world size: 8.987231012724806e-06Per-token loss scaled by world size: 0.00010111679148394614Per-token loss scaled by world size: 1.0885350093303714e-05Per-token loss scaled by world size: 0.0005767960683442652 | |
Per-token loss scaled by world size: 0.00016675007645972073 | |
Epoch: 0, Step: 92, Rank: 6, loss = 1.448703646659851 | |
Epoch: 0, Step: 92, Rank: 1, loss = 0.02447298914194107Epoch: 0, Step: 92, Rank: 0, loss = 0.0202055424451828 | |
Epoch: 0, Step: 92, Rank: 3, loss = 1.3239556550979614 | |
Epoch: 0, Step: 92, Rank: 4, loss = 1.2967817783355713Epoch: 0, Step: 92, Rank: 2, loss = 0.2273358255624771 | |
Epoch: 0, Step: 92, Rank: 7, loss = 0.3748958706855774 | |
Per-token loss scaled by world size: 0.0007324381731450558 | |
Epoch: 0, Step: 92, Rank: 5, loss = 1.646704077720642 | |
Epoch 0: 76%|███████▌ | 92/121 [03:54<01:13, 2.54s/it] total tokens: 5964 num samples: 2 num padding tokens: 23 - rank: 1 max len: 2982 min len: 2959 avg len: 2970.5 num_loss_counted_tokens: 702 | |
total tokens: 4664 num samples: 22 num padding tokens: 1595 - rank: 7 max len: 212 min len: 82 avg len: 139.5 num_loss_counted_tokens: 1228 | |
total tokens: 8070 num samples: 10 num padding tokens: 724 - rank: 4 max len: 807 min len: 638 avg len: 734.6 num_loss_counted_tokens: 4997 | |
{ | |
"epoch": 0, | |
"step": 92, | |
"rank": 0, | |
"loss": 0.0202055424451828, | |
"overall_throughput": 41.45973903821548, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.430901527404785, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 17986, | |
"batch_size": 75, | |
"total_loss": 0.7953818440437317, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:56.703392" | |
} | |
total tokens: 6888 num samples: 3 num padding tokens: 1202 - rank: 2 max len: 2296 min len: 1438 avg len: 1895.3333333333333 num_loss_counted_tokens: 633 | |
total tokens: 8040 num samples: 20 num padding tokens: 2357 - rank: 6 max len: 402 min len: 223 avg len: 284.15 num_loss_counted_tokens: 2838 | |
total tokens: 7665 num samples: 7 num padding tokens: 534 - rank: 3 max len: 1095 min len: 894 avg len: 1018.7142857142857 num_loss_counted_tokens: 4973 | |
total tokens: 7764 num samples: 2 num padding tokens: 79 - rank: 0 max len: 3882 min len: 3803 avg len: 3842.5 num_loss_counted_tokens: 230 | |
total tokens: 7596 num samples: 12 num padding tokens: 1505 - rank: 5 max len: 633 min len: 403 avg len: 507.5833333333333 num_loss_counted_tokens: 3739 | |
Per-token loss scaled by world size: 0.0010100876679643989Per-token loss scaled by world size: 0.0010313765378668904Per-token loss scaled by world size: 0.0004698181292042136Per-token loss scaled by world size: 0.00015131689724512398 | |
Per-token loss scaled by world size: 1.1349918167979922e-05Per-token loss scaled by world size: 0.0004358472360763699Per-token loss scaled by world size: 5.984314611851005e-06 | |
Epoch: 0, Step: 93, Rank: 5, loss = 1.8667914867401123 | |
Epoch: 0, Step: 93, Rank: 3, loss = 0.273883581161499 | |
Epoch: 0, Step: 93, Rank: 4, loss = 0.8503708243370056 | |
Epoch: 0, Step: 93, Rank: 6, loss = 1.828258752822876 | |
Epoch: 0, Step: 93, Rank: 0, loss = 0.020543351769447327 | |
Epoch: 0, Step: 93, Rank: 1, loss = 0.01083160936832428 | |
Epoch: 0, Step: 93, Rank: 7, loss = 0.7888835072517395 | |
Per-token loss scaled by world size: 9.832592331804335e-05 | |
Epoch: 0, Step: 93, Rank: 2, loss = 0.17796991765499115 | |
Epoch 0: 77%|███████▋ | 93/121 [03:57<01:11, 2.57s/it] total tokens: 7368 num samples: 8 num padding tokens: 711 - rank: 4 max len: 921 min len: 712 avg len: 832.125 num_loss_counted_tokens: 4742 | |
total tokens: 6549 num samples: 3 num padding tokens: 1202 - rank: 1 max len: 2183 min len: 1533 avg len: 1782.3333333333333 num_loss_counted_tokens: 405 | |
{ | |
"epoch": 0, | |
"step": 93, | |
"rank": 0, | |
"loss": 0.020543351769447327, | |
"overall_throughput": 40.324540881871776, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.333390712738037, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 14480, | |
"batch_size": 60, | |
"total_loss": 0.7271916270256042, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:51:59.324128" | |
} | |
total tokens: 8115 num samples: 15 num padding tokens: 1727 - rank: 6 max len: 541 min len: 302 avg len: 425.8666666666667 num_loss_counted_tokens: 4209 | |
total tokens: 7450 num samples: 5 num padding tokens: 482 - rank: 2 max len: 1490 min len: 1299 avg len: 1393.6 num_loss_counted_tokens: 4364 | |
total tokens: 7832 num samples: 11 num padding tokens: 625 - rank: 5 max len: 712 min len: 561 avg len: 655.1818181818181 num_loss_counted_tokens: 4908 | |
total tokens: 6486 num samples: 2 num padding tokens: 374 - rank: 0 max len: 3243 min len: 2869 avg len: 3056.0 num_loss_counted_tokens: 161 | |
total tokens: 7693 num samples: 7 num padding tokens: 643 - rank: 3 max len: 1099 min len: 949 avg len: 1007.1428571428571 num_loss_counted_tokens: 4763 | |
total tokens: 8073 num samples: 27 num padding tokens: 3166 - rank: 7 max len: 299 min len: 83 avg len: 181.74074074074073 num_loss_counted_tokens: 2216 | |
Per-token loss scaled by world size: 0.00047986634308472276Per-token loss scaled by world size: 0.0005184172769077122Per-token loss scaled by world size: 0.00042661072802729905Per-token loss scaled by world size: 5.770879943156615e-05 | |
Per-token loss scaled by world size: 0.0003191411087755114 | |
Per-token loss scaled by world size: 1.1559887752810027e-05 | |
Per-token loss scaled by world size: 1.194144033433986e-06 | |
Epoch: 0, Step: 94, Rank: 3, loss = 1.1955350637435913 | |
Epoch: 0, Step: 94, Rank: 5, loss = 0.9838176965713501Epoch: 0, Step: 94, Rank: 2, loss = 0.13308370113372803 | |
Epoch: 0, Step: 94, Rank: 7, loss = 1.1066317558288574 | |
Epoch: 0, Step: 94, Rank: 4, loss = 0.7359793186187744 | |
Epoch: 0, Step: 94, Rank: 0, loss = 0.026658546179533005 | |
Epoch: 0, Step: 94, Rank: 1, loss = 0.002753845416009426 | |
Per-token loss scaled by world size: 0.0007509095594286919 | |
Epoch: 0, Step: 94, Rank: 6, loss = 1.7316913604736328 | |
Epoch 0: 78%|███████▊ | 94/121 [04:00<01:08, 2.55s/it] total tokens: 7608 num samples: 8 num padding tokens: 1057 - rank: 4 max len: 951 min len: 740 avg len: 818.875 num_loss_counted_tokens: 3486 | |
total tokens: 5880 num samples: 2 num padding tokens: 154 - rank: 1 max len: 2940 min len: 2786 avg len: 2863.0 num_loss_counted_tokens: 481 | |
total tokens: 6471 num samples: 3 num padding tokens: 972 - rank: 2 max len: 2157 min len: 1602 avg len: 1833.0 num_loss_counted_tokens: 1746 | |
{ | |
"epoch": 0, | |
"step": 94, | |
"rank": 0, | |
"loss": 0.026658546179533005, | |
"overall_throughput": 42.25155762364806, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.342710971832275, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18449, | |
"batch_size": 79, | |
"total_loss": 0.7395188808441162, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:01.831915" | |
} | |
total tokens: 7725 num samples: 5 num padding tokens: 1821 - rank: 3 max len: 1545 min len: 1004 avg len: 1180.8 num_loss_counted_tokens: 4073 | |
total tokens: 7725 num samples: 25 num padding tokens: 2825 - rank: 7 max len: 309 min len: 84 avg len: 196.0 num_loss_counted_tokens: 2238 | |
total tokens: 7540 num samples: 13 num padding tokens: 2357 - rank: 6 max len: 580 min len: 317 avg len: 398.6923076923077 num_loss_counted_tokens: 3252 | |
total tokens: 7920 num samples: 11 num padding tokens: 539 - rank: 5 max len: 720 min len: 583 avg len: 671.0 num_loss_counted_tokens: 4509 | |
total tokens: 7732 num samples: 2 num padding tokens: 720 - rank: 0 max len: 3866 min len: 3146 avg len: 3506.0 num_loss_counted_tokens: 197 | |
Per-token loss scaled by world size: 0.00016949654673226178Per-token loss scaled by world size: 0.00019448986859060824Per-token loss scaled by world size: 0.0003709697921294719Per-token loss scaled by world size: 0.00028232726617716253Per-token loss scaled by world size: 0.00031966116512194276Per-token loss scaled by world size: 4.294802783988416e-05 | |
Per-token loss scaled by world size: 4.4173757487442344e-06 | |
Epoch: 0, Step: 95, Rank: 6, loss = 1.1748613119125366 | |
Epoch: 0, Step: 95, Rank: 7, loss = 0.8941304087638855Epoch: 0, Step: 95, Rank: 1, loss = 0.13601639866828918 | |
Epoch: 0, Step: 95, Rank: 3, loss = 0.5367955565452576Epoch: 0, Step: 95, Rank: 0, loss = 0.013989828526973724 | |
Epoch: 0, Step: 95, Rank: 2, loss = 0.6159493923187256Epoch: 0, Step: 95, Rank: 4, loss = 1.0123668909072876 | |
Per-token loss scaled by world size: 0.00031159218633547425 | |
Epoch: 0, Step: 95, Rank: 5, loss = 0.9868124723434448 | |
Epoch 0: 79%|███████▊ | 95/121 [04:02<01:06, 2.56s/it] total tokens: 7953 num samples: 11 num padding tokens: 802 - rank: 4 max len: 723 min len: 574 avg len: 650.0909090909091 num_loss_counted_tokens: 3865 | |
total tokens: 7895 num samples: 5 num padding tokens: 903 - rank: 1 max len: 1579 min len: 1252 avg len: 1398.4 num_loss_counted_tokens: 4598 | |
{ | |
"epoch": 0, | |
"step": 95, | |
"rank": 0, | |
"loss": 0.013989828526973724, | |
"overall_throughput": 40.78303709583472, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.430901527404785, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25336, | |
"batch_size": 79, | |
"total_loss": 0.6713653802871704, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:04.432513" | |
} | |
total tokens: 7994 num samples: 14 num padding tokens: 1263 - rank: 5 max len: 571 min len: 393 avg len: 480.7857142857143 num_loss_counted_tokens: 4425 | |
total tokens: 5280 num samples: 24 num padding tokens: 2248 - rank: 7 max len: 220 min len: 77 avg len: 126.33333333333333 num_loss_counted_tokens: 1108 | |
total tokens: 7212 num samples: 6 num padding tokens: 1461 - rank: 2 max len: 1202 min len: 820 avg len: 958.5 num_loss_counted_tokens: 3497 | |
total tokens: 7820 num samples: 20 num padding tokens: 2085 - rank: 6 max len: 391 min len: 220 avg len: 286.75 num_loss_counted_tokens: 3565 | |
total tokens: 7326 num samples: 9 num padding tokens: 374 - rank: 3 max len: 814 min len: 726 avg len: 772.4444444444445 num_loss_counted_tokens: 2645 | |
total tokens: 7032 num samples: 3 num padding tokens: 98 - rank: 0 max len: 2344 min len: 2282 avg len: 2311.3333333333335 num_loss_counted_tokens: 312 | |
Per-token loss scaled by world size: 0.0003659721987787634Per-token loss scaled by world size: 1.1935087059100624e-05Per-token loss scaled by world size: 0.00034196022897958755Per-token loss scaled by world size: 4.9577370191400405e-06 | |
Per-token loss scaled by world size: 0.000384376646252349 | |
Per-token loss scaled by world size: 3.742313765542349e-06Per-token loss scaled by world size: 0.0004321872256696224 | |
Epoch: 0, Step: 96, Rank: 3, loss = 1.043386697769165 | |
Epoch: 0, Step: 96, Rank: 2, loss = 0.03402693197131157 | |
Epoch: 0, Step: 96, Rank: 6, loss = 0.974928617477417 | |
Epoch: 0, Step: 96, Rank: 1, loss = 0.01413450762629509 | |
Epoch: 0, Step: 96, Rank: 7, loss = 1.095857858657837 | |
Epoch: 0, Step: 96, Rank: 0, loss = 0.010669336654245853Epoch: 0, Step: 96, Rank: 4, loss = 1.232165813446045 | |
Per-token loss scaled by world size: 0.00038899367791600525 | |
Epoch: 0, Step: 96, Rank: 5, loss = 1.1090209484100342 | |
Epoch 0: 79%|███████▉ | 96/121 [04:05<01:03, 2.55s/it] total tokens: 6210 num samples: 3 num padding tokens: 1280 - rank: 1 max len: 2070 min len: 1379 avg len: 1643.3333333333333 num_loss_counted_tokens: 704 | |
total tokens: 8100 num samples: 10 num padding tokens: 581 - rank: 4 max len: 810 min len: 687 avg len: 751.9 num_loss_counted_tokens: 4814 | |
{ | |
"epoch": 0, | |
"step": 96, | |
"rank": 0, | |
"loss": 0.010669336654245853, | |
"overall_throughput": 42.34869571304124, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.221776962280273, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22808, | |
"batch_size": 79, | |
"total_loss": 0.6892738342285156, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:06.964617" | |
} | |
total tokens: 5532 num samples: 2 num padding tokens: 327 - rank: 0 max len: 2766 min len: 2439 avg len: 2602.5 num_loss_counted_tokens: 211 | |
total tokens: 7075 num samples: 25 num padding tokens: 2496 - rank: 7 max len: 283 min len: 84 avg len: 183.16 num_loss_counted_tokens: 1819 | |
total tokens: 7546 num samples: 11 num padding tokens: 1067 - rank: 5 max len: 686 min len: 503 avg len: 589.0 num_loss_counted_tokens: 4558 | |
total tokens: 7595 num samples: 7 num padding tokens: 688 - rank: 3 max len: 1085 min len: 846 avg len: 986.7142857142857 num_loss_counted_tokens: 4050 | |
total tokens: 8048 num samples: 16 num padding tokens: 1729 - rank: 6 max len: 503 min len: 314 avg len: 394.9375 num_loss_counted_tokens: 3712 | |
total tokens: 8070 num samples: 6 num padding tokens: 955 - rank: 2 max len: 1345 min len: 1089 avg len: 1185.8333333333333 num_loss_counted_tokens: 3253 | |
Per-token loss scaled by world size: 0.00020825346291530877Per-token loss scaled by world size: 0.0002562287845648825Per-token loss scaled by world size: 8.292648271890357e-05Per-token loss scaled by world size: 9.641618089517578e-05Per-token loss scaled by world size: 0.00017162703443318605Per-token loss scaled by world size: 9.9565637356136e-05Per-token loss scaled by world size: 8.746929961489514e-05 | |
Epoch: 0, Step: 97, Rank: 2, loss = 0.41501447558403015Epoch: 0, Step: 97, Rank: 0, loss = 0.3645939230918884Epoch: 0, Step: 97, Rank: 3, loss = 0.3456583023071289 | |
Epoch: 0, Step: 97, Rank: 6, loss = 0.8680524826049805Epoch: 0, Step: 97, Rank: 1, loss = 0.4018867313861847Epoch: 0, Step: 97, Rank: 7, loss = 0.7153843641281128 | |
Epoch: 0, Step: 97, Rank: 4, loss = 1.0680255889892578 | |
Per-token loss scaled by world size: 0.00028643777477554977 | |
Epoch: 0, Step: 97, Rank: 5, loss = 1.1939442157745361 | |
Epoch 0: 80%|████████ | 97/121 [04:07<01:00, 2.54s/it] total tokens: 8016 num samples: 8 num padding tokens: 1080 - rank: 4 max len: 1002 min len: 797 avg len: 867.0 num_loss_counted_tokens: 5620 | |
total tokens: 7887 num samples: 3 num padding tokens: 742 - rank: 1 max len: 2629 min len: 1930 avg len: 2381.6666666666665 num_loss_counted_tokens: 1049 | |
total tokens: 7076 num samples: 29 num padding tokens: 2680 - rank: 7 max len: 244 min len: 78 avg len: 151.58620689655172 num_loss_counted_tokens: 1550 | |
{ | |
"epoch": 0, | |
"step": 97, | |
"rank": 0, | |
"loss": 0.3645939230918884, | |
"overall_throughput": 41.762877356480914, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.456066131591797, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 33346, | |
"batch_size": 92, | |
"total_loss": 0.6715700030326843, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:09.460519" | |
} | |
total tokens: 8048 num samples: 16 num padding tokens: 2637 - rank: 6 max len: 503 min len: 255 avg len: 338.1875 num_loss_counted_tokens: 3126 | |
total tokens: 7820 num samples: 10 num padding tokens: 1247 - rank: 5 max len: 782 min len: 563 avg len: 657.3 num_loss_counted_tokens: 4844 | |
total tokens: 7668 num samples: 4 num padding tokens: 419 - rank: 2 max len: 1917 min len: 1666 avg len: 1812.25 num_loss_counted_tokens: 734 | |
total tokens: 7176 num samples: 6 num padding tokens: 542 - rank: 3 max len: 1196 min len: 1025 avg len: 1105.6666666666667 num_loss_counted_tokens: 3353 | |
total tokens: 7128 num samples: 2 num padding tokens: 2 - rank: 0 max len: 3564 min len: 3562 avg len: 3563.0 num_loss_counted_tokens: 172 | |
Per-token loss scaled by world size: 0.0005401856615208089Per-token loss scaled by world size: 0.000174855042132549Per-token loss scaled by world size: 0.0003811018541455269Per-token loss scaled by world size: 2.7653879442368634e-05Per-token loss scaled by world size: 0.00025230227038264275 | |
Per-token loss scaled by world size: 6.325829599518329e-05 | |
Per-token loss scaled by world size: 0.0003237307828385383 | |
Epoch: 0, Step: 98, Rank: 3, loss = 0.4728299081325531 | |
Epoch: 0, Step: 98, Rank: 2, loss = 1.030547022819519Epoch: 0, Step: 98, Rank: 0, loss = 0.07477954775094986 | |
Epoch: 0, Step: 98, Rank: 5, loss = 1.4607295989990234 | |
Epoch: 0, Step: 98, Rank: 4, loss = 0.6822568774223328 | |
Epoch: 0, Step: 98, Rank: 1, loss = 0.17105834186077118 | |
Epoch: 0, Step: 98, Rank: 7, loss = 0.8754084706306458 | |
Per-token loss scaled by world size: 0.00048297818284481764 | |
Epoch: 0, Step: 98, Rank: 6, loss = 1.3060333728790283 | |
Epoch 0: 81%|████████ | 98/121 [04:10<00:57, 2.52s/it] total tokens: 7452 num samples: 9 num padding tokens: 759 - rank: 4 max len: 828 min len: 690 avg len: 743.6666666666666 num_loss_counted_tokens: 4766 | |
total tokens: 8040 num samples: 6 num padding tokens: 578 - rank: 1 max len: 1340 min len: 1184 avg len: 1243.6666666666667 num_loss_counted_tokens: 2162 | |
{ | |
"epoch": 0, | |
"step": 98, | |
"rank": 0, | |
"loss": 0.07477954775094986, | |
"overall_throughput": 42.80100675786857, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.374258518218994, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21633, | |
"batch_size": 82, | |
"total_loss": 0.7592054009437561, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:11.932654" | |
} | |
total tokens: 7994 num samples: 7 num padding tokens: 237 - rank: 2 max len: 1142 min len: 1003 avg len: 1108.142857142857 num_loss_counted_tokens: 2239 | |
total tokens: 8008 num samples: 8 num padding tokens: 518 - rank: 3 max len: 1001 min len: 839 avg len: 936.25 num_loss_counted_tokens: 4873 | |
total tokens: 7579 num samples: 11 num padding tokens: 1520 - rank: 5 max len: 689 min len: 432 avg len: 550.8181818181819 num_loss_counted_tokens: 3863 | |
total tokens: 7808 num samples: 32 num padding tokens: 2556 - rank: 7 max len: 244 min len: 86 avg len: 164.125 num_loss_counted_tokens: 1928 | |
total tokens: 8080 num samples: 4 num padding tokens: 1956 - rank: 0 max len: 2020 min len: 1348 avg len: 1531.0 num_loss_counted_tokens: 2972 | |
total tokens: 7740 num samples: 18 num padding tokens: 1567 - rank: 6 max len: 430 min len: 249 avg len: 342.94444444444446 num_loss_counted_tokens: 3382 | |
Per-token loss scaled by world size: 0.00014469273446593434Per-token loss scaled by world size: 0.0002714892034418881Per-token loss scaled by world size: 0.00024302249948959798Per-token loss scaled by world size: 0.0003524755884427577Per-token loss scaled by world size: 6.74632319714874e-05 | |
Per-token loss scaled by world size: 0.0003295539354439825 | |
Per-token loss scaled by world size: 0.0001440553314751014 | |
Epoch: 0, Step: 99, Rank: 3, loss = 0.4647168815135956Epoch: 0, Step: 99, Rank: 5, loss = 0.8719554543495178 | |
Epoch: 0, Step: 99, Rank: 1, loss = 0.2166750431060791 | |
Epoch: 0, Step: 99, Rank: 7, loss = 0.7805275321006775 | |
Epoch: 0, Step: 99, Rank: 6, loss = 1.1320635080337524 | |
Epoch: 0, Step: 99, Rank: 4, loss = 1.058444857597351 | |
Per-token loss scaled by world size: 0.00019673565111588687 | |
Epoch: 0, Step: 99, Rank: 2, loss = 0.4626697301864624 | |
Epoch: 0, Step: 99, Rank: 0, loss = 0.6318657398223877 | |
Epoch 0: 82%|████████▏ | 99/121 [04:12<00:55, 2.54s/it] total tokens: 7893 num samples: 9 num padding tokens: 665 - rank: 4 max len: 877 min len: 700 avg len: 803.1111111111111 num_loss_counted_tokens: 3967 | |
total tokens: 2590 num samples: 14 num padding tokens: 751 - rank: 7 max len: 185 min len: 86 avg len: 131.35714285714286 num_loss_counted_tokens: 581 | |
total tokens: 7648 num samples: 8 num padding tokens: 131 - rank: 3 max len: 956 min len: 911 avg len: 939.625 num_loss_counted_tokens: 5962 | |
total tokens: 7645 num samples: 5 num padding tokens: 395 - rank: 1 max len: 1529 min len: 1349 avg len: 1450.0 num_loss_counted_tokens: 2441 | |
{ | |
"epoch": 0, | |
"step": 99, | |
"rank": 0, | |
"loss": 0.6318657398223877, | |
"overall_throughput": 40.78082568043985, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.533984184265137, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25694, | |
"batch_size": 91, | |
"total_loss": 0.7023648619651794, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:14.529016" | |
} | |
total tokens: 7752 num samples: 6 num padding tokens: 918 - rank: 2 max len: 1292 min len: 958 avg len: 1139.0 num_loss_counted_tokens: 2321 | |
total tokens: 7260 num samples: 3 num padding tokens: 1692 - rank: 0 max len: 2420 min len: 1564 avg len: 1856.0 num_loss_counted_tokens: 2978 | |
total tokens: 7740 num samples: 20 num padding tokens: 1925 - rank: 6 max len: 387 min len: 196 avg len: 290.75 num_loss_counted_tokens: 3585 | |
total tokens: 8064 num samples: 12 num padding tokens: 1582 - rank: 5 max len: 672 min len: 401 avg len: 540.1666666666666 num_loss_counted_tokens: 3589 | |
Per-token loss scaled by world size: 0.0002912842610385269Per-token loss scaled by world size: 0.00046773377107456326Per-token loss scaled by world size: 0.00034302467247471213Per-token loss scaled by world size: 0.0002211699465988204 | |
Per-token loss scaled by world size: 6.557774031534791e-05Per-token loss scaled by world size: 2.8064672733307816e-05 | |
Per-token loss scaled by world size: 2.8628314794332255e-06 | |
Epoch: 0, Step: 100, Rank: 5, loss = 1.2855077981948853 | |
Epoch: 0, Step: 100, Rank: 7, loss = 0.9427604675292969 | |
Epoch: 0, Step: 100, Rank: 4, loss = 0.6078579425811768 | |
Epoch: 0, Step: 100, Rank: 3, loss = 0.8005583882331848 | |
Epoch: 0, Step: 100, Rank: 2, loss = 0.1802322268486023 | |
Epoch: 0, Step: 100, Rank: 1, loss = 0.07713224738836288 | |
Epoch: 0, Step: 100, Rank: 0, loss = 0.007868134416639805 | |
Per-token loss scaled by world size: 0.0006381099228747189 | |
Epoch: 0, Step: 100, Rank: 6, loss = 1.753765344619751 | |
Epoch 0: 83%|████████▎ | 100/121 [04:15<00:53, 2.54s/it] total tokens: 5672 num samples: 2 num padding tokens: 1177 - rank: 1 max len: 2836 min len: 1659 avg len: 2247.5 num_loss_counted_tokens: 502 | |
total tokens: 7504 num samples: 8 num padding tokens: 1233 - rank: 4 max len: 938 min len: 657 avg len: 783.875 num_loss_counted_tokens: 4542 | |
{ | |
"epoch": 0, | |
"step": 100, | |
"rank": 0, | |
"loss": 0.007868134416639805, | |
"overall_throughput": 41.97207337460162, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.315226078033447, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21987, | |
"batch_size": 69, | |
"total_loss": 0.7069603800773621, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:17.051456" | |
} | |
total tokens: 7908 num samples: 6 num padding tokens: 1459 - rank: 3 max len: 1318 min len: 961 avg len: 1074.8333333333333 num_loss_counted_tokens: 4507 | |
total tokens: 7644 num samples: 12 num padding tokens: 705 - rank: 5 max len: 637 min len: 515 avg len: 578.25 num_loss_counted_tokens: 4604 | |
total tokens: 8032 num samples: 16 num padding tokens: 1718 - rank: 6 max len: 502 min len: 297 avg len: 394.625 num_loss_counted_tokens: 3774 | |
total tokens: 7306 num samples: 26 num padding tokens: 2547 - rank: 7 max len: 281 min len: 87 avg len: 183.03846153846155 num_loss_counted_tokens: 1982 | |
total tokens: 6576 num samples: 4 num padding tokens: 507 - rank: 2 max len: 1644 min len: 1411 avg len: 1517.25 num_loss_counted_tokens: 3636 | |
total tokens: 6544 num samples: 2 num padding tokens: 236 - rank: 0 max len: 3272 min len: 3036 avg len: 3154.0 num_loss_counted_tokens: 209 | |
Per-token loss scaled by world size: 0.00010836837464012206Per-token loss scaled by world size: 0.0003503480111248791Per-token loss scaled by world size: 0.0005262716440483928Per-token loss scaled by world size: 0.0004925990360789001Per-token loss scaled by world size: 0.0006540374597534537 | |
Per-token loss scaled by world size: 8.3443388575688e-05Per-token loss scaled by world size: 2.478029045960284e-06 | |
Epoch: 0, Step: 101, Rank: 6, loss = 1.2019386291503906 | |
Epoch: 0, Step: 101, Rank: 4, loss = 0.8001510500907898 | |
Epoch: 0, Step: 101, Rank: 5, loss = 1.4937398433685303 | |
Epoch: 0, Step: 101, Rank: 2, loss = 0.24749982357025146 | |
Epoch: 0, Step: 101, Rank: 7, loss = 1.1250346899032593Epoch: 0, Step: 101, Rank: 1, loss = 0.1905742734670639 | |
Epoch: 0, Step: 101, Rank: 0, loss = 0.005659508518874645 | |
Per-token loss scaled by world size: 0.00031609582947567105 | |
Epoch: 0, Step: 101, Rank: 3, loss = 0.7219233512878418 | |
Epoch 0: 83%|████████▎ | 101/121 [04:17<00:50, 2.54s/it] total tokens: 7903 num samples: 7 num padding tokens: 827 - rank: 4 max len: 1129 min len: 858 avg len: 1010.8571428571429 num_loss_counted_tokens: 3333 | |
total tokens: 5778 num samples: 2 num padding tokens: 561 - rank: 1 max len: 2889 min len: 2328 avg len: 2608.5 num_loss_counted_tokens: 499 | |
{ | |
"epoch": 0, | |
"step": 101, | |
"rank": 0, | |
"loss": 0.005659508518874645, | |
"overall_throughput": 41.27268820447733, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.25443983078003, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18271, | |
"batch_size": 78, | |
"total_loss": 0.7233151197433472, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:19.615502" | |
} | |
total tokens: 7981 num samples: 23 num padding tokens: 4218 - rank: 7 max len: 347 min len: 81 avg len: 163.6086956521739 num_loss_counted_tokens: 1503 | |
total tokens: 7659 num samples: 9 num padding tokens: 737 - rank: 5 max len: 851 min len: 617 avg len: 769.1111111111111 num_loss_counted_tokens: 5452 | |
total tokens: 7125 num samples: 5 num padding tokens: 617 - rank: 3 max len: 1425 min len: 1151 avg len: 1301.6 num_loss_counted_tokens: 3466 | |
total tokens: 6171 num samples: 3 num padding tokens: 616 - rank: 2 max len: 2057 min len: 1742 avg len: 1851.6666666666667 num_loss_counted_tokens: 894 | |
total tokens: 6344 num samples: 2 num padding tokens: 281 - rank: 0 max len: 3172 min len: 2891 avg len: 3031.5 num_loss_counted_tokens: 198 | |
total tokens: 7878 num samples: 13 num padding tokens: 1215 - rank: 6 max len: 606 min len: 373 avg len: 512.5384615384615 num_loss_counted_tokens: 4756 | |
Per-token loss scaled by world size: 0.0004239458357915282Per-token loss scaled by world size: 0.00031359592685475945Per-token loss scaled by world size: 0.00021933596872258931Per-token loss scaled by world size: 0.00028875406133010983 | |
Per-token loss scaled by world size: 0.000378787808585912 | |
Per-token loss scaled by world size: 6.144649523776025e-05 | |
Per-token loss scaled by world size: 0.0003507360816001892 | |
Epoch: 0, Step: 102, Rank: 2, loss = 0.9117801189422607 | |
Epoch: 0, Step: 102, Rank: 5, loss = 1.232622504234314 | |
Epoch: 0, Step: 102, Rank: 4, loss = 0.8395524621009827Epoch: 0, Step: 102, Rank: 6, loss = 1.101325511932373Epoch: 0, Step: 102, Rank: 3, loss = 0.6377193331718445Epoch: 0, Step: 102, Rank: 0, loss = 0.1786556839942932 | |
Epoch: 0, Step: 102, Rank: 7, loss = 1.0197651386260986 | |
Per-token loss scaled by world size: 0.00012210274871904403 | |
Epoch: 0, Step: 102, Rank: 1, loss = 0.35501372814178467 | |
Epoch 0: 84%|████████▍ | 102/121 [04:20<00:48, 2.56s/it] total tokens: 7592 num samples: 8 num padding tokens: 994 - rank: 4 max len: 949 min len: 717 avg len: 824.75 num_loss_counted_tokens: 4547 | |
total tokens: 6717 num samples: 3 num padding tokens: 1106 - rank: 1 max len: 2239 min len: 1619 avg len: 1870.3333333333333 num_loss_counted_tokens: 2357 | |
{ | |
"epoch": 0, | |
"step": 102, | |
"rank": 0, | |
"loss": 0.1786556839942932, | |
"overall_throughput": 40.7946280716196, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.383514404296875, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23260, | |
"batch_size": 97, | |
"total_loss": 0.7845543026924133, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:22.207291" | |
} | |
total tokens: 8022 num samples: 14 num padding tokens: 1912 - rank: 6 max len: 573 min len: 313 avg len: 436.42857142857144 num_loss_counted_tokens: 3612 | |
total tokens: 7575 num samples: 5 num padding tokens: 437 - rank: 2 max len: 1515 min len: 1354 avg len: 1427.6 num_loss_counted_tokens: 3239 | |
total tokens: 7656 num samples: 11 num padding tokens: 467 - rank: 5 max len: 696 min len: 601 avg len: 653.5454545454545 num_loss_counted_tokens: 3901 | |
total tokens: 7392 num samples: 24 num padding tokens: 2967 - rank: 7 max len: 308 min len: 80 avg len: 184.375 num_loss_counted_tokens: 1964 | |
total tokens: 7478 num samples: 2 num padding tokens: 770 - rank: 0 max len: 3739 min len: 2969 avg len: 3354.0 num_loss_counted_tokens: 161 | |
total tokens: 7944 num samples: 6 num padding tokens: 1452 - rank: 3 max len: 1324 min len: 951 avg len: 1082.0 num_loss_counted_tokens: 5314 | |
Per-token loss scaled by world size: 0.000377753924112767Per-token loss scaled by world size: 0.0003784565778914839Per-token loss scaled by world size: 0.00017589255003258586Per-token loss scaled by world size: 0.00025099579943343997Per-token loss scaled by world size: 0.0002923366264440119 | |
Per-token loss scaled by world size: 1.306815647694748e-06 | |
Per-token loss scaled by world size: 8.565741882193834e-05 | |
Epoch: 0, Step: 103, Rank: 5, loss = 1.0730663537979126 | |
Epoch: 0, Step: 103, Rank: 3, loss = 0.7116672396659851Epoch: 0, Step: 103, Rank: 1, loss = 0.49872133135795593Epoch: 0, Step: 103, Rank: 0, loss = 0.0037053122650831938Epoch: 0, Step: 103, Rank: 6, loss = 1.0710740089416504 | |
Epoch: 0, Step: 103, Rank: 4, loss = 0.828883945941925 | |
Epoch: 0, Step: 103, Rank: 7, loss = 0.24287091195583344 | |
Per-token loss scaled by world size: 0.00028907370870001614 | |
Epoch: 0, Step: 103, Rank: 2, loss = 0.819632351398468 | |
Epoch 0: 85%|████████▌ | 103/121 [04:22<00:45, 2.55s/it] total tokens: 7472 num samples: 8 num padding tokens: 859 - rank: 4 max len: 934 min len: 761 avg len: 826.625 num_loss_counted_tokens: 4441 | |
total tokens: 5632 num samples: 2 num padding tokens: 254 - rank: 1 max len: 2816 min len: 2562 avg len: 2689.0 num_loss_counted_tokens: 324 | |
total tokens: 7860 num samples: 30 num padding tokens: 3085 - rank: 7 max len: 262 min len: 70 avg len: 159.16666666666666 num_loss_counted_tokens: 2067 | |
total tokens: 7635 num samples: 15 num padding tokens: 2223 - rank: 6 max len: 509 min len: 271 avg len: 360.8 num_loss_counted_tokens: 2976 | |
{ | |
"epoch": 0, | |
"step": 103, | |
"rank": 0, | |
"loss": 0.0037053122650831938, | |
"overall_throughput": 41.79496526057776, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.362098217010498, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22683, | |
"batch_size": 80, | |
"total_loss": 0.6562026739120483, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:24.741846" | |
} | |
total tokens: 7140 num samples: 5 num padding tokens: 839 - rank: 3 max len: 1428 min len: 1015 avg len: 1260.2 num_loss_counted_tokens: 4097 | |
total tokens: 7520 num samples: 10 num padding tokens: 1082 - rank: 5 max len: 752 min len: 531 avg len: 643.8 num_loss_counted_tokens: 3194 | |
total tokens: 6831 num samples: 3 num padding tokens: 752 - rank: 2 max len: 2277 min len: 1793 avg len: 2026.3333333333333 num_loss_counted_tokens: 2435 | |
total tokens: 6642 num samples: 2 num padding tokens: 16 - rank: 0 max len: 3321 min len: 3305 avg len: 3313.0 num_loss_counted_tokens: 167 | |
Per-token loss scaled by world size: 0.00023188922205008566Per-token loss scaled by world size: 0.0005923461285419762Per-token loss scaled by world size: 0.0007710273494012654 | |
Per-token loss scaled by world size: 0.0006771996268071234Per-token loss scaled by world size: 5.4260908655123785e-06Per-token loss scaled by world size: 7.5567550084088e-06Per-token loss scaled by world size: 0.00048243210767395794 | |
Epoch: 0, Step: 104, Rank: 5, loss = 1.1573703289031982Epoch: 0, Step: 104, Rank: 6, loss = 1.5064910650253296 | |
Epoch: 0, Step: 104, Rank: 3, loss = 0.4530825614929199 | |
Epoch: 0, Step: 104, Rank: 4, loss = 1.323163390159607Epoch: 0, Step: 104, Rank: 2, loss = 0.010601903311908245 | |
Epoch: 0, Step: 104, Rank: 1, loss = 0.014764954335987568 | |
Epoch: 0, Step: 104, Rank: 7, loss = 0.9426120519638062 | |
Per-token loss scaled by world size: 8.817338675726205e-05 | |
Epoch: 0, Step: 104, Rank: 0, loss = 0.17227977514266968 | |
Epoch 0: 86%|████████▌ | 104/121 [04:25<00:43, 2.55s/it] total tokens: 7770 num samples: 10 num padding tokens: 458 - rank: 4 max len: 777 min len: 676 avg len: 731.2 num_loss_counted_tokens: 4387 | |
total tokens: 7380 num samples: 5 num padding tokens: 709 - rank: 1 max len: 1476 min len: 1222 avg len: 1334.2 num_loss_counted_tokens: 2606 | |
total tokens: 7931 num samples: 7 num padding tokens: 471 - rank: 2 max len: 1133 min len: 997 avg len: 1065.7142857142858 num_loss_counted_tokens: 2877 | |
{ | |
"epoch": 0, | |
"step": 104, | |
"rank": 0, | |
"loss": 0.17227977514266968, | |
"overall_throughput": 41.371928375269654, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.487372398376465, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 15631, | |
"batch_size": 56, | |
"total_loss": 0.6975457668304443, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:27.296912" | |
} | |
total tokens: 8100 num samples: 12 num padding tokens: 2394 - rank: 5 max len: 675 min len: 353 avg len: 475.5 num_loss_counted_tokens: 3145 | |
total tokens: 7684 num samples: 34 num padding tokens: 2510 - rank: 7 max len: 226 min len: 80 avg len: 152.1764705882353 num_loss_counted_tokens: 1885 | |
total tokens: 7496 num samples: 8 num padding tokens: 516 - rank: 3 max len: 937 min len: 786 avg len: 872.5 num_loss_counted_tokens: 4095 | |
total tokens: 8073 num samples: 23 num padding tokens: 1383 - rank: 6 max len: 351 min len: 231 avg len: 290.8695652173913 num_loss_counted_tokens: 3443 | |
total tokens: 6147 num samples: 3 num padding tokens: 667 - rank: 0 max len: 2049 min len: 1588 avg len: 1826.6666666666667 num_loss_counted_tokens: 458 | |
Per-token loss scaled by world size: 0.00020804539963137358Per-token loss scaled by world size: 0.00024777904036454856Per-token loss scaled by world size: 0.00040929196984507143Per-token loss scaled by world size: 0.00047369435196742415Per-token loss scaled by world size: 0.00023292600235436112 | |
Per-token loss scaled by world size: 0.0002660582831595093Per-token loss scaled by world size: 2.4003027647268027e-05 | |
Epoch: 0, Step: 105, Rank: 6, loss = 1.2955113649368286Epoch: 0, Step: 105, Rank: 5, loss = 1.4993610382080078 | |
Epoch: 0, Step: 105, Rank: 7, loss = 0.7842825651168823 | |
Epoch: 0, Step: 105, Rank: 3, loss = 0.7372690439224243Epoch: 0, Step: 105, Rank: 2, loss = 0.6585156917572021 | |
Epoch: 0, Step: 105, Rank: 0, loss = 0.07597558200359344 | |
Epoch: 0, Step: 105, Rank: 4, loss = 0.8421409726142883 | |
Per-token loss scaled by world size: 0.0001044606979121454 | |
Epoch: 0, Step: 105, Rank: 1, loss = 0.3306442201137543 | |
Epoch 0: 87%|████████▋ | 105/121 [04:28<00:40, 2.56s/it]{ | |
"epoch": 0, | |
"step": 105, | |
"rank": 0, | |
"loss": 0.07597558200359344, | |
"overall_throughput": 41.19873338301903, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.338607788085938, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25322, | |
"batch_size": 90, | |
"total_loss": 0.7779626250267029, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:29.864769" | |
} | |
Per-token loss scaled by world size: 0.0007665135781280696Per-token loss scaled by world size: 0.0007400316535495222Per-token loss scaled by world size: 0.0002667310182005167Per-token loss scaled by world size: 0.000634319381788373Per-token loss scaled by world size: 0.00010545395343797281Per-token loss scaled by world size: 4.777937192557147e-06 | |
Per-token loss scaled by world size: 1.3243148941910476e-06 | |
Epoch: 0, Step: 106, Rank: 2, loss = 0.21801286935806274 | |
Epoch: 0, Step: 106, Rank: 4, loss = 1.5846710205078125Epoch: 0, Step: 106, Rank: 3, loss = 0.5514330267906189Epoch: 0, Step: 106, Rank: 6, loss = 1.3113759756088257Epoch: 0, Step: 106, Rank: 0, loss = 0.00987778790295124 | |
Epoch: 0, Step: 106, Rank: 7, loss = 1.5299229621887207Epoch: 0, Step: 106, Rank: 1, loss = 0.002737855538725853 | |
Per-token loss scaled by world size: 0.0005778921768069267 | |
Epoch: 0, Step: 106, Rank: 5, loss = 1.1947197914123535 | |
Epoch 0: 88%|████████▊ | 106/121 [04:30<00:38, 2.53s/it]{ | |
"epoch": 0, | |
"step": 106, | |
"rank": 0, | |
"loss": 0.00987778790295124, | |
"overall_throughput": 42.66049518366333, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.433530807495117, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 16539, | |
"batch_size": 82, | |
"total_loss": 0.800343930721283, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:32.345371" | |
} | |
Per-token loss scaled by world size: 0.00032297801226377487Per-token loss scaled by world size: 0.00039764257962815464Per-token loss scaled by world size: 0.00018069567158818245Per-token loss scaled by world size: 0.00018984438793268055 | |
Per-token loss scaled by world size: 0.00037407863419502974 | |
Per-token loss scaled by world size: 2.2991487185208825e-06Per-token loss scaled by world size: 0.00024643141659907997 | |
Epoch: 0, Step: 107, Rank: 1, loss = 0.6019198894500732Epoch: 0, Step: 107, Rank: 4, loss = 1.3245971202850342 | |
Epoch: 0, Step: 107, Rank: 2, loss = 0.6323953866958618Epoch: 0, Step: 107, Rank: 6, loss = 1.0758801698684692 | |
Epoch: 0, Step: 107, Rank: 0, loss = 0.007658752147108316Epoch: 0, Step: 107, Rank: 3, loss = 1.2461026906967163 | |
Epoch: 0, Step: 107, Rank: 7, loss = 0.8208938241004944 | |
Per-token loss scaled by world size: 0.00040240780799649656 | |
Epoch: 0, Step: 107, Rank: 5, loss = 1.3404706716537476 | |
Epoch 0: 88%|████████▊ | 107/121 [04:33<00:35, 2.54s/it]{ | |
"epoch": 0, | |
"step": 107, | |
"rank": 0, | |
"loss": 0.007658752147108316, | |
"overall_throughput": 41.73801803946499, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 24.428075790405273, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26649, | |
"batch_size": 107, | |
"total_loss": 0.8812397718429565, | |
"gradnorm": 1.0122549533843994, | |
"weight_norm": 433.0432434082031, | |
"timestamp": "2024-08-18T20:52:34.882920" | |
} | |
Per-token loss scaled by world size: 0.0003084656782448292Per-token loss scaled by world size: 0.0003857612609863281Per-token loss scaled by world size: 7.505448593292385e-05 | |
Per-token loss scaled by world size: 3.2563605145696783e-06Per-token loss scaled by world size: 6.232602754607797e-05 | |
Per-token loss scaled by world size: 0.00027665658853948116 | |
Per-token loss scaled by world size: 0.00029414540040306747 | |
Epoch: 0, Step: 108, Rank: 5, loss = 1.2347253561019897 | |
Epoch: 0, Step: 108, Rank: 1, loss = 0.24023064970970154 | |
Epoch: 0, Step: 108, Rank: 0, loss = 0.01042279601097107Epoch: 0, Step: 108, Rank: 2, loss = 0.199490025639534Epoch: 0, Step: 108, Rank: 4, loss = 0.9873215556144714 | |
Epoch: 0, Step: 108, Rank: 6, loss = 0.9414858818054199 | |
Epoch: 0, Step: 108, Rank: 7, loss = 0.8855085372924805 | |
Per-token loss scaled by world size: 0.0003316248476039618 | |
Epoch: 0, Step: 108, Rank: 3, loss = 1.0614482164382935 | |
[2024-08-18 20:52:37,412] [INFO] [logging.py:96:log_dist] [Rank 0] step=3, skipped=0, lr=[2.4000000000000003e-06], mom=[(0.9, 0.95)] | |
[2024-08-18 20:52:37,489] [INFO] [timer.py:258:stop] epoch=0/micro_step=108/global_step=3, RunningAvgSamplesPerSec=41.632043299114166, CurrSamplesPerSec=41.632043299114166, MemAllocated=22.7GB, MaxMemAllocated=30.58GB | |
Epoch 0: 89%|████████▉ | 108/121 [04:35<00:33, 2.56s/it]{ | |
"epoch": 0, | |
"step": 108, | |
"rank": 0, | |
"loss": 0.01042279601097107, | |
"overall_throughput": 40.57748284835933, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 22.696479320526123, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25606, | |
"batch_size": 79, | |
"total_loss": 0.6950791478157043, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:52:37.550745" | |
} | |
Per-token loss scaled by world size: 0.0006956385332159698Per-token loss scaled by world size: 0.0003228207933716476 | |
Per-token loss scaled by world size: 3.4351643989793956e-05Per-token loss scaled by world size: 7.628021558048204e-05 | |
Per-token loss scaled by world size: 0.0002656075230333954Per-token loss scaled by world size: 0.0005566730978898704 | |
Epoch: 0, Step: 109, Rank: 3, loss = 0.7804192900657654 | |
Epoch: 0, Step: 109, Rank: 5, loss = 1.681706190109253 | |
Epoch: 0, Step: 109, Rank: 1, loss = 0.0830451026558876Epoch: 0, Step: 109, Rank: 2, loss = 0.18440742790699005 | |
Per-token loss scaled by world size: 0.00023115344811230898 | |
Epoch: 0, Step: 109, Rank: 4, loss = 0.6421061754226685Epoch: 0, Step: 109, Rank: 6, loss = 1.345757246017456 | |
Per-token loss scaled by world size: 2.6328027161071077e-05 | |
Epoch: 0, Step: 109, Rank: 7, loss = 0.5588134527206421 | |
Epoch: 0, Step: 109, Rank: 0, loss = 0.06364800781011581 | |
Epoch 0: 90%|█████████ | 109/121 [04:38<00:31, 2.59s/it]{ | |
"epoch": 0, | |
"step": 109, | |
"rank": 0, | |
"loss": 0.06364800781011581, | |
"overall_throughput": 40.20414871551739, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.495201587677002, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19340, | |
"batch_size": 78, | |
"total_loss": 0.6674879193305969, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:52:40.142481" | |
} | |
Per-token loss scaled by world size: 0.00015839148545637727Per-token loss scaled by world size: 1.4700900692332652e-06Per-token loss scaled by world size: 0.00025009570526890457Per-token loss scaled by world size: 0.00011483808339107782Per-token loss scaled by world size: 1.5624003935954534e-05Per-token loss scaled by world size: 0.00039461886626668274 | |
Per-token loss scaled by world size: 0.0002000233216676861 | |
Epoch: 0, Step: 110, Rank: 2, loss = 0.8055582642555237 | |
Epoch: 0, Step: 110, Rank: 5, loss = 1.2710673809051514Epoch: 0, Step: 110, Rank: 0, loss = 0.004735160153359175 | |
Epoch: 0, Step: 110, Rank: 4, loss = 0.36989346146583557 | |
Epoch: 0, Step: 110, Rank: 1, loss = 0.05032491683959961Epoch: 0, Step: 110, Rank: 3, loss = 0.5101789832115173 | |
Epoch: 0, Step: 110, Rank: 7, loss = 0.6442751288414001 | |
Per-token loss scaled by world size: 0.00032995041692629457 | |
Epoch: 0, Step: 110, Rank: 6, loss = 1.0627702474594116 | |
Epoch 0: 91%|█████████ | 110/121 [04:40<00:28, 2.56s/it]{ | |
"epoch": 0, | |
"step": 110, | |
"rank": 0, | |
"loss": 0.004735160153359175, | |
"overall_throughput": 41.98747405186681, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.342525005340576, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25768, | |
"batch_size": 78, | |
"total_loss": 0.5898504853248596, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:52:42.652816" | |
} | |
Per-token loss scaled by world size: 0.0003511311369948089Per-token loss scaled by world size: 0.0003018661809619516Per-token loss scaled by world size: 0.00043145185918547213Per-token loss scaled by world size: 0.0006916387937963009Per-token loss scaled by world size: 0.00025981958606280386 | |
Per-token loss scaled by world size: 0.000197814850253053 | |
Per-token loss scaled by world size: 5.381280425353907e-05 | |
Epoch: 0, Step: 111, Rank: 6, loss = 0.877037763595581 | |
Epoch: 0, Step: 111, Rank: 5, loss = 1.7275407314300537 | |
Epoch: 0, Step: 111, Rank: 4, loss = 0.7539862394332886 | |
Epoch: 0, Step: 111, Rank: 7, loss = 1.0776588916778564 | |
Epoch: 0, Step: 111, Rank: 3, loss = 0.6489643454551697Epoch: 0, Step: 111, Rank: 2, loss = 0.49409204721450806 | |
Epoch: 0, Step: 111, Rank: 1, loss = 0.13441093266010284 | |
Per-token loss scaled by world size: 6.49542971586925e-06 | |
Epoch: 0, Step: 111, Rank: 0, loss = 0.016223959624767303 | |
Epoch 0: 92%|█████████▏| 111/121 [04:43<00:25, 2.57s/it]{ | |
"epoch": 0, | |
"step": 111, | |
"rank": 0, | |
"loss": 0.016223959624767303, | |
"overall_throughput": 41.001441887063585, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.491368293762207, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19982, | |
"batch_size": 69, | |
"total_loss": 0.716239333152771, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:52:45.269018" | |
} | |
Per-token loss scaled by world size: 0.00018836244998965412Per-token loss scaled by world size: 0.00022297118266578764Per-token loss scaled by world size: 0.0004412824346218258Per-token loss scaled by world size: 0.0002451048349030316Per-token loss scaled by world size: 2.037934336840408e-06Per-token loss scaled by world size: 0.00028445024508982897 | |
Per-token loss scaled by world size: 0.00019181481911800802 | |
Epoch: 0, Step: 112, Rank: 4, loss = 0.735774040222168Epoch: 0, Step: 112, Rank: 6, loss = 1.3246747255325317 | |
Epoch: 0, Step: 112, Rank: 1, loss = 0.5654405355453491 | |
Epoch: 0, Step: 112, Rank: 0, loss = 0.00611762423068285Epoch: 0, Step: 112, Rank: 3, loss = 0.6693316102027893 | |
Epoch: 0, Step: 112, Rank: 2, loss = 0.8538841009140015 | |
Epoch: 0, Step: 112, Rank: 7, loss = 0.5758041143417358 | |
Per-token loss scaled by world size: 0.00042108085472136736 | |
Epoch: 0, Step: 112, Rank: 5, loss = 1.2640321254730225 | |
Epoch 0: 93%|█████████▎| 112/121 [04:45<00:23, 2.56s/it]{ | |
"epoch": 0, | |
"step": 112, | |
"rank": 0, | |
"loss": 0.00611762423068285, | |
"overall_throughput": 41.80928245712371, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.407159328460693, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24015, | |
"batch_size": 92, | |
"total_loss": 0.7493823766708374, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:52:47.762170" | |
} | |
Per-token loss scaled by world size: 0.0004581383545883Per-token loss scaled by world size: 0.0004093858879059553Per-token loss scaled by world size: 0.0004562860412988812Per-token loss scaled by world size: 0.0003013689420185983Per-token loss scaled by world size: 8.010071906028315e-05 | |
Per-token loss scaled by world size: 9.55424093262991e-06 | |
Per-token loss scaled by world size: 0.0002337862824788317 | |
Epoch: 0, Step: 113, Rank: 6, loss = 1.3187236785888672 | |
Epoch: 0, Step: 113, Rank: 5, loss = 1.1831763982772827 | |
Epoch: 0, Step: 113, Rank: 4, loss = 1.3240771293640137 | |
Epoch: 0, Step: 113, Rank: 1, loss = 0.23150108754634857 | |
Epoch: 0, Step: 113, Rank: 3, loss = 0.8709939122200012 | |
Epoch: 0, Step: 113, Rank: 0, loss = 0.027612950652837753 | |
Epoch: 0, Step: 113, Rank: 7, loss = 0.6756715774536133 | |
Per-token loss scaled by world size: 0.0002764550154097378 | |
Epoch: 0, Step: 113, Rank: 2, loss = 0.7989895343780518 | |
Epoch 0: 93%|█████████▎| 113/121 [04:48<00:20, 2.56s/it]{ | |
"epoch": 0, | |
"step": 113, | |
"rank": 0, | |
"loss": 0.027612950652837753, | |
"overall_throughput": 41.14323140579119, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.228994369506836, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23121, | |
"batch_size": 80, | |
"total_loss": 0.8038432598114014, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:52:50.330248" | |
} | |
Per-token loss scaled by world size: 0.0005671991966664791Per-token loss scaled by world size: 0.00022458804596681148Per-token loss scaled by world size: 0.00021035455574747175Per-token loss scaled by world size: 4.355869896244258e-05Per-token loss scaled by world size: 7.875097253418062e-06Per-token loss scaled by world size: 0.0004406924417708069 | |
Per-token loss scaled by world size: 0.0002821373345796019 | |
Epoch: 0, Step: 114, Rank: 6, loss = 1.126409888267517 | |
Epoch: 0, Step: 114, Rank: 0, loss = 0.020128747448325157Epoch: 0, Step: 114, Rank: 5, loss = 1.449761152267456Epoch: 0, Step: 114, Rank: 4, loss = 0.5376662611961365Epoch: 0, Step: 114, Rank: 3, loss = 0.5740470290184021Epoch: 0, Step: 114, Rank: 2, loss = 0.11133603751659393 | |
Epoch: 0, Step: 114, Rank: 7, loss = 0.7211430072784424 | |
Per-token loss scaled by world size: 2.8121936338720843e-05 | |
Epoch: 0, Step: 114, Rank: 1, loss = 0.07187967002391815 | |
Epoch 0: 94%|█████████▍| 114/121 [04:51<00:17, 2.56s/it]{ | |
"epoch": 0, | |
"step": 114, | |
"rank": 0, | |
"loss": 0.020128747448325157, | |
"overall_throughput": 41.23395474647752, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.419190883636475, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20448, | |
"batch_size": 78, | |
"total_loss": 0.5765464305877686, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:52:52.896686" | |
} | |
Per-token loss scaled by world size: 0.00020671900711022317Per-token loss scaled by world size: 0.00019181812240276486Per-token loss scaled by world size: 0.00029149872716516256 | |
Per-token loss scaled by world size: 0.00032473698956891894 | |
Per-token loss scaled by world size: 0.00032112447661347687Per-token loss scaled by world size: 0.0003388051700312644 | |
Per-token loss scaled by world size: 0.00025795208057388663 | |
Epoch: 0, Step: 115, Rank: 3, loss = 0.9541117548942566 | |
Epoch: 0, Step: 115, Rank: 1, loss = 0.6278446912765503Epoch: 0, Step: 115, Rank: 0, loss = 0.6766171455383301 | |
Epoch: 0, Step: 115, Rank: 6, loss = 1.062904715538025 | |
Epoch: 0, Step: 115, Rank: 5, loss = 1.051080584526062 | |
Epoch: 0, Step: 115, Rank: 7, loss = 0.8443094491958618Epoch: 0, Step: 115, Rank: 4, loss = 1.1089516878128052 | |
Per-token loss scaled by world size: 0.00011388205894036219 | |
Epoch: 0, Step: 115, Rank: 2, loss = 0.3727502226829529 | |
Epoch 0: 95%|█████████▌| 115/121 [04:53<00:15, 2.57s/it]{ | |
"epoch": 0, | |
"step": 115, | |
"rank": 0, | |
"loss": 0.6766171455383301, | |
"overall_throughput": 40.978208090508964, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.531991481781006, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26185, | |
"batch_size": 95, | |
"total_loss": 0.8373212814331055, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:52:55.477863" | |
} | |
Per-token loss scaled by world size: 0.00025488759274594486Per-token loss scaled by world size: 0.00010854459105757996Per-token loss scaled by world size: 0.00016505405073985457Per-token loss scaled by world size: 0.00032808436662890017Per-token loss scaled by world size: 0.00027797490474767983Per-token loss scaled by world size: 0.00035925835254602134 | |
Epoch: 0, Step: 116, Rank: 5, loss = 1.0426521301269531 | |
Epoch: 0, Step: 116, Rank: 0, loss = 0.5245417952537537Epoch: 0, Step: 116, Rank: 3, loss = 0.8100327849388123 | |
Epoch: 0, Step: 116, Rank: 1, loss = 0.3449546992778778 | |
Epoch: 0, Step: 116, Rank: 6, loss = 1.1417230367660522 | |
Epoch: 0, Step: 116, Rank: 4, loss = 0.8834042549133301 | |
Per-token loss scaled by world size: 6.462022429332137e-05 | |
Per-token loss scaled by world size: 4.7339886805275455e-05 | |
Epoch: 0, Step: 116, Rank: 2, loss = 0.15044616162776947 | |
Epoch: 0, Step: 116, Rank: 7, loss = 0.20536306500434875 | |
Epoch 0: 96%|█████████▌| 116/121 [04:56<00:12, 2.55s/it]{ | |
"epoch": 0, | |
"step": 116, | |
"rank": 0, | |
"loss": 0.5245417952537537, | |
"overall_throughput": 42.05718076221283, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.434387683868408, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25424, | |
"batch_size": 77, | |
"total_loss": 0.6378897428512573, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:52:57.991057" | |
} | |
Per-token loss scaled by world size: 0.00043317166273482144Per-token loss scaled by world size: 0.0003517000295687467 | |
Per-token loss scaled by world size: 0.0003901177551597357Per-token loss scaled by world size: 0.00024814700009301305Per-token loss scaled by world size: 0.0001685179740888998 | |
Per-token loss scaled by world size: 2.624317403387977e-06 | |
Per-token loss scaled by world size: 2.476824556651991e-05 | |
Epoch: 0, Step: 117, Rank: 3, loss = 1.0443732738494873 | |
Epoch: 0, Step: 117, Rank: 5, loss = 1.2863032817840576 | |
Epoch: 0, Step: 117, Rank: 7, loss = 0.7368724942207336Epoch: 0, Step: 117, Rank: 4, loss = 1.1584546566009521 | |
Epoch: 0, Step: 117, Rank: 0, loss = 0.007792910560965538Epoch: 0, Step: 117, Rank: 2, loss = 0.5004141330718994 | |
Epoch: 0, Step: 117, Rank: 1, loss = 0.0735493078827858 | |
Per-token loss scaled by world size: 0.00034316719393245876 | |
Epoch: 0, Step: 117, Rank: 6, loss = 1.0190349817276 | |
Epoch 0: 97%|█████████▋| 117/121 [04:58<00:10, 2.54s/it]{ | |
"epoch": 0, | |
"step": 117, | |
"rank": 0, | |
"loss": 0.007792910560965538, | |
"overall_throughput": 42.14977654824287, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.35035228729248, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23756, | |
"batch_size": 76, | |
"total_loss": 0.7283493876457214, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:00.498739" | |
} | |
Per-token loss scaled by world size: 0.00027391428011469543Per-token loss scaled by world size: 0.00038771191611886024Per-token loss scaled by world size: 0.0005671838880516589Per-token loss scaled by world size: 6.881056378915673e-06 | |
Per-token loss scaled by world size: 4.1250186768593267e-05Per-token loss scaled by world size: 7.28774830349721e-05 | |
Per-token loss scaled by world size: 0.00023806751414667815 | |
Epoch: 0, Step: 118, Rank: 5, loss = 1.42512047290802 | |
Epoch: 0, Step: 118, Rank: 0, loss = 0.01728951372206211Epoch: 0, Step: 118, Rank: 3, loss = 0.9741746783256531 | |
Epoch: 0, Step: 118, Rank: 4, loss = 0.6882438659667969 | |
Epoch: 0, Step: 118, Rank: 2, loss = 0.18311378359794617Epoch: 0, Step: 118, Rank: 1, loss = 0.10364624857902527Epoch: 0, Step: 118, Rank: 7, loss = 0.5981743931770325 | |
Per-token loss scaled by world size: 0.0005670187529176474 | |
Epoch: 0, Step: 118, Rank: 6, loss = 1.4247055053710938 | |
Epoch 0: 98%|█████████▊| 118/121 [05:01<00:07, 2.53s/it]{ | |
"epoch": 0, | |
"step": 118, | |
"rank": 0, | |
"loss": 0.01728951372206211, | |
"overall_throughput": 42.35613860527613, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.3255033493042, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20101, | |
"batch_size": 64, | |
"total_loss": 0.6768085956573486, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:03.034819" | |
} | |
Per-token loss scaled by world size: 0.0003004461759701371Per-token loss scaled by world size: 0.0002987224142998457Per-token loss scaled by world size: 0.0002618007711134851Per-token loss scaled by world size: 0.0003349074686411768Per-token loss scaled by world size: 3.861719960696064e-06 | |
Per-token loss scaled by world size: 0.0002586914342828095 | |
Per-token loss scaled by world size: 0.00010813030530698597 | |
Epoch: 0, Step: 119, Rank: 0, loss = 0.012113732285797596 | |
Epoch: 0, Step: 119, Rank: 2, loss = 0.821236252784729Epoch: 0, Step: 119, Rank: 6, loss = 0.9424620866775513 | |
Epoch: 0, Step: 119, Rank: 4, loss = 0.9370548725128174Epoch: 0, Step: 119, Rank: 5, loss = 1.050562858581543 | |
Epoch: 0, Step: 119, Rank: 7, loss = 0.8114826679229736 | |
Epoch: 0, Step: 119, Rank: 1, loss = 0.3391912579536438 | |
Per-token loss scaled by world size: 0.00019692791101988405 | |
Epoch: 0, Step: 119, Rank: 3, loss = 0.6177382469177246 | |
Epoch 0: 98%|█████████▊| 119/121 [05:03<00:05, 2.52s/it]{ | |
"epoch": 0, | |
"step": 119, | |
"rank": 0, | |
"loss": 0.012113732285797596, | |
"overall_throughput": 41.99486327647021, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.461923599243164, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25095, | |
"batch_size": 73, | |
"total_loss": 0.691480278968811, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:05.516499" | |
} | |
Per-token loss scaled by world size: 0.00044983928091824055Per-token loss scaled by world size: 0.00034963246434926987Per-token loss scaled by world size: 0.0002892126503866166Per-token loss scaled by world size: 0.0003644507669378072Per-token loss scaled by world size: 0.0004460133204702288Per-token loss scaled by world size: 3.170213403791422e-06 | |
Per-token loss scaled by world size: 2.0736099486384774e-06 | |
Epoch: 0, Step: 120, Rank: 5, loss = 0.8975055813789368 | |
Epoch: 0, Step: 120, Rank: 6, loss = 1.0983635187149048 | |
Epoch: 0, Step: 120, Rank: 0, loss = 0.007807046640664339Epoch: 0, Step: 120, Rank: 4, loss = 0.861013650894165 | |
Epoch: 0, Step: 120, Rank: 2, loss = 0.7122223377227783Epoch: 0, Step: 120, Rank: 3, loss = 1.1077854633331299 | |
Epoch: 0, Step: 120, Rank: 1, loss = 0.005106523633003235 | |
Per-token loss scaled by world size: 0.0004087797424290329 | |
Epoch: 0, Step: 120, Rank: 7, loss = 1.0066711902618408 | |
Epoch 0: 99%|█████████▉| 120/121 [05:06<00:02, 2.48s/it]{ | |
"epoch": 0, | |
"step": 120, | |
"rank": 0, | |
"loss": 0.007807046640664339, | |
"overall_throughput": 44.47716496365875, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.361114025115967, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19701, | |
"batch_size": 75, | |
"total_loss": 0.7120593786239624, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:07.898377" | |
} | |
Per-token loss scaled by world size: 0.00019174267072230577Per-token loss scaled by world size: 0.0003345832519698888Per-token loss scaled by world size: 0.0003169576812069863Per-token loss scaled by world size: 0.0004366403736639768Per-token loss scaled by world size: 0.00023642051382921636 | |
Per-token loss scaled by world size: 1.6517016774741933e-05 | |
Epoch: 0, Step: 121, Rank: 5, loss = 0.9575772881507874 | |
Epoch: 0, Step: 121, Rank: 1, loss = 0.5487675070762634Epoch: 0, Step: 121, Rank: 4, loss = 1.2496647834777832 | |
Epoch: 0, Step: 121, Rank: 3, loss = 0.9071329236030579 | |
Epoch: 0, Step: 121, Rank: 7, loss = 0.6766355037689209 | |
Per-token loss scaled by world size: 0.00013164509437046945Epoch: 0, Step: 121, Rank: 0, loss = 0.04727170243859291 | |
Epoch: 0, Step: 121, Rank: 2, loss = 0.3767682611942291 | |
Per-token loss scaled by world size: 0.0003866745682898909 | |
Epoch: 0, Step: 121, Rank: 6, loss = 1.106662631034851 | |
Epoch 0: 100%|██████████| 121/121 [05:08<00:00, 2.50s/it]{ | |
"epoch": 0, | |
"step": 121, | |
"rank": 0, | |
"loss": 0.04727170243859291, | |
"overall_throughput": 41.60628512913175, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.30147409439087, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22896, | |
"batch_size": 102, | |
"total_loss": 0.7338100075721741, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:10.441395" | |
} | |
Saving model in huggingface format at samples_seen: 12688 | |
Model saved in /var/mnt/inststg1/instructlab/phasedbasedir/phase2/checkpoints/hf_format/samples_12688 | |
[20:53:29] INFO saving took 18.56430721282959 seconds utils.py:611 | |
Epoch 0: 100%|██████████| 121/121 [05:27<00:00, 2.70s/it] | |
total tokens: 6858 num samples: 3 num padding tokens: 1697 - rank: 1 max len: 2286 min len: 1282 avg len: 1720.3333333333333 num_loss_counted_tokens: 1935 | |
total tokens: 7868 num samples: 7 num padding tokens: 1154 - rank: 3 max len: 1124 min len: 848 avg len: 959.1428571428571 num_loss_counted_tokens: 3674 | |
total tokens: 6932 num samples: 4 num padding tokens: 207 - rank: 1 max len: 1733 min len: 1622 avg len: 1681.25 num_loss_counted_tokens: 1829 | |
total tokens: 7551 num samples: 3 num padding tokens: 1038 - rank: 1 max len: 2517 min len: 1773 avg len: 2171.0 num_loss_counted_tokens: 344 | |
total tokens: 7515 num samples: 3 num padding tokens: 578 - rank: 1 max len: 2505 min len: 2066 avg len: 2312.3333333333335 num_loss_counted_tokens: 990 | |
total tokens: 7928 num samples: 4 num padding tokens: 1119 - rank: 1 max len: 1982 min len: 1503 avg len: 1702.25 num_loss_counted_tokens: 604 | |
total tokens: 6279 num samples: 3 num padding tokens: 554 - rank: 1 max len: 2093 min len: 1575 avg len: 1908.3333333333333 num_loss_counted_tokens: 1272 | |
total tokens: 6693 num samples: 3 num padding tokens: 401 - rank: 1 max len: 2231 min len: 1907 avg len: 2097.3333333333335 num_loss_counted_tokens: 402 total tokens: 6963 num samples: 3 num padding tokens: 692 - rank: 1 max len: 2321 min len: 1787 avg len: 2090.3333333333335 num_loss_counted_tokens: 386 | |
total tokens: 6988 num samples: 4 num padding tokens: 1322 - rank: 1 max len: 1747 min len: 1233 avg len: 1416.5 num_loss_counted_tokens: 2279 | |
total tokens: 6990 num samples: 5 num padding tokens: 1059 - rank: 2 max len: 1398 min len: 1070 avg len: 1186.2 num_loss_counted_tokens: 4472 | |
total tokens: 7260 num samples: 6 num padding tokens: 870 - rank: 2 max len: 1210 min len: 983 avg len: 1065.0 num_loss_counted_tokens: 3195 | |
total tokens: 7796 num samples: 4 num padding tokens: 966 - rank: 1 max len: 1949 min len: 1423 avg len: 1707.5 num_loss_counted_tokens: 4297 | |
total tokens: 7130 num samples: 5 num padding tokens: 914 - rank: 2 max len: 1426 min len: 1115 avg len: 1243.2 num_loss_counted_tokens: 4551 | |
total tokens: 6874 num samples: 2 num padding tokens: 795 - rank: 1 max len: 3437 min len: 2642 avg len: 3039.5 num_loss_counted_tokens: 158 | |
total tokens: 8064 num samples: 12 num padding tokens: 1509 - rank: 5 max len: 672 min len: 468 avg len: 546.25 num_loss_counted_tokens: 3627 | |
total tokens: 7714 num samples: 7 num padding tokens: 1215 - rank: 3 max len: 1102 min len: 813 avg len: 928.4285714285714 num_loss_counted_tokens: 4827 | |
total tokens: 7722 num samples: 13 num padding tokens: 1136 - rank: 5 max len: 594 min len: 399 avg len: 506.61538461538464 num_loss_counted_tokens: 4746 | |
total tokens: 7575 num samples: 5 num padding tokens: 945 - rank: 2 max len: 1515 min len: 1189 avg len: 1326.0 num_loss_counted_tokens: 2489 | |
total tokens: 7609 num samples: 7 num padding tokens: 1227 - rank: 3 max len: 1087 min len: 823 avg len: 911.7142857142857 num_loss_counted_tokens: 1419 | |
total tokens: 7942 num samples: 11 num padding tokens: 1142 - rank: 5 max len: 722 min len: 517 avg len: 618.1818181818181 num_loss_counted_tokens: 4497 | |
total tokens: 8108 num samples: 4 num padding tokens: 1070 - rank: 2 max len: 2027 min len: 1386 avg len: 1759.5 num_loss_counted_tokens: 2087 | |
total tokens: 6940 num samples: 5 num padding tokens: 950 - rank: 2 max len: 1388 min len: 1065 avg len: 1198.0 num_loss_counted_tokens: 3642 | |
total tokens: 7680 num samples: 5 num padding tokens: 844 - rank: 2 max len: 1536 min len: 1208 avg len: 1367.2 num_loss_counted_tokens: 3999 | |
total tokens: 7700 num samples: 14 num padding tokens: 1915 - rank: 5 max len: 550 min len: 316 avg len: 413.2142857142857 num_loss_counted_tokens: 3693 | |
total tokens: 7966 num samples: 7 num padding tokens: 799 - rank: 3 max len: 1138 min len: 946 avg len: 1023.8571428571429 num_loss_counted_tokens: 5133 | |
total tokens: 7566 num samples: 13 num padding tokens: 1066 - rank: 5 max len: 582 min len: 447 avg len: 500.0 num_loss_counted_tokens: 4009 | |
total tokens: 7344 num samples: 4 num padding tokens: 439 - rank: 2 max len: 1836 min len: 1616 avg len: 1726.25 num_loss_counted_tokens: 888 | |
total tokens: 7696 num samples: 8 num padding tokens: 1050 - rank: 3 max len: 962 min len: 715 avg len: 830.75 num_loss_counted_tokens: 5818 | |
total tokens: 8076 num samples: 12 num padding tokens: 901 - rank: 5 max len: 673 min len: 477 avg len: 597.9166666666666 num_loss_counted_tokens: 3764 | |
total tokens: 7953 num samples: 11 num padding tokens: 496 - rank: 5 max len: 723 min len: 589 avg len: 677.9090909090909 num_loss_counted_tokens: 4581 | |
total tokens: 7320 num samples: 4 num padding tokens: 796 - rank: 1 max len: 1830 min len: 1451 avg len: 1631.0 num_loss_counted_tokens: 2250 | |
total tokens: 7616 num samples: 8 num padding tokens: 348 - rank: 3 max len: 952 min len: 870 avg len: 908.5 num_loss_counted_tokens: 5221 | |
total tokens: 7987 num samples: 7 num padding tokens: 817 - rank: 3 max len: 1141 min len: 951 avg len: 1024.2857142857142 num_loss_counted_tokens: 3832 | |
total tokens: 7826 num samples: 7 num padding tokens: 953 - rank: 3 max len: 1118 min len: 839 avg len: 981.8571428571429 num_loss_counted_tokens: 4081 | |
total tokens: 7764 num samples: 12 num padding tokens: 1322 - rank: 5 max len: 647 min len: 464 avg len: 536.8333333333334 num_loss_counted_tokens: 3693 | |
total tokens: 7224 num samples: 8 num padding tokens: 547 - rank: 3 max len: 903 min len: 776 avg len: 834.625 num_loss_counted_tokens: 3728 | |
total tokens: 7896 num samples: 3 num padding tokens: 906 - rank: 1 max len: 2632 min len: 1930 avg len: 2330.0 num_loss_counted_tokens: 1083 | |
total tokens: 7769 num samples: 17 num padding tokens: 1525 - rank: 6 max len: 457 min len: 273 avg len: 367.29411764705884 num_loss_counted_tokens: 3637 | |
total tokens: 7816 num samples: 4 num padding tokens: 1243 - rank: 1 max len: 1954 min len: 1404 avg len: 1643.25 num_loss_counted_tokens: 2194 | |
total tokens: 7806 num samples: 6 num padding tokens: 497 - rank: 2 max len: 1301 min len: 1097 avg len: 1218.1666666666667 num_loss_counted_tokens: 3744 | |
total tokens: 6950 num samples: 5 num padding tokens: 1216 - rank: 3 max len: 1390 min len: 974 avg len: 1146.8 num_loss_counted_tokens: 3368 | |
total tokens: 7455 num samples: 3 num padding tokens: 849 - rank: 2 max len: 2485 min len: 1841 avg len: 2202.0 num_loss_counted_tokens: 231 | |
total tokens: 7536 num samples: 6 num padding tokens: 917 - rank: 2 max len: 1256 min len: 968 avg len: 1103.1666666666667 num_loss_counted_tokens: 3262 | |
total tokens: 7806 num samples: 6 num padding tokens: 1329 - rank: 2 max len: 1301 min len: 978 avg len: 1079.5 num_loss_counted_tokens: 3537 | |
total tokens: 7709 num samples: 13 num padding tokens: 1295 - rank: 5 max len: 593 min len: 373 avg len: 493.38461538461536 num_loss_counted_tokens: 4385 | |
total tokens: 7220 num samples: 5 num padding tokens: 746 - rank: 1 max len: 1444 min len: 1177 avg len: 1294.8 num_loss_counted_tokens: 3357 | |
total tokens: 6820 num samples: 4 num padding tokens: 1422 - rank: 3 max len: 1705 min len: 1072 avg len: 1349.5 num_loss_counted_tokens: 1716 | |
total tokens: 8019 num samples: 9 num padding tokens: 450 - rank: 3 max len: 891 min len: 790 avg len: 841.0 num_loss_counted_tokens: 4323 | |
total tokens: 7668 num samples: 4 num padding tokens: 969 - rank: 2 max len: 1917 min len: 1447 avg len: 1674.75 num_loss_counted_tokens: 1838 | |
total tokens: 7693 num samples: 7 num padding tokens: 403 - rank: 2 max len: 1099 min len: 978 avg len: 1041.4285714285713 num_loss_counted_tokens: 4254 | |
total tokens: 7525 num samples: 7 num padding tokens: 708 - rank: 3 max len: 1075 min len: 867 avg len: 973.8571428571429 num_loss_counted_tokens: 4837 | |
total tokens: 7740 num samples: 5 num padding tokens: 1159 - rank: 1 max len: 1548 min len: 1219 avg len: 1316.2 num_loss_counted_tokens: 2345 | |
total tokens: 7843 num samples: 11 num padding tokens: 990 - rank: 5 max len: 713 min len: 541 avg len: 623.0 num_loss_counted_tokens: 5313 | |
total tokens: 5614 num samples: 2 num padding tokens: 423 - rank: 0 max len: 2807 min len: 2384 avg len: 2595.5 num_loss_counted_tokens: 193 | |
total tokens: 8037 num samples: 3 num padding tokens: 1697 - rank: 1 max len: 2679 min len: 1665 avg len: 2113.3333333333335 num_loss_counted_tokens: 1007 | |
total tokens: 8022 num samples: 14 num padding tokens: 1252 - rank: 5 max len: 573 min len: 409 avg len: 483.57142857142856 num_loss_counted_tokens: 4146 | |
total tokens: 7945 num samples: 7 num padding tokens: 727 - rank: 2 max len: 1135 min len: 918 avg len: 1031.142857142857 num_loss_counted_tokens: 4527 | |
total tokens: 7891 num samples: 13 num padding tokens: 1039 - rank: 5 max len: 607 min len: 439 avg len: 527.0769230769231 num_loss_counted_tokens: 4697 | |
total tokens: 7472 num samples: 2 num padding tokens: 1064 - rank: 0 max len: 3736 min len: 2672 avg len: 3204.0 num_loss_counted_tokens: 186 | |
total tokens: 6628 num samples: 2 num padding tokens: 541 - rank: 0 max len: 3314 min len: 2773 avg len: 3043.5 num_loss_counted_tokens: 178 | |
total tokens: 7560 num samples: 10 num padding tokens: 554 - rank: 5 max len: 756 min len: 656 avg len: 700.6 num_loss_counted_tokens: 4201 | |
total tokens: 8021 num samples: 13 num padding tokens: 974 - rank: 5 max len: 617 min len: 468 avg len: 542.0769230769231 num_loss_counted_tokens: 4392 | |
total tokens: 4062 num samples: 1 num padding tokens: 0 - rank: 0 max len: 4062 min len: 4062 avg len: 4062.0 num_loss_counted_tokens: 85 | |
total tokens: 7650 num samples: 9 num padding tokens: 478 - rank: 3 max len: 850 min len: 751 avg len: 796.8888888888889 num_loss_counted_tokens: 4488 | |
total tokens: 8041 num samples: 11 num padding tokens: 905 - rank: 5 max len: 731 min len: 567 avg len: 648.7272727272727 num_loss_counted_tokens: 4931 | |
total tokens: 6344 num samples: 2 num padding tokens: 190 - rank: 0 max len: 3172 min len: 2982 avg len: 3077.0 num_loss_counted_tokens: 709 | |
total tokens: 6178 num samples: 2 num padding tokens: 302 - rank: 0 max len: 3089 min len: 2787 avg len: 2938.0 num_loss_counted_tokens: 161 | |
total tokens: 7370 num samples: 5 num padding tokens: 663 - rank: 2 max len: 1474 min len: 1191 avg len: 1341.4 num_loss_counted_tokens: 6093 | |
total tokens: 7680 num samples: 8 num padding tokens: 596 - rank: 3 max len: 960 min len: 833 avg len: 885.5 num_loss_counted_tokens: 5258 | |
total tokens: 7776 num samples: 12 num padding tokens: 1190 - rank: 5 max len: 648 min len: 476 avg len: 548.8333333333334 num_loss_counted_tokens: 4105 | |
total tokens: 7803 num samples: 17 num padding tokens: 1519 - rank: 6 max len: 459 min len: 281 avg len: 369.6470588235294 num_loss_counted_tokens: 4044 total tokens: 7980 num samples: 19 num padding tokens: 2139 - rank: 6 max len: 420 min len: 236 avg len: 307.42105263157896 num_loss_counted_tokens: 3018 | |
total tokens: 8016 num samples: 16 num padding tokens: 1816 - rank: 6 max len: 501 min len: 300 avg len: 387.5 num_loss_counted_tokens: 3555 | |
total tokens: 8064 num samples: 18 num padding tokens: 1914 - rank: 6 max len: 448 min len: 254 avg len: 341.6666666666667 num_loss_counted_tokens: 3451 | |
total tokens: 6752 num samples: 2 num padding tokens: 43 - rank: 0 max len: 3376 min len: 3333 avg len: 3354.5 num_loss_counted_tokens: 441 | |
total tokens: 8010 num samples: 15 num padding tokens: 2334 - rank: 6 max len: 534 min len: 273 avg len: 378.4 num_loss_counted_tokens: 3218 | |
total tokens: 7800 num samples: 20 num padding tokens: 1809 - rank: 6 max len: 390 min len: 229 avg len: 299.55 num_loss_counted_tokens: 3534 | |
total tokens: 8112 num samples: 26 num padding tokens: 2416 - rank: 6 max len: 312 min len: 150 avg len: 219.07692307692307 num_loss_counted_tokens: 2759 | |
total tokens: 7189 num samples: 7 num padding tokens: 616 - rank: 3 max len: 1027 min len: 850 avg len: 939.0 num_loss_counted_tokens: 4467 | |
total tokens: 5632 num samples: 2 num padding tokens: 674 - rank: 0 max len: 2816 min len: 2142 avg len: 2479.0 num_loss_counted_tokens: 167 | |
total tokens: 7192 num samples: 2 num padding tokens: 1145 - rank: 0 max len: 3596 min len: 2451 avg len: 3023.5 num_loss_counted_tokens: 187 | |
total tokens: 5518 num samples: 2 num padding tokens: 53 - rank: 0 max len: 2759 min len: 2706 avg len: 2732.5 num_loss_counted_tokens: 276 | |
total tokens: 6324 num samples: 2 num padding tokens: 282 - rank: 0 max len: 3162 min len: 2880 avg len: 3021.0 num_loss_counted_tokens: 179 | |
total tokens: 8109 num samples: 17 num padding tokens: 1877 - rank: 6 max len: 477 min len: 301 avg len: 366.5882352941176 num_loss_counted_tokens: 3402 | |
total tokens: 7284 num samples: 3 num padding tokens: 841 - rank: 0 max len: 2428 min len: 1996 avg len: 2147.6666666666665 num_loss_counted_tokens: 2016 | |
total tokens: 6622 num samples: 2 num padding tokens: 51 - rank: 0 max len: 3311 min len: 3260 avg len: 3285.5 num_loss_counted_tokens: 220 | |
total tokens: 7606 num samples: 2 num padding tokens: 284 - rank: 0 max len: 3803 min len: 3519 avg len: 3661.0 num_loss_counted_tokens: 239 | |
total tokens: 8021 num samples: 13 num padding tokens: 1426 - rank: 5 max len: 617 min len: 421 avg len: 507.3076923076923 num_loss_counted_tokens: 4445 | |
total tokens: 7548 num samples: 12 num padding tokens: 1418 - rank: 6 max len: 629 min len: 356 avg len: 510.8333333333333 num_loss_counted_tokens: 4627 | |
total tokens: 7623 num samples: 9 num padding tokens: 977 - rank: 4 max len: 847 min len: 676 avg len: 738.4444444444445 num_loss_counted_tokens: 4362 | |
total tokens: 920 num samples: 8 num padding tokens: 143 - rank: 7 max len: 115 min len: 80 avg len: 97.125 num_loss_counted_tokens: 174 | |
total tokens: 7865 num samples: 11 num padding tokens: 678 - rank: 4 max len: 715 min len: 597 avg len: 653.3636363636364 num_loss_counted_tokens: 4917 | |
total tokens: 7890 num samples: 30 num padding tokens: 2636 - rank: 7 max len: 263 min len: 81 avg len: 175.13333333333333 num_loss_counted_tokens: 2257 | |
total tokens: 8060 num samples: 20 num padding tokens: 1821 - rank: 6 max len: 403 min len: 229 avg len: 311.95 num_loss_counted_tokens: 3851 | |
total tokens: 7644 num samples: 14 num padding tokens: 1953 - rank: 6 max len: 546 min len: 313 avg len: 406.5 num_loss_counted_tokens: 3666 | |
total tokens: 7568 num samples: 8 num padding tokens: 610 - rank: 3 max len: 946 min len: 809 avg len: 869.75 num_loss_counted_tokens: 4321 | |
total tokens: 7923 num samples: 19 num padding tokens: 1491 - rank: 6 max len: 417 min len: 251 avg len: 338.5263157894737 num_loss_counted_tokens: 3342 | |
total tokens: 7767 num samples: 9 num padding tokens: 568 - rank: 4 max len: 863 min len: 734 avg len: 799.8888888888889 num_loss_counted_tokens: 4720 | |
total tokens: 6913 num samples: 31 num padding tokens: 1814 - rank: 7 max len: 223 min len: 81 avg len: 164.48387096774192 num_loss_counted_tokens: 2029 | |
total tokens: 8041 num samples: 17 num padding tokens: 2270 - rank: 6 max len: 473 min len: 266 avg len: 339.47058823529414 num_loss_counted_tokens: 3062 | |
total tokens: 7710 num samples: 3 num padding tokens: 1745 - rank: 0 max len: 2570 min len: 1545 avg len: 1988.3333333333333 num_loss_counted_tokens: 932 | |
total tokens: 8109 num samples: 9 num padding tokens: 1178 - rank: 4 max len: 901 min len: 683 avg len: 770.1111111111111 num_loss_counted_tokens: 4382 | |
total tokens: 7461 num samples: 9 num padding tokens: 1078 - rank: 4 max len: 829 min len: 583 avg len: 709.2222222222222 num_loss_counted_tokens: 4008 | |
total tokens: 8028 num samples: 18 num padding tokens: 1647 - rank: 6 max len: 446 min len: 269 avg len: 354.5 num_loss_counted_tokens: 3480 | |
total tokens: 8010 num samples: 10 num padding tokens: 847 - rank: 4 max len: 801 min len: 646 avg len: 716.3 num_loss_counted_tokens: 5137 | |
total tokens: 7812 num samples: 28 num padding tokens: 3453 - rank: 7 max len: 279 min len: 77 avg len: 155.67857142857142 num_loss_counted_tokens: 1761 | |
total tokens: 7791 num samples: 21 num padding tokens: 1808 - rank: 6 max len: 371 min len: 232 avg len: 284.9047619047619 num_loss_counted_tokens: 2895 | |
total tokens: 6410 num samples: 2 num padding tokens: 76 - rank: 0 max len: 3205 min len: 3129 avg len: 3167.0 num_loss_counted_tokens: 169 | |
total tokens: 7540 num samples: 26 num padding tokens: 3128 - rank: 7 max len: 290 min len: 81 avg len: 169.69230769230768 num_loss_counted_tokens: 1955 | |
total tokens: 7980 num samples: 35 num padding tokens: 2701 - rank: 7 max len: 228 min len: 76 avg len: 150.82857142857142 num_loss_counted_tokens: 1955 | |
total tokens: 7116 num samples: 6 num padding tokens: 825 - rank: 2 max len: 1186 min len: 937 avg len: 1048.5 num_loss_counted_tokens: 3033 | |
total tokens: 7714 num samples: 19 num padding tokens: 1776 - rank: 6 max len: 406 min len: 218 avg len: 312.5263157894737 num_loss_counted_tokens: 3452 | |
total tokens: 6944 num samples: 28 num padding tokens: 2075 - rank: 7 max len: 248 min len: 85 avg len: 173.89285714285714 num_loss_counted_tokens: 2194 | |
total tokens: 6546 num samples: 3 num padding tokens: 491 - rank: 0 max len: 2182 min len: 1721 avg len: 2018.3333333333333 num_loss_counted_tokens: 1774 | |
total tokens: 8090 num samples: 10 num padding tokens: 775 - rank: 4 max len: 809 min len: 668 avg len: 731.5 num_loss_counted_tokens: 4241 | |
total tokens: 6475 num samples: 25 num padding tokens: 2566 - rank: 7 max len: 259 min len: 80 avg len: 156.36 num_loss_counted_tokens: 1562 | |
total tokens: 7461 num samples: 9 num padding tokens: 615 - rank: 4 max len: 829 min len: 730 avg len: 760.6666666666666 num_loss_counted_tokens: 5504 | |
total tokens: 6972 num samples: 28 num padding tokens: 2290 - rank: 7 max len: 249 min len: 78 avg len: 167.21428571428572 num_loss_counted_tokens: 2098 | |
total tokens: 7434 num samples: 9 num padding tokens: 997 - rank: 4 max len: 826 min len: 634 avg len: 715.2222222222222 num_loss_counted_tokens: 4684 | |
total tokens: 7830 num samples: 9 num padding tokens: 687 - rank: 4 max len: 870 min len: 732 avg len: 793.6666666666666 num_loss_counted_tokens: 4598 | |
total tokens: 7868 num samples: 28 num padding tokens: 3286 - rank: 7 max len: 281 min len: 71 avg len: 163.64285714285714 num_loss_counted_tokens: 1693 | |
total tokens: 7700 num samples: 10 num padding tokens: 670 - rank: 4 max len: 770 min len: 672 avg len: 703.0 num_loss_counted_tokens: 3844 | |
total tokens: 8019 num samples: 11 num padding tokens: 675 - rank: 4 max len: 729 min len: 607 avg len: 667.6363636363636 num_loss_counted_tokens: 5829 | |
total tokens: 7824 num samples: 8 num padding tokens: 1092 - rank: 4 max len: 978 min len: 762 avg len: 841.5 num_loss_counted_tokens: 3700 | |
total tokens: 7890 num samples: 30 num padding tokens: 2695 - rank: 7 max len: 263 min len: 77 avg len: 173.16666666666666 num_loss_counted_tokens: 2423 | |
total tokens: 7668 num samples: 9 num padding tokens: 739 - rank: 4 max len: 852 min len: 673 avg len: 769.8888888888889 num_loss_counted_tokens: 3730 | |
total tokens: 7514 num samples: 26 num padding tokens: 3137 - rank: 7 max len: 289 min len: 81 avg len: 168.34615384615384 num_loss_counted_tokens: 1545 | |
total tokens: 7336 num samples: 28 num padding tokens: 3103 - rank: 7 max len: 262 min len: 76 avg len: 151.17857142857142 num_loss_counted_tokens: 1734 | |
total tokens: 7774 num samples: 23 num padding tokens: 3246 - rank: 7 max len: 338 min len: 79 avg len: 196.8695652173913 num_loss_counted_tokens: 2197 | |
total tokens: 8050 num samples: 35 num padding tokens: 2271 - rank: 7 max len: 230 min len: 71 avg len: 165.11428571428573 num_loss_counted_tokens: 2692 | |
total tokens: 7960 num samples: 10 num padding tokens: 1262 - rank: 4 max len: 796 min len: 576 avg len: 669.8 num_loss_counted_tokens: 5376 | |
total tokens: 7576 num samples: 8 num padding tokens: 1064 - rank: 4 max len: 947 min len: 736 avg len: 814.0 num_loss_counted_tokens: 2965 | |
total tokens: 4393 num samples: 23 num padding tokens: 1496 - rank: 7 max len: 191 min len: 75 avg len: 125.95652173913044 num_loss_counted_tokens: 1010 | |
total tokens: 7945 num samples: 35 num padding tokens: 2723 - rank: 7 max len: 227 min len: 78 avg len: 149.2 num_loss_counted_tokens: 2193 | |
total tokens: 7760 num samples: 10 num padding tokens: 746 - rank: 4 max len: 776 min len: 627 avg len: 701.4 num_loss_counted_tokens: 4720 | |
Per-token loss scaled by world size: 0.0004431476700119674Per-token loss scaled by world size: 0.0004245223826728761Per-token loss scaled by world size: 0.0004812271217815578Per-token loss scaled by world size: 0.0004481318756006658Per-token loss scaled by world size: 5.5284708651015535e-06Per-token loss scaled by world size: 0.0003835844690911472 | |
Epoch: 1, Step: 122, Rank: 0, loss = 0.014387845993041992 | |
Per-token loss scaled by world size: 3.5970988392364234e-05Epoch: 1, Step: 122, Rank: 5, loss = 1.1048195362091064 | |
Epoch: 1, Step: 122, Rank: 6, loss = 1.2523936033248901 | |
Epoch: 1, Step: 122, Rank: 3, loss = 0.9982785582542419 | |
Epoch: 1, Step: 122, Rank: 4, loss = 1.1532918214797974 | |
Epoch: 1, Step: 122, Rank: 7, loss = 1.166263222694397 | |
Epoch: 1, Step: 122, Rank: 1, loss = 0.09361449629068375 | |
Per-token loss scaled by world size: 0.00016121947555802763 | |
Epoch: 1, Step: 122, Rank: 2, loss = 0.41957369446754456 | |
total tokens: 7389 num samples: 9 num padding tokens: 463 - rank: 4 max len: 821 min len: 731 avg len: 769.5555555555555 num_loss_counted_tokens: 4484 | |
total tokens: 7920 num samples: 4 num padding tokens: 1041 - rank: 1 max len: 1980 min len: 1501 avg len: 1719.75 num_loss_counted_tokens: 3030 | |
total tokens: 7557 num samples: 11 num padding tokens: 1175 - rank: 5 max len: 687 min len: 471 avg len: 580.1818181818181 num_loss_counted_tokens: 3537 | |
{ | |
"epoch": 1, | |
"step": 122, | |
"rank": 0, | |
"loss": 0.014387845993041992, | |
"overall_throughput": 41.10590635951548, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.46029806137085, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20820, | |
"batch_size": 84, | |
"total_loss": 0.7753278613090515, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:32.058777" | |
} | |
total tokens: 6208 num samples: 2 num padding tokens: 1070 - rank: 0 max len: 3104 min len: 2034 avg len: 2569.0 num_loss_counted_tokens: 154 | |
total tokens: 7758 num samples: 6 num padding tokens: 429 - rank: 2 max len: 1293 min len: 1127 avg len: 1221.5 num_loss_counted_tokens: 4122 | |
total tokens: 7641 num samples: 27 num padding tokens: 2718 - rank: 7 max len: 283 min len: 90 avg len: 182.33333333333334 num_loss_counted_tokens: 2178 | |
total tokens: 7786 num samples: 17 num padding tokens: 1303 - rank: 6 max len: 458 min len: 306 avg len: 381.3529411764706 num_loss_counted_tokens: 3550 | |
total tokens: 7714 num samples: 7 num padding tokens: 1069 - rank: 3 max len: 1102 min len: 863 avg len: 949.2857142857143 num_loss_counted_tokens: 3857 | |
Per-token loss scaled by world size: 0.00028338973061181605Per-token loss scaled by world size: 0.00028131416183896363Per-token loss scaled by world size: 0.00031643020338378847 | |
Per-token loss scaled by world size: 2.5795485271373764e-05Per-token loss scaled by world size: 0.0003353776701260358Per-token loss scaled by world size: 3.061828238060116e-06 | |
Epoch: 1, Step: 123, Rank: 2, loss = 0.9252774715423584 | |
Epoch: 1, Step: 123, Rank: 6, loss = 0.9321042895317078Epoch: 1, Step: 123, Rank: 4, loss = 1.0407785177230835 | |
Per-token loss scaled by world size: 0.00020596390822902322Epoch: 1, Step: 123, Rank: 1, loss = 0.08484457433223724 | |
Epoch: 1, Step: 123, Rank: 0, loss = 0.010070735588669777 | |
Epoch: 1, Step: 123, Rank: 5, loss = 1.1030991077423096 | |
Per-token loss scaled by world size: 0.00040186592377722263 | |
Epoch: 1, Step: 123, Rank: 3, loss = 1.3217872381210327 | |
Epoch: 1, Step: 123, Rank: 7, loss = 0.6774410605430603 | |
total tokens: 7158 num samples: 3 num padding tokens: 1822 - rank: 1 max len: 2386 min len: 1371 avg len: 1778.6666666666667 num_loss_counted_tokens: 859 | |
total tokens: 8090 num samples: 10 num padding tokens: 658 - rank: 4 max len: 809 min len: 681 avg len: 743.2 num_loss_counted_tokens: 4692 | |
{ | |
"epoch": 1, | |
"step": 123, | |
"rank": 0, | |
"loss": 0.010070735588669777, | |
"overall_throughput": 41.52344953508732, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.238781452178955, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26313, | |
"batch_size": 94, | |
"total_loss": 0.7619253396987915, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:34.606466" | |
} | |
total tokens: 7752 num samples: 8 num padding tokens: 561 - rank: 3 max len: 969 min len: 810 avg len: 898.875 num_loss_counted_tokens: 4165 | |
total tokens: 8109 num samples: 17 num padding tokens: 1626 - rank: 6 max len: 477 min len: 291 avg len: 381.3529411764706 num_loss_counted_tokens: 4491 | |
total tokens: 7248 num samples: 2 num padding tokens: 588 - rank: 0 max len: 3624 min len: 3036 avg len: 3330.0 num_loss_counted_tokens: 185 | |
total tokens: 7836 num samples: 6 num padding tokens: 921 - rank: 2 max len: 1306 min len: 997 avg len: 1152.5 num_loss_counted_tokens: 3095 | |
total tokens: 7480 num samples: 11 num padding tokens: 811 - rank: 5 max len: 680 min len: 536 avg len: 606.2727272727273 num_loss_counted_tokens: 4943 | |
total tokens: 7614 num samples: 27 num padding tokens: 2676 - rank: 7 max len: 282 min len: 88 avg len: 182.88888888888889 num_loss_counted_tokens: 2334 | |
Per-token loss scaled by world size: 0.00031184396357275546Per-token loss scaled by world size: 0.0001448883704142645Per-token loss scaled by world size: 0.0002452973276376724Per-token loss scaled by world size: 0.00019676386727951467Per-token loss scaled by world size: 0.00030535017140209675Per-token loss scaled by world size: 1.8834512047760654e-06 | |
Per-token loss scaled by world size: 0.00019069209520239383 | |
Epoch: 1, Step: 124, Rank: 6, loss = 0.9844914078712463 | |
Epoch: 1, Step: 124, Rank: 1, loss = 0.45741257071495056 | |
Epoch: 1, Step: 124, Rank: 3, loss = 0.9639905095100403Epoch: 1, Step: 124, Rank: 4, loss = 0.7744036912918091 | |
Epoch: 1, Step: 124, Rank: 7, loss = 0.6211835145950317Epoch: 1, Step: 124, Rank: 0, loss = 0.005946055520325899Epoch: 1, Step: 124, Rank: 2, loss = 0.60201495885849 | |
Per-token loss scaled by world size: 0.00031234745983965695 | |
Epoch: 1, Step: 124, Rank: 5, loss = 0.9860809445381165 | |
total tokens: 7335 num samples: 3 num padding tokens: 518 - rank: 1 max len: 2445 min len: 1952 avg len: 2272.3333333333335 num_loss_counted_tokens: 357 | |
total tokens: 7632 num samples: 8 num padding tokens: 1110 - rank: 4 max len: 954 min len: 772 avg len: 815.25 num_loss_counted_tokens: 4352 | |
total tokens: 7540 num samples: 10 num padding tokens: 877 - rank: 5 max len: 754 min len: 579 avg len: 666.3 num_loss_counted_tokens: 4493 | |
{ | |
"epoch": 1, | |
"step": 124, | |
"rank": 0, | |
"loss": 0.005946055520325899, | |
"overall_throughput": 42.28005677176176, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.3601393699646, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25256, | |
"batch_size": 81, | |
"total_loss": 0.6744404435157776, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:37.110671" | |
} | |
total tokens: 7146 num samples: 6 num padding tokens: 707 - rank: 3 max len: 1191 min len: 954 avg len: 1073.1666666666667 num_loss_counted_tokens: 3282 | |
total tokens: 6650 num samples: 2 num padding tokens: 396 - rank: 0 max len: 3325 min len: 2929 avg len: 3127.0 num_loss_counted_tokens: 164 | |
total tokens: 7188 num samples: 4 num padding tokens: 897 - rank: 2 max len: 1797 min len: 1207 avg len: 1572.75 num_loss_counted_tokens: 3275 | |
total tokens: 7749 num samples: 27 num padding tokens: 3523 - rank: 7 max len: 287 min len: 78 avg len: 156.5185185185185 num_loss_counted_tokens: 1794 | |
total tokens: 7924 num samples: 14 num padding tokens: 2249 - rank: 6 max len: 566 min len: 291 avg len: 405.35714285714283 num_loss_counted_tokens: 3760 | |
Per-token loss scaled by world size: 0.0005536731332540512Per-token loss scaled by world size: 0.0003451017546467483Per-token loss scaled by world size: 8.309840632136911e-05Per-token loss scaled by world size: 0.00047730450751259923Per-token loss scaled by world size: 0.0006338073872029781 | |
Per-token loss scaled by world size: 3.222640589228831e-05Per-token loss scaled by world size: 5.77377568333759e-06 | |
Epoch: 1, Step: 125, Rank: 3, loss = 0.21387451887130737Epoch: 1, Step: 125, Rank: 7, loss = 0.8882056474685669 | |
Epoch: 1, Step: 125, Rank: 4, loss = 1.6312617063522339Epoch: 1, Step: 125, Rank: 2, loss = 1.425016164779663Epoch: 1, Step: 125, Rank: 1, loss = 0.014860255643725395Epoch: 1, Step: 125, Rank: 5, loss = 1.2284624576568604 | |
Epoch: 1, Step: 125, Rank: 0, loss = 0.08294271677732468 | |
Per-token loss scaled by world size: 0.0003464070614427328 | |
Epoch: 1, Step: 125, Rank: 6, loss = 0.8915651440620422 | |
total tokens: 7860 num samples: 4 num padding tokens: 1157 - rank: 1 max len: 1965 min len: 1442 avg len: 1675.75 num_loss_counted_tokens: 559 | |
total tokens: 7632 num samples: 9 num padding tokens: 954 - rank: 4 max len: 848 min len: 605 avg len: 742.0 num_loss_counted_tokens: 4296 | |
{ | |
"epoch": 1, | |
"step": 125, | |
"rank": 0, | |
"loss": 0.08294271677732468, | |
"overall_throughput": 42.30731411051347, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.3255033493042, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20590, | |
"batch_size": 94, | |
"total_loss": 0.7970236539840698, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:39.613939" | |
} | |
total tokens: 7974 num samples: 18 num padding tokens: 1582 - rank: 6 max len: 443 min len: 282 avg len: 355.1111111111111 num_loss_counted_tokens: 3319 | |
total tokens: 8033 num samples: 29 num padding tokens: 2797 - rank: 7 max len: 277 min len: 75 avg len: 180.55172413793105 num_loss_counted_tokens: 2279 | |
total tokens: 7904 num samples: 8 num padding tokens: 432 - rank: 3 max len: 988 min len: 857 avg len: 934.0 num_loss_counted_tokens: 3900 | |
total tokens: 6880 num samples: 5 num padding tokens: 746 - rank: 2 max len: 1376 min len: 1087 avg len: 1226.8 num_loss_counted_tokens: 1611 | |
total tokens: 7629 num samples: 3 num padding tokens: 752 - rank: 0 max len: 2543 min len: 1988 avg len: 2292.3333333333335 num_loss_counted_tokens: 470 | |
total tokens: 7813 num samples: 13 num padding tokens: 1168 - rank: 5 max len: 601 min len: 444 avg len: 511.15384615384613 num_loss_counted_tokens: 3836 | |
Per-token loss scaled by world size: 0.00016968029376585037Per-token loss scaled by world size: 0.00043697707587853074Per-token loss scaled by world size: 0.00022829265799373388Per-token loss scaled by world size: 0.00044452777365222573 | |
Per-token loss scaled by world size: 0.00034727680031210184 | |
Per-token loss scaled by world size: 1.7178894040625892e-06Per-token loss scaled by world size: 0.0002958408440463245 | |
Epoch: 1, Step: 126, Rank: 5, loss = 1.2159979343414307 | |
Epoch: 1, Step: 126, Rank: 2, loss = 0.6352813839912415 | |
Epoch: 1, Step: 126, Rank: 6, loss = 1.2370096445083618 | |
Epoch: 1, Step: 126, Rank: 1, loss = 0.4721778333187103Epoch: 1, Step: 126, Rank: 4, loss = 0.9663845300674438 | |
Epoch: 1, Step: 126, Rank: 0, loss = 0.004780456889420748 | |
Epoch: 1, Step: 126, Rank: 7, loss = 0.8232511281967163 | |
Per-token loss scaled by world size: 0.0001971422607311979 | |
Epoch: 1, Step: 126, Rank: 3, loss = 0.5485976338386536 | |
total tokens: 5944 num samples: 2 num padding tokens: 412 - rank: 1 max len: 2972 min len: 2560 avg len: 2766.0 num_loss_counted_tokens: 816 | |
total tokens: 7911 num samples: 9 num padding tokens: 941 - rank: 4 max len: 879 min len: 688 avg len: 774.4444444444445 num_loss_counted_tokens: 5131 | |
{ | |
"epoch": 1, | |
"step": 126, | |
"rank": 0, | |
"loss": 0.004780456889420748, | |
"overall_throughput": 41.60576526421823, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.30566644668579, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22262, | |
"batch_size": 84, | |
"total_loss": 0.7379351258277893, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:42.158108" | |
} | |
total tokens: 8048 num samples: 16 num padding tokens: 2431 - rank: 6 max len: 503 min len: 247 avg len: 351.0625 num_loss_counted_tokens: 3335 | |
total tokens: 6345 num samples: 27 num padding tokens: 1990 - rank: 7 max len: 235 min len: 78 avg len: 161.2962962962963 num_loss_counted_tokens: 1823 | |
total tokens: 7467 num samples: 3 num padding tokens: 1752 - rank: 2 max len: 2489 min len: 1428 avg len: 1905.0 num_loss_counted_tokens: 538 | |
total tokens: 7710 num samples: 6 num padding tokens: 1580 - rank: 3 max len: 1285 min len: 891 avg len: 1021.6666666666666 num_loss_counted_tokens: 3990 | |
total tokens: 7513 num samples: 11 num padding tokens: 964 - rank: 5 max len: 683 min len: 534 avg len: 595.3636363636364 num_loss_counted_tokens: 4084 | |
total tokens: 5974 num samples: 2 num padding tokens: 4 - rank: 0 max len: 2987 min len: 2983 avg len: 2985.0 num_loss_counted_tokens: 160 | |
Per-token loss scaled by world size: 0.00019664443971123546Per-token loss scaled by world size: 0.00033893511863425374 | |
Per-token loss scaled by world size: 0.0002353027812205255Per-token loss scaled by world size: 0.00019304313173051924 | |
Per-token loss scaled by world size: 0.00022410067322198302 | |
Per-token loss scaled by world size: 0.00028643259429372847Per-token loss scaled by world size: 1.9502303985063918e-05 | |
Epoch: 1, Step: 127, Rank: 5, loss = 1.0451911687850952 | |
Epoch: 1, Step: 127, Rank: 3, loss = 0.6064022779464722 | |
Epoch: 1, Step: 127, Rank: 4, loss = 0.5952967405319214 | |
Epoch: 1, Step: 127, Rank: 1, loss = 0.7256149649620056 | |
Epoch: 1, Step: 127, Rank: 2, loss = 0.6910704374313354 | |
Epoch: 1, Step: 127, Rank: 0, loss = 0.060140229761600494 | |
Epoch: 1, Step: 127, Rank: 7, loss = 0.8832864761352539 | |
Per-token loss scaled by world size: 0.00036767972051166 | |
Epoch: 1, Step: 127, Rank: 6, loss = 1.133832335472107 | |
total tokens: 7504 num samples: 8 num padding tokens: 911 - rank: 4 max len: 938 min len: 741 avg len: 824.125 num_loss_counted_tokens: 5191 | |
total tokens: 7206 num samples: 3 num padding tokens: 904 - rank: 1 max len: 2402 min len: 1894 avg len: 2100.6666666666665 num_loss_counted_tokens: 932 | |
total tokens: 7887 num samples: 11 num padding tokens: 1542 - rank: 5 max len: 717 min len: 395 avg len: 576.8181818181819 num_loss_counted_tokens: 4690 | |
{ | |
"epoch": 1, | |
"step": 127, | |
"rank": 0, | |
"loss": 0.060140229761600494, | |
"overall_throughput": 42.34427644874209, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.374258518218994, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24670, | |
"batch_size": 85, | |
"total_loss": 0.7176043391227722, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:44.674459" | |
} | |
total tokens: 7096 num samples: 4 num padding tokens: 561 - rank: 2 max len: 1774 min len: 1529 avg len: 1633.75 num_loss_counted_tokens: 1104 | |
total tokens: 7780 num samples: 20 num padding tokens: 1756 - rank: 6 max len: 389 min len: 236 avg len: 301.2 num_loss_counted_tokens: 2929 | |
total tokens: 6770 num samples: 5 num padding tokens: 936 - rank: 3 max len: 1354 min len: 956 avg len: 1166.8 num_loss_counted_tokens: 3115 | |
total tokens: 6110 num samples: 26 num padding tokens: 1710 - rank: 7 max len: 235 min len: 78 avg len: 169.23076923076923 num_loss_counted_tokens: 1981 | |
total tokens: 5920 num samples: 2 num padding tokens: 555 - rank: 0 max len: 2960 min len: 2405 avg len: 2682.5 num_loss_counted_tokens: 172 | |
Per-token loss scaled by world size: 0.0008036900544539094Per-token loss scaled by world size: 0.0005513833602890372Per-token loss scaled by world size: 0.0007896597380749881Per-token loss scaled by world size: 9.589117689756677e-05 | |
Per-token loss scaled by world size: 5.089726073492784e-06 | |
Per-token loss scaled by world size: 1.1957185051869601e-05 | |
Epoch: 1, Step: 128, Rank: 6, loss = 1.6164215803146362Epoch: 1, Step: 128, Rank: 2, loss = 0.19286112487316132 | |
Epoch: 1, Step: 128, Rank: 5, loss = 1.5882031917572021 | |
Epoch: 1, Step: 128, Rank: 4, loss = 1.108969807624817 | |
Per-token loss scaled by world size: 5.8463097957428545e-05 | |
Epoch: 1, Step: 128, Rank: 1, loss = 0.010236711241304874 | |
Epoch: 1, Step: 128, Rank: 0, loss = 0.02404888905584812 | |
Per-token loss scaled by world size: 0.00048682422493584454 | |
Epoch: 1, Step: 128, Rank: 3, loss = 0.9791252017021179 | |
Epoch: 1, Step: 128, Rank: 7, loss = 0.11758390814065933 | |
total tokens: 6630 num samples: 26 num padding tokens: 2393 - rank: 7 max len: 255 min len: 78 avg len: 162.96153846153845 num_loss_counted_tokens: 1758 | |
total tokens: 7601 num samples: 11 num padding tokens: 831 - rank: 4 max len: 691 min len: 561 avg len: 615.4545454545455 num_loss_counted_tokens: 4864 | |
total tokens: 7125 num samples: 5 num padding tokens: 1532 - rank: 1 max len: 1425 min len: 985 avg len: 1118.6 num_loss_counted_tokens: 3514 | |
{ | |
"epoch": 1, | |
"step": 128, | |
"rank": 0, | |
"loss": 0.02404888905584812, | |
"overall_throughput": 42.28494221885827, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.05379819869995, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 16090, | |
"batch_size": 72, | |
"total_loss": 0.7046812772750854, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:47.176000" | |
} | |
total tokens: 5476 num samples: 2 num padding tokens: 764 - rank: 0 max len: 2738 min len: 1974 avg len: 2356.0 num_loss_counted_tokens: 216 | |
total tokens: 7714 num samples: 14 num padding tokens: 571 - rank: 5 max len: 551 min len: 429 avg len: 510.2142857142857 num_loss_counted_tokens: 5114 | |
total tokens: 7731 num samples: 9 num padding tokens: 842 - rank: 3 max len: 859 min len: 715 avg len: 765.4444444444445 num_loss_counted_tokens: 4877 | |
total tokens: 7923 num samples: 19 num padding tokens: 1796 - rank: 6 max len: 417 min len: 256 avg len: 322.4736842105263 num_loss_counted_tokens: 3232 | |
total tokens: 7840 num samples: 8 num padding tokens: 324 - rank: 2 max len: 980 min len: 894 avg len: 939.5 num_loss_counted_tokens: 5169 | |
Per-token loss scaled by world size: 0.0001966664713108912Per-token loss scaled by world size: 0.0001095464758691378Per-token loss scaled by world size: 0.00030617474112659693Per-token loss scaled by world size: 0.0003008927742484957 | |
Per-token loss scaled by world size: 0.0001922248484333977Per-token loss scaled by world size: 6.467673188126355e-07Per-token loss scaled by world size: 0.00025096320314332843 | |
Epoch: 1, Step: 129, Rank: 6, loss = 1.0479342937469482Epoch: 1, Step: 129, Rank: 4, loss = 1.066330075263977Epoch: 1, Step: 129, Rank: 2, loss = 0.3815229833126068 | |
Epoch: 1, Step: 129, Rank: 1, loss = 0.6849401593208313 | |
Epoch: 1, Step: 129, Rank: 0, loss = 0.0022525289095938206 | |
Epoch: 1, Step: 129, Rank: 7, loss = 0.6694710850715637 | |
Per-token loss scaled by world size: 0.00032839240157045424 | |
Epoch: 1, Step: 129, Rank: 5, loss = 1.14370858669281 | |
Epoch: 1, Step: 129, Rank: 3, loss = 0.8740420937538147 | |
total tokens: 6094 num samples: 2 num padding tokens: 254 - rank: 1 max len: 3047 min len: 2793 avg len: 2920.0 num_loss_counted_tokens: 165 | |
total tokens: 7068 num samples: 6 num padding tokens: 383 - rank: 4 max len: 1178 min len: 1029 avg len: 1114.1666666666667 num_loss_counted_tokens: 3758 | |
{ | |
"epoch": 1, | |
"step": 129, | |
"rank": 0, | |
"loss": 0.0022525289095938206, | |
"overall_throughput": 41.47733724327907, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.227038383483887, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 27862, | |
"batch_size": 83, | |
"total_loss": 0.7337751984596252, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:49.705725" | |
} | |
total tokens: 8016 num samples: 16 num padding tokens: 1518 - rank: 6 max len: 501 min len: 265 avg len: 406.125 num_loss_counted_tokens: 3598 | |
total tokens: 7964 num samples: 4 num padding tokens: 776 - rank: 3 max len: 1991 min len: 1616 avg len: 1797.0 num_loss_counted_tokens: 415 | |
total tokens: 6264 num samples: 24 num padding tokens: 2144 - rank: 7 max len: 261 min len: 85 avg len: 171.66666666666666 num_loss_counted_tokens: 1744 | |
total tokens: 7974 num samples: 2 num padding tokens: 703 - rank: 0 max len: 3987 min len: 3284 avg len: 3635.5 num_loss_counted_tokens: 349 | |
total tokens: 8073 num samples: 9 num padding tokens: 2207 - rank: 5 max len: 897 min len: 515 avg len: 651.7777777777778 num_loss_counted_tokens: 4298 | |
total tokens: 7056 num samples: 3 num padding tokens: 377 - rank: 2 max len: 2352 min len: 2050 avg len: 2226.3333333333335 num_loss_counted_tokens: 2384 | |
Per-token loss scaled by world size: 0.0002643604821059853Per-token loss scaled by world size: 0.000382772006560117Per-token loss scaled by world size: 0.000271481869276613 | |
Per-token loss scaled by world size: 3.5156226658727974e-06 | |
Per-token loss scaled by world size: 6.239629328774754e-07 | |
Per-token loss scaled by world size: 0.00018353613268118352 | |
Epoch: 1, Step: 130, Rank: 5, loss = 1.2674537897109985 | |
Epoch: 1, Step: 130, Rank: 6, loss = 0.8989443778991699Epoch: 1, Step: 130, Rank: 0, loss = 0.011641105636954308 | |
Epoch: 1, Step: 130, Rank: 2, loss = 0.8753636479377747 | |
Per-token loss scaled by world size: 0.0003246103588026017Epoch: 1, Step: 130, Rank: 1, loss = 0.0020660972222685814 | |
Epoch: 1, Step: 130, Rank: 7, loss = 0.6077340245246887 | |
Per-token loss scaled by world size: 0.00032646721228957176 | |
Epoch: 1, Step: 130, Rank: 4, loss = 1.0748660564422607 | |
Epoch: 1, Step: 130, Rank: 3, loss = 1.0810145139694214 | |
total tokens: 7672 num samples: 7 num padding tokens: 936 - rank: 4 max len: 1096 min len: 867 avg len: 962.2857142857143 num_loss_counted_tokens: 3200 | |
total tokens: 6610 num samples: 2 num padding tokens: 448 - rank: 1 max len: 3305 min len: 2857 avg len: 3081.0 num_loss_counted_tokens: 155 | |
total tokens: 7690 num samples: 10 num padding tokens: 781 - rank: 5 max len: 769 min len: 604 avg len: 690.9 num_loss_counted_tokens: 5078 | |
{ | |
"epoch": 1, | |
"step": 130, | |
"rank": 0, | |
"loss": 0.011641105636954308, | |
"overall_throughput": 41.55595823632429, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.426838874816895, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26490, | |
"batch_size": 77, | |
"total_loss": 0.7273854613304138, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:52.249145" | |
} | |
total tokens: 5650 num samples: 2 num padding tokens: 223 - rank: 2 max len: 2825 min len: 2602 avg len: 2713.5 num_loss_counted_tokens: 210 | |
total tokens: 4070 num samples: 1 num padding tokens: 0 - rank: 0 max len: 4070 min len: 4070 avg len: 4070.0 num_loss_counted_tokens: 1038 | |
total tokens: 8030 num samples: 5 num padding tokens: 1189 - rank: 3 max len: 1606 min len: 1172 avg len: 1368.2 num_loss_counted_tokens: 2403 | |
total tokens: 5180 num samples: 20 num padding tokens: 2159 - rank: 7 max len: 259 min len: 76 avg len: 151.05 num_loss_counted_tokens: 1160 | |
total tokens: 7800 num samples: 13 num padding tokens: 2266 - rank: 6 max len: 600 min len: 271 avg len: 425.6923076923077 num_loss_counted_tokens: 3635 | |
Per-token loss scaled by world size: 0.0001472200092393905Per-token loss scaled by world size: 0.00025629153242334723Per-token loss scaled by world size: 0.0002756573085207492Per-token loss scaled by world size: 0.00023288748343475163Per-token loss scaled by world size: 0.0004771172534674406 | |
Per-token loss scaled by world size: 1.5136585034269956e-06 | |
Epoch: 1, Step: 131, Rank: 2, loss = 0.8319223523139954Epoch: 1, Step: 131, Rank: 6, loss = 0.894783616065979 | |
Epoch: 1, Step: 131, Rank: 3, loss = 0.755952775478363Epoch: 1, Step: 131, Rank: 4, loss = 1.5487226247787476 | |
Epoch: 1, Step: 131, Rank: 1, loss = 0.4778761565685272 | |
Per-token loss scaled by world size: 0.0003685416013468057 | |
Epoch: 1, Step: 131, Rank: 0, loss = 0.004913335666060448 | |
Epoch: 1, Step: 131, Rank: 5, loss = 1.1962860822677612 | |
Per-token loss scaled by world size: 0.0002723717479966581 | |
Epoch: 1, Step: 131, Rank: 7, loss = 0.8841187357902527 | |
total tokens: 7064 num samples: 4 num padding tokens: 383 - rank: 1 max len: 1766 min len: 1494 avg len: 1670.25 num_loss_counted_tokens: 2269 | |
total tokens: 7308 num samples: 9 num padding tokens: 754 - rank: 4 max len: 812 min len: 666 avg len: 728.2222222222222 num_loss_counted_tokens: 4526 | |
{ | |
"epoch": 1, | |
"step": 131, | |
"rank": 0, | |
"loss": 0.004913335666060448, | |
"overall_throughput": 42.206020029222465, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.24073839187622, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25968, | |
"batch_size": 101, | |
"total_loss": 0.8243219256401062, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:54.758478" | |
} | |
total tokens: 8040 num samples: 8 num padding tokens: 747 - rank: 3 max len: 1005 min len: 816 avg len: 911.625 num_loss_counted_tokens: 5982 | |
total tokens: 7415 num samples: 5 num padding tokens: 1415 - rank: 2 max len: 1483 min len: 1015 avg len: 1200.0 num_loss_counted_tokens: 2673 | |
total tokens: 8106 num samples: 3 num padding tokens: 591 - rank: 0 max len: 2702 min len: 2308 avg len: 2505.0 num_loss_counted_tokens: 285 | |
total tokens: 7936 num samples: 32 num padding tokens: 2881 - rank: 7 max len: 248 min len: 70 avg len: 157.96875 num_loss_counted_tokens: 2168 | |
total tokens: 7824 num samples: 12 num padding tokens: 940 - rank: 5 max len: 652 min len: 506 avg len: 573.6666666666666 num_loss_counted_tokens: 5458 | |
total tokens: 7664 num samples: 16 num padding tokens: 1917 - rank: 6 max len: 479 min len: 262 avg len: 359.1875 num_loss_counted_tokens: 3606 | |
Per-token loss scaled by world size: 0.0003236977499909699Per-token loss scaled by world size: 0.0009724997216835618Per-token loss scaled by world size: 0.00038814375875517726Per-token loss scaled by world size: 0.0005891940090805292 | |
Per-token loss scaled by world size: 0.0001362602924928069 | |
Per-token loss scaled by world size: 5.815729309688322e-06 | |
Per-token loss scaled by world size: 8.939716281020083e-06 | |
Epoch: 1, Step: 132, Rank: 5, loss = 1.257119059562683Epoch: 1, Step: 132, Rank: 6, loss = 2.0749497413635254 | |
Epoch: 1, Step: 132, Rank: 4, loss = 0.6906496286392212Epoch: 1, Step: 132, Rank: 7, loss = 0.8281532526016235 | |
Epoch: 1, Step: 132, Rank: 2, loss = 0.012408585287630558 | |
Epoch: 1, Step: 132, Rank: 3, loss = 0.290728360414505 | |
Epoch: 1, Step: 132, Rank: 1, loss = 0.019074002280831337 | |
Per-token loss scaled by world size: 3.719959931913763e-05 | |
Epoch: 1, Step: 132, Rank: 0, loss = 0.07936999201774597 | |
total tokens: 7930 num samples: 10 num padding tokens: 877 - rank: 4 max len: 793 min len: 643 avg len: 705.3 num_loss_counted_tokens: 3435 | |
total tokens: 7288 num samples: 4 num padding tokens: 780 - rank: 1 max len: 1822 min len: 1515 avg len: 1627.0 num_loss_counted_tokens: 3100 | |
total tokens: 7105 num samples: 7 num padding tokens: 499 - rank: 3 max len: 1015 min len: 827 avg len: 943.7142857142857 num_loss_counted_tokens: 4654 | |
{ | |
"epoch": 1, | |
"step": 132, | |
"rank": 0, | |
"loss": 0.07936999201774597, | |
"overall_throughput": 42.231542345095384, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.476311683654785, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 17069, | |
"batch_size": 64, | |
"total_loss": 0.6565565466880798, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:57.301535" | |
} | |
total tokens: 7560 num samples: 30 num padding tokens: 2281 - rank: 7 max len: 252 min len: 77 avg len: 175.96666666666667 num_loss_counted_tokens: 2486 | |
total tokens: 7176 num samples: 6 num padding tokens: 666 - rank: 2 max len: 1196 min len: 1024 avg len: 1085.0 num_loss_counted_tokens: 3309 | |
total tokens: 5546 num samples: 2 num padding tokens: 144 - rank: 0 max len: 2773 min len: 2629 avg len: 2701.0 num_loss_counted_tokens: 194 | |
total tokens: 8046 num samples: 18 num padding tokens: 1959 - rank: 6 max len: 447 min len: 255 avg len: 338.1666666666667 num_loss_counted_tokens: 3390 | |
total tokens: 7656 num samples: 12 num padding tokens: 892 - rank: 5 max len: 638 min len: 458 avg len: 563.6666666666666 num_loss_counted_tokens: 4307 | |
Per-token loss scaled by world size: 0.00027533259708434343Per-token loss scaled by world size: 0.00012797772069461644Per-token loss scaled by world size: 0.00020723696798086166Per-token loss scaled by world size: 0.00024243281222879887Per-token loss scaled by world size: 0.00023833484738133848 | |
Per-token loss scaled by world size: 0.00026018035714514554 | |
Per-token loss scaled by world size: 0.00010569631558610126 | |
Epoch: 1, Step: 133, Rank: 2, loss = 0.7923144102096558 | |
Epoch: 1, Step: 133, Rank: 6, loss = 0.9153087735176086Epoch: 1, Step: 133, Rank: 1, loss = 0.42544591426849365Epoch: 1, Step: 133, Rank: 4, loss = 0.8059375882148743 | |
Epoch: 1, Step: 133, Rank: 3, loss = 0.8649370670318604 | |
Epoch: 1, Step: 133, Rank: 7, loss = 0.6889333724975586 | |
Epoch: 1, Step: 133, Rank: 0, loss = 0.35137417912483215 | |
Per-token loss scaled by world size: 0.0003980571636930108 | |
Epoch: 1, Step: 133, Rank: 5, loss = 1.323291301727295 | |
total tokens: 7461 num samples: 9 num padding tokens: 647 - rank: 4 max len: 829 min len: 685 avg len: 757.1111111111111 num_loss_counted_tokens: 3551 | |
total tokens: 7808 num samples: 4 num padding tokens: 946 - rank: 1 max len: 1952 min len: 1419 avg len: 1715.5 num_loss_counted_tokens: 1967 | |
total tokens: 8010 num samples: 30 num padding tokens: 2819 - rank: 7 max len: 267 min len: 85 avg len: 173.03333333333333 num_loss_counted_tokens: 2333 | |
{ | |
"epoch": 1, | |
"step": 133, | |
"rank": 0, | |
"loss": 0.35137417912483215, | |
"overall_throughput": 41.395595194588644, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.437856197357178, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26595, | |
"batch_size": 92, | |
"total_loss": 0.7709429264068604, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:53:59.817004" | |
} | |
total tokens: 6995 num samples: 5 num padding tokens: 1102 - rank: 2 max len: 1399 min len: 998 avg len: 1178.6 num_loss_counted_tokens: 2712 | |
total tokens: 7808 num samples: 8 num padding tokens: 442 - rank: 3 max len: 976 min len: 844 avg len: 920.75 num_loss_counted_tokens: 4240 | |
total tokens: 7686 num samples: 3 num padding tokens: 937 - rank: 0 max len: 2562 min len: 2050 avg len: 2249.6666666666665 num_loss_counted_tokens: 556 | |
total tokens: 7992 num samples: 18 num padding tokens: 1651 - rank: 6 max len: 444 min len: 270 avg len: 352.27777777777777 num_loss_counted_tokens: 3377 | |
total tokens: 7656 num samples: 12 num padding tokens: 1025 - rank: 5 max len: 638 min len: 445 avg len: 552.5833333333334 num_loss_counted_tokens: 4330 | |
Per-token loss scaled by world size: 0.0003683593822643161Per-token loss scaled by world size: 0.0001245876046596095Per-token loss scaled by world size: 0.0003223164821974933Per-token loss scaled by world size: 0.0003114262653980404Per-token loss scaled by world size: 0.0002396363124717027Per-token loss scaled by world size: 0.0002762637159321457 | |
Per-token loss scaled by world size: 7.41567782824859e-06 | |
Epoch: 1, Step: 134, Rank: 1, loss = 0.3786684572696686 | |
Epoch: 1, Step: 134, Rank: 6, loss = 0.9796406626701355 | |
Epoch: 1, Step: 134, Rank: 5, loss = 1.1195822954177856 | |
Epoch: 1, Step: 134, Rank: 7, loss = 0.9465411901473999 | |
Epoch: 1, Step: 134, Rank: 4, loss = 0.7283446192741394 | |
Epoch: 1, Step: 134, Rank: 3, loss = 0.8396689891815186 | |
Epoch: 1, Step: 134, Rank: 0, loss = 0.02253902517259121 | |
Per-token loss scaled by world size: 0.00010645172005752102 | |
Epoch: 1, Step: 134, Rank: 2, loss = 0.32354670763015747 | |
total tokens: 6520 num samples: 4 num padding tokens: 733 - rank: 1 max len: 1630 min len: 1198 avg len: 1446.75 num_loss_counted_tokens: 1720 | |
total tokens: 7600 num samples: 10 num padding tokens: 457 - rank: 4 max len: 760 min len: 692 avg len: 714.3 num_loss_counted_tokens: 4057 | |
{ | |
"epoch": 1, | |
"step": 134, | |
"rank": 0, | |
"loss": 0.02253902517259121, | |
"overall_throughput": 41.963973566813905, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.358724117279053, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24315, | |
"batch_size": 87, | |
"total_loss": 0.6673164963722229, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:02.337863" | |
} | |
total tokens: 7712 num samples: 16 num padding tokens: 2123 - rank: 6 max len: 482 min len: 243 avg len: 349.3125 num_loss_counted_tokens: 3585 | |
total tokens: 8064 num samples: 7 num padding tokens: 647 - rank: 2 max len: 1152 min len: 995 avg len: 1059.5714285714287 num_loss_counted_tokens: 4100 | |
total tokens: 6102 num samples: 27 num padding tokens: 2162 - rank: 7 max len: 226 min len: 78 avg len: 145.92592592592592 num_loss_counted_tokens: 1630 | |
total tokens: 7956 num samples: 12 num padding tokens: 1032 - rank: 5 max len: 663 min len: 497 avg len: 577.0 num_loss_counted_tokens: 5176 | |
total tokens: 7920 num samples: 8 num padding tokens: 902 - rank: 3 max len: 990 min len: 770 avg len: 877.25 num_loss_counted_tokens: 3876 | |
total tokens: 6987 num samples: 3 num padding tokens: 1078 - rank: 0 max len: 2329 min len: 1691 avg len: 1969.6666666666667 num_loss_counted_tokens: 437 | |
Per-token loss scaled by world size: 0.0005423504626378417Per-token loss scaled by world size: 0.00036555714905261993Per-token loss scaled by world size: 0.00022980774519965053Per-token loss scaled by world size: 2.886021502490621e-05Per-token loss scaled by world size: 2.577510167611763e-05Per-token loss scaled by world size: 0.00026930312742479146 | |
Per-token loss scaled by world size: 4.34833509643795e-06 | |
Epoch: 1, Step: 135, Rank: 3, loss = 0.5620235800743103 | |
Epoch: 1, Step: 135, Rank: 4, loss = 0.8940157294273376Epoch: 1, Step: 135, Rank: 2, loss = 0.07058126479387283Epoch: 1, Step: 135, Rank: 1, loss = 0.0630362331867218 | |
Epoch: 1, Step: 135, Rank: 6, loss = 1.3263858556747437 | |
Epoch: 1, Step: 135, Rank: 0, loss = 0.010634397156536579 | |
Epoch: 1, Step: 135, Rank: 7, loss = 0.658614456653595 | |
Per-token loss scaled by world size: 0.0009107645018957555 | |
Epoch: 1, Step: 135, Rank: 5, loss = 2.227388381958008 | |
total tokens: 6480 num samples: 3 num padding tokens: 750 - rank: 1 max len: 2160 min len: 1742 avg len: 1910.0 num_loss_counted_tokens: 614 | |
total tokens: 7314 num samples: 6 num padding tokens: 901 - rank: 4 max len: 1219 min len: 926 avg len: 1068.8333333333333 num_loss_counted_tokens: 4111 | |
{ | |
"epoch": 1, | |
"step": 135, | |
"rank": 0, | |
"loss": 0.010634397156536579, | |
"overall_throughput": 41.220052175245854, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.333390712738037, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19565, | |
"batch_size": 73, | |
"total_loss": 0.7265850305557251, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:04.904683" | |
} | |
total tokens: 7488 num samples: 9 num padding tokens: 392 - rank: 5 max len: 832 min len: 756 avg len: 788.4444444444445 num_loss_counted_tokens: 5496 | |
total tokens: 7700 num samples: 11 num padding tokens: 1846 - rank: 6 max len: 700 min len: 394 avg len: 532.1818181818181 num_loss_counted_tokens: 3763 | |
total tokens: 7185 num samples: 5 num padding tokens: 672 - rank: 3 max len: 1437 min len: 1230 avg len: 1302.6 num_loss_counted_tokens: 1982 | |
total tokens: 6648 num samples: 4 num padding tokens: 299 - rank: 2 max len: 1662 min len: 1515 avg len: 1587.25 num_loss_counted_tokens: 916 | |
total tokens: 7875 num samples: 21 num padding tokens: 3380 - rank: 7 max len: 375 min len: 88 avg len: 214.04761904761904 num_loss_counted_tokens: 2091 | |
total tokens: 5508 num samples: 2 num padding tokens: 33 - rank: 0 max len: 2754 min len: 2721 avg len: 2737.5 num_loss_counted_tokens: 386 | |
Per-token loss scaled by world size: 0.0004460816562641412Per-token loss scaled by world size: 0.0001522299717180431Per-token loss scaled by world size: 0.0003226569388061762Per-token loss scaled by world size: 0.0003637947083916515Per-token loss scaled by world size: 0.00024503390886820853 | |
Per-token loss scaled by world size: 5.745379894506186e-05 | |
Per-token loss scaled by world size: 6.490522537205834e-06 | |
Epoch: 1, Step: 136, Rank: 6, loss = 1.0003172159194946 | |
Epoch: 1, Step: 136, Rank: 5, loss = 1.3829646110534668Epoch: 1, Step: 136, Rank: 3, loss = 0.4719509482383728 | |
Epoch: 1, Step: 136, Rank: 1, loss = 0.17812113463878632Epoch: 1, Step: 136, Rank: 4, loss = 1.127854585647583 | |
Epoch: 1, Step: 136, Rank: 7, loss = 0.759666383266449 | |
Epoch: 1, Step: 136, Rank: 0, loss = 0.020122243091464043 | |
Per-token loss scaled by world size: 0.00021433050278574228 | |
Epoch: 1, Step: 136, Rank: 2, loss = 0.6644781231880188 | |
total tokens: 6660 num samples: 4 num padding tokens: 420 - rank: 1 max len: 1665 min len: 1335 avg len: 1560.0 num_loss_counted_tokens: 1802 | |
total tokens: 7997 num samples: 11 num padding tokens: 948 - rank: 4 max len: 727 min len: 535 avg len: 640.8181818181819 num_loss_counted_tokens: 5687 | |
{ | |
"epoch": 1, | |
"step": 136, | |
"rank": 0, | |
"loss": 0.020122243091464043, | |
"overall_throughput": 41.5055278365255, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.32311248779297, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24802, | |
"batch_size": 88, | |
"total_loss": 0.7006844282150269, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:07.453575" | |
} | |
total tokens: 8010 num samples: 15 num padding tokens: 963 - rank: 5 max len: 534 min len: 399 avg len: 469.8 num_loss_counted_tokens: 4616 | |
total tokens: 7308 num samples: 6 num padding tokens: 791 - rank: 2 max len: 1218 min len: 1008 avg len: 1086.1666666666667 num_loss_counted_tokens: 4945 | |
total tokens: 6356 num samples: 28 num padding tokens: 2338 - rank: 7 max len: 227 min len: 81 avg len: 143.5 num_loss_counted_tokens: 1107 | |
total tokens: 7782 num samples: 2 num padding tokens: 2057 - rank: 0 max len: 3891 min len: 1834 avg len: 2862.5 num_loss_counted_tokens: 230 | |
total tokens: 7752 num samples: 8 num padding tokens: 1153 - rank: 3 max len: 969 min len: 740 avg len: 824.875 num_loss_counted_tokens: 4517 | |
total tokens: 8085 num samples: 21 num padding tokens: 1827 - rank: 6 max len: 385 min len: 241 avg len: 298.0 num_loss_counted_tokens: 3422 | |
Per-token loss scaled by world size: 0.00032276863930746913Per-token loss scaled by world size: 0.0002541161666158587Per-token loss scaled by world size: 4.045515743200667e-05Per-token loss scaled by world size: 0.00033090231590904295 | |
Per-token loss scaled by world size: 0.0003559431352186948 | |
Per-token loss scaled by world size: 0.00022709915356244892 | |
Epoch: 1, Step: 137, Rank: 0, loss = 0.13576750457286835 | |
Epoch: 1, Step: 137, Rank: 6, loss = 1.1105082035064697Epoch: 1, Step: 137, Rank: 3, loss = 0.8528138995170593Per-token loss scaled by world size: 0.0001023018267005682 | |
Epoch: 1, Step: 137, Rank: 5, loss = 1.0832115411758423 | |
Epoch: 1, Step: 137, Rank: 4, loss = 1.1945451498031616 | |
Epoch: 1, Step: 137, Rank: 1, loss = 0.7621447443962097 | |
Per-token loss scaled by world size: 0.0002989015483763069 | |
Epoch: 1, Step: 137, Rank: 7, loss = 0.3433249294757843 | |
Epoch: 1, Step: 137, Rank: 2, loss = 1.0031136274337769 | |
total tokens: 6924 num samples: 4 num padding tokens: 1102 - rank: 1 max len: 1731 min len: 1172 avg len: 1455.5 num_loss_counted_tokens: 1807 | |
total tokens: 7890 num samples: 10 num padding tokens: 472 - rank: 4 max len: 789 min len: 675 avg len: 741.8 num_loss_counted_tokens: 3224 | |
total tokens: 5649 num samples: 21 num padding tokens: 2159 - rank: 7 max len: 269 min len: 72 avg len: 166.1904761904762 num_loss_counted_tokens: 1470 | |
{ | |
"epoch": 1, | |
"step": 137, | |
"rank": 0, | |
"loss": 0.13576750457286835, | |
"overall_throughput": 42.12198553508762, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.488715171813965, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26848, | |
"batch_size": 89, | |
"total_loss": 0.8106787204742432, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:09.964344" | |
} | |
total tokens: 7440 num samples: 8 num padding tokens: 452 - rank: 3 max len: 930 min len: 829 avg len: 873.5 num_loss_counted_tokens: 4943 | |
total tokens: 7932 num samples: 12 num padding tokens: 1107 - rank: 5 max len: 661 min len: 488 avg len: 568.75 num_loss_counted_tokens: 4833 | |
total tokens: 7648 num samples: 16 num padding tokens: 1451 - rank: 6 max len: 478 min len: 284 avg len: 387.3125 num_loss_counted_tokens: 3233 | |
total tokens: 5446 num samples: 2 num padding tokens: 973 - rank: 0 max len: 2723 min len: 1750 avg len: 2236.5 num_loss_counted_tokens: 235 | |
total tokens: 7763 num samples: 7 num padding tokens: 774 - rank: 2 max len: 1109 min len: 938 avg len: 998.4285714285714 num_loss_counted_tokens: 4812 | |
Per-token loss scaled by world size: 0.00015472256927751005Per-token loss scaled by world size: 0.00039065544842742383Per-token loss scaled by world size: 0.0003819867270067334Per-token loss scaled by world size: 0.00013601673708762974Per-token loss scaled by world size: 0.0002345130778849125 | |
Per-token loss scaled by world size: 9.365750884171575e-05 | |
Per-token loss scaled by world size: 0.00026829339913092554 | |
Epoch: 1, Step: 138, Rank: 5, loss = 1.27397620677948 | |
Epoch: 1, Step: 138, Rank: 1, loss = 0.5045696496963501Epoch: 1, Step: 138, Rank: 2, loss = 0.4435676038265228 | |
Epoch: 1, Step: 138, Rank: 4, loss = 1.2457064390182495 | |
Epoch: 1, Step: 138, Rank: 0, loss = 0.3054288327693939 | |
Epoch: 1, Step: 138, Rank: 7, loss = 0.8749383091926575Epoch: 1, Step: 138, Rank: 3, loss = 0.7647764682769775 | |
Per-token loss scaled by world size: 0.0003792343777604401 | |
Epoch: 1, Step: 138, Rank: 6, loss = 1.236730694770813 | |
total tokens: 7994 num samples: 7 num padding tokens: 1969 - rank: 4 max len: 1142 min len: 760 avg len: 860.7142857142857 num_loss_counted_tokens: 3971 | |
total tokens: 6921 num samples: 3 num padding tokens: 423 - rank: 1 max len: 2307 min len: 1952 avg len: 2166.0 num_loss_counted_tokens: 2301 | |
{ | |
"epoch": 1, | |
"step": 138, | |
"rank": 0, | |
"loss": 0.3054288327693939, | |
"overall_throughput": 42.2029043837429, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.34983253479004, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26089, | |
"batch_size": 100, | |
"total_loss": 0.8312118053436279, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:12.475862" | |
} | |
total tokens: 7876 num samples: 11 num padding tokens: 1537 - rank: 5 max len: 716 min len: 452 avg len: 576.2727272727273 num_loss_counted_tokens: 4021 | |
total tokens: 7676 num samples: 4 num padding tokens: 763 - rank: 2 max len: 1919 min len: 1594 avg len: 1728.25 num_loss_counted_tokens: 1719 | |
total tokens: 7115 num samples: 5 num padding tokens: 406 - rank: 3 max len: 1423 min len: 1187 avg len: 1341.8 num_loss_counted_tokens: 3469 | |
total tokens: 7202 num samples: 26 num padding tokens: 2407 - rank: 7 max len: 277 min len: 93 avg len: 184.42307692307693 num_loss_counted_tokens: 2108 | |
total tokens: 6352 num samples: 2 num padding tokens: 277 - rank: 0 max len: 3176 min len: 2899 avg len: 3037.5 num_loss_counted_tokens: 710 | |
total tokens: 8046 num samples: 18 num padding tokens: 1591 - rank: 6 max len: 447 min len: 283 avg len: 358.6111111111111 num_loss_counted_tokens: 3535 | |
Per-token loss scaled by world size: 0.0002807814453262836Per-token loss scaled by world size: 0.0002877341175917536Per-token loss scaled by world size: 0.00017392370500601828 | |
Per-token loss scaled by world size: 0.00026719356537796557Per-token loss scaled by world size: 0.00031913904240354896 | |
Per-token loss scaled by world size: 0.000377663760446012 | |
Per-token loss scaled by world size: 3.4834424695873167e-06 | |
Epoch: 1, Step: 139, Rank: 5, loss = 0.8960040807723999Epoch: 1, Step: 139, Rank: 3, loss = 0.5415984392166138 | |
Epoch: 1, Step: 139, Rank: 1, loss = 0.8743534088134766Epoch: 1, Step: 139, Rank: 6, loss = 0.9937989711761475 | |
Epoch: 1, Step: 139, Rank: 7, loss = 0.8320407271385193 | |
Epoch: 1, Step: 139, Rank: 0, loss = 0.010847439989447594 | |
Epoch: 1, Step: 139, Rank: 4, loss = 1.1760449409484863 | |
Per-token loss scaled by world size: 0.0001241332065546885 | |
Epoch: 1, Step: 139, Rank: 2, loss = 0.38655081391334534 | |
total tokens: 7483 num samples: 7 num padding tokens: 916 - rank: 4 max len: 1069 min len: 850 avg len: 938.1428571428571 num_loss_counted_tokens: 5047 | |
total tokens: 8067 num samples: 3 num padding tokens: 632 - rank: 1 max len: 2689 min len: 2157 avg len: 2478.3333333333335 num_loss_counted_tokens: 281 | |
{ | |
"epoch": 1, | |
"step": 139, | |
"rank": 0, | |
"loss": 0.010847439989447594, | |
"overall_throughput": 41.72034163026323, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.309249877929688, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24912, | |
"batch_size": 83, | |
"total_loss": 0.7139047980308533, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:15.011121" | |
} | |
total tokens: 7392 num samples: 4 num padding tokens: 517 - rank: 2 max len: 1848 min len: 1598 avg len: 1718.75 num_loss_counted_tokens: 1592 | |
total tokens: 3971 num samples: 19 num padding tokens: 1215 - rank: 7 max len: 209 min len: 77 avg len: 145.05263157894737 num_loss_counted_tokens: 1089 | |
total tokens: 7575 num samples: 5 num padding tokens: 836 - rank: 3 max len: 1515 min len: 1078 avg len: 1347.8 num_loss_counted_tokens: 3059 | |
total tokens: 7148 num samples: 2 num padding tokens: 850 - rank: 0 max len: 3574 min len: 2724 avg len: 3149.0 num_loss_counted_tokens: 226 | |
total tokens: 7308 num samples: 9 num padding tokens: 1380 - rank: 5 max len: 812 min len: 530 avg len: 658.6666666666666 num_loss_counted_tokens: 3339 | |
total tokens: 8048 num samples: 16 num padding tokens: 2308 - rank: 6 max len: 503 min len: 219 avg len: 358.75 num_loss_counted_tokens: 3793 | |
Per-token loss scaled by world size: 0.0002717878087423742Per-token loss scaled by world size: 0.00017612801457289606 | |
Per-token loss scaled by world size: 0.00045838873484171927Per-token loss scaled by world size: 0.00027124761254526675 | |
Per-token loss scaled by world size: 0.00033216923475265503Per-token loss scaled by world size: 2.2576082301384304e-06 | |
Per-token loss scaled by world size: 5.144028546055779e-05 | |
Epoch: 1, Step: 140, Rank: 2, loss = 0.5452042818069458 | |
Epoch: 1, Step: 140, Rank: 3, loss = 0.8413191437721252 | |
Epoch: 1, Step: 140, Rank: 5, loss = 1.4189423322677612 | |
Epoch: 1, Step: 140, Rank: 4, loss = 0.8396469950675964 | |
Epoch: 1, Step: 140, Rank: 0, loss = 0.0069884262047708035Epoch: 1, Step: 140, Rank: 7, loss = 1.028229832649231 | |
Epoch: 1, Step: 140, Rank: 1, loss = 0.15923340618610382 | |
Per-token loss scaled by world size: 0.0004096345801372081 | |
Epoch: 1, Step: 140, Rank: 6, loss = 1.2680238485336304 | |
total tokens: 7968 num samples: 8 num padding tokens: 1069 - rank: 4 max len: 996 min len: 754 avg len: 862.375 num_loss_counted_tokens: 4593 | |
total tokens: 7478 num samples: 2 num padding tokens: 921 - rank: 1 max len: 3739 min len: 2818 avg len: 3278.5 num_loss_counted_tokens: 165 | |
{ | |
"epoch": 1, | |
"step": 140, | |
"rank": 0, | |
"loss": 0.0069884262047708035, | |
"overall_throughput": 41.71879346229527, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.433530807495117, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24764, | |
"batch_size": 84, | |
"total_loss": 0.7634485363960266, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:17.546839" | |
} | |
total tokens: 7473 num samples: 3 num padding tokens: 2287 - rank: 2 max len: 2491 min len: 1276 avg len: 1728.6666666666667 num_loss_counted_tokens: 634 | |
total tokens: 7476 num samples: 28 num padding tokens: 2615 - rank: 7 max len: 267 min len: 72 avg len: 173.60714285714286 num_loss_counted_tokens: 2127 | |
total tokens: 7326 num samples: 6 num padding tokens: 603 - rank: 3 max len: 1221 min len: 1065 avg len: 1120.5 num_loss_counted_tokens: 2131 | |
total tokens: 7580 num samples: 2 num padding tokens: 14 - rank: 0 max len: 3790 min len: 3776 avg len: 3783.0 num_loss_counted_tokens: 616 | |
total tokens: 7390 num samples: 10 num padding tokens: 1181 - rank: 5 max len: 739 min len: 522 avg len: 620.9 num_loss_counted_tokens: 3575 | |
total tokens: 7740 num samples: 15 num padding tokens: 1604 - rank: 6 max len: 516 min len: 269 avg len: 409.06666666666666 num_loss_counted_tokens: 3156 | |
Per-token loss scaled by world size: 0.0002609801304060966Per-token loss scaled by world size: 0.0002962287690024823Per-token loss scaled by world size: 0.0003096856235060841Per-token loss scaled by world size: 0.00027970768860541284 | |
Per-token loss scaled by world size: 2.0839811440964695e-06 | |
Per-token loss scaled by world size: 0.0006479129078797996 | |
Per-token loss scaled by world size: 5.338866685633548e-06 | |
Epoch: 1, Step: 141, Rank: 4, loss = 0.7952631711959839 | |
Epoch: 1, Step: 141, Rank: 3, loss = 0.8313897848129272Epoch: 1, Step: 141, Rank: 7, loss = 0.7006337642669678 | |
Epoch: 1, Step: 141, Rank: 2, loss = 0.750910222530365 | |
Epoch: 1, Step: 141, Rank: 5, loss = 1.7394031286239624Epoch: 1, Step: 141, Rank: 0, loss = 0.005594708025455475 | |
Epoch: 1, Step: 141, Rank: 1, loss = 0.014332855120301247 | |
Per-token loss scaled by world size: 0.0004338203580118716 | |
Epoch: 1, Step: 141, Rank: 6, loss = 1.1646449565887451 | |
total tokens: 6904 num samples: 4 num padding tokens: 1250 - rank: 1 max len: 1726 min len: 1224 avg len: 1413.5 num_loss_counted_tokens: 3292 | |
total tokens: 7900 num samples: 10 num padding tokens: 629 - rank: 4 max len: 790 min len: 654 avg len: 727.1 num_loss_counted_tokens: 4621 | |
{ | |
"epoch": 1, | |
"step": 141, | |
"rank": 0, | |
"loss": 0.005594708025455475, | |
"overall_throughput": 43.02419669348744, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.362098217010498, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21477, | |
"batch_size": 74, | |
"total_loss": 0.7502715587615967, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:20.007363" | |
} | |
total tokens: 7999 num samples: 19 num padding tokens: 1858 - rank: 6 max len: 421 min len: 242 avg len: 323.2105263157895 num_loss_counted_tokens: 3598 | |
total tokens: 7648 num samples: 8 num padding tokens: 656 - rank: 3 max len: 956 min len: 812 avg len: 874.0 num_loss_counted_tokens: 5090 | |
total tokens: 6966 num samples: 6 num padding tokens: 548 - rank: 2 max len: 1161 min len: 993 avg len: 1069.6666666666667 num_loss_counted_tokens: 3065 | |
total tokens: 7440 num samples: 31 num padding tokens: 2495 - rank: 7 max len: 240 min len: 77 avg len: 159.51612903225808 num_loss_counted_tokens: 2088 | |
total tokens: 6690 num samples: 3 num padding tokens: 492 - rank: 0 max len: 2230 min len: 1959 avg len: 2066.0 num_loss_counted_tokens: 280 | |
total tokens: 7728 num samples: 12 num padding tokens: 1190 - rank: 5 max len: 644 min len: 425 avg len: 544.8333333333334 num_loss_counted_tokens: 3724 | |
Per-token loss scaled by world size: 0.0005912419874221087Per-token loss scaled by world size: 2.4495158868376166e-05Per-token loss scaled by world size: 0.0005119486595503986Per-token loss scaled by world size: 4.400004763738252e-05Per-token loss scaled by world size: 0.00011134906526422128Per-token loss scaled by world size: 0.0005244921194389462 | |
Per-token loss scaled by world size: 0.0003919812443200499 | |
Epoch: 1, Step: 142, Rank: 1, loss = 0.06206461042165756Epoch: 1, Step: 142, Rank: 6, loss = 1.297149896621704Epoch: 1, Step: 142, Rank: 5, loss = 1.4980593919754028 | |
Epoch: 1, Step: 142, Rank: 0, loss = 0.11148512363433838 | |
Epoch: 1, Step: 142, Rank: 2, loss = 0.2821306884288788 | |
Epoch: 1, Step: 142, Rank: 7, loss = 0.9931824803352356 | |
Epoch: 1, Step: 142, Rank: 4, loss = 1.3289319276809692 | |
Per-token loss scaled by world size: 0.00033672110293991864 | |
Epoch: 1, Step: 142, Rank: 3, loss = 0.8531671166419983 | |
total tokens: 6288 num samples: 3 num padding tokens: 423 - rank: 1 max len: 2096 min len: 1765 avg len: 1955.0 num_loss_counted_tokens: 4016 | |
total tokens: 7448 num samples: 8 num padding tokens: 727 - rank: 4 max len: 931 min len: 688 avg len: 840.125 num_loss_counted_tokens: 5430 | |
{ | |
"epoch": 1, | |
"step": 142, | |
"rank": 0, | |
"loss": 0.11148512363433838, | |
"overall_throughput": 41.71373478675397, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.478935718536377, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20270, | |
"batch_size": 89, | |
"total_loss": 0.8032714128494263, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:22.542197" | |
} | |
total tokens: 7664 num samples: 16 num padding tokens: 2101 - rank: 6 max len: 479 min len: 261 avg len: 347.6875 num_loss_counted_tokens: 2998 | |
total tokens: 7707 num samples: 7 num padding tokens: 488 - rank: 3 max len: 1101 min len: 961 avg len: 1031.2857142857142 num_loss_counted_tokens: 3858 | |
total tokens: 7130 num samples: 5 num padding tokens: 559 - rank: 2 max len: 1426 min len: 1132 avg len: 1314.2 num_loss_counted_tokens: 3828 | |
total tokens: 7540 num samples: 29 num padding tokens: 3170 - rank: 7 max len: 260 min len: 78 avg len: 150.68965517241378 num_loss_counted_tokens: 1742 | |
total tokens: 6342 num samples: 2 num padding tokens: 886 - rank: 0 max len: 3171 min len: 2285 avg len: 2728.0 num_loss_counted_tokens: 169 | |
total tokens: 7872 num samples: 12 num padding tokens: 877 - rank: 5 max len: 656 min len: 510 avg len: 582.9166666666666 num_loss_counted_tokens: 4836 | |
Per-token loss scaled by world size: 0.0003404757590033114Per-token loss scaled by world size: 0.0005307358223944902Per-token loss scaled by world size: 6.328061135718599e-05Per-token loss scaled by world size: 0.0002552252262830734 | |
Per-token loss scaled by world size: 2.1218120309640653e-06 | |
Per-token loss scaled by world size: 1.8243759768665768e-05 | |
Per-token loss scaled by world size: 0.0003829057968687266Epoch: 1, Step: 143, Rank: 5, loss = 1.3186794519424438 | |
Epoch: 1, Step: 143, Rank: 2, loss = 0.15722858905792236Epoch: 1, Step: 143, Rank: 3, loss = 0.6341390013694763 | |
Epoch: 1, Step: 143, Rank: 0, loss = 0.005271907430142164 | |
Epoch: 1, Step: 143, Rank: 4, loss = 0.8459545969963074 | |
Epoch: 1, Step: 143, Rank: 1, loss = 0.04532890021800995 | |
Epoch: 1, Step: 143, Rank: 7, loss = 0.9513773322105408 | |
Per-token loss scaled by world size: 0.0005416726926341653 | |
Epoch: 1, Step: 143, Rank: 6, loss = 1.3458534479141235 | |
total tokens: 7796 num samples: 4 num padding tokens: 508 - rank: 1 max len: 1949 min len: 1564 avg len: 1822.0 num_loss_counted_tokens: 2068 | |
total tokens: 7893 num samples: 9 num padding tokens: 666 - rank: 4 max len: 877 min len: 712 avg len: 803.0 num_loss_counted_tokens: 5521 | |
total tokens: 7679 num samples: 7 num padding tokens: 655 - rank: 3 max len: 1097 min len: 888 avg len: 1003.4285714285714 num_loss_counted_tokens: 5783 | |
{ | |
"epoch": 1, | |
"step": 143, | |
"rank": 0, | |
"loss": 0.005271907430142164, | |
"overall_throughput": 42.46446277162131, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 24.28184461593628, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19877, | |
"batch_size": 76, | |
"total_loss": 0.6629791259765625, | |
"gradnorm": 1.007487177848816, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:25.071211" | |
} | |
total tokens: 2587 num samples: 13 num padding tokens: 827 - rank: 7 max len: 199 min len: 78 avg len: 135.3846153846154 num_loss_counted_tokens: 674 | |
total tokens: 7755 num samples: 11 num padding tokens: 897 - rank: 5 max len: 705 min len: 483 avg len: 623.4545454545455 num_loss_counted_tokens: 3835 | |
total tokens: 8028 num samples: 18 num padding tokens: 2679 - rank: 6 max len: 446 min len: 203 avg len: 297.1666666666667 num_loss_counted_tokens: 3221 | |
total tokens: 7772 num samples: 2 num padding tokens: 1840 - rank: 0 max len: 3886 min len: 2046 avg len: 2966.0 num_loss_counted_tokens: 2038 total tokens: 7080 num samples: 5 num padding tokens: 517 - rank: 2 max len: 1416 min len: 1149 avg len: 1312.6 num_loss_counted_tokens: 2382 | |
Per-token loss scaled by world size: 0.00031199524528346956Per-token loss scaled by world size: 8.699101454112679e-05Per-token loss scaled by world size: 1.0524022400204558e-06Per-token loss scaled by world size: 0.00040260597597807646Per-token loss scaled by world size: 0.0002991097862832248Per-token loss scaled by world size: 9.484303154749796e-05 | |
Epoch: 1, Step: 144, Rank: 1, loss = 0.21871715784072876Epoch: 1, Step: 144, Rank: 0, loss = 0.0026460024528205395 | |
Epoch: 1, Step: 144, Rank: 2, loss = 0.23845909535884857 | |
Epoch: 1, Step: 144, Rank: 4, loss = 1.0122520923614502 | |
Epoch: 1, Step: 144, Rank: 7, loss = 0.7844340801239014 | |
Epoch: 1, Step: 144, Rank: 3, loss = 0.7520367503166199 | |
Per-token loss scaled by world size: 0.0005959446425549686 | |
Per-token loss scaled by world size: 0.00046541052870452404Epoch: 1, Step: 144, Rank: 5, loss = 1.4983538389205933 | |
Epoch: 1, Step: 144, Rank: 6, loss = 1.1701583862304688 | |
[2024-08-18 20:54:27,526] [INFO] [logging.py:96:log_dist] [Rank 0] step=4, skipped=0, lr=[3.2000000000000003e-06], mom=[(0.9, 0.95)] | |
[2024-08-18 20:54:27,603] [INFO] [timer.py:258:stop] epoch=0/micro_step=144/global_step=4, RunningAvgSamplesPerSec=41.73388834459918, CurrSamplesPerSec=41.83623290194428, MemAllocated=22.69GB, MaxMemAllocated=30.58GB | |
total tokens: 7693 num samples: 7 num padding tokens: 967 - rank: 4 max len: 1099 min len: 892 avg len: 960.8571428571429 num_loss_counted_tokens: 3764 | |
total tokens: 8019 num samples: 3 num padding tokens: 876 - rank: 1 max len: 2673 min len: 2126 avg len: 2381.0 num_loss_counted_tokens: 283 | |
total tokens: 8005 num samples: 5 num padding tokens: 1366 - rank: 3 max len: 1601 min len: 1136 avg len: 1327.8 num_loss_counted_tokens: 1813 | |
{ | |
"epoch": 1, | |
"step": 144, | |
"rank": 0, | |
"loss": 0.0026460024528205395, | |
"overall_throughput": 41.11764673168018, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 22.69074296951294, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20114, | |
"batch_size": 79, | |
"total_loss": 0.709632158279419, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:27.606216" | |
} | |
total tokens: 8100 num samples: 15 num padding tokens: 1493 - rank: 6 max len: 540 min len: 336 avg len: 440.46666666666664 num_loss_counted_tokens: 3689 | |
total tokens: 8000 num samples: 25 num padding tokens: 2965 - rank: 7 max len: 320 min len: 78 avg len: 201.4 num_loss_counted_tokens: 2308 | |
total tokens: 7860 num samples: 4 num padding tokens: 801 - rank: 2 max len: 1965 min len: 1608 avg len: 1764.75 num_loss_counted_tokens: 4235 | |
total tokens: 7858 num samples: 2 num padding tokens: 1014 - rank: 0 max len: 3929 min len: 2915 avg len: 3422.0 num_loss_counted_tokens: 226 | |
total tokens: 7686 num samples: 9 num padding tokens: 1386 - rank: 5 max len: 854 min len: 588 avg len: 700.0 num_loss_counted_tokens: 3557 | |
Per-token loss scaled by world size: 0.00021755551279056817Per-token loss scaled by world size: 0.00011090948828496039Per-token loss scaled by world size: 0.0003043776086997241Per-token loss scaled by world size: 0.00020149040210526437 | |
Per-token loss scaled by world size: 1.0487364079381223e-06 | |
Per-token loss scaled by world size: 0.000273953570285812Per-token loss scaled by world size: 0.00017797687905840576 | |
Epoch: 1, Step: 145, Rank: 5, loss = 1.0936287641525269 | |
Epoch: 1, Step: 145, Rank: 6, loss = 0.7239550352096558 | |
Epoch: 1, Step: 145, Rank: 3, loss = 0.7816769480705261Epoch: 1, Step: 145, Rank: 0, loss = 0.0037681099493056536 | |
Epoch: 1, Step: 145, Rank: 1, loss = 0.3984977900981903 | |
Epoch: 1, Step: 145, Rank: 7, loss = 0.6394709348678589 | |
Epoch: 1, Step: 145, Rank: 4, loss = 0.9843152165412903 | |
Per-token loss scaled by world size: 0.00025657241349108517 | |
Epoch: 1, Step: 145, Rank: 2, loss = 0.9218646883964539 | |
total tokens: 5826 num samples: 2 num padding tokens: 6 - rank: 1 max len: 2913 min len: 2907 avg len: 2910.0 num_loss_counted_tokens: 567 | |
total tokens: 7812 num samples: 7 num padding tokens: 1611 - rank: 4 max len: 1116 min len: 752 avg len: 885.8571428571429 num_loss_counted_tokens: 3680 | |
{ | |
"epoch": 1, | |
"step": 145, | |
"rank": 0, | |
"loss": 0.0037681099493056536, | |
"overall_throughput": 41.9281555047958, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.221776962280273, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 28744, | |
"batch_size": 94, | |
"total_loss": 0.6933972239494324, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:30.179291" | |
} | |
total tokens: 5792 num samples: 2 num padding tokens: 692 - rank: 2 max len: 2896 min len: 2204 avg len: 2550.0 num_loss_counted_tokens: 520 | |
total tokens: 7612 num samples: 11 num padding tokens: 980 - rank: 5 max len: 692 min len: 509 avg len: 602.9090909090909 num_loss_counted_tokens: 4697 | |
total tokens: 7952 num samples: 16 num padding tokens: 1525 - rank: 6 max len: 497 min len: 266 avg len: 401.6875 num_loss_counted_tokens: 4516 | |
total tokens: 6225 num samples: 25 num padding tokens: 2041 - rank: 7 max len: 249 min len: 75 avg len: 167.36 num_loss_counted_tokens: 1838 | |
total tokens: 6171 num samples: 3 num padding tokens: 1311 - rank: 3 max len: 2057 min len: 1365 avg len: 1620.0 num_loss_counted_tokens: 217 | |
total tokens: 7190 num samples: 2 num padding tokens: 292 - rank: 0 max len: 3595 min len: 3303 avg len: 3449.0 num_loss_counted_tokens: 179 | |
Per-token loss scaled by world size: 0.00040030613308772445Per-token loss scaled by world size: 2.0256973130017286e-06Per-token loss scaled by world size: 4.8830220293893944e-06Per-token loss scaled by world size: 0.0004052049189340323Per-token loss scaled by world size: 0.0007852399721741676 | |
Per-token loss scaled by world size: 0.00055212079314515Per-token loss scaled by world size: 0.0006642856751568615 | |
Epoch: 1, Step: 146, Rank: 3, loss = 0.004231428261846304Epoch: 1, Step: 146, Rank: 6, loss = 1.640268087387085 | |
Epoch: 1, Step: 146, Rank: 2, loss = 0.8361894488334656 | |
Epoch: 1, Step: 146, Rank: 5, loss = 1.3876097202301025 | |
Epoch: 1, Step: 146, Rank: 7, loss = 0.8464224338531494 | |
Epoch: 1, Step: 146, Rank: 4, loss = 1.1533113718032837 | |
Epoch: 1, Step: 146, Rank: 1, loss = 0.010200022719800472 | |
Per-token loss scaled by world size: 5.620273441309109e-05 | |
Epoch: 1, Step: 146, Rank: 0, loss = 0.11740048974752426 | |
total tokens: 5644 num samples: 2 num padding tokens: 996 - rank: 1 max len: 2822 min len: 1826 avg len: 2324.0 num_loss_counted_tokens: 207 | |
total tokens: 7605 num samples: 9 num padding tokens: 737 - rank: 4 max len: 845 min len: 701 avg len: 763.1111111111111 num_loss_counted_tokens: 4932 | |
{ | |
"epoch": 1, | |
"step": 146, | |
"rank": 0, | |
"loss": 0.11740048974752426, | |
"overall_throughput": 40.05446798320092, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.520647048950195, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 16711, | |
"batch_size": 66, | |
"total_loss": 0.749454140663147, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:32.780987" | |
} | |
total tokens: 6615 num samples: 27 num padding tokens: 2358 - rank: 7 max len: 245 min len: 88 avg len: 157.66666666666666 num_loss_counted_tokens: 1644 | |
total tokens: 6744 num samples: 4 num padding tokens: 435 - rank: 2 max len: 1686 min len: 1368 avg len: 1577.25 num_loss_counted_tokens: 992 | |
total tokens: 6300 num samples: 2 num padding tokens: 261 - rank: 0 max len: 3150 min len: 2889 avg len: 3019.5 num_loss_counted_tokens: 190 | |
total tokens: 6790 num samples: 5 num padding tokens: 1037 - rank: 3 max len: 1358 min len: 1051 avg len: 1150.6 num_loss_counted_tokens: 2865 | |
total tokens: 7872 num samples: 12 num padding tokens: 1248 - rank: 5 max len: 656 min len: 441 avg len: 552.0 num_loss_counted_tokens: 3289 | |
total tokens: 7776 num samples: 18 num padding tokens: 1655 - rank: 6 max len: 432 min len: 260 avg len: 340.05555555555554 num_loss_counted_tokens: 3056 | |
Per-token loss scaled by world size: 0.0007255689124576747Per-token loss scaled by world size: 0.0006010388606227934Per-token loss scaled by world size: 0.0007319062133319676Per-token loss scaled by world size: 7.5493703661777545e-06Per-token loss scaled by world size: 0.00025839314912445843Per-token loss scaled by world size: 5.159122338227462e-06Per-token loss scaled by world size: 7.537942292401567e-05 | |
Epoch: 1, Step: 147, Rank: 5, loss = 1.5308597087860107Epoch: 1, Step: 147, Rank: 6, loss = 1.544230580329895 | |
Epoch: 1, Step: 147, Rank: 4, loss = 1.26811683177948Epoch: 1, Step: 147, Rank: 2, loss = 0.010885103605687618 | |
Epoch: 1, Step: 147, Rank: 1, loss = 0.015928227454423904 | |
Epoch: 1, Step: 147, Rank: 0, loss = 0.159041166305542Epoch: 1, Step: 147, Rank: 7, loss = 0.5451772212982178 | |
Per-token loss scaled by world size: 0.000249014439759776 | |
Epoch: 1, Step: 147, Rank: 3, loss = 0.5253893136978149 | |
total tokens: 5520 num samples: 2 num padding tokens: 43 - rank: 1 max len: 2760 min len: 2717 avg len: 2738.5 num_loss_counted_tokens: 221 | |
{poch 1: 21%|██▏ | 26/122 [01:06<04:05, 2.56s/it] | |
"epoch": 1, | |
"step": 147, | |
"rank": 0, | |
"loss": 0.159041166305542, | |
"overall_throughput": 41.27657023996106, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.05473041534424, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 16879, | |
"batch_size": 60, | |
"total_loss": 0.699953556060791, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:35.343018" | |
} | |
total tokens: 7721 num samples: 7 num padding tokens: 1814 - rank: 4 max len: 1103 min len: 719 avg len: 843.8571428571429 num_loss_counted_tokens: 3306 | |
total tokens: 6106 num samples: 2 num padding tokens: 283 - rank: 0 max len: 3053 min len: 2770 avg len: 2911.5 num_loss_counted_tokens: 189 | |
total tokens: 8064 num samples: 32 num padding tokens: 2634 - rank: 7 max len: 252 min len: 83 avg len: 169.6875 num_loss_counted_tokens: 2206 | |
total tokens: 7017 num samples: 3 num padding tokens: 592 - rank: 2 max len: 2339 min len: 2001 avg len: 2141.6666666666665 num_loss_counted_tokens: 257 | |
total tokens: 7960 num samples: 5 num padding tokens: 1198 - rank: 3 max len: 1592 min len: 1124 avg len: 1352.4 num_loss_counted_tokens: 3699 | |
total tokens: 7840 num samples: 16 num padding tokens: 2136 - rank: 6 max len: 490 min len: 253 avg len: 356.5 num_loss_counted_tokens: 2793 | |
total tokens: 7744 num samples: 11 num padding tokens: 773 - rank: 5 max len: 704 min len: 499 avg len: 633.7272727272727 num_loss_counted_tokens: 4777 | |
Per-token loss scaled by world size: 0.00016060298366937786Per-token loss scaled by world size: 0.0004763362812809646Per-token loss scaled by world size: 1.0486909332030336e-06 | |
Per-token loss scaled by world size: 0.00025925057707354426 | |
Per-token loss scaled by world size: 0.00013205081631895155 | |
Per-token loss scaled by world size: 0.0002572258817963302Per-token loss scaled by world size: 0.0002500153495930135 | |
Epoch: 1, Step: 148, Rank: 5, loss = 1.6056700944900513 | |
Epoch: 1, Step: 148, Rank: 0, loss = 0.003535006195306778 | |
Epoch: 1, Step: 148, Rank: 2, loss = 0.5413725972175598 | |
Epoch: 1, Step: 148, Rank: 6, loss = 0.8739013075828552 | |
Epoch: 1, Step: 148, Rank: 4, loss = 0.8670763373374939Epoch: 1, Step: 148, Rank: 7, loss = 0.8427704572677612 | |
Epoch: 1, Step: 148, Rank: 1, loss = 0.44512680172920227 | |
Per-token loss scaled by world size: 0.00031931744888424873 | |
Epoch: 1, Step: 148, Rank: 3, loss = 1.0763791799545288 | |
total tokens: 7304 num samples: 8 num padding tokens: 608 - rank: 4 max len: 913 min len: 793 avg len: 837.0 num_loss_counted_tokens: 4081 | |
total tokens: 7014 num samples: 3 num padding tokens: 554 - rank: 1 max len: 2338 min len: 1840 avg len: 2153.3333333333335 num_loss_counted_tokens: 349 | |
{ | |
"epoch": 1, | |
"step": 148, | |
"rank": 0, | |
"loss": 0.003535006195306778, | |
"overall_throughput": 40.32522306984037, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.535661697387695, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26967, | |
"batch_size": 89, | |
"total_loss": 0.781978964805603, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:37.948159" | |
} | |
total tokens: 7966 num samples: 14 num padding tokens: 3196 - rank: 6 max len: 569 min len: 266 avg len: 340.7142857142857 num_loss_counted_tokens: 2847 | |
total tokens: 6400 num samples: 25 num padding tokens: 2104 - rank: 7 max len: 256 min len: 75 avg len: 171.84 num_loss_counted_tokens: 1881 | |
total tokens: 7720 num samples: 10 num padding tokens: 795 - rank: 5 max len: 772 min len: 616 avg len: 692.5 num_loss_counted_tokens: 5407 | |
total tokens: 5704 num samples: 2 num padding tokens: 490 - rank: 0 max len: 2852 min len: 2362 avg len: 2607.0 num_loss_counted_tokens: 206 | |
total tokens: 7476 num samples: 6 num padding tokens: 582 - rank: 2 max len: 1246 min len: 1054 avg len: 1149.0 num_loss_counted_tokens: 3350 | |
total tokens: 7350 num samples: 7 num padding tokens: 419 - rank: 3 max len: 1050 min len: 921 avg len: 990.1428571428571 num_loss_counted_tokens: 4766 | |
Per-token loss scaled by world size: 0.00031092012068256736Per-token loss scaled by world size: 0.00025245780125260353Per-token loss scaled by world size: 0.0003971010446548462Per-token loss scaled by world size: 0.00022266971063800156Per-token loss scaled by world size: 0.0001838229363784194Per-token loss scaled by world size: 0.0004638316167984158 | |
Per-token loss scaled by world size: 4.563625225273427e-06 | |
Epoch: 1, Step: 149, Rank: 4, loss = 0.7849859595298767Epoch: 1, Step: 149, Rank: 5, loss = 1.2347360849380493 | |
Epoch: 1, Step: 149, Rank: 1, loss = 0.6923636198043823 | |
Epoch: 1, Step: 149, Rank: 7, loss = 0.9667672514915466 | |
Epoch: 1, Step: 149, Rank: 2, loss = 0.5715744495391846Epoch: 1, Step: 149, Rank: 3, loss = 1.4422264099121094 | |
Epoch: 1, Step: 149, Rank: 0, loss = 0.014190022833645344 | |
Per-token loss scaled by world size: 0.00031763844890519977 | |
Epoch: 1, Step: 149, Rank: 6, loss = 0.9876570105552673 | |
total tokens: 7170 num samples: 3 num padding tokens: 1425 - rank: 1 max len: 2390 min len: 1672 avg len: 1915.0 num_loss_counted_tokens: 1941 | |
total tokens: 7462 num samples: 7 num padding tokens: 447 - rank: 4 max len: 1066 min len: 923 avg len: 1002.1428571428571 num_loss_counted_tokens: 3761 | |
{ | |
"epoch": 1, | |
"step": 149, | |
"rank": 0, | |
"loss": 0.014190022833645344, | |
"overall_throughput": 42.217359471248905, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.2309513092041, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24875, | |
"batch_size": 89, | |
"total_loss": 0.8368127346038818, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:40.455161" | |
} | |
total tokens: 7720 num samples: 5 num padding tokens: 320 - rank: 2 max len: 1544 min len: 1403 avg len: 1480.0 num_loss_counted_tokens: 2899 | |
total tokens: 7384 num samples: 8 num padding tokens: 1322 - rank: 5 max len: 923 min len: 593 avg len: 757.75 num_loss_counted_tokens: 4939 | |
total tokens: 7968 num samples: 24 num padding tokens: 3134 - rank: 7 max len: 332 min len: 91 avg len: 201.41666666666666 num_loss_counted_tokens: 2053 | |
total tokens: 6318 num samples: 2 num padding tokens: 100 - rank: 0 max len: 3159 min len: 3059 avg len: 3109.0 num_loss_counted_tokens: 160 | |
total tokens: 6785 num samples: 5 num padding tokens: 341 - rank: 3 max len: 1357 min len: 1186 avg len: 1288.8 num_loss_counted_tokens: 5137 | |
total tokens: 7602 num samples: 14 num padding tokens: 1273 - rank: 6 max len: 543 min len: 371 avg len: 452.07142857142856 num_loss_counted_tokens: 4011 | |
Per-token loss scaled by world size: 0.00026751268887892365Per-token loss scaled by world size: 0.00014292483683675528Per-token loss scaled by world size: 0.0001920466311275959Per-token loss scaled by world size: 0.0004723104939330369Per-token loss scaled by world size: 2.162046712328447e-06Per-token loss scaled by world size: 0.0003416259423829615 | |
Per-token loss scaled by world size: 0.0003082101175095886 | |
Epoch: 1, Step: 150, Rank: 3, loss = 0.7713059782981873Epoch: 1, Step: 150, Rank: 0, loss = 0.0062337215058505535 | |
Epoch: 1, Step: 150, Rank: 2, loss = 0.5537184476852417 | |
Epoch: 1, Step: 150, Rank: 5, loss = 1.3617892265319824 | |
Epoch: 1, Step: 150, Rank: 1, loss = 0.4120880365371704 | |
Epoch: 1, Step: 150, Rank: 7, loss = 0.9849929809570312 | |
Epoch: 1, Step: 150, Rank: 4, loss = 0.8886467814445496 | |
Per-token loss scaled by world size: 0.00027262946241535246 | |
Epoch: 1, Step: 150, Rank: 6, loss = 0.7860589027404785 | |
total tokens: 7392 num samples: 3 num padding tokens: 958 - rank: 1 max len: 2464 min len: 1969 avg len: 2144.6666666666665 num_loss_counted_tokens: 582 | |
total tokens: 7800 num samples: 10 num padding tokens: 762 - rank: 4 max len: 780 min len: 667 avg len: 703.8 num_loss_counted_tokens: 4946 | |
{ | |
"epoch": 1, | |
"step": 150, | |
"rank": 0, | |
"loss": 0.0062337215058505535, | |
"overall_throughput": 41.782778937325965, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.4852614402771, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23066, | |
"batch_size": 89, | |
"total_loss": 0.7206042408943176, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:42.989502" | |
} | |
total tokens: 7866 num samples: 19 num padding tokens: 2154 - rank: 6 max len: 414 min len: 221 avg len: 300.63157894736844 num_loss_counted_tokens: 2912 | |
total tokens: 7504 num samples: 4 num padding tokens: 1167 - rank: 2 max len: 1876 min len: 1283 avg len: 1584.25 num_loss_counted_tokens: 741 | |
total tokens: 8016 num samples: 8 num padding tokens: 721 - rank: 3 max len: 1002 min len: 795 avg len: 911.875 num_loss_counted_tokens: 6005 | |
total tokens: 5805 num samples: 27 num padding tokens: 2202 - rank: 7 max len: 215 min len: 71 avg len: 133.44444444444446 num_loss_counted_tokens: 1404 | |
total tokens: 6802 num samples: 2 num padding tokens: 541 - rank: 0 max len: 3401 min len: 2860 avg len: 3130.5 num_loss_counted_tokens: 166 | |
total tokens: 7920 num samples: 12 num padding tokens: 1266 - rank: 5 max len: 660 min len: 445 avg len: 554.5 num_loss_counted_tokens: 4048 | |
Per-token loss scaled by world size: 0.0004141188692301512Per-token loss scaled by world size: 0.0002827317512128502Per-token loss scaled by world size: 0.00034681695979088545 | |
Per-token loss scaled by world size: 0.0004108196299057454 | |
Per-token loss scaled by world size: 1.1767973546739086e-06 | |
Per-token loss scaled by world size: 9.391092316946015e-05Per-token loss scaled by world size: 0.00017690712411422282 | |
Epoch: 1, Step: 151, Rank: 3, loss = 1.0656384229660034 | |
Epoch: 1, Step: 151, Rank: 5, loss = 1.2724319696426392Epoch: 1, Step: 151, Rank: 4, loss = 0.8687286376953125 | |
Epoch: 1, Step: 151, Rank: 0, loss = 0.003615856869146228Epoch: 1, Step: 151, Rank: 6, loss = 1.2622946500778198 | |
Epoch: 1, Step: 151, Rank: 7, loss = 0.5435692667961121 | |
Epoch: 1, Step: 151, Rank: 1, loss = 0.28855305910110474 | |
Per-token loss scaled by world size: 0.0003057793073821813 | |
Epoch: 1, Step: 151, Rank: 2, loss = 0.9395451545715332 | |
total tokens: 8016 num samples: 8 num padding tokens: 1103 - rank: 4 max len: 1002 min len: 696 avg len: 864.125 num_loss_counted_tokens: 4285 | |
total tokens: 7425 num samples: 3 num padding tokens: 1417 - rank: 1 max len: 2475 min len: 1754 avg len: 2002.6666666666667 num_loss_counted_tokens: 868 | |
{ | |
"epoch": 1, | |
"step": 151, | |
"rank": 0, | |
"loss": 0.003615856869146228, | |
"overall_throughput": 41.2486467947231, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.402647495269775, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24581, | |
"batch_size": 87, | |
"total_loss": 0.780547022819519, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:45.555476" | |
} | |
total tokens: 8046 num samples: 6 num padding tokens: 1241 - rank: 3 max len: 1341 min len: 1041 avg len: 1134.1666666666667 num_loss_counted_tokens: 2869 | |
total tokens: 7560 num samples: 30 num padding tokens: 2626 - rank: 7 max len: 252 min len: 77 avg len: 164.46666666666667 num_loss_counted_tokens: 2002 | |
total tokens: 7755 num samples: 15 num padding tokens: 2376 - rank: 6 max len: 517 min len: 262 avg len: 358.6 num_loss_counted_tokens: 3077 | |
total tokens: 6984 num samples: 4 num padding tokens: 492 - rank: 2 max len: 1746 min len: 1432 avg len: 1623.0 num_loss_counted_tokens: 2214 | |
total tokens: 7491 num samples: 11 num padding tokens: 868 - rank: 5 max len: 681 min len: 536 avg len: 602.0909090909091 num_loss_counted_tokens: 3815 | |
total tokens: 7306 num samples: 2 num padding tokens: 786 - rank: 0 max len: 3653 min len: 2867 avg len: 3260.0 num_loss_counted_tokens: 160 | |
Per-token loss scaled by world size: 0.00020859052892774343Per-token loss scaled by world size: 0.0006468938081525266Per-token loss scaled by world size: 0.00038840470369905233Per-token loss scaled by world size: 8.114238880807534e-05 | |
Per-token loss scaled by world size: 4.119947334402241e-05 | |
Per-token loss scaled by world size: 0.0005277044838294387 | |
Per-token loss scaled by world size: 3.4815836897905683e-06 | |
Epoch: 1, Step: 152, Rank: 5, loss = 1.5654021501541138 | |
Epoch: 1, Step: 152, Rank: 3, loss = 0.5047630071640015 | |
Epoch: 1, Step: 152, Rank: 2, loss = 0.1963544338941574 | |
Epoch: 1, Step: 152, Rank: 7, loss = 0.9398908615112305 | |
Epoch: 1, Step: 152, Rank: 4, loss = 1.276978850364685 | |
Epoch: 1, Step: 152, Rank: 0, loss = 0.008424997329711914 | |
Epoch: 1, Step: 152, Rank: 1, loss = 0.09969757497310638 | |
Per-token loss scaled by world size: 0.0005147996125742793 | |
Epoch: 1, Step: 152, Rank: 6, loss = 1.2457506656646729 | |
total tokens: 5714 num samples: 2 num padding tokens: 255 - rank: 1 max len: 2857 min len: 2602 avg len: 2729.5 num_loss_counted_tokens: 173 | |
total tokens: 7479 num samples: 9 num padding tokens: 752 - rank: 4 max len: 831 min len: 644 avg len: 747.4444444444445 num_loss_counted_tokens: 4908 | |
{ | |
"epoch": 1, | |
"step": 152, | |
"rank": 0, | |
"loss": 0.008424997329711914, | |
"overall_throughput": 43.280048077519346, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.22560167312622, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19359, | |
"batch_size": 61, | |
"total_loss": 0.7296578288078308, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:48.003629" | |
} | |
total tokens: 7932 num samples: 6 num padding tokens: 1038 - rank: 2 max len: 1322 min len: 990 avg len: 1149.0 num_loss_counted_tokens: 3464 total tokens: 6662 num samples: 2 num padding tokens: 247 - rank: 0 max len: 3331 min len: 3084 avg len: 3207.5 num_loss_counted_tokens: 180 | |
total tokens: 3819 num samples: 19 num padding tokens: 1446 - rank: 7 max len: 201 min len: 76 avg len: 124.89473684210526 num_loss_counted_tokens: 686 | |
total tokens: 8052 num samples: 22 num padding tokens: 2171 - rank: 6 max len: 366 min len: 201 avg len: 267.3181818181818 num_loss_counted_tokens: 3024 | |
total tokens: 7912 num samples: 8 num padding tokens: 615 - rank: 3 max len: 989 min len: 855 avg len: 912.125 num_loss_counted_tokens: 4585 | |
total tokens: 8047 num samples: 13 num padding tokens: 1226 - rank: 5 max len: 619 min len: 375 avg len: 524.6923076923077 num_loss_counted_tokens: 5490 | |
Per-token loss scaled by world size: 0.0002795422915369272Per-token loss scaled by world size: 0.00013167920405976474Per-token loss scaled by world size: 0.0003253524482715875Per-token loss scaled by world size: 0.00030560040613636374Per-token loss scaled by world size: 0.00012384731962811202 | |
Per-token loss scaled by world size: 1.194436777041119e-06 | |
Epoch: 1, Step: 153, Rank: 3, loss = 1.0706535577774048Per-token loss scaled by world size: 0.0002909142931457609 | |
Epoch: 1, Step: 153, Rank: 1, loss = 0.407550573348999Epoch: 1, Step: 153, Rank: 7, loss = 0.43332335352897644 | |
Epoch: 1, Step: 153, Rank: 4, loss = 1.0056545734405518Epoch: 1, Step: 153, Rank: 6, loss = 0.9199037551879883Epoch: 1, Step: 153, Rank: 0, loss = 0.003930592909455299 | |
Per-token loss scaled by world size: 0.0003188060945831239 | |
Epoch: 1, Step: 153, Rank: 2, loss = 0.9573261737823486 | |
Epoch: 1, Step: 153, Rank: 5, loss = 1.0491111278533936 | |
total tokens: 7839 num samples: 9 num padding tokens: 793 - rank: 4 max len: 871 min len: 691 avg len: 782.8888888888889 num_loss_counted_tokens: 5172 | |
total tokens: 6369 num samples: 3 num padding tokens: 846 - rank: 1 max len: 2123 min len: 1506 avg len: 1841.0 num_loss_counted_tokens: 2162 | |
{ | |
"epoch": 1, | |
"step": 153, | |
"rank": 0, | |
"loss": 0.003930592909455299, | |
"overall_throughput": 41.68838808478364, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.49734401702881, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26326, | |
"batch_size": 95, | |
"total_loss": 0.7309317588806152, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:50.537829" | |
} | |
total tokens: 7215 num samples: 5 num padding tokens: 508 - rank: 2 max len: 1443 min len: 1149 avg len: 1341.4 num_loss_counted_tokens: 3549 | |
total tokens: 7714 num samples: 29 num padding tokens: 2672 - rank: 7 max len: 266 min len: 79 avg len: 173.86206896551724 num_loss_counted_tokens: 2096 | |
total tokens: 8004 num samples: 3 num padding tokens: 275 - rank: 0 max len: 2668 min len: 2403 avg len: 2576.3333333333335 num_loss_counted_tokens: 964 | |
total tokens: 7700 num samples: 7 num padding tokens: 838 - rank: 3 max len: 1100 min len: 894 avg len: 980.2857142857143 num_loss_counted_tokens: 4738 | |
total tokens: 8076 num samples: 12 num padding tokens: 1148 - rank: 5 max len: 673 min len: 516 avg len: 577.3333333333334 num_loss_counted_tokens: 4820 | |
total tokens: 8032 num samples: 16 num padding tokens: 2150 - rank: 6 max len: 502 min len: 274 avg len: 367.625 num_loss_counted_tokens: 3962 | |
Per-token loss scaled by world size: 0.00019867185619659722Per-token loss scaled by world size: 0.0002461467229295522Per-token loss scaled by world size: 0.00024380745890084654Per-token loss scaled by world size: 0.00019991688895970583 | |
Per-token loss scaled by world size: 6.630049756495282e-05 | |
Per-token loss scaled by world size: 1.6863944551914756e-07 | |
Epoch: 1, Step: 154, Rank: 2, loss = 0.7483974695205688 | |
Epoch: 1, Step: 154, Rank: 3, loss = 0.6136698722839355Epoch: 1, Step: 154, Rank: 4, loss = 0.7555781602859497 | |
Epoch: 1, Step: 154, Rank: 6, loss = 0.6098480820655823 | |
Per-token loss scaled by world size: 0.00021128085791133344Epoch: 1, Step: 154, Rank: 1, loss = 0.20351766049861908 | |
Epoch: 1, Step: 154, Rank: 0, loss = 0.0005176598788239062 | |
Per-token loss scaled by world size: 0.0005612249951809645 | |
Epoch: 1, Step: 154, Rank: 7, loss = 0.6485530138015747 | |
Epoch: 1, Step: 154, Rank: 5, loss = 1.7227503061294556 | |
total tokens: 7904 num samples: 4 num padding tokens: 814 - rank: 1 max len: 1976 min len: 1533 avg len: 1772.5 num_loss_counted_tokens: 2873 | |
total tokens: 7744 num samples: 11 num padding tokens: 596 - rank: 4 max len: 704 min len: 594 avg len: 649.8181818181819 num_loss_counted_tokens: 5417 | |
{ | |
"epoch": 1, | |
"step": 154, | |
"rank": 0, | |
"loss": 0.0005176598788239062, | |
"overall_throughput": 42.108806987198356, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.21819305419922, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24557, | |
"batch_size": 80, | |
"total_loss": 0.662854015827179, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:53.052708" | |
} | |
total tokens: 7440 num samples: 31 num padding tokens: 3180 - rank: 7 max len: 240 min len: 74 avg len: 137.41935483870967 num_loss_counted_tokens: 1408 | |
total tokens: 7020 num samples: 6 num padding tokens: 1084 - rank: 2 max len: 1170 min len: 846 avg len: 989.3333333333334 num_loss_counted_tokens: 1599 | |
total tokens: 8097 num samples: 3 num padding tokens: 808 - rank: 0 max len: 2699 min len: 2025 avg len: 2429.6666666666665 num_loss_counted_tokens: 947 | |
total tokens: 7999 num samples: 19 num padding tokens: 1990 - rank: 6 max len: 421 min len: 240 avg len: 316.2631578947368 num_loss_counted_tokens: 3596 | |
total tokens: 7657 num samples: 13 num padding tokens: 1095 - rank: 5 max len: 589 min len: 434 avg len: 504.7692307692308 num_loss_counted_tokens: 4851 | |
total tokens: 7560 num samples: 9 num padding tokens: 485 - rank: 3 max len: 840 min len: 716 avg len: 786.1111111111111 num_loss_counted_tokens: 5656 | |
Per-token loss scaled by world size: 0.0003278250514995307Per-token loss scaled by world size: 0.0003366835881024599Per-token loss scaled by world size: 0.0003885742917191237Per-token loss scaled by world size: 0.00019582045206334442 | |
Per-token loss scaled by world size: 0.00032262562308460474 | |
Per-token loss scaled by world size: 0.00013199940440244973 | |
Per-token loss scaled by world size: 3.822985672741197e-05 | |
Epoch: 1, Step: 155, Rank: 5, loss = 1.060516357421875 | |
Epoch: 1, Step: 155, Rank: 4, loss = 0.8947165012359619Epoch: 1, Step: 155, Rank: 7, loss = 0.9188936948776245 | |
Epoch: 1, Step: 155, Rank: 2, loss = 0.5344429612159729 | |
Epoch: 1, Step: 155, Rank: 3, loss = 0.8805260062217712 | |
Epoch: 1, Step: 155, Rank: 1, loss = 0.36025938391685486 | |
Epoch: 1, Step: 155, Rank: 0, loss = 0.10433883965015411 | |
Per-token loss scaled by world size: 0.00041954353218898177 | |
Epoch: 1, Step: 155, Rank: 6, loss = 1.1450392007827759 | |
total tokens: 8064 num samples: 12 num padding tokens: 820 - rank: 4 max len: 672 min len: 523 avg len: 603.6666666666666 num_loss_counted_tokens: 5303 | |
total tokens: 6845 num samples: 5 num padding tokens: 859 - rank: 1 max len: 1369 min len: 1106 avg len: 1197.2 num_loss_counted_tokens: 4981 | |
{ | |
"epoch": 1, | |
"step": 155, | |
"rank": 0, | |
"loss": 0.10433883965015411, | |
"overall_throughput": 41.90741869411001, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.32686471939087, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21834, | |
"batch_size": 76, | |
"total_loss": 0.7373416423797607, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:55.565723" | |
} | |
total tokens: 7539 num samples: 7 num padding tokens: 396 - rank: 2 max len: 1077 min len: 913 avg len: 1020.4285714285714 num_loss_counted_tokens: 3046 | |
total tokens: 8118 num samples: 9 num padding tokens: 1166 - rank: 3 max len: 902 min len: 687 avg len: 772.4444444444445 num_loss_counted_tokens: 4403 | |
total tokens: 7740 num samples: 15 num padding tokens: 1204 - rank: 5 max len: 516 min len: 339 avg len: 435.73333333333335 num_loss_counted_tokens: 4055 | |
total tokens: 7732 num samples: 2 num padding tokens: 2186 - rank: 0 max len: 3866 min len: 1680 avg len: 2773.0 num_loss_counted_tokens: 1642 | |
total tokens: 6720 num samples: 30 num padding tokens: 2251 - rank: 7 max len: 224 min len: 83 avg len: 148.96666666666667 num_loss_counted_tokens: 1606 | |
total tokens: 7872 num samples: 24 num padding tokens: 1261 - rank: 6 max len: 328 min len: 226 avg len: 275.4583333333333 num_loss_counted_tokens: 3708 | |
Per-token loss scaled by world size: 0.00035203597508370876Per-token loss scaled by world size: 0.0004669471236411482Per-token loss scaled by world size: 0.00028399238362908363Per-token loss scaled by world size: 0.0006751486216671765 | |
Per-token loss scaled by world size: 8.408135727222543e-06Per-token loss scaled by world size: 0.0003578344185370952 | |
Epoch: 1, Step: 156, Rank: 2, loss = 0.6541054248809814 | |
Epoch: 1, Step: 156, Rank: 3, loss = 1.075495958328247 | |
Epoch: 1, Step: 156, Rank: 6, loss = 1.5550360679626465 | |
Epoch: 1, Step: 156, Rank: 5, loss = 0.81082683801651Epoch: 1, Step: 156, Rank: 0, loss = 0.019366038963198662 | |
Per-token loss scaled by world size: 0.0002267559466417879 | |
Epoch: 1, Step: 156, Rank: 4, loss = 0.8241821527481079 | |
Per-token loss scaled by world size: 2.405712393738213e-06 | |
Epoch: 1, Step: 156, Rank: 7, loss = 0.5222756266593933 | |
Epoch: 1, Step: 156, Rank: 1, loss = 0.00554095720872283 | |
total tokens: 6510 num samples: 3 num padding tokens: 687 - rank: 1 max len: 2170 min len: 1746 avg len: 1941.0 num_loss_counted_tokens: 859 | |
total tokens: 7452 num samples: 9 num padding tokens: 620 - rank: 4 max len: 828 min len: 705 avg len: 759.1111111111111 num_loss_counted_tokens: 5560 | |
total tokens: 8004 num samples: 29 num padding tokens: 2622 - rank: 7 max len: 276 min len: 75 avg len: 185.58620689655172 num_loss_counted_tokens: 2559 | |
{ | |
"epoch": 1, | |
"step": 156, | |
"rank": 0, | |
"loss": 0.019366038963198662, | |
"overall_throughput": 40.42005133127166, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.42158031463623, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18426, | |
"batch_size": 65, | |
"total_loss": 0.6833536028862, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:54:58.183018" | |
} | |
total tokens: 7995 num samples: 15 num padding tokens: 1805 - rank: 6 max len: 533 min len: 277 avg len: 412.6666666666667 num_loss_counted_tokens: 4022 | |
total tokens: 7678 num samples: 11 num padding tokens: 680 - rank: 5 max len: 698 min len: 541 avg len: 636.1818181818181 num_loss_counted_tokens: 5943 | |
total tokens: 6816 num samples: 4 num padding tokens: 1254 - rank: 2 max len: 1704 min len: 1064 avg len: 1390.5 num_loss_counted_tokens: 844 | |
total tokens: 7217 num samples: 7 num padding tokens: 458 - rank: 3 max len: 1031 min len: 875 avg len: 965.5714285714286 num_loss_counted_tokens: 3971 | |
total tokens: 5472 num samples: 2 num padding tokens: 101 - rank: 0 max len: 2736 min len: 2635 avg len: 2685.5 num_loss_counted_tokens: 183 | |
Per-token loss scaled by world size: 0.000452109903562814Per-token loss scaled by world size: 0.0004523490206338465Per-token loss scaled by world size: 7.593091595481383e-06Per-token loss scaled by world size: 0.0006878247950226068Per-token loss scaled by world size: 9.245219553122297e-05Per-token loss scaled by world size: 0.0003072120016440749Per-token loss scaled by world size: 0.0005076072411611676 | |
Epoch: 1, Step: 157, Rank: 5, loss = 0.9610720276832581 | |
Epoch: 1, Step: 157, Rank: 6, loss = 1.4613697528839111Epoch: 1, Step: 157, Rank: 3, loss = 0.6527103185653687Epoch: 1, Step: 157, Rank: 2, loss = 0.19642624258995056Epoch: 1, Step: 157, Rank: 4, loss = 0.9605640172958374 | |
Epoch: 1, Step: 157, Rank: 1, loss = 0.016132472082972527 | |
Epoch: 1, Step: 157, Rank: 7, loss = 1.078474998474121 | |
Per-token loss scaled by world size: 3.8059803046053275e-05 | |
Epoch: 1, Step: 157, Rank: 0, loss = 0.08086281269788742 | |
total tokens: 7684 num samples: 4 num padding tokens: 608 - rank: 1 max len: 1921 min len: 1493 avg len: 1769.0 num_loss_counted_tokens: 2916 | |
total tokens: 8118 num samples: 9 num padding tokens: 805 - rank: 4 max len: 902 min len: 707 avg len: 812.5555555555555 num_loss_counted_tokens: 5091 | |
{ | |
"epoch": 1, | |
"step": 157, | |
"rank": 0, | |
"loss": 0.08086281269788742, | |
"overall_throughput": 41.84706536508769, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.473669052124023, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 16997, | |
"batch_size": 74, | |
"total_loss": 0.6759515404701233, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:00.708630" | |
} | |
total tokens: 7973 num samples: 17 num padding tokens: 2030 - rank: 6 max len: 469 min len: 276 avg len: 349.5882352941176 num_loss_counted_tokens: 3185 | |
total tokens: 7330 num samples: 5 num padding tokens: 579 - rank: 2 max len: 1466 min len: 1164 avg len: 1350.2 num_loss_counted_tokens: 1678 | |
total tokens: 7656 num samples: 11 num padding tokens: 986 - rank: 5 max len: 696 min len: 545 avg len: 606.3636363636364 num_loss_counted_tokens: 3391 | |
total tokens: 7714 num samples: 29 num padding tokens: 2374 - rank: 7 max len: 266 min len: 78 avg len: 184.13793103448276 num_loss_counted_tokens: 2574 | |
total tokens: 7791 num samples: 7 num padding tokens: 885 - rank: 3 max len: 1113 min len: 906 avg len: 986.5714285714286 num_loss_counted_tokens: 4155 | |
total tokens: 6052 num samples: 2 num padding tokens: 661 - rank: 0 max len: 3026 min len: 2365 avg len: 2695.5 num_loss_counted_tokens: 163 | |
Per-token loss scaled by world size: 0.00020404128008522093Per-token loss scaled by world size: 0.0002751105057541281Per-token loss scaled by world size: 0.0002363547682762146Per-token loss scaled by world size: 0.0002702484780456871Per-token loss scaled by world size: 0.0001953808678081259 | |
Per-token loss scaled by world size: 0.00024261375074274838Per-token loss scaled by world size: 1.6901136632441194e-06 | |
Epoch: 1, Step: 158, Rank: 5, loss = 0.8857870101928711 | |
Epoch: 1, Step: 158, Rank: 4, loss = 0.870132565498352Epoch: 1, Step: 158, Rank: 3, loss = 0.6569619178771973Epoch: 1, Step: 158, Rank: 7, loss = 0.6290775537490845Epoch: 1, Step: 158, Rank: 2, loss = 0.7610032558441162 | |
Epoch: 1, Step: 158, Rank: 0, loss = 0.005441743414849043 | |
Epoch: 1, Step: 158, Rank: 1, loss = 0.7811556458473206 | |
Per-token loss scaled by world size: 0.0003746829752344638 | |
Epoch: 1, Step: 158, Rank: 6, loss = 1.2063854932785034 | |
total tokens: 7932 num samples: 3 num padding tokens: 1113 - rank: 1 max len: 2644 min len: 1935 avg len: 2273.0 num_loss_counted_tokens: 430 | |
total tokens: 7551 num samples: 9 num padding tokens: 529 - rank: 4 max len: 839 min len: 726 avg len: 780.2222222222222 num_loss_counted_tokens: 3403 | |
{ | |
"epoch": 1, | |
"step": 158, | |
"rank": 0, | |
"loss": 0.005441743414849043, | |
"overall_throughput": 42.29815333288017, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.366318225860596, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25758, | |
"batch_size": 93, | |
"total_loss": 0.724493145942688, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:03.213265" | |
} | |
total tokens: 7185 num samples: 5 num padding tokens: 924 - rank: 2 max len: 1437 min len: 1091 avg len: 1252.2 num_loss_counted_tokens: 3371 | |
total tokens: 6068 num samples: 2 num padding tokens: 230 - rank: 0 max len: 3034 min len: 2804 avg len: 2919.0 num_loss_counted_tokens: 189 | |
total tokens: 7942 num samples: 11 num padding tokens: 1172 - rank: 5 max len: 722 min len: 546 avg len: 615.4545454545455 num_loss_counted_tokens: 3713 | |
total tokens: 6292 num samples: 22 num padding tokens: 2266 - rank: 7 max len: 286 min len: 80 avg len: 183.0 num_loss_counted_tokens: 1812 | |
total tokens: 7856 num samples: 16 num padding tokens: 1583 - rank: 6 max len: 491 min len: 299 avg len: 392.0625 num_loss_counted_tokens: 3733 | |
total tokens: 7308 num samples: 7 num padding tokens: 647 - rank: 3 max len: 1044 min len: 844 avg len: 951.5714285714286 num_loss_counted_tokens: 4260 | |
Per-token loss scaled by world size: 0.0002086303138639778Per-token loss scaled by world size: 0.00018750393064692616Per-token loss scaled by world size: 0.00023788934049662203Per-token loss scaled by world size: 0.00018921871378552169Per-token loss scaled by world size: 0.00015611379058100283 | |
Per-token loss scaled by world size: 0.00034976963070221245 | |
Per-token loss scaled by world size: 2.962535518236109e-06 | |
Epoch: 1, Step: 159, Rank: 6, loss = 0.6299428939819336Epoch: 1, Step: 159, Rank: 2, loss = 0.7992189526557922Epoch: 1, Step: 159, Rank: 5, loss = 1.1750948429107666 | |
Epoch: 1, Step: 159, Rank: 1, loss = 0.5244837999343872 | |
Epoch: 1, Step: 159, Rank: 7, loss = 0.6357039213180542 | |
Epoch: 1, Step: 159, Rank: 4, loss = 0.7009196281433105 | |
Epoch: 1, Step: 159, Rank: 0, loss = 0.009953008033335209 | |
Per-token loss scaled by world size: 0.0001541711390018463 | |
Epoch: 1, Step: 159, Rank: 3, loss = 0.5179572105407715 | |
total tokens: 7248 num samples: 3 num padding tokens: 1157 - rank: 1 max len: 2416 min len: 1450 avg len: 2030.3333333333333 num_loss_counted_tokens: 491 | |
total tokens: 7848 num samples: 9 num padding tokens: 544 - rank: 4 max len: 872 min len: 752 avg len: 811.5555555555555 num_loss_counted_tokens: 5085 | |
{ | |
"epoch": 1, | |
"step": 159, | |
"rank": 0, | |
"loss": 0.009953008033335209, | |
"overall_throughput": 41.96663008841316, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.325264930725098, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26877, | |
"batch_size": 82, | |
"total_loss": 0.6241592168807983, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:05.729512" | |
} | |
total tokens: 7830 num samples: 18 num padding tokens: 1959 - rank: 6 max len: 435 min len: 237 avg len: 326.1666666666667 num_loss_counted_tokens: 3267 | |
total tokens: 8118 num samples: 11 num padding tokens: 1074 - rank: 5 max len: 738 min len: 541 avg len: 640.3636363636364 num_loss_counted_tokens: 4228 | |
total tokens: 7938 num samples: 7 num padding tokens: 978 - rank: 3 max len: 1134 min len: 898 avg len: 994.2857142857143 num_loss_counted_tokens: 3454 | |
total tokens: 7010 num samples: 5 num padding tokens: 659 - rank: 2 max len: 1402 min len: 1181 avg len: 1270.2 num_loss_counted_tokens: 5158 | |
total tokens: 6944 num samples: 31 num padding tokens: 2362 - rank: 7 max len: 224 min len: 74 avg len: 147.80645161290323 num_loss_counted_tokens: 1787 | |
total tokens: 6818 num samples: 2 num padding tokens: 131 - rank: 0 max len: 3409 min len: 3278 avg len: 3343.5 num_loss_counted_tokens: 203 | |
Per-token loss scaled by world size: 0.00015841875574551523Per-token loss scaled by world size: 0.00011544318113010377Per-token loss scaled by world size: 0.00030993111431598663Per-token loss scaled by world size: 0.00033960427390411496Per-token loss scaled by world size: 0.0002961684949696064Per-token loss scaled by world size: 0.0003685772535391152 | |
Epoch: 1, Step: 160, Rank: 5, loss = 0.9887577295303345 | |
Epoch: 1, Step: 160, Rank: 3, loss = 1.0834225416183472Epoch: 1, Step: 160, Rank: 6, loss = 0.9448515772819519 | |
Epoch: 1, Step: 160, Rank: 4, loss = 1.1758536100387573Epoch: 1, Step: 160, Rank: 1, loss = 0.36829259991645813Epoch: 1, Step: 160, Rank: 2, loss = 0.5053954124450684 | |
Per-token loss scaled by world size: 5.767856782767922e-05 | |
Epoch: 1, Step: 160, Rank: 7, loss = 0.18400904536247253 | |
Per-token loss scaled by world size: 0.00019863103807438165 | |
Epoch: 1, Step: 160, Rank: 0, loss = 0.6336826682090759 | |
total tokens: 7389 num samples: 9 num padding tokens: 610 - rank: 4 max len: 821 min len: 661 avg len: 753.2222222222222 num_loss_counted_tokens: 4659 | |
total tokens: 7668 num samples: 4 num padding tokens: 927 - rank: 1 max len: 1917 min len: 1314 avg len: 1685.25 num_loss_counted_tokens: 682 | |
total tokens: 7945 num samples: 35 num padding tokens: 3133 - rank: 7 max len: 227 min len: 79 avg len: 137.4857142857143 num_loss_counted_tokens: 1541 | |
{ | |
"epoch": 1, | |
"step": 160, | |
"rank": 0, | |
"loss": 0.6336826682090759, | |
"overall_throughput": 41.55102604571352, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.496148586273193, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25522, | |
"batch_size": 69, | |
"total_loss": 0.7355331778526306, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:08.279444" | |
} | |
total tokens: 7800 num samples: 12 num padding tokens: 1127 - rank: 5 max len: 650 min len: 421 avg len: 556.0833333333334 num_loss_counted_tokens: 4039 | |
total tokens: 7800 num samples: 6 num padding tokens: 1207 - rank: 2 max len: 1300 min len: 984 avg len: 1098.8333333333333 num_loss_counted_tokens: 3151 | |
total tokens: 8020 num samples: 20 num padding tokens: 2104 - rank: 6 max len: 401 min len: 238 avg len: 295.8 num_loss_counted_tokens: 3264 | |
total tokens: 7640 num samples: 8 num padding tokens: 396 - rank: 3 max len: 955 min len: 852 avg len: 905.5 num_loss_counted_tokens: 6041 | |
total tokens: 5680 num samples: 2 num padding tokens: 585 - rank: 0 max len: 2840 min len: 2255 avg len: 2547.5 num_loss_counted_tokens: 205 | |
Per-token loss scaled by world size: 5.620659976557363e-06Per-token loss scaled by world size: 0.00039823996485210955Per-token loss scaled by world size: 0.0004752624372486025Per-token loss scaled by world size: 0.00014614466635975987Per-token loss scaled by world size: 0.0004995565977878869 | |
Per-token loss scaled by world size: 0.00032942448160611093Per-token loss scaled by world size: 0.0004525336844380945 | |
Epoch: 1, Step: 161, Rank: 5, loss = 1.1807301044464111 | |
Epoch: 1, Step: 161, Rank: 2, loss = 0.9893774390220642Epoch: 1, Step: 161, Rank: 6, loss = 1.2410858869552612 | |
Epoch: 1, Step: 161, Rank: 3, loss = 0.36307814717292786 | |
Epoch: 1, Step: 161, Rank: 1, loss = 0.013963826932013035 | |
Epoch: 1, Step: 161, Rank: 4, loss = 0.8184139132499695 | |
Epoch: 1, Step: 161, Rank: 7, loss = 1.1242634057998657 | |
Per-token loss scaled by world size: 9.448503078601789e-06 | |
Epoch: 1, Step: 161, Rank: 0, loss = 0.023473624140024185 | |
total tokens: 7380 num samples: 3 num padding tokens: 869 - rank: 1 max len: 2460 min len: 1846 avg len: 2170.3333333333335 num_loss_counted_tokens: 394 | |
total tokens: 7407 num samples: 9 num padding tokens: 1144 - rank: 4 max len: 823 min len: 580 avg len: 695.8888888888889 num_loss_counted_tokens: 4039 | |
{ | |
"epoch": 1, | |
"step": 161, | |
"rank": 0, | |
"loss": 0.023473624140024185, | |
"overall_throughput": 40.61008048978514, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.50694465637207, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19875, | |
"batch_size": 70, | |
"total_loss": 0.7192983031272888, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:10.880154" | |
} | |
total tokens: 8008 num samples: 14 num padding tokens: 808 - rank: 5 max len: 572 min len: 460 avg len: 514.2857142857143 num_loss_counted_tokens: 4103 | |
total tokens: 7992 num samples: 8 num padding tokens: 557 - rank: 3 max len: 999 min len: 868 avg len: 929.375 num_loss_counted_tokens: 6112 | |
total tokens: 7704 num samples: 18 num padding tokens: 1248 - rank: 6 max len: 428 min len: 273 avg len: 358.6666666666667 num_loss_counted_tokens: 3541 | |
total tokens: 8028 num samples: 2 num padding tokens: 1416 - rank: 0 max len: 4014 min len: 2598 avg len: 3306.0 num_loss_counted_tokens: 164 total tokens: 8076 num samples: 6 num padding tokens: 1073 - rank: 2 max len: 1346 min len: 1029 avg len: 1167.1666666666667 num_loss_counted_tokens: 4924 | |
total tokens: 6233 num samples: 23 num padding tokens: 2022 - rank: 7 max len: 271 min len: 80 avg len: 183.08695652173913 num_loss_counted_tokens: 1936 | |
Per-token loss scaled by world size: 0.0008293814607895911Per-token loss scaled by world size: 0.0006533037521876395Per-token loss scaled by world size: 0.0005620094598270953Per-token loss scaled by world size: 1.2225326827319805e-05Per-token loss scaled by world size: 5.4908236052142456e-05Per-token loss scaled by world size: 5.549823254114017e-05 | |
Per-token loss scaled by world size: 2.421064209556789e-06 | |
Epoch: 1, Step: 162, Rank: 5, loss = 1.6809488534927368Epoch: 1, Step: 162, Rank: 0, loss = 0.02477768063545227 | |
Epoch: 1, Step: 162, Rank: 4, loss = 1.1390526294708252 | |
Epoch: 1, Step: 162, Rank: 7, loss = 1.3240833282470703 | |
Epoch: 1, Step: 162, Rank: 2, loss = 0.11248104274272919 | |
Epoch: 1, Step: 162, Rank: 1, loss = 0.1112852692604065 | |
Epoch: 1, Step: 162, Rank: 3, loss = 0.004906891845166683 | |
Per-token loss scaled by world size: 0.0008481117547489703 | |
Epoch: 1, Step: 162, Rank: 6, loss = 1.7189104557037354 | |
total tokens: 7875 num samples: 3 num padding tokens: 1486 - rank: 1 max len: 2625 min len: 1737 avg len: 2129.6666666666665 num_loss_counted_tokens: 747 | |
total tokens: 7448 num samples: 8 num padding tokens: 1221 - rank: 4 max len: 931 min len: 707 avg len: 778.375 num_loss_counted_tokens: 2823 | |
total tokens: 6572 num samples: 4 num padding tokens: 264 - rank: 2 max len: 1643 min len: 1485 avg len: 1577.0 num_loss_counted_tokens: 3122 | |
{ | |
"epoch": 1, | |
"step": 162, | |
"rank": 0, | |
"loss": 0.02477768063545227, | |
"overall_throughput": 42.2840773469095, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.426692962646484, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 16214, | |
"batch_size": 68, | |
"total_loss": 0.7645557522773743, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:13.378530" | |
} | |
total tokens: 7760 num samples: 16 num padding tokens: 1908 - rank: 6 max len: 485 min len: 279 avg len: 365.75 num_loss_counted_tokens: 3363 | |
total tokens: 7944 num samples: 6 num padding tokens: 1666 - rank: 3 max len: 1324 min len: 934 avg len: 1046.3333333333333 num_loss_counted_tokens: 2604 | |
total tokens: 7590 num samples: 11 num padding tokens: 1066 - rank: 5 max len: 690 min len: 492 avg len: 593.0909090909091 num_loss_counted_tokens: 3430 | |
total tokens: 7830 num samples: 29 num padding tokens: 2293 - rank: 7 max len: 270 min len: 77 avg len: 190.93103448275863 num_loss_counted_tokens: 2270 | |
total tokens: 6446 num samples: 2 num padding tokens: 435 - rank: 0 max len: 3223 min len: 2788 avg len: 3005.5 num_loss_counted_tokens: 179 | |
Per-token loss scaled by world size: 0.00018602880300022662Per-token loss scaled by world size: 0.000572515360545367Per-token loss scaled by world size: 0.0006904263282194734Per-token loss scaled by world size: 0.0004460025520529598 | |
Per-token loss scaled by world size: 0.0004231746424920857 | |
Per-token loss scaled by world size: 5.620245701720705e-06 | |
Per-token loss scaled by world size: 8.813981935418269e-07 | |
Epoch: 1, Step: 163, Rank: 6, loss = 1.2291189432144165 | |
Epoch: 1, Step: 163, Rank: 4, loss = 1.4822590351104736Epoch: 1, Step: 163, Rank: 2, loss = 0.3993805944919586 | |
Epoch: 1, Step: 163, Rank: 0, loss = 0.012065964750945568Epoch: 1, Step: 163, Rank: 3, loss = 0.9575117230415344 | |
Epoch: 1, Step: 163, Rank: 7, loss = 0.9085030555725098 | |
Epoch: 1, Step: 163, Rank: 1, loss = 0.0018922517774626613 | |
Per-token loss scaled by world size: 0.0005087603931315243 | |
Epoch: 1, Step: 163, Rank: 5, loss = 1.0922449827194214 | |
total tokens: 7695 num samples: 9 num padding tokens: 710 - rank: 4 max len: 855 min len: 712 avg len: 776.1111111111111 num_loss_counted_tokens: 5130 | |
total tokens: 7552 num samples: 4 num padding tokens: 1186 - rank: 1 max len: 1888 min len: 1444 avg len: 1591.5 num_loss_counted_tokens: 754 | |
{ | |
"epoch": 1, | |
"step": 163, | |
"rank": 0, | |
"loss": 0.012065964750945568, | |
"overall_throughput": 42.679599605421174, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.32099151611328, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 17175, | |
"batch_size": 79, | |
"total_loss": 0.7603721022605896, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:15.854736" | |
} | |
total tokens: 6895 num samples: 5 num padding tokens: 862 - rank: 2 max len: 1379 min len: 1055 avg len: 1206.6 num_loss_counted_tokens: 1706 | |
total tokens: 7287 num samples: 7 num padding tokens: 458 - rank: 3 max len: 1041 min len: 894 avg len: 975.5714285714286 num_loss_counted_tokens: 4839 | |
total tokens: 6182 num samples: 22 num padding tokens: 2681 - rank: 7 max len: 281 min len: 75 avg len: 159.13636363636363 num_loss_counted_tokens: 1402 | |
total tokens: 6306 num samples: 2 num padding tokens: 367 - rank: 0 max len: 3153 min len: 2786 avg len: 2969.5 num_loss_counted_tokens: 481 | |
total tokens: 8016 num samples: 16 num padding tokens: 1294 - rank: 6 max len: 501 min len: 281 avg len: 420.125 num_loss_counted_tokens: 3734 | |
total tokens: 7799 num samples: 11 num padding tokens: 922 - rank: 5 max len: 709 min len: 521 avg len: 625.1818181818181 num_loss_counted_tokens: 3657 | |
Per-token loss scaled by world size: 0.000572259072214365Per-token loss scaled by world size: 0.0008030128665268421Per-token loss scaled by world size: 0.0005713719874620438Per-token loss scaled by world size: 0.0008417390054091811 | |
Per-token loss scaled by world size: 4.514108695730101e-06Per-token loss scaled by world size: 1.0517849659663625e-05 | |
Per-token loss scaled by world size: 8.813677595753688e-06 | |
Epoch: 1, Step: 164, Rank: 5, loss = 1.8358327150344849Epoch: 1, Step: 164, Rank: 6, loss = 1.7513710260391235 | |
Epoch: 1, Step: 164, Rank: 7, loss = 1.2461622953414917 | |
Epoch: 1, Step: 164, Rank: 2, loss = 0.009845270775258541 | |
Epoch: 1, Step: 164, Rank: 4, loss = 1.2480970621109009Epoch: 1, Step: 164, Rank: 0, loss = 0.02293943054974079Epoch: 1, Step: 164, Rank: 1, loss = 0.019222630187869072 | |
Per-token loss scaled by world size: 0.0004583366389852017 | |
Epoch: 1, Step: 164, Rank: 3, loss = 0.9996321797370911 | |
total tokens: 7084 num samples: 4 num padding tokens: 676 - rank: 1 max len: 1771 min len: 1482 avg len: 1602.0 num_loss_counted_tokens: 552 | |
total tokens: 7983 num samples: 9 num padding tokens: 1125 - rank: 4 max len: 887 min len: 678 avg len: 762.0 num_loss_counted_tokens: 4997 | |
{ | |
"epoch": 1, | |
"step": 164, | |
"rank": 0, | |
"loss": 0.02293943054974079, | |
"overall_throughput": 41.52812214957308, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.29750394821167, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 17448, | |
"batch_size": 78, | |
"total_loss": 0.8916378021240234, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:18.404418" | |
} | |
total tokens: 7848 num samples: 12 num padding tokens: 795 - rank: 5 max len: 654 min len: 498 avg len: 587.75 num_loss_counted_tokens: 5391 | |
total tokens: 7010 num samples: 5 num padding tokens: 860 - rank: 2 max len: 1402 min len: 1134 avg len: 1230.0 num_loss_counted_tokens: 3080 | |
total tokens: 7448 num samples: 7 num padding tokens: 586 - rank: 3 max len: 1064 min len: 904 avg len: 980.2857142857143 num_loss_counted_tokens: 3922 | |
total tokens: 7560 num samples: 28 num padding tokens: 2574 - rank: 7 max len: 270 min len: 79 avg len: 178.07142857142858 num_loss_counted_tokens: 2339 | |
total tokens: 7263 num samples: 3 num padding tokens: 675 - rank: 0 max len: 2421 min len: 1797 avg len: 2196.0 num_loss_counted_tokens: 329 | |
total tokens: 8109 num samples: 17 num padding tokens: 1719 - rank: 6 max len: 477 min len: 271 avg len: 375.88235294117646 num_loss_counted_tokens: 3620 | |
Per-token loss scaled by world size: 0.0004442204663064331Per-token loss scaled by world size: 0.00018070742953568697Per-token loss scaled by world size: 0.0003504411142785102Per-token loss scaled by world size: 4.3835102587763686e-06 | |
Per-token loss scaled by world size: 0.00022767498739995062Per-token loss scaled by world size: 7.03313219219126e-07 | |
Per-token loss scaled by world size: 0.0002697974268812686 | |
Epoch: 1, Step: 165, Rank: 2, loss = 0.5169813632965088 | |
Epoch: 1, Step: 165, Rank: 5, loss = 1.2708592414855957 | |
Epoch: 1, Step: 165, Rank: 3, loss = 1.002568244934082 | |
Epoch: 1, Step: 165, Rank: 4, loss = 0.651349663734436 | |
Epoch: 1, Step: 165, Rank: 7, loss = 0.7718567252159119 | |
Epoch: 1, Step: 165, Rank: 1, loss = 0.012540674768388271 | |
Epoch: 1, Step: 165, Rank: 0, loss = 0.0020120912231504917 | |
Per-token loss scaled by world size: 0.00034711475018411875 | |
Epoch: 1, Step: 165, Rank: 6, loss = 0.9930519461631775 | |
total tokens: 8085 num samples: 11 num padding tokens: 750 - rank: 4 max len: 735 min len: 600 avg len: 666.8181818181819 num_loss_counted_tokens: 3756 | |
total tokens: 7432 num samples: 4 num padding tokens: 599 - rank: 1 max len: 1858 min len: 1560 avg len: 1708.25 num_loss_counted_tokens: 1691 | |
{ | |
"epoch": 1, | |
"step": 165, | |
"rank": 0, | |
"loss": 0.0020120912231504917, | |
"overall_throughput": 42.759868613235994, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.2490234375, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22887, | |
"batch_size": 75, | |
"total_loss": 0.6526525020599365, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:20.878553" | |
} | |
total tokens: 2249 num samples: 13 num padding tokens: 686 - rank: 7 max len: 173 min len: 79 avg len: 120.23076923076923 num_loss_counted_tokens: 568 | |
total tokens: 7657 num samples: 13 num padding tokens: 1363 - rank: 5 max len: 589 min len: 381 avg len: 484.15384615384613 num_loss_counted_tokens: 3837 | |
total tokens: 7615 num samples: 5 num padding tokens: 1314 - rank: 2 max len: 1523 min len: 1058 avg len: 1260.2 num_loss_counted_tokens: 3900 | |
total tokens: 7980 num samples: 21 num padding tokens: 2329 - rank: 6 max len: 380 min len: 178 avg len: 269.0952380952381 num_loss_counted_tokens: 2730 | |
total tokens: 8040 num samples: 8 num padding tokens: 983 - rank: 3 max len: 1005 min len: 762 avg len: 882.125 num_loss_counted_tokens: 5456 | |
total tokens: 8061 num samples: 3 num padding tokens: 594 - rank: 0 max len: 2687 min len: 2311 avg len: 2489.0 num_loss_counted_tokens: 259 | |
Per-token loss scaled by world size: 0.0003275613998994231Per-token loss scaled by world size: 0.00014095827646087855Per-token loss scaled by world size: 0.0002651048998814076Per-token loss scaled by world size: 0.00036449063918553293Per-token loss scaled by world size: 0.00021203258074820042Per-token loss scaled by world size: 0.0002020968240685761 | |
Per-token loss scaled by world size: 1.8056784938380588e-06 | |
Epoch: 1, Step: 166, Rank: 1, loss = 0.4387502372264862 | |
Epoch: 1, Step: 166, Rank: 5, loss = 0.659977912902832 | |
Epoch: 1, Step: 166, Rank: 6, loss = 1.019575834274292 | |
Epoch: 1, Step: 166, Rank: 0, loss = 0.005620399955660105 | |
Epoch: 1, Step: 166, Rank: 7, loss = 0.8251721262931824Epoch: 1, Step: 166, Rank: 3, loss = 1.1345226764678955Epoch: 1, Step: 166, Rank: 4, loss = 0.6290516257286072 | |
Per-token loss scaled by world size: 0.0001444466906832531 | |
Epoch: 1, Step: 166, Rank: 2, loss = 0.44960838556289673 | |
total tokens: 5966 num samples: 2 num padding tokens: 617 - rank: 1 max len: 2983 min len: 2366 avg len: 2674.5 num_loss_counted_tokens: 241 | |
total tokens: 7798 num samples: 7 num padding tokens: 1377 - rank: 4 max len: 1114 min len: 740 avg len: 917.2857142857143 num_loss_counted_tokens: 4598 | |
{ | |
"epoch": 1, | |
"step": 166, | |
"rank": 0, | |
"loss": 0.005620399955660105, | |
"overall_throughput": 41.7691118119109, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.322949409484863, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24901, | |
"batch_size": 68, | |
"total_loss": 0.64528489112854, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:23.412572" | |
} | |
total tokens: 8100 num samples: 12 num padding tokens: 1017 - rank: 5 max len: 675 min len: 510 avg len: 590.25 num_loss_counted_tokens: 5149 | |
total tokens: 7824 num samples: 4 num padding tokens: 1097 - rank: 3 max len: 1956 min len: 1341 avg len: 1681.75 num_loss_counted_tokens: 3450 | |
total tokens: 8000 num samples: 16 num padding tokens: 2121 - rank: 6 max len: 500 min len: 290 avg len: 367.4375 num_loss_counted_tokens: 3595 | |
total tokens: 6925 num samples: 25 num padding tokens: 2774 - rank: 7 max len: 277 min len: 77 avg len: 166.04 num_loss_counted_tokens: 1782 | |
total tokens: 6918 num samples: 3 num padding tokens: 285 - rank: 2 max len: 2306 min len: 2144 avg len: 2211.0 num_loss_counted_tokens: 208 | |
total tokens: 6292 num samples: 2 num padding tokens: 131 - rank: 0 max len: 3146 min len: 3015 avg len: 3080.5 num_loss_counted_tokens: 174 | |
Per-token loss scaled by world size: 0.0004856240702793002Per-token loss scaled by world size: 0.0003434315149206668Per-token loss scaled by world size: 3.41317463607993e-05Per-token loss scaled by world size: 3.1640320230508223e-06Per-token loss scaled by world size: 1.4528293377225054e-06 | |
Per-token loss scaled by world size: 0.0005926437443122268 | |
Per-token loss scaled by world size: 0.0002455389767419547 | |
Epoch: 1, Step: 167, Rank: 5, loss = 1.262865424156189Epoch: 1, Step: 167, Rank: 2, loss = 0.08875960856676102 | |
Epoch: 1, Step: 167, Rank: 6, loss = 0.8930936455726624 | |
Epoch: 1, Step: 167, Rank: 0, loss = 0.0037780827842652798 | |
Epoch: 1, Step: 167, Rank: 4, loss = 1.5411700010299683Epoch: 1, Step: 167, Rank: 1, loss = 0.008228065446019173 | |
Epoch: 1, Step: 167, Rank: 7, loss = 0.6385241150856018 | |
Per-token loss scaled by world size: 0.0002826468553394079 | |
Epoch: 1, Step: 167, Rank: 3, loss = 0.7350231409072876 | |
total tokens: 7464 num samples: 4 num padding tokens: 501 - rank: 1 max len: 1866 min len: 1478 avg len: 1740.75 num_loss_counted_tokens: 1361 | |
total tokens: 7820 num samples: 10 num padding tokens: 879 - rank: 4 max len: 782 min len: 657 avg len: 694.1 num_loss_counted_tokens: 4357 | |
{ | |
"epoch": 1, | |
"step": 167, | |
"rank": 0, | |
"loss": 0.0037780827842652798, | |
"overall_throughput": 41.38714694400821, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.380234718322754, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20804, | |
"batch_size": 85, | |
"total_loss": 0.6464303731918335, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:25.970591" | |
} | |
total tokens: 6966 num samples: 27 num padding tokens: 2143 - rank: 7 max len: 258 min len: 77 avg len: 178.62962962962962 num_loss_counted_tokens: 1881 | |
total tokens: 8032 num samples: 8 num padding tokens: 946 - rank: 3 max len: 1004 min len: 790 avg len: 885.75 num_loss_counted_tokens: 3467 | |
total tokens: 7410 num samples: 6 num padding tokens: 411 - rank: 2 max len: 1235 min len: 1054 avg len: 1166.5 num_loss_counted_tokens: 3794 | |
total tokens: 7668 num samples: 12 num padding tokens: 1179 - rank: 5 max len: 639 min len: 424 avg len: 540.75 num_loss_counted_tokens: 2919 | |
total tokens: 8037 num samples: 19 num padding tokens: 1731 - rank: 6 max len: 423 min len: 269 avg len: 331.89473684210526 num_loss_counted_tokens: 3562 | |
total tokens: 6488 num samples: 2 num padding tokens: 894 - rank: 0 max len: 3244 min len: 2350 avg len: 2797.0 num_loss_counted_tokens: 206 | |
Per-token loss scaled by world size: 0.00039238386671058834Per-token loss scaled by world size: 0.0004944170941598713Per-token loss scaled by world size: 5.2391669669304974e-06Per-token loss scaled by world size: 0.00014728681708220392Per-token loss scaled by world size: 0.0004893930163234472Per-token loss scaled by world size: 3.5653782106237486e-05 | |
Per-token loss scaled by world size: 0.0004791621759068221 | |
Epoch: 1, Step: 168, Rank: 0, loss = 0.012632940895855427Epoch: 1, Step: 168, Rank: 6, loss = 1.180048942565918 | |
Epoch: 1, Step: 168, Rank: 5, loss = 1.1921632289886475Epoch: 1, Step: 168, Rank: 2, loss = 0.35514533519744873Epoch: 1, Step: 168, Rank: 4, loss = 0.9461355805397034 | |
Epoch: 1, Step: 168, Rank: 1, loss = 0.08597017824649811 | |
Epoch: 1, Step: 168, Rank: 7, loss = 1.1553797721862793 | |
Per-token loss scaled by world size: 0.00023344735382124782 | |
Epoch: 1, Step: 168, Rank: 3, loss = 0.5628999471664429 | |
total tokens: 7931 num samples: 11 num padding tokens: 374 - rank: 4 max len: 721 min len: 645 avg len: 687.0 num_loss_counted_tokens: 3685 | |
total tokens: 6885 num samples: 3 num padding tokens: 1645 - rank: 1 max len: 2295 min len: 1266 avg len: 1746.6666666666667 num_loss_counted_tokens: 678 | |
{ | |
"epoch": 1, | |
"step": 168, | |
"rank": 0, | |
"loss": 0.012632940895855427, | |
"overall_throughput": 41.535125133568044, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.440462589263916, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19290, | |
"batch_size": 79, | |
"total_loss": 0.6862969994544983, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:28.521157" | |
} | |
total tokens: 8112 num samples: 16 num padding tokens: 1692 - rank: 6 max len: 507 min len: 284 avg len: 401.25 num_loss_counted_tokens: 3636 | |
total tokens: 7821 num samples: 9 num padding tokens: 753 - rank: 3 max len: 869 min len: 722 avg len: 785.3333333333334 num_loss_counted_tokens: 5472 | |
total tokens: 7440 num samples: 6 num padding tokens: 983 - rank: 2 max len: 1240 min len: 874 avg len: 1076.1666666666667 num_loss_counted_tokens: 3013 | |
total tokens: 7728 num samples: 12 num padding tokens: 813 - rank: 5 max len: 644 min len: 530 avg len: 576.25 num_loss_counted_tokens: 4677 | |
total tokens: 7306 num samples: 26 num padding tokens: 2436 - rank: 7 max len: 281 min len: 81 avg len: 187.30769230769232 num_loss_counted_tokens: 2351 | |
total tokens: 7062 num samples: 2 num padding tokens: 653 - rank: 0 max len: 3531 min len: 2878 avg len: 3204.5 num_loss_counted_tokens: 193 | |
Per-token loss scaled by world size: 0.00023193543893285096Per-token loss scaled by world size: 0.00030478413100354373Per-token loss scaled by world size: 0.00034480085014365613 | |
Per-token loss scaled by world size: 4.721171990240691e-06Per-token loss scaled by world size: 4.196311692794552e-06 | |
Per-token loss scaled by world size: 0.00038670990034006536 | |
Epoch: 1, Step: 169, Rank: 3, loss = 0.8575863242149353 | |
Per-token loss scaled by world size: 8.75471014296636e-05Epoch: 1, Step: 169, Rank: 6, loss = 0.9701833724975586 | |
Epoch: 1, Step: 169, Rank: 0, loss = 0.013284197077155113Epoch: 1, Step: 169, Rank: 2, loss = 0.652608335018158 | |
Epoch: 1, Step: 169, Rank: 1, loss = 0.011807371862232685 | |
Epoch: 1, Step: 169, Rank: 4, loss = 1.0881049633026123 | |
Per-token loss scaled by world size: 0.0006382779683917761 | |
Epoch: 1, Step: 169, Rank: 7, loss = 0.24633565545082092 | |
Epoch: 1, Step: 169, Rank: 5, loss = 1.7959545850753784 | |
total tokens: 6174 num samples: 3 num padding tokens: 245 - rank: 1 max len: 2058 min len: 1856 avg len: 1976.3333333333333 num_loss_counted_tokens: 674 | |
total tokens: 7389 num samples: 9 num padding tokens: 742 - rank: 4 max len: 821 min len: 683 avg len: 738.5555555555555 num_loss_counted_tokens: 5047 | |
total tokens: 7320 num samples: 30 num padding tokens: 2107 - rank: 7 max len: 244 min len: 91 avg len: 173.76666666666668 num_loss_counted_tokens: 2225 | |
{ | |
"epoch": 1, | |
"step": 169, | |
"rank": 0, | |
"loss": 0.013284197077155113, | |
"overall_throughput": 41.77523610688796, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.364055633544922, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22510, | |
"batch_size": 81, | |
"total_loss": 0.7044830918312073, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:31.054810" | |
} | |
total tokens: 7602 num samples: 7 num padding tokens: 811 - rank: 3 max len: 1086 min len: 880 avg len: 970.1428571428571 num_loss_counted_tokens: 5560 | |
total tokens: 8100 num samples: 12 num padding tokens: 1477 - rank: 5 max len: 675 min len: 438 avg len: 551.9166666666666 num_loss_counted_tokens: 3835 | |
total tokens: 7270 num samples: 5 num padding tokens: 1169 - rank: 2 max len: 1454 min len: 1115 avg len: 1220.2 num_loss_counted_tokens: 3611 | |
total tokens: 5712 num samples: 2 num padding tokens: 500 - rank: 0 max len: 2856 min len: 2356 avg len: 2606.0 num_loss_counted_tokens: 161 | |
total tokens: 7848 num samples: 18 num padding tokens: 1235 - rank: 6 max len: 436 min len: 265 avg len: 367.3888888888889 num_loss_counted_tokens: 4095 | |
Per-token loss scaled by world size: 0.00010392792319180444Per-token loss scaled by world size: 0.00014247662329580635Per-token loss scaled by world size: 0.00030479932320304215Per-token loss scaled by world size: 0.0002555457758717239Per-token loss scaled by world size: 0.00023876398336142302Per-token loss scaled by world size: 0.0003587114915717393 | |
Per-token loss scaled by world size: 0.00022289040498435497 | |
Epoch: 1, Step: 170, Rank: 6, loss = 0.8772567510604858 | |
Epoch: 1, Step: 170, Rank: 4, loss = 1.0463379621505737 | |
Epoch: 1, Step: 170, Rank: 2, loss = 0.4891044497489929 | |
Epoch: 1, Step: 170, Rank: 7, loss = 0.8196468949317932Epoch: 1, Step: 170, Rank: 3, loss = 0.7651548981666565 | |
Epoch: 1, Step: 170, Rank: 5, loss = 1.2314116954803467 | |
Epoch: 1, Step: 170, Rank: 1, loss = 0.3567715585231781 | |
Per-token loss scaled by world size: 2.3532686100224964e-05 | |
Epoch: 1, Step: 170, Rank: 0, loss = 0.08078476786613464 | |
total tokens: 7016 num samples: 4 num padding tokens: 1240 - rank: 1 max len: 1754 min len: 1179 avg len: 1444.0 num_loss_counted_tokens: 1903 | |
total tokens: 8085 num samples: 11 num padding tokens: 649 - rank: 4 max len: 735 min len: 609 avg len: 676.0 num_loss_counted_tokens: 4383 | |
{ | |
"epoch": 1, | |
"step": 170, | |
"rank": 0, | |
"loss": 0.08078476786613464, | |
"overall_throughput": 40.4277908497992, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.523925304412842, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 27463, | |
"batch_size": 84, | |
"total_loss": 0.7083086371421814, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:33.666058" | |
} | |
total tokens: 7812 num samples: 18 num padding tokens: 1829 - rank: 6 max len: 434 min len: 242 avg len: 332.3888888888889 num_loss_counted_tokens: 3261 | |
total tokens: 7917 num samples: 13 num padding tokens: 1050 - rank: 5 max len: 609 min len: 441 avg len: 528.2307692307693 num_loss_counted_tokens: 3856 | |
total tokens: 7644 num samples: 7 num padding tokens: 494 - rank: 2 max len: 1092 min len: 972 avg len: 1021.4285714285714 num_loss_counted_tokens: 4662 | |
total tokens: 7260 num samples: 33 num padding tokens: 2169 - rank: 7 max len: 220 min len: 78 avg len: 154.27272727272728 num_loss_counted_tokens: 1876 | |
total tokens: 7704 num samples: 8 num padding tokens: 1070 - rank: 3 max len: 963 min len: 747 avg len: 829.25 num_loss_counted_tokens: 5234 | |
total tokens: 6336 num samples: 3 num padding tokens: 247 - rank: 0 max len: 2112 min len: 1890 avg len: 2029.6666666666667 num_loss_counted_tokens: 2181 | |
Per-token loss scaled by world size: 0.0003628956328611821Per-token loss scaled by world size: 0.0002919238177128136Per-token loss scaled by world size: 0.0003382969880476594Per-token loss scaled by world size: 7.853787246858701e-05Per-token loss scaled by world size: 0.00014888570876792073Per-token loss scaled by world size: 0.00044355227146297693 | |
Per-token loss scaled by world size: 0.00015364577120635659 | |
Epoch: 1, Step: 171, Rank: 4, loss = 0.961414635181427Epoch: 1, Step: 171, Rank: 3, loss = 1.1141388416290283Epoch: 1, Step: 171, Rank: 1, loss = 0.49033647775650024 | |
Epoch: 1, Step: 171, Rank: 2, loss = 0.2586546540260315 | |
Epoch: 1, Step: 171, Rank: 5, loss = 1.4607839584350586Epoch: 1, Step: 171, Rank: 6, loss = 1.195151448249817 | |
Epoch: 1, Step: 171, Rank: 7, loss = 0.5060131549835205 | |
Per-token loss scaled by world size: 3.983392525697127e-05 | |
Epoch: 1, Step: 171, Rank: 0, loss = 0.1311880499124527 | |
total tokens: 7515 num samples: 9 num padding tokens: 869 - rank: 4 max len: 835 min len: 663 avg len: 738.4444444444445 num_loss_counted_tokens: 5088 | |
total tokens: 8040 num samples: 4 num padding tokens: 1032 - rank: 1 max len: 2010 min len: 1576 avg len: 1752.0 num_loss_counted_tokens: 2036 | |
{ | |
"epoch": 1, | |
"step": 171, | |
"rank": 0, | |
"loss": 0.1311880499124527, | |
"overall_throughput": 40.56471721668778, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.534343242645264, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26347, | |
"batch_size": 96, | |
"total_loss": 0.7647101283073425, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:36.272989" | |
} | |
total tokens: 7752 num samples: 17 num padding tokens: 1950 - rank: 6 max len: 456 min len: 279 avg len: 341.29411764705884 num_loss_counted_tokens: 3147 | |
total tokens: 7596 num samples: 12 num padding tokens: 817 - rank: 5 max len: 633 min len: 491 avg len: 564.9166666666666 num_loss_counted_tokens: 5545 | |
total tokens: 7479 num samples: 27 num padding tokens: 3306 - rank: 7 max len: 277 min len: 84 avg len: 154.55555555555554 num_loss_counted_tokens: 1626 total tokens: 8016 num samples: 8 num padding tokens: 490 - rank: 3 max len: 1002 min len: 848 avg len: 940.75 num_loss_counted_tokens: 5067 | |
total tokens: 5446 num samples: 2 num padding tokens: 669 - rank: 0 max len: 2723 min len: 2054 avg len: 2388.5 num_loss_counted_tokens: 841 | |
total tokens: 6835 num samples: 5 num padding tokens: 1245 - rank: 2 max len: 1367 min len: 1005 avg len: 1118.0 num_loss_counted_tokens: 616 | |
Per-token loss scaled by world size: 0.00026996861561201513Per-token loss scaled by world size: 8.522550342604518e-05Per-token loss scaled by world size: 0.00032177582033909857Per-token loss scaled by world size: 9.002388833323494e-05Per-token loss scaled by world size: 9.29309317143634e-05Per-token loss scaled by world size: 0.00024938900605775416 | |
Per-token loss scaled by world size: 0.00014789693523198366 | |
Epoch: 1, Step: 172, Rank: 0, loss = 0.3234558403491974Epoch: 1, Step: 172, Rank: 4, loss = 1.1561405658721924 | |
Epoch: 1, Step: 172, Rank: 6, loss = 0.9699972867965698Epoch: 1, Step: 172, Rank: 2, loss = 0.30621522665023804Epoch: 1, Step: 172, Rank: 1, loss = 0.3339008390903473Epoch: 1, Step: 172, Rank: 5, loss = 0.896054744720459 | |
Epoch: 1, Step: 172, Rank: 7, loss = 0.5313937067985535 | |
Per-token loss scaled by world size: 0.00018880210700444877 | |
Epoch: 1, Step: 172, Rank: 3, loss = 0.67836594581604 | |
total tokens: 6222 num samples: 3 num padding tokens: 850 - rank: 1 max len: 2074 min len: 1514 avg len: 1790.6666666666667 num_loss_counted_tokens: 599 | |
total tokens: 7579 num samples: 11 num padding tokens: 750 - rank: 4 max len: 689 min len: 580 avg len: 620.8181818181819 num_loss_counted_tokens: 3571 | |
{ | |
"epoch": 1, | |
"step": 172, | |
"rank": 0, | |
"loss": 0.3234558403491974, | |
"overall_throughput": 41.46099612058457, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.491368293762207, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 28744, | |
"batch_size": 104, | |
"total_loss": 0.6494404673576355, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:38.822555" | |
} | |
total tokens: 7328 num samples: 32 num padding tokens: 2250 - rank: 7 max len: 229 min len: 81 avg len: 158.6875 num_loss_counted_tokens: 2126 | |
total tokens: 7920 num samples: 10 num padding tokens: 439 - rank: 3 max len: 792 min len: 702 avg len: 748.1 num_loss_counted_tokens: 6006 | |
total tokens: 7617 num samples: 3 num padding tokens: 480 - rank: 0 max len: 2539 min len: 2078 avg len: 2379.0 num_loss_counted_tokens: 282 | |
total tokens: 7494 num samples: 6 num padding tokens: 1464 - rank: 2 max len: 1249 min len: 825 avg len: 1005.0 num_loss_counted_tokens: 3339 | |
total tokens: 7938 num samples: 14 num padding tokens: 1254 - rank: 5 max len: 567 min len: 424 avg len: 477.42857142857144 num_loss_counted_tokens: 4110 | |
total tokens: 8018 num samples: 19 num padding tokens: 1923 - rank: 6 max len: 422 min len: 231 avg len: 320.7894736842105 num_loss_counted_tokens: 3186 | |
Per-token loss scaled by world size: 0.0002713052381295711Per-token loss scaled by world size: 0.00035623108851723373Per-token loss scaled by world size: 0.00047955545596778393Per-token loss scaled by world size: 0.0002560637367423624Per-token loss scaled by world size: 3.385763557162136e-05Per-token loss scaled by world size: 5.4830157750984654e-05 | |
Per-token loss scaled by world size: 2.089197550958488e-06 | |
Epoch: 1, Step: 173, Rank: 3, loss = 0.766302764415741Epoch: 1, Step: 173, Rank: 2, loss = 0.101323202252388Epoch: 1, Step: 173, Rank: 5, loss = 1.4351296424865723 | |
Epoch: 1, Step: 173, Rank: 4, loss = 1.066066026687622 | |
Epoch: 1, Step: 173, Rank: 1, loss = 0.16408610343933105Epoch: 1, Step: 173, Rank: 7, loss = 0.8119148015975952 | |
Epoch: 1, Step: 173, Rank: 0, loss = 0.006252184975892305 | |
Per-token loss scaled by world size: 0.00036022928543388844 | |
Epoch: 1, Step: 173, Rank: 6, loss = 1.0780311822891235 | |
total tokens: 7947 num samples: 9 num padding tokens: 497 - rank: 4 max len: 883 min len: 762 avg len: 827.7777777777778 num_loss_counted_tokens: 6018 | |
total tokens: 6960 num samples: 3 num padding tokens: 675 - rank: 1 max len: 2320 min len: 1916 avg len: 2095.0 num_loss_counted_tokens: 1117 | |
{ | |
"epoch": 1, | |
"step": 173, | |
"rank": 0, | |
"loss": 0.006252184975892305, | |
"overall_throughput": 42.234314636277595, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.221298694610596, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23941, | |
"batch_size": 80, | |
"total_loss": 0.678638219833374, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:41.355552" | |
} | |
total tokens: 6350 num samples: 2 num padding tokens: 380 - rank: 0 max len: 3175 min len: 2795 avg len: 2985.0 num_loss_counted_tokens: 1089 | |
total tokens: 6624 num samples: 4 num padding tokens: 786 - rank: 2 max len: 1656 min len: 1233 avg len: 1459.5 num_loss_counted_tokens: 2379 | |
total tokens: 8041 num samples: 11 num padding tokens: 1409 - rank: 5 max len: 731 min len: 520 avg len: 602.9090909090909 num_loss_counted_tokens: 3830 | |
total tokens: 8064 num samples: 16 num padding tokens: 2063 - rank: 6 max len: 504 min len: 251 avg len: 375.0625 num_loss_counted_tokens: 3722 | |
total tokens: 7936 num samples: 32 num padding tokens: 2626 - rank: 7 max len: 248 min len: 74 avg len: 165.9375 num_loss_counted_tokens: 2201 | |
total tokens: 7314 num samples: 6 num padding tokens: 1159 - rank: 3 max len: 1219 min len: 923 avg len: 1025.8333333333333 num_loss_counted_tokens: 4532 | |
Per-token loss scaled by world size: 0.0002848200674634427Per-token loss scaled by world size: 0.0003533354902174324Per-token loss scaled by world size: 0.00015056866686791182 | |
Per-token loss scaled by world size: 0.00036068688496015966Per-token loss scaled by world size: 0.0002968825865536928 | |
Per-token loss scaled by world size: 0.000274753401754424 | |
Per-token loss scaled by world size: 2.8490408112702426e-06 | |
Epoch: 1, Step: 174, Rank: 6, loss = 1.022597074508667 | |
Epoch: 1, Step: 174, Rank: 2, loss = 0.4357645511627197 | |
Epoch: 1, Step: 174, Rank: 4, loss = 0.8243048787117004 | |
Epoch: 1, Step: 174, Rank: 5, loss = 1.0438729524612427 | |
Epoch: 1, Step: 174, Rank: 7, loss = 0.8592153191566467 | |
Epoch: 1, Step: 174, Rank: 1, loss = 0.7951706647872925 | |
Epoch: 1, Step: 174, Rank: 0, loss = 0.008245480246841908 | |
Per-token loss scaled by world size: 0.0002512831415515393 | |
Epoch: 1, Step: 174, Rank: 3, loss = 0.7272448539733887 | |
total tokens: 7217 num samples: 7 num padding tokens: 815 - rank: 4 max len: 1031 min len: 839 avg len: 914.5714285714286 num_loss_counted_tokens: 4377 | |
total tokens: 5458 num samples: 2 num padding tokens: 51 - rank: 1 max len: 2729 min len: 2678 avg len: 2703.5 num_loss_counted_tokens: 801 | |
{ | |
"epoch": 1, | |
"step": 174, | |
"rank": 0, | |
"loss": 0.008245480246841908, | |
"overall_throughput": 41.476173818456004, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.290608882904053, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23153, | |
"batch_size": 84, | |
"total_loss": 0.7145519852638245, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:43.868184" | |
} | |
total tokens: 8085 num samples: 15 num padding tokens: 1660 - rank: 6 max len: 539 min len: 289 avg len: 428.3333333333333 num_loss_counted_tokens: 4071 | |
total tokens: 6890 num samples: 5 num padding tokens: 1024 - rank: 3 max len: 1378 min len: 1085 avg len: 1173.2 num_loss_counted_tokens: 3068 | |
total tokens: 7680 num samples: 3 num padding tokens: 1620 - rank: 2 max len: 2560 min len: 1464 avg len: 2020.0 num_loss_counted_tokens: 1227 | |
total tokens: 7850 num samples: 10 num padding tokens: 1055 - rank: 5 max len: 785 min len: 541 avg len: 679.5 num_loss_counted_tokens: 4251 | |
total tokens: 7830 num samples: 29 num padding tokens: 3032 - rank: 7 max len: 270 min len: 79 avg len: 165.44827586206895 num_loss_counted_tokens: 2136 | |
total tokens: 6586 num samples: 2 num padding tokens: 49 - rank: 0 max len: 3293 min len: 3244 avg len: 3268.5 num_loss_counted_tokens: 217 | |
Per-token loss scaled by world size: 0.0003542072663549334Per-token loss scaled by world size: 0.00031207496067509055Per-token loss scaled by world size: 0.000552273471839726Per-token loss scaled by world size: 0.0003514452837407589 | |
Per-token loss scaled by world size: 8.925243264457094e-07Per-token loss scaled by world size: 2.603805114631541e-06 | |
Per-token loss scaled by world size: 0.00034913059789687395 | |
Epoch: 1, Step: 175, Rank: 4, loss = 0.9186340570449829Epoch: 1, Step: 175, Rank: 6, loss = 1.4435738325119019Epoch: 1, Step: 175, Rank: 1, loss = 0.0023329469840973616Epoch: 1, Step: 175, Rank: 2, loss = 0.9258535504341125 | |
Epoch: 1, Step: 175, Rank: 3, loss = 0.8157249093055725 | |
Epoch: 1, Step: 175, Rank: 0, loss = 0.006806021090596914 | |
Epoch: 1, Step: 175, Rank: 7, loss = 0.9125837683677673 | |
Per-token loss scaled by world size: 0.00037907989462837577 | |
Epoch: 1, Step: 175, Rank: 5, loss = 0.9908674359321594 | |
total tokens: 7552 num samples: 8 num padding tokens: 1076 - rank: 4 max len: 944 min len: 723 avg len: 809.5 num_loss_counted_tokens: 4001 | |
total tokens: 6219 num samples: 3 num padding tokens: 803 - rank: 1 max len: 2073 min len: 1455 avg len: 1805.3333333333333 num_loss_counted_tokens: 1982 | |
{ | |
"epoch": 1, | |
"step": 175, | |
"rank": 0, | |
"loss": 0.006806021090596914, | |
"overall_throughput": 41.26828372064641, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.29252052307129, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20911, | |
"batch_size": 75, | |
"total_loss": 0.752047061920166, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:46.430726" | |
} | |
total tokens: 5842 num samples: 23 num padding tokens: 1859 - rank: 7 max len: 254 min len: 86 avg len: 173.17391304347825 num_loss_counted_tokens: 1764 | |
total tokens: 7518 num samples: 7 num padding tokens: 452 - rank: 3 max len: 1074 min len: 955 avg len: 1009.4285714285714 num_loss_counted_tokens: 5885 | |
total tokens: 7803 num samples: 17 num padding tokens: 1386 - rank: 6 max len: 459 min len: 302 avg len: 377.47058823529414 num_loss_counted_tokens: 3507 | |
total tokens: 7755 num samples: 11 num padding tokens: 586 - rank: 5 max len: 705 min len: 578 avg len: 651.7272727272727 num_loss_counted_tokens: 3489 | |
total tokens: 7440 num samples: 6 num padding tokens: 402 - rank: 2 max len: 1240 min len: 1106 avg len: 1173.0 num_loss_counted_tokens: 4030 | |
total tokens: 7226 num samples: 2 num padding tokens: 1117 - rank: 0 max len: 3613 min len: 2496 avg len: 3054.5 num_loss_counted_tokens: 257 | |
Per-token loss scaled by world size: 0.00023268039512913674Per-token loss scaled by world size: 0.00022860463650431484Per-token loss scaled by world size: 0.00033770385198295116Per-token loss scaled by world size: 4.125645318708848e-06Per-token loss scaled by world size: 0.0003991488483734429Per-token loss scaled by world size: 1.1589580026338808e-05 | |
Per-token loss scaled by world size: 0.0002852912584785372 | |
Epoch: 1, Step: 176, Rank: 3, loss = 0.6885303854942322 | |
Epoch: 1, Step: 176, Rank: 0, loss = 0.012208300642669201Epoch: 1, Step: 176, Rank: 6, loss = 0.9993079304695129 | |
Epoch: 1, Step: 176, Rank: 4, loss = 1.181131362915039 | |
Epoch: 1, Step: 176, Rank: 2, loss = 0.6764696836471558 | |
Epoch: 1, Step: 176, Rank: 1, loss = 0.034295015037059784 | |
Epoch: 1, Step: 176, Rank: 7, loss = 0.844212532043457 | |
Per-token loss scaled by world size: 0.0003952819970436394 | |
Epoch: 1, Step: 176, Rank: 5, loss = 1.1696888208389282 | |
total tokens: 7189 num samples: 7 num padding tokens: 797 - rank: 4 max len: 1027 min len: 807 avg len: 913.1428571428571 num_loss_counted_tokens: 4546 | |
total tokens: 7528 num samples: 4 num padding tokens: 414 - rank: 1 max len: 1882 min len: 1584 avg len: 1778.5 num_loss_counted_tokens: 2995 | |
{ | |
"epoch": 1, | |
"step": 176, | |
"rank": 0, | |
"loss": 0.012208300642669201, | |
"overall_throughput": 41.56666979221221, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.38214635848999, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23673, | |
"batch_size": 86, | |
"total_loss": 0.7007305026054382, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:49.012721" | |
} | |
total tokens: 8025 num samples: 15 num padding tokens: 1836 - rank: 6 max len: 535 min len: 285 avg len: 412.6 num_loss_counted_tokens: 4071 | |
total tokens: 7525 num samples: 5 num padding tokens: 253 - rank: 2 max len: 1505 min len: 1379 avg len: 1454.4 num_loss_counted_tokens: 1812 | |
total tokens: 7860 num samples: 10 num padding tokens: 1300 - rank: 5 max len: 786 min len: 557 avg len: 656.0 num_loss_counted_tokens: 4726 | |
total tokens: 7980 num samples: 6 num padding tokens: 677 - rank: 3 max len: 1330 min len: 1078 avg len: 1217.1666666666667 num_loss_counted_tokens: 3981 | |
total tokens: 8091 num samples: 29 num padding tokens: 3107 - rank: 7 max len: 279 min len: 72 avg len: 171.86206896551724 num_loss_counted_tokens: 2093 | |
total tokens: 5826 num samples: 2 num padding tokens: 735 - rank: 0 max len: 2913 min len: 2178 avg len: 2545.5 num_loss_counted_tokens: 2214 | |
Per-token loss scaled by world size: 0.0002194504631916061Per-token loss scaled by world size: 0.0005286230007186532Per-token loss scaled by world size: 0.00018121296307072043Per-token loss scaled by world size: 0.00039247411768883467 | |
Per-token loss scaled by world size: 3.4347518521826714e-05Per-token loss scaled by world size: 4.241336228005821e-06 | |
Per-token loss scaled by world size: 0.00030080656870268285 | |
Epoch: 1, Step: 177, Rank: 2, loss = 0.5341705083847046Epoch: 1, Step: 177, Rank: 5, loss = 1.156915545463562 | |
Epoch: 1, Step: 177, Rank: 3, loss = 1.558248519897461 | |
Epoch: 1, Step: 177, Rank: 7, loss = 0.646885097026825 | |
Epoch: 1, Step: 177, Rank: 1, loss = 0.10124789923429489Epoch: 1, Step: 177, Rank: 0, loss = 0.012502399273216724 | |
Epoch: 1, Step: 177, Rank: 4, loss = 0.8867025375366211 | |
Per-token loss scaled by world size: 0.00036306059337221086 | |
Epoch: 1, Step: 177, Rank: 6, loss = 1.0702118873596191 | |
total tokens: 7792 num samples: 8 num padding tokens: 837 - rank: 4 max len: 974 min len: 724 avg len: 869.375 num_loss_counted_tokens: 4288 | |
total tokens: 7242 num samples: 3 num padding tokens: 572 - rank: 1 max len: 2414 min len: 1992 avg len: 2223.3333333333335 num_loss_counted_tokens: 799 | |
{ | |
"epoch": 1, | |
"step": 177, | |
"rank": 0, | |
"loss": 0.012502399273216724, | |
"overall_throughput": 42.43462885426557, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.246610641479492, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23582, | |
"batch_size": 96, | |
"total_loss": 0.7458605170249939, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:51.468642" | |
} | |
total tokens: 7854 num samples: 11 num padding tokens: 757 - rank: 5 max len: 714 min len: 600 avg len: 645.1818181818181 num_loss_counted_tokens: 5101 | |
total tokens: 7854 num samples: 14 num padding tokens: 1807 - rank: 6 max len: 561 min len: 335 avg len: 431.92857142857144 num_loss_counted_tokens: 3657 | |
total tokens: 7752 num samples: 24 num padding tokens: 3111 - rank: 7 max len: 323 min len: 76 avg len: 193.375 num_loss_counted_tokens: 2099 | |
total tokens: 5566 num samples: 2 num padding tokens: 100 - rank: 0 max len: 2783 min len: 2683 avg len: 2733.0 num_loss_counted_tokens: 274 | |
total tokens: 7026 num samples: 6 num padding tokens: 376 - rank: 3 max len: 1171 min len: 1022 avg len: 1108.3333333333333 num_loss_counted_tokens: 3075 | |
total tokens: 7916 num samples: 4 num padding tokens: 1590 - rank: 2 max len: 1979 min len: 1340 avg len: 1581.5 num_loss_counted_tokens: 4584 | |
Per-token loss scaled by world size: 0.0001956072374014184Per-token loss scaled by world size: 0.00021871054195798934Per-token loss scaled by world size: 0.0003932247345801443 | |
Per-token loss scaled by world size: 1.0211075277766213e-05 | |
Per-token loss scaled by world size: 0.00026568045723252 | |
Per-token loss scaled by world size: 0.00030937412520870566Epoch: 1, Step: 178, Rank: 2, loss = 0.689293622970581 | |
Per-token loss scaled by world size: 0.000248032680246979Epoch: 1, Step: 178, Rank: 3, loss = 0.6164806485176086 | |
Epoch: 1, Step: 178, Rank: 5, loss = 1.2392969131469727 | |
Epoch: 1, Step: 178, Rank: 1, loss = 0.032181479036808014 | |
Epoch: 1, Step: 178, Rank: 6, loss = 0.8373252153396606 | |
Epoch: 1, Step: 178, Rank: 4, loss = 0.9750311970710754 | |
Per-token loss scaled by world size: 3.7826398511242587e-06Epoch: 1, Step: 178, Rank: 7, loss = 0.7817060351371765 | |
Epoch: 1, Step: 178, Rank: 0, loss = 0.01192146260291338 | |
total tokens: 8037 num samples: 9 num padding tokens: 489 - rank: 4 max len: 893 min len: 768 avg len: 838.6666666666666 num_loss_counted_tokens: 5348 | |
total tokens: 7436 num samples: 4 num padding tokens: 550 - rank: 1 max len: 1859 min len: 1539 avg len: 1721.5 num_loss_counted_tokens: 2082 | |
total tokens: 7380 num samples: 5 num padding tokens: 481 - rank: 2 max len: 1476 min len: 1220 avg len: 1379.8 num_loss_counted_tokens: 1223 | |
{ | |
"epoch": 1, | |
"step": 178, | |
"rank": 0, | |
"loss": 0.01192146260291338, | |
"overall_throughput": 40.62829660225024, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.526740550994873, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25213, | |
"batch_size": 83, | |
"total_loss": 0.647904634475708, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:54.071173" | |
} | |
total tokens: 8000 num samples: 16 num padding tokens: 2833 - rank: 6 max len: 500 min len: 256 avg len: 322.9375 num_loss_counted_tokens: 2783 | |
total tokens: 7994 num samples: 7 num padding tokens: 840 - rank: 3 max len: 1142 min len: 895 avg len: 1022.0 num_loss_counted_tokens: 3757 | |
total tokens: 7520 num samples: 10 num padding tokens: 909 - rank: 5 max len: 752 min len: 566 avg len: 661.1 num_loss_counted_tokens: 3632 | |
total tokens: 7560 num samples: 30 num padding tokens: 2931 - rank: 7 max len: 252 min len: 77 avg len: 154.3 num_loss_counted_tokens: 1778 | |
total tokens: 6586 num samples: 2 num padding tokens: 527 - rank: 0 max len: 3293 min len: 2766 avg len: 3029.5 num_loss_counted_tokens: 357 | |
Per-token loss scaled by world size: 0.0004986776039004326Per-token loss scaled by world size: 0.0002882078697439283Per-token loss scaled by world size: 0.0005514522199518979 | |
Per-token loss scaled by world size: 0.0002959422126878053 | |
Per-token loss scaled by world size: 0.0005512300995178521Per-token loss scaled by world size: 4.8149313442991115e-06 | |
Per-token loss scaled by world size: 8.981861901702359e-05 | |
Epoch: 1, Step: 179, Rank: 5, loss = 1.2778526544570923 | |
Epoch: 1, Step: 179, Rank: 6, loss = 1.1555607318878174 | |
Epoch: 1, Step: 179, Rank: 4, loss = 0.6678496599197388 | |
Epoch: 1, Step: 179, Rank: 7, loss = 1.277337908744812 | |
Epoch: 1, Step: 179, Rank: 0, loss = 0.011157399974763393 | |
Epoch: 1, Step: 179, Rank: 1, loss = 0.20813219249248505Epoch: 1, Step: 179, Rank: 2, loss = 0.6857721209526062 | |
Per-token loss scaled by world size: 0.00024368343292735517 | |
Epoch: 1, Step: 179, Rank: 3, loss = 0.5646754503250122 | |
total tokens: 5876 num samples: 2 num padding tokens: 220 - rank: 1 max len: 2938 min len: 2718 avg len: 2828.0 num_loss_counted_tokens: 745 | |
total tokens: 7839 num samples: 9 num padding tokens: 725 - rank: 4 max len: 871 min len: 630 avg len: 790.4444444444445 num_loss_counted_tokens: 4180 | |
total tokens: 7287 num samples: 7 num padding tokens: 860 - rank: 3 max len: 1041 min len: 872 avg len: 918.1428571428571 num_loss_counted_tokens: 4584 | |
{ | |
"epoch": 1, | |
"step": 179, | |
"rank": 0, | |
"loss": 0.011157399974763393, | |
"overall_throughput": 41.67005705220922, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 24.338607788085938, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18538, | |
"batch_size": 79, | |
"total_loss": 0.7310422658920288, | |
"gradnorm": 0.9710609316825867, | |
"weight_norm": 433.04327392578125, | |
"timestamp": "2024-08-18T20:55:56.610997" | |
} | |
total tokens: 8112 num samples: 13 num padding tokens: 833 - rank: 5 max len: 624 min len: 485 avg len: 559.9230769230769 num_loss_counted_tokens: 4530 | |
total tokens: 7329 num samples: 3 num padding tokens: 1943 - rank: 2 max len: 2443 min len: 1182 avg len: 1795.3333333333333 num_loss_counted_tokens: 502 | |
total tokens: 7812 num samples: 28 num padding tokens: 2498 - rank: 7 max len: 279 min len: 87 avg len: 189.78571428571428 num_loss_counted_tokens: 2490 | |
total tokens: 6798 num samples: 2 num padding tokens: 177 - rank: 0 max len: 3399 min len: 3222 avg len: 3310.5 num_loss_counted_tokens: 146 | |
total tokens: 8024 num samples: 17 num padding tokens: 1349 - rank: 6 max len: 472 min len: 304 avg len: 392.6470588235294 num_loss_counted_tokens: 4448 | |
Per-token loss scaled by world size: 0.0003445638285484165Per-token loss scaled by world size: 0.00042620761087164283Per-token loss scaled by world size: 7.993769395397976e-05Per-token loss scaled by world size: 4.4396303565008566e-05 | |
Per-token loss scaled by world size: 0.00013256767124403268 | |
Per-token loss scaled by world size: 0.0002054571668850258 | |
Epoch: 1, Step: 180, Rank: 1, loss = 0.21686097979545593Epoch: 1, Step: 180, Rank: 5, loss = 1.1562479734420776 | |
Per-token loss scaled by world size: 0.0001180191757157445Epoch: 1, Step: 180, Rank: 0, loss = 0.12044162303209305 | |
Epoch: 1, Step: 180, Rank: 4, loss = 0.9347586035728455 | |
Epoch: 1, Step: 180, Rank: 3, loss = 0.3596395254135132 | |
Epoch: 1, Step: 180, Rank: 7, loss = 0.5573796033859253 | |
Per-token loss scaled by world size: 0.00041190627962350845 | |
Epoch: 1, Step: 180, Rank: 2, loss = 0.3201712667942047 | |
Epoch: 1, Step: 180, Rank: 6, loss = 1.11745023727417 | |
[2024-08-18 20:55:59,149] [INFO] [logging.py:96:log_dist] [Rank 0] step=5, skipped=0, lr=[4.000000000000001e-06], mom=[(0.9, 0.95)] | |
[2024-08-18 20:55:59,226] [INFO] [timer.py:258:stop] epoch=0/micro_step=180/global_step=5, RunningAvgSamplesPerSec=41.67583330946055, CurrSamplesPerSec=41.56020644882237, MemAllocated=22.74GB, MaxMemAllocated=30.61GB | |
total tokens: 7120 num samples: 4 num padding tokens: 1624 - rank: 1 max len: 1780 min len: 1186 avg len: 1374.0 num_loss_counted_tokens: 1016 | |
total tokens: 7790 num samples: 10 num padding tokens: 1050 - rank: 4 max len: 779 min len: 583 avg len: 674.0 num_loss_counted_tokens: 5078 | |
{ | |
"epoch": 1, | |
"step": 180, | |
"rank": 0, | |
"loss": 0.12044162303209305, | |
"overall_throughput": 40.381532723652576, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 22.73690176010132, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21703, | |
"batch_size": 76, | |
"total_loss": 0.5978687405586243, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:55:59.229405" | |
} | |
total tokens: 6960 num samples: 29 num padding tokens: 2603 - rank: 7 max len: 240 min len: 85 avg len: 150.24137931034483 num_loss_counted_tokens: 1642 | |
total tokens: 7408 num samples: 8 num padding tokens: 579 - rank: 3 max len: 926 min len: 792 avg len: 853.625 num_loss_counted_tokens: 5923 | |
total tokens: 8020 num samples: 20 num padding tokens: 1553 - rank: 6 max len: 401 min len: 267 avg len: 323.35 num_loss_counted_tokens: 3882 | |
total tokens: 5448 num samples: 2 num padding tokens: 854 - rank: 0 max len: 2724 min len: 1870 avg len: 2297.0 num_loss_counted_tokens: 185 | |
total tokens: 7812 num samples: 14 num padding tokens: 1049 - rank: 5 max len: 558 min len: 414 avg len: 483.07142857142856 num_loss_counted_tokens: 3625 | |
total tokens: 7854 num samples: 7 num padding tokens: 726 - rank: 2 max len: 1122 min len: 943 avg len: 1018.2857142857143 num_loss_counted_tokens: 3158 | |
Per-token loss scaled by world size: 0.00021158010349608958 | |
Per-token loss scaled by world size: 9.194504673359916e-05Per-token loss scaled by world size: 2.7597000098467106e-06Per-token loss scaled by world size: 0.00044244344462640584Per-token loss scaled by world size: 0.0002959812409244478Per-token loss scaled by world size: 1.918704765557777e-05 | |
Per-token loss scaled by world size: 0.0002461440162733197 | |
Epoch: 1, Step: 181, Rank: 4, loss = 0.640823245048523 | |
Epoch: 1, Step: 181, Rank: 5, loss = 1.3400505781173706 | |
Epoch: 1, Step: 181, Rank: 2, loss = 0.27847856283187866 | |
Epoch: 1, Step: 181, Rank: 0, loss = 0.008358441293239594Epoch: 1, Step: 181, Rank: 1, loss = 0.058112770318984985 | |
Epoch: 1, Step: 181, Rank: 7, loss = 0.8964532017707825 | |
Epoch: 1, Step: 181, Rank: 3, loss = 0.7455086708068848 | |
Per-token loss scaled by world size: 0.00037991307908669114 | |
Epoch: 1, Step: 181, Rank: 6, loss = 1.1506617069244385 | |
total tokens: 7336 num samples: 8 num padding tokens: 1044 - rank: 4 max len: 917 min len: 694 avg len: 786.5 num_loss_counted_tokens: 4411 | |
total tokens: 5638 num samples: 2 num padding tokens: 201 - rank: 1 max len: 2819 min len: 2618 avg len: 2718.5 num_loss_counted_tokens: 197 | |
{ | |
"epoch": 1, | |
"step": 181, | |
"rank": 0, | |
"loss": 0.008358441293239594, | |
"overall_throughput": 41.99766118896557, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.434746265411377, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24230, | |
"batch_size": 85, | |
"total_loss": 0.6398059129714966, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:01.770174" | |
} | |
total tokens: 8088 num samples: 12 num padding tokens: 1035 - rank: 5 max len: 674 min len: 501 avg len: 587.75 num_loss_counted_tokens: 4793 | |
total tokens: 7696 num samples: 16 num padding tokens: 1539 - rank: 6 max len: 481 min len: 270 avg len: 384.8125 num_loss_counted_tokens: 3508 | |
total tokens: 7340 num samples: 4 num padding tokens: 1220 - rank: 2 max len: 1835 min len: 1287 avg len: 1530.0 num_loss_counted_tokens: 2015 | |
total tokens: 7047 num samples: 27 num padding tokens: 2280 - rank: 7 max len: 261 min len: 83 avg len: 176.55555555555554 num_loss_counted_tokens: 1980 | |
total tokens: 6842 num samples: 2 num padding tokens: 194 - rank: 0 max len: 3421 min len: 3227 avg len: 3324.0 num_loss_counted_tokens: 203 | |
total tokens: 7308 num samples: 6 num padding tokens: 899 - rank: 3 max len: 1218 min len: 951 avg len: 1068.1666666666667 num_loss_counted_tokens: 2980 | |
Per-token loss scaled by world size: 0.00044985805288888514Per-token loss scaled by world size: 0.0003697045613080263Per-token loss scaled by world size: 0.00020446558482944965Per-token loss scaled by world size: 0.0002235985011793673Per-token loss scaled by world size: 0.0003323642595205456 | |
Per-token loss scaled by world size: 8.518856338923797e-05 | |
Epoch: 1, Step: 182, Rank: 3, loss = 0.6204019784927368 | |
Per-token loss scaled by world size: 0.00010683093569241464Epoch: 1, Step: 182, Rank: 5, loss = 1.0257915258407593Epoch: 1, Step: 182, Rank: 6, loss = 0.9221861958503723Epoch: 1, Step: 182, Rank: 2, loss = 0.5673153400421143 | |
Epoch: 1, Step: 182, Rank: 4, loss = 1.2481874227523804 | |
Epoch: 1, Step: 182, Rank: 1, loss = 0.23636631667613983 | |
Epoch: 1, Step: 182, Rank: 7, loss = 0.296415776014328Per-token loss scaled by world size: 2.3087804947863333e-06 | |
Epoch: 1, Step: 182, Rank: 0, loss = 0.006406000349670649 | |
total tokens: 7911 num samples: 3 num padding tokens: 1870 - rank: 1 max len: 2637 min len: 1680 avg len: 2013.6666666666667 num_loss_counted_tokens: 268 | |
total tokens: 3888 num samples: 18 num padding tokens: 1364 - rank: 7 max len: 216 min len: 80 avg len: 140.22222222222223 num_loss_counted_tokens: 1085 | |
total tokens: 7695 num samples: 9 num padding tokens: 456 - rank: 4 max len: 855 min len: 750 avg len: 804.3333333333334 num_loss_counted_tokens: 5317 | |
{ | |
"epoch": 1, | |
"step": 182, | |
"rank": 0, | |
"loss": 0.006406000349670649, | |
"overall_throughput": 40.52672826708525, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.530043125152588, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22197, | |
"batch_size": 78, | |
"total_loss": 0.6153838038444519, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:04.374573" | |
} | |
total tokens: 8024 num samples: 17 num padding tokens: 2930 - rank: 6 max len: 472 min len: 227 avg len: 299.6470588235294 num_loss_counted_tokens: 3168 | |
total tokens: 6560 num samples: 4 num padding tokens: 820 - rank: 2 max len: 1640 min len: 1225 avg len: 1435.0 num_loss_counted_tokens: 3801 | |
total tokens: 7062 num samples: 6 num padding tokens: 912 - rank: 3 max len: 1177 min len: 874 avg len: 1025.0 num_loss_counted_tokens: 4465 | |
total tokens: 7854 num samples: 11 num padding tokens: 1369 - rank: 5 max len: 714 min len: 483 avg len: 589.5454545454545 num_loss_counted_tokens: 3058 | |
total tokens: 6820 num samples: 2 num padding tokens: 89 - rank: 0 max len: 3410 min len: 3321 avg len: 3365.5 num_loss_counted_tokens: 208 | |
Per-token loss scaled by world size: 0.00046004995238035917Per-token loss scaled by world size: 0.0006659817881882191Per-token loss scaled by world size: 0.0003906514320988208 | |
Per-token loss scaled by world size: 2.6746805815491825e-05Per-token loss scaled by world size: 0.000365366053301841 | |
Per-token loss scaled by world size: 6.110809408710338e-06Per-token loss scaled by world size: 2.263839405713952e-06 | |
Epoch: 1, Step: 183, Rank: 6, loss = 1.5981065034866333Epoch: 1, Step: 183, Rank: 3, loss = 0.9374169707298279 | |
Epoch: 1, Step: 183, Rank: 4, loss = 1.103947401046753Epoch: 1, Step: 183, Rank: 2, loss = 0.06418230384588242 | |
Epoch: 1, Step: 183, Rank: 7, loss = 0.8767415285110474 | |
Epoch: 1, Step: 183, Rank: 0, loss = 0.005432365462183952Epoch: 1, Step: 183, Rank: 1, loss = 0.014663650654256344 | |
Per-token loss scaled by world size: 0.0007157879881560802 | |
Epoch: 1, Step: 183, Rank: 5, loss = 1.7176227569580078 | |
total tokens: 7851 num samples: 3 num padding tokens: 445 - rank: 1 max len: 2617 min len: 2389 avg len: 2468.6666666666665 num_loss_counted_tokens: 473 | |
total tokens: 7480 num samples: 11 num padding tokens: 807 - rank: 4 max len: 680 min len: 536 avg len: 606.6363636363636 num_loss_counted_tokens: 4807 | |
{ | |
"epoch": 1, | |
"step": 183, | |
"rank": 0, | |
"loss": 0.005432365462183952, | |
"overall_throughput": 41.57406220293697, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.319289207458496, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19197, | |
"batch_size": 71, | |
"total_loss": 0.7897641658782959, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:06.915463" | |
} | |
total tokens: 976 num samples: 8 num padding tokens: 185 - rank: 7 max len: 122 min len: 80 avg len: 98.875 num_loss_counted_tokens: 195 | |
total tokens: 7280 num samples: 7 num padding tokens: 746 - rank: 3 max len: 1040 min len: 851 avg len: 933.4285714285714 num_loss_counted_tokens: 5111 | |
total tokens: 6990 num samples: 6 num padding tokens: 218 - rank: 2 max len: 1165 min len: 1063 avg len: 1128.6666666666667 num_loss_counted_tokens: 3805 | |
total tokens: 7917 num samples: 21 num padding tokens: 2826 - rank: 6 max len: 377 min len: 129 avg len: 242.42857142857142 num_loss_counted_tokens: 2506 | |
total tokens: 7920 num samples: 15 num padding tokens: 1424 - rank: 5 max len: 528 min len: 378 avg len: 433.06666666666666 num_loss_counted_tokens: 4285 | |
total tokens: 5748 num samples: 2 num padding tokens: 35 - rank: 0 max len: 2874 min len: 2839 avg len: 2856.5 num_loss_counted_tokens: 168 | |
Per-token loss scaled by world size: 0.000344208674505353Per-token loss scaled by world size: 0.00029513309709727764Per-token loss scaled by world size: 0.0003547095402609557Per-token loss scaled by world size: 0.0004679278936237097Per-token loss scaled by world size: 4.7765744966454804e-05Per-token loss scaled by world size: 0.0003639253554865718Per-token loss scaled by world size: 4.720650849776575e-06 | |
Epoch: 1, Step: 184, Rank: 6, loss = 0.9553658366203308 | |
Epoch: 1, Step: 184, Rank: 1, loss = 0.12865106761455536Epoch: 1, Step: 184, Rank: 2, loss = 0.9270830750465393 | |
Epoch: 1, Step: 184, Rank: 4, loss = 1.2603052854537964Epoch: 1, Step: 184, Rank: 0, loss = 0.012714482843875885Epoch: 1, Step: 184, Rank: 7, loss = 0.7949041128158569 | |
Epoch: 1, Step: 184, Rank: 5, loss = 0.9801874756813049 | |
Per-token loss scaled by world size: 0.00017599221609998494 | |
Epoch: 1, Step: 184, Rank: 3, loss = 0.4740130305290222 | |
total tokens: 6996 num samples: 2 num padding tokens: 1478 - rank: 1 max len: 3498 min len: 2020 avg len: 2759.0 num_loss_counted_tokens: 177 | |
total tokens: 7335 num samples: 9 num padding tokens: 438 - rank: 4 max len: 815 min len: 722 avg len: 766.3333333333334 num_loss_counted_tokens: 4746 | |
total tokens: 7648 num samples: 16 num padding tokens: 1473 - rank: 6 max len: 478 min len: 288 avg len: 385.9375 num_loss_counted_tokens: 3125 | |
total tokens: 7722 num samples: 11 num padding tokens: 1234 - rank: 5 max len: 702 min len: 483 avg len: 589.8181818181819 num_loss_counted_tokens: 4245 | |
{ | |
"epoch": 1, | |
"step": 184, | |
"rank": 0, | |
"loss": 0.012714482843875885, | |
"overall_throughput": 41.53668738608439, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.342710971832275, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21547, | |
"batch_size": 88, | |
"total_loss": 0.6916530132293701, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:09.462656" | |
} | |
total tokens: 7588 num samples: 28 num padding tokens: 2405 - rank: 7 max len: 271 min len: 82 avg len: 185.10714285714286 num_loss_counted_tokens: 2309 | |
total tokens: 7024 num samples: 4 num padding tokens: 1375 - rank: 2 max len: 1756 min len: 1193 avg len: 1412.25 num_loss_counted_tokens: 2078 | |
total tokens: 7448 num samples: 7 num padding tokens: 379 - rank: 3 max len: 1064 min len: 961 avg len: 1009.8571428571429 num_loss_counted_tokens: 5273 | |
total tokens: 4061 num samples: 1 num padding tokens: 0 - rank: 0 max len: 4061 min len: 4061 avg len: 4061.0 num_loss_counted_tokens: 393 | |
Per-token loss scaled by world size: 0.00041325463098473847Per-token loss scaled by world size: 0.00040759582770988345Per-token loss scaled by world size: 0.0002913470088969916Per-token loss scaled by world size: 5.2812706599070225e-06Per-token loss scaled by world size: 9.605777449905872e-05 | |
Per-token loss scaled by world size: 4.5470653276424855e-05 | |
Per-token loss scaled by world size: 0.000289949937723577 | |
Epoch: 1, Step: 185, Rank: 5, loss = 1.2077573537826538 | |
Epoch: 1, Step: 185, Rank: 6, loss = 1.2245250940322876Epoch: 1, Step: 185, Rank: 0, loss = 0.015649065375328064Epoch: 1, Step: 185, Rank: 4, loss = 0.8632975816726685 | |
Epoch: 1, Step: 185, Rank: 2, loss = 0.2846311926841736 | |
Epoch: 1, Step: 185, Rank: 1, loss = 0.13473522663116455 | |
Epoch: 1, Step: 185, Rank: 7, loss = 0.859157919883728 | |
Per-token loss scaled by world size: 0.00039052340434864163 | |
Epoch: 1, Step: 185, Rank: 3, loss = 1.1571696996688843 | |
total tokens: 7909 num samples: 11 num padding tokens: 619 - rank: 4 max len: 719 min len: 567 avg len: 662.7272727272727 num_loss_counted_tokens: 3030 | |
total tokens: 7710 num samples: 5 num padding tokens: 1395 - rank: 1 max len: 1542 min len: 1135 avg len: 1263.0 num_loss_counted_tokens: 2357 | |
total tokens: 7533 num samples: 9 num padding tokens: 442 - rank: 3 max len: 837 min len: 720 avg len: 787.8888888888889 num_loss_counted_tokens: 4365 | |
{ | |
"epoch": 1, | |
"step": 185, | |
"rank": 0, | |
"loss": 0.015649065375328064, | |
"overall_throughput": 42.01932932464499, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.411304473876953, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23705, | |
"batch_size": 85, | |
"total_loss": 0.7183653712272644, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:11.984726" | |
} | |
total tokens: 8030 num samples: 22 num padding tokens: 1301 - rank: 6 max len: 365 min len: 237 avg len: 305.8636363636364 num_loss_counted_tokens: 3940 | |
total tokens: 7868 num samples: 14 num padding tokens: 1280 - rank: 5 max len: 562 min len: 411 avg len: 470.57142857142856 num_loss_counted_tokens: 4766 | |
total tokens: 6728 num samples: 29 num padding tokens: 2588 - rank: 7 max len: 232 min len: 75 avg len: 142.75862068965517 num_loss_counted_tokens: 1514 | |
total tokens: 6430 num samples: 2 num padding tokens: 1586 - rank: 0 max len: 3215 min len: 1629 avg len: 2422.0 num_loss_counted_tokens: 1620 | |
total tokens: 7819 num samples: 7 num padding tokens: 799 - rank: 2 max len: 1117 min len: 938 avg len: 1002.8571428571429 num_loss_counted_tokens: 3592 | |
Per-token loss scaled by world size: 0.00034539305488578975Per-token loss scaled by world size: 0.0004060663341078907Per-token loss scaled by world size: 0.0002557812840677798Per-token loss scaled by world size: 0.00037697027437388897Per-token loss scaled by world size: 0.00021574345009867102Per-token loss scaled by world size: 1.922340288729174e-06Per-token loss scaled by world size: 1.7185264368890785e-05 | |
Epoch: 1, Step: 186, Rank: 6, loss = 1.187833309173584Epoch: 1, Step: 186, Rank: 2, loss = 0.8059667944908142 | |
Epoch: 1, Step: 186, Rank: 0, loss = 0.006057294551283121 | |
Epoch: 1, Step: 186, Rank: 3, loss = 1.0883334875106812 | |
Epoch: 1, Step: 186, Rank: 4, loss = 1.279515027999878 | |
Epoch: 1, Step: 186, Rank: 7, loss = 0.6798076033592224Epoch: 1, Step: 186, Rank: 1, loss = 0.054150767624378204 | |
Per-token loss scaled by world size: 0.00038644636515527964 | |
Epoch: 1, Step: 186, Rank: 5, loss = 1.217692494392395 | |
total tokens: 8019 num samples: 9 num padding tokens: 612 - rank: 4 max len: 891 min len: 729 avg len: 823.0 num_loss_counted_tokens: 5812 | |
total tokens: 7845 num samples: 3 num padding tokens: 1345 - rank: 1 max len: 2615 min len: 1815 avg len: 2166.6666666666665 num_loss_counted_tokens: 455 | |
{ | |
"epoch": 1, | |
"step": 186, | |
"rank": 0, | |
"loss": 0.006057294551283121, | |
"overall_throughput": 41.92610024295953, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.250525951385498, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25208, | |
"batch_size": 86, | |
"total_loss": 0.7899196147918701, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:14.513363" | |
} | |
total tokens: 8030 num samples: 5 num padding tokens: 2103 - rank: 3 max len: 1606 min len: 905 avg len: 1185.4 num_loss_counted_tokens: 1115 | |
total tokens: 7990 num samples: 17 num padding tokens: 1874 - rank: 6 max len: 470 min len: 282 avg len: 359.7647058823529 num_loss_counted_tokens: 3470 | |
total tokens: 7040 num samples: 4 num padding tokens: 176 - rank: 2 max len: 1760 min len: 1656 avg len: 1716.0 num_loss_counted_tokens: 1048 | |
total tokens: 8091 num samples: 29 num padding tokens: 2852 - rank: 7 max len: 279 min len: 85 avg len: 180.6551724137931 num_loss_counted_tokens: 2267 | |
total tokens: 6206 num samples: 2 num padding tokens: 163 - rank: 0 max len: 3103 min len: 2940 avg len: 3021.5 num_loss_counted_tokens: 193 | |
total tokens: 7722 num samples: 11 num padding tokens: 1158 - rank: 5 max len: 702 min len: 471 avg len: 596.7272727272727 num_loss_counted_tokens: 4117 | |
Per-token loss scaled by world size: 0.0002549219934735447Per-token loss scaled by world size: 9.078537550522014e-05Per-token loss scaled by world size: 0.0001285703619942069 | |
Per-token loss scaled by world size: 0.00026954273926094174 | |
Per-token loss scaled by world size: 0.00027302553644403815 | |
Per-token loss scaled by world size: 0.00012735271593555808 | |
Per-token loss scaled by world size: 0.00019962496298830956 | |
Epoch: 1, Step: 187, Rank: 3, loss = 0.3104405999183655 | |
Epoch: 1, Step: 187, Rank: 2, loss = 0.8717057108879089 | |
Epoch: 1, Step: 187, Rank: 6, loss = 0.9217013716697693 | |
Epoch: 1, Step: 187, Rank: 1, loss = 0.4396463632583618 | |
Epoch: 1, Step: 187, Rank: 4, loss = 0.9336108565330505 | |
Epoch: 1, Step: 187, Rank: 0, loss = 0.43548262119293213 | |
Per-token loss scaled by world size: 0.00028984216623939574 | |
Epoch: 1, Step: 187, Rank: 7, loss = 0.6826175451278687 | |
Epoch: 1, Step: 187, Rank: 5, loss = 0.9911152720451355 | |
total tokens: 6432 num samples: 3 num padding tokens: 1172 - rank: 1 max len: 2144 min len: 1513 avg len: 1753.3333333333333 num_loss_counted_tokens: 778 | |
total tokens: 7992 num samples: 8 num padding tokens: 895 - rank: 4 max len: 999 min len: 761 avg len: 887.125 num_loss_counted_tokens: 3688 | |
total tokens: 7580 num samples: 10 num padding tokens: 664 - rank: 5 max len: 758 min len: 591 avg len: 691.6 num_loss_counted_tokens: 5001 | |
{ | |
"epoch": 1, | |
"step": 187, | |
"rank": 0, | |
"loss": 0.43548262119293213, | |
"overall_throughput": 41.80966716175401, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.324402809143066, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 27356, | |
"batch_size": 97, | |
"total_loss": 0.6982901096343994, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:17.044420" | |
} | |
total tokens: 7756 num samples: 14 num padding tokens: 2347 - rank: 6 max len: 554 min len: 267 avg len: 386.35714285714283 num_loss_counted_tokens: 3784 | |
total tokens: 5145 num samples: 21 num padding tokens: 1727 - rank: 7 max len: 245 min len: 84 avg len: 162.76190476190476 num_loss_counted_tokens: 1417 | |
total tokens: 7055 num samples: 5 num padding tokens: 476 - rank: 2 max len: 1411 min len: 1211 avg len: 1315.8 num_loss_counted_tokens: 2646 | |
total tokens: 7146 num samples: 6 num padding tokens: 724 - rank: 3 max len: 1191 min len: 1017 avg len: 1070.3333333333333 num_loss_counted_tokens: 3869 | |
total tokens: 7376 num samples: 2 num padding tokens: 819 - rank: 0 max len: 3688 min len: 2869 avg len: 3278.5 num_loss_counted_tokens: 160 | |
Per-token loss scaled by world size: 0.00011057691881433129Per-token loss scaled by world size: 0.00035715868580155075Per-token loss scaled by world size: 0.0003879719879478216Per-token loss scaled by world size: 0.00021651088900398463Per-token loss scaled by world size: 6.0199621657375246e-05 | |
Per-token loss scaled by world size: 5.681112452293746e-05 | |
Per-token loss scaled by world size: 0.0005215432029217482 | |
Epoch: 1, Step: 188, Rank: 6, loss = 1.0699580907821655Epoch: 1, Step: 188, Rank: 4, loss = 1.1622670888900757 | |
Epoch: 1, Step: 188, Rank: 2, loss = 0.18034301698207855Epoch: 1, Step: 188, Rank: 1, loss = 0.3312608003616333 | |
Epoch: 1, Step: 188, Rank: 7, loss = 0.6486124992370605 | |
Epoch: 1, Step: 188, Rank: 0, loss = 0.1701919287443161 | |
Epoch: 1, Step: 188, Rank: 5, loss = 1.562412977218628 | |
Per-token loss scaled by world size: 0.0001381830807076767 | |
Epoch: 1, Step: 188, Rank: 3, loss = 0.4139619469642639 | |
total tokens: 5862 num samples: 2 num padding tokens: 913 - rank: 1 max len: 2931 min len: 2018 avg len: 2474.5 num_loss_counted_tokens: 226 | |
total tokens: 7819 num samples: 7 num padding tokens: 1165 - rank: 4 max len: 1117 min len: 742 avg len: 950.5714285714286 num_loss_counted_tokens: 4359 | |
{ | |
"epoch": 1, | |
"step": 188, | |
"rank": 0, | |
"loss": 0.1701919287443161, | |
"overall_throughput": 40.95455841532473, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.21819305419922, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23966, | |
"batch_size": 84, | |
"total_loss": 0.6923760771751404, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:19.630875" | |
} | |
total tokens: 8085 num samples: 11 num padding tokens: 1172 - rank: 5 max len: 735 min len: 538 avg len: 628.4545454545455 num_loss_counted_tokens: 4424 | |
total tokens: 7680 num samples: 4 num padding tokens: 1258 - rank: 2 max len: 1920 min len: 1352 avg len: 1605.5 num_loss_counted_tokens: 1060 | |
total tokens: 7875 num samples: 15 num padding tokens: 1673 - rank: 6 max len: 525 min len: 296 avg len: 413.46666666666664 num_loss_counted_tokens: 4419 | |
total tokens: 6734 num samples: 2 num padding tokens: 36 - rank: 0 max len: 3367 min len: 3331 avg len: 3349.0 num_loss_counted_tokens: 164 | |
total tokens: 7920 num samples: 6 num padding tokens: 486 - rank: 3 max len: 1320 min len: 1118 avg len: 1239.0 num_loss_counted_tokens: 1697 | |
total tokens: 8064 num samples: 28 num padding tokens: 2898 - rank: 7 max len: 288 min len: 79 avg len: 184.5 num_loss_counted_tokens: 2302 | |
Per-token loss scaled by world size: 0.0002291280252393335Per-token loss scaled by world size: 0.00042932084761559963Per-token loss scaled by world size: 0.0003715125494636595Per-token loss scaled by world size: 0.0003486127534415573Per-token loss scaled by world size: 0.00029553903732448816 | |
Per-token loss scaled by world size: 2.541187996030203e-06 | |
Per-token loss scaled by world size: 4.7232209908543155e-05 | |
Epoch: 1, Step: 189, Rank: 5, loss = 1.2460501194000244 | |
Epoch: 1, Step: 189, Rank: 6, loss = 1.0782687664031982Epoch: 1, Step: 189, Rank: 0, loss = 0.007375480607151985 | |
Epoch: 1, Step: 189, Rank: 2, loss = 0.665015459060669 | |
Epoch: 1, Step: 189, Rank: 7, loss = 0.8577651381492615 | |
Epoch: 1, Step: 189, Rank: 4, loss = 1.0118049383163452 | |
Epoch: 1, Step: 189, Rank: 1, loss = 0.13708558678627014 | |
Per-token loss scaled by world size: 0.00045083268196322024 | |
Epoch: 1, Step: 189, Rank: 3, loss = 1.308485507965088 | |
total tokens: 8012 num samples: 4 num padding tokens: 1085 - rank: 1 max len: 2003 min len: 1547 avg len: 1731.75 num_loss_counted_tokens: 501 | |
total tokens: 7704 num samples: 9 num padding tokens: 539 - rank: 4 max len: 856 min len: 750 avg len: 796.1111111111111 num_loss_counted_tokens: 5817 | |
{ | |
"epoch": 1, | |
"step": 189, | |
"rank": 0, | |
"loss": 0.007375480607151985, | |
"overall_throughput": 41.729872575916524, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.477022171020508, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23219, | |
"batch_size": 98, | |
"total_loss": 0.7889814376831055, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:22.162410" | |
} | |
total tokens: 6978 num samples: 6 num padding tokens: 878 - rank: 3 max len: 1163 min len: 953 avg len: 1016.6666666666666 num_loss_counted_tokens: 3406 | |
total tokens: 5496 num samples: 24 num padding tokens: 2060 - rank: 7 max len: 229 min len: 84 avg len: 143.16666666666666 num_loss_counted_tokens: 1259 | |
total tokens: 8074 num samples: 11 num padding tokens: 921 - rank: 5 max len: 734 min len: 585 avg len: 650.2727272727273 num_loss_counted_tokens: 3399 | |
total tokens: 8100 num samples: 15 num padding tokens: 2428 - rank: 6 max len: 540 min len: 251 avg len: 378.1333333333333 num_loss_counted_tokens: 2866 | |
total tokens: 7430 num samples: 5 num padding tokens: 440 - rank: 2 max len: 1486 min len: 1279 avg len: 1398.0 num_loss_counted_tokens: 2676 | |
total tokens: 7548 num samples: 2 num padding tokens: 467 - rank: 0 max len: 3774 min len: 3307 avg len: 3540.5 num_loss_counted_tokens: 178 | |
Per-token loss scaled by world size: 0.00038093223702162504Per-token loss scaled by world size: 0.0003549058164935559Per-token loss scaled by world size: 0.0002593057288322598Per-token loss scaled by world size: 0.0002407356078037992Per-token loss scaled by world size: 0.0001624725991860032 | |
Per-token loss scaled by world size: 9.274062176700681e-05Per-token loss scaled by world size: 3.2670384825905785e-05 | |
Epoch: 1, Step: 190, Rank: 6, loss = 1.1850801706314087 | |
Epoch: 1, Step: 190, Rank: 7, loss = 0.8067001104354858Epoch: 1, Step: 190, Rank: 4, loss = 1.1041120290756226 | |
Epoch: 1, Step: 190, Rank: 3, loss = 0.7489284873008728 | |
Epoch: 1, Step: 190, Rank: 0, loss = 0.10163756459951401 | |
Epoch: 1, Step: 190, Rank: 1, loss = 0.2885160744190216Epoch: 1, Step: 190, Rank: 2, loss = 0.5054522752761841 | |
Per-token loss scaled by world size: 0.00030219164909794927 | |
Epoch: 1, Step: 190, Rank: 5, loss = 0.9401181936264038 | |
total tokens: 6304 num samples: 2 num padding tokens: 712 - rank: 1 max len: 3152 min len: 2440 avg len: 2796.0 num_loss_counted_tokens: 243 | |
total tokens: 7343 num samples: 7 num padding tokens: 1218 - rank: 4 max len: 1049 min len: 690 avg len: 875.0 num_loss_counted_tokens: 4702 | |
{ | |
"epoch": 1, | |
"step": 190, | |
"rank": 0, | |
"loss": 0.10163756459951401, | |
"overall_throughput": 41.867834901558076, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.32686471939087, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24888, | |
"batch_size": 83, | |
"total_loss": 0.7100681662559509, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:24.689299" | |
} | |
total tokens: 8112 num samples: 12 num padding tokens: 910 - rank: 5 max len: 676 min len: 517 avg len: 600.1666666666666 num_loss_counted_tokens: 4936 | |
total tokens: 4446 num samples: 19 num padding tokens: 1491 - rank: 7 max len: 234 min len: 76 avg len: 155.52631578947367 num_loss_counted_tokens: 1146 | |
total tokens: 7266 num samples: 2 num padding tokens: 342 - rank: 0 max len: 3633 min len: 3291 avg len: 3462.0 num_loss_counted_tokens: 203 | |
total tokens: 7984 num samples: 16 num padding tokens: 1646 - rank: 6 max len: 499 min len: 253 avg len: 396.125 num_loss_counted_tokens: 3681 | |
total tokens: 7490 num samples: 5 num padding tokens: 639 - rank: 2 max len: 1498 min len: 1299 avg len: 1370.2 num_loss_counted_tokens: 3496 | |
total tokens: 7590 num samples: 6 num padding tokens: 448 - rank: 3 max len: 1265 min len: 1117 avg len: 1190.3333333333333 num_loss_counted_tokens: 2752 | |
Per-token loss scaled by world size: 0.0003764858120121062Per-token loss scaled by world size: 0.0006537719164043665Per-token loss scaled by world size: 0.00021446413302328438Per-token loss scaled by world size: 9.457199485041201e-05Per-token loss scaled by world size: 0.00034635854535736144 | |
Per-token loss scaled by world size: 1.4150586139294319e-05 | |
Per-token loss scaled by world size: 7.414004357997328e-05 | |
Epoch: 1, Step: 191, Rank: 5, loss = 1.6465245485305786 | |
Epoch: 1, Step: 191, Rank: 2, loss = 0.23817956447601318 | |
Epoch: 1, Step: 191, Rank: 3, loss = 0.8723039627075195Epoch: 1, Step: 191, Rank: 7, loss = 0.9481794834136963 | |
Epoch: 1, Step: 191, Rank: 4, loss = 0.5401279330253601 | |
Epoch: 1, Step: 191, Rank: 0, loss = 0.03563825041055679 | |
Epoch: 1, Step: 191, Rank: 1, loss = 0.18672169744968414 | |
Per-token loss scaled by world size: 0.0005529047921299934 | |
Epoch: 1, Step: 191, Rank: 6, loss = 1.3924907445907593 | |
total tokens: 7998 num samples: 3 num padding tokens: 968 - rank: 1 max len: 2666 min len: 1768 avg len: 2343.3333333333335 num_loss_counted_tokens: 631 | |
total tokens: 7758 num samples: 9 num padding tokens: 1421 - rank: 4 max len: 862 min len: 603 avg len: 704.1111111111111 num_loss_counted_tokens: 3783 | |
{ | |
"epoch": 1, | |
"step": 191, | |
"rank": 0, | |
"loss": 0.03563825041055679, | |
"overall_throughput": 41.91096199447346, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.354421615600586, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20148, | |
"batch_size": 73, | |
"total_loss": 0.7325208187103271, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:27.214193" | |
} | |
total tokens: 7946 num samples: 29 num padding tokens: 2882 - rank: 7 max len: 274 min len: 81 avg len: 174.6206896551724 num_loss_counted_tokens: 2096 | |
total tokens: 8094 num samples: 19 num padding tokens: 1335 - rank: 6 max len: 426 min len: 280 avg len: 355.7368421052632 num_loss_counted_tokens: 4052 | |
total tokens: 8118 num samples: 6 num padding tokens: 1664 - rank: 3 max len: 1353 min len: 917 avg len: 1075.6666666666667 num_loss_counted_tokens: 3173 | |
total tokens: 7826 num samples: 13 num padding tokens: 1193 - rank: 5 max len: 602 min len: 428 avg len: 510.2307692307692 num_loss_counted_tokens: 2941 | |
total tokens: 7920 num samples: 5 num padding tokens: 502 - rank: 2 max len: 1584 min len: 1373 avg len: 1483.6 num_loss_counted_tokens: 2395 | |
total tokens: 7424 num samples: 2 num padding tokens: 728 - rank: 0 max len: 3712 min len: 2984 avg len: 3348.0 num_loss_counted_tokens: 259 | |
Per-token loss scaled by world size: 0.00023188829072751105Per-token loss scaled by world size: 0.00031627173302695155Per-token loss scaled by world size: 0.0003686068521346897Per-token loss scaled by world size: 0.0001379517198074609Per-token loss scaled by world size: 0.00017684763588476926Per-token loss scaled by world size: 3.2488858323631575e-06Per-token loss scaled by world size: 3.9887581806397066e-05 | |
Epoch: 1, Step: 192, Rank: 6, loss = 1.147979974746704 | |
Epoch: 1, Step: 192, Rank: 4, loss = 0.42963337898254395 | |
Epoch: 1, Step: 192, Rank: 0, loss = 0.010118248872458935Epoch: 1, Step: 192, Rank: 2, loss = 0.9849887490272522 | |
Epoch: 1, Step: 192, Rank: 3, loss = 0.7221871018409729Epoch: 1, Step: 192, Rank: 7, loss = 0.5507698655128479 | |
Epoch: 1, Step: 192, Rank: 1, loss = 0.12422488629817963 | |
Per-token loss scaled by world size: 0.00024023951846174896 | |
Epoch: 1, Step: 192, Rank: 5, loss = 0.7481959462165833 | |
total tokens: 6840 num samples: 3 num padding tokens: 728 - rank: 1 max len: 2280 min len: 1767 avg len: 2037.3333333333333 num_loss_counted_tokens: 324 | |
total tokens: 8030 num samples: 10 num padding tokens: 837 - rank: 4 max len: 803 min len: 661 avg len: 719.3 num_loss_counted_tokens: 4290 | |
{ | |
"epoch": 1, | |
"step": 192, | |
"rank": 0, | |
"loss": 0.010118248872458935, | |
"overall_throughput": 42.505572644428966, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.430901527404785, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24915, | |
"batch_size": 77, | |
"total_loss": 0.589762270450592, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:29.701973" | |
} | |
total tokens: 7060 num samples: 5 num padding tokens: 1273 - rank: 2 max len: 1412 min len: 992 avg len: 1157.4 num_loss_counted_tokens: 3150 | |
total tokens: 7864 num samples: 8 num padding tokens: 738 - rank: 3 max len: 983 min len: 809 avg len: 890.75 num_loss_counted_tokens: 4284 | |
total tokens: 7627 num samples: 29 num padding tokens: 3124 - rank: 7 max len: 263 min len: 85 avg len: 155.27586206896552 num_loss_counted_tokens: 1771 | |
total tokens: 7680 num samples: 12 num padding tokens: 957 - rank: 5 max len: 640 min len: 482 avg len: 560.25 num_loss_counted_tokens: 4379 | |
total tokens: 7837 num samples: 17 num padding tokens: 1825 - rank: 6 max len: 461 min len: 273 avg len: 353.6470588235294 num_loss_counted_tokens: 2956 | |
total tokens: 7128 num samples: 2 num padding tokens: 1238 - rank: 0 max len: 3564 min len: 2326 avg len: 2945.0 num_loss_counted_tokens: 463 | |
Per-token loss scaled by world size: 0.00031890295213088393Per-token loss scaled by world size: 0.0001828969834605232Per-token loss scaled by world size: 0.0001293038367293775Per-token loss scaled by world size: 0.00029604701558128 | |
Per-token loss scaled by world size: 0.00024164760543499142 | |
Per-token loss scaled by world size: 0.00021057862613815814Per-token loss scaled by world size: 3.072937033721246e-05 | |
Epoch: 1, Step: 193, Rank: 2, loss = 0.4273168742656708Epoch: 1, Step: 193, Rank: 6, loss = 0.9783613681793213 | |
Epoch: 1, Step: 193, Rank: 1, loss = 0.6044288277626038Epoch: 1, Step: 193, Rank: 5, loss = 1.0538945198059082 | |
Epoch: 1, Step: 193, Rank: 7, loss = 0.7985849380493164 | |
Epoch: 1, Step: 193, Rank: 0, loss = 0.10155288130044937 | |
Epoch: 1, Step: 193, Rank: 4, loss = 0.6959097385406494 | |
Per-token loss scaled by world size: 0.00022084206284489483 | |
Epoch: 1, Step: 193, Rank: 3, loss = 0.7298278212547302 | |
total tokens: 6780 num samples: 5 num padding tokens: 682 - rank: 4 max len: 1356 min len: 1000 avg len: 1219.6 num_loss_counted_tokens: 3729 | |
total tokens: 7341 num samples: 3 num padding tokens: 810 - rank: 1 max len: 2447 min len: 1943 avg len: 2177.0 num_loss_counted_tokens: 1272 | |
{ | |
"epoch": 1, | |
"step": 193, | |
"rank": 0, | |
"loss": 0.10155288130044937, | |
"overall_throughput": 41.24658740563778, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.264228343963623, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26438, | |
"batch_size": 78, | |
"total_loss": 0.6737346053123474, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:32.267001" | |
} | |
total tokens: 7935 num samples: 15 num padding tokens: 1362 - rank: 6 max len: 529 min len: 333 avg len: 438.2 num_loss_counted_tokens: 4416 | |
total tokens: 7840 num samples: 8 num padding tokens: 1773 - rank: 5 max len: 980 min len: 597 avg len: 758.375 num_loss_counted_tokens: 3956 | |
total tokens: 5744 num samples: 2 num padding tokens: 30 - rank: 0 max len: 2872 min len: 2842 avg len: 2857.0 num_loss_counted_tokens: 152 | |
total tokens: 7596 num samples: 4 num padding tokens: 617 - rank: 2 max len: 1899 min len: 1609 avg len: 1744.75 num_loss_counted_tokens: 1330 | |
total tokens: 7644 num samples: 26 num padding tokens: 3101 - rank: 7 max len: 294 min len: 79 avg len: 174.73076923076923 num_loss_counted_tokens: 1861 | |
total tokens: 7405 num samples: 5 num padding tokens: 157 - rank: 3 max len: 1481 min len: 1402 avg len: 1449.6 num_loss_counted_tokens: 2757 | |
Per-token loss scaled by world size: 0.00030297457124106586Per-token loss scaled by world size: 0.000216660147998482Per-token loss scaled by world size: 0.00037809842615388334Per-token loss scaled by world size: 0.00046558064059354365Per-token loss scaled by world size: 0.00023001583758741617 | |
Per-token loss scaled by world size: 6.362871499732137e-05 | |
Per-token loss scaled by world size: 1.6167678040801547e-05 | |
Epoch: 1, Step: 194, Rank: 5, loss = 1.3895835876464844Epoch: 1, Step: 194, Rank: 6, loss = 0.9042654633522034Epoch: 1, Step: 194, Rank: 4, loss = 1.1284819841384888 | |
Epoch: 1, Step: 194, Rank: 1, loss = 0.18990786373615265 | |
Epoch: 1, Step: 194, Rank: 7, loss = 0.6466493010520935 | |
Epoch: 1, Step: 194, Rank: 3, loss = 0.6865110397338867 | |
Epoch: 1, Step: 194, Rank: 0, loss = 0.048254456371068954 | |
Per-token loss scaled by world size: 0.0002928127069026232 | |
Epoch: 1, Step: 194, Rank: 2, loss = 0.873936116695404 | |
total tokens: 7677 num samples: 3 num padding tokens: 483 - rank: 1 max len: 2559 min len: 2205 avg len: 2398.0 num_loss_counted_tokens: 1070 | |
total tokens: 7784 num samples: 8 num padding tokens: 1598 - rank: 4 max len: 973 min len: 692 avg len: 773.25 num_loss_counted_tokens: 3677 | |
{ | |
"epoch": 1, | |
"step": 194, | |
"rank": 0, | |
"loss": 0.048254456371068954, | |
"overall_throughput": 41.96848339305704, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.232909202575684, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23877, | |
"batch_size": 72, | |
"total_loss": 0.7334486842155457, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:34.786179" | |
} | |
total tokens: 6501 num samples: 3 num padding tokens: 1290 - rank: 2 max len: 2167 min len: 1410 avg len: 1737.0 num_loss_counted_tokens: 241 | |
total tokens: 7221 num samples: 29 num padding tokens: 2731 - rank: 7 max len: 249 min len: 78 avg len: 154.82758620689654 num_loss_counted_tokens: 1846 | |
total tokens: 8080 num samples: 20 num padding tokens: 1685 - rank: 6 max len: 404 min len: 251 avg len: 319.75 num_loss_counted_tokens: 3499 | |
total tokens: 6885 num samples: 5 num padding tokens: 603 - rank: 3 max len: 1377 min len: 1090 avg len: 1256.4 num_loss_counted_tokens: 2165 | |
total tokens: 7980 num samples: 12 num padding tokens: 1568 - rank: 5 max len: 665 min len: 429 avg len: 534.3333333333334 num_loss_counted_tokens: 4731 | |
total tokens: 6814 num samples: 2 num padding tokens: 843 - rank: 0 max len: 3407 min len: 2564 avg len: 2985.5 num_loss_counted_tokens: 553 | |
Per-token loss scaled by world size: 0.00011609335342654958Per-token loss scaled by world size: 0.00037096577580086887Per-token loss scaled by world size: 0.000276279344689101Per-token loss scaled by world size: 0.0002262169582536444Per-token loss scaled by world size: 0.0004427096282597631 | |
Per-token loss scaled by world size: 0.00033836739021353424 | |
Per-token loss scaled by world size: 1.8575705325929448e-05 | |
Epoch: 1, Step: 195, Rank: 4, loss = 0.9719303250312805Epoch: 1, Step: 195, Rank: 7, loss = 0.7238518595695496 | |
Epoch: 1, Step: 195, Rank: 2, loss = 0.3041645884513855Epoch: 1, Step: 195, Rank: 0, loss = 0.04866834729909897 | |
Epoch: 1, Step: 195, Rank: 1, loss = 0.5926884412765503 | |
Epoch: 1, Step: 195, Rank: 5, loss = 0.8865225911140442Epoch: 1, Step: 195, Rank: 6, loss = 1.1598992347717285 | |
Per-token loss scaled by world size: 0.00017340479826088995 | |
Epoch: 1, Step: 195, Rank: 3, loss = 0.4543205797672272 | |
total tokens: 6090 num samples: 3 num padding tokens: 726 - rank: 1 max len: 2030 min len: 1643 avg len: 1788.0 num_loss_counted_tokens: 1593 | |
total tokens: 7690 num samples: 10 num padding tokens: 460 - rank: 4 max len: 769 min len: 680 avg len: 723.0 num_loss_counted_tokens: 4881 | |
{ | |
"epoch": 1, | |
"step": 195, | |
"rank": 0, | |
"loss": 0.04866834729909897, | |
"overall_throughput": 41.3547409747129, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.354421615600586, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20960, | |
"batch_size": 83, | |
"total_loss": 0.6427558064460754, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:37.382964" | |
} | |
total tokens: 7967 num samples: 31 num padding tokens: 2761 - rank: 7 max len: 257 min len: 83 avg len: 167.93548387096774 num_loss_counted_tokens: 2276 | |
total tokens: 6960 num samples: 5 num padding tokens: 813 - rank: 2 max len: 1392 min len: 1089 avg len: 1229.4 num_loss_counted_tokens: 1530 | |
total tokens: 7856 num samples: 16 num padding tokens: 1984 - rank: 6 max len: 491 min len: 260 avg len: 367.0 num_loss_counted_tokens: 3780 | |
total tokens: 7602 num samples: 7 num padding tokens: 1100 - rank: 3 max len: 1086 min len: 795 avg len: 928.8571428571429 num_loss_counted_tokens: 5015 | |
total tokens: 7469 num samples: 11 num padding tokens: 1062 - rank: 5 max len: 679 min len: 515 avg len: 582.4545454545455 num_loss_counted_tokens: 4102 | |
total tokens: 6152 num samples: 2 num padding tokens: 117 - rank: 0 max len: 3076 min len: 2959 avg len: 3017.5 num_loss_counted_tokens: 200 | |
Per-token loss scaled by world size: 0.00046574202133342624Per-token loss scaled by world size: 2.0194725038891193e-06Per-token loss scaled by world size: 0.0005153888487257063Per-token loss scaled by world size: 0.0002995604299940169Per-token loss scaled by world size: 0.0004275553219486028Per-token loss scaled by world size: 3.1239229429047555e-05Per-token loss scaled by world size: 3.525143984006718e-05 | |
Epoch: 1, Step: 196, Rank: 6, loss = 1.3931604623794556 | |
Epoch: 1, Step: 196, Rank: 3, loss = 1.2589589357376099Epoch: 1, Step: 196, Rank: 0, loss = 0.005458886735141277 | |
Epoch: 1, Step: 196, Rank: 4, loss = 0.8097493052482605Epoch: 1, Step: 196, Rank: 2, loss = 0.08444354683160782Epoch: 1, Step: 196, Rank: 1, loss = 0.09528905153274536Epoch: 1, Step: 196, Rank: 7, loss = 1.1557354927062988 | |
Per-token loss scaled by world size: 0.0005320304771885276 | |
Epoch: 1, Step: 196, Rank: 5, loss = 1.4381449222564697 | |
total tokens: 6570 num samples: 3 num padding tokens: 733 - rank: 1 max len: 2190 min len: 1806 avg len: 1945.6666666666667 num_loss_counted_tokens: 897 | |
total tokens: 8064 num samples: 9 num padding tokens: 1305 - rank: 4 max len: 896 min len: 651 avg len: 751.0 num_loss_counted_tokens: 4732 | |
{ | |
"epoch": 1, | |
"step": 196, | |
"rank": 0, | |
"loss": 0.005458886735141277, | |
"overall_throughput": 41.48583814053106, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.379756450653076, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21625, | |
"batch_size": 81, | |
"total_loss": 0.7801175117492676, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:39.894850" | |
} | |
total tokens: 7520 num samples: 5 num padding tokens: 1864 - rank: 3 max len: 1504 min len: 905 avg len: 1131.2 num_loss_counted_tokens: 2489 | |
total tokens: 7856 num samples: 16 num padding tokens: 1953 - rank: 6 max len: 491 min len: 301 avg len: 368.9375 num_loss_counted_tokens: 3951 | |
total tokens: 7020 num samples: 4 num padding tokens: 441 - rank: 2 max len: 1755 min len: 1533 avg len: 1644.75 num_loss_counted_tokens: 2927 | |
total tokens: 7965 num samples: 27 num padding tokens: 3021 - rank: 7 max len: 295 min len: 88 avg len: 183.11111111111111 num_loss_counted_tokens: 2199 | |
total tokens: 7226 num samples: 2 num padding tokens: 1285 - rank: 0 max len: 3613 min len: 2328 avg len: 2970.5 num_loss_counted_tokens: 471 | |
total tokens: 7764 num samples: 12 num padding tokens: 807 - rank: 5 max len: 647 min len: 495 avg len: 579.75 num_loss_counted_tokens: 4070 | |
Per-token loss scaled by world size: 0.0001413007703376934Per-token loss scaled by world size: 0.0003590704873204231Per-token loss scaled by world size: 0.000253127800533548 | |
Per-token loss scaled by world size: 0.00044859678018838167Per-token loss scaled by world size: 7.146921416278929e-05 | |
Per-token loss scaled by world size: 0.0002180417359340936Per-token loss scaled by world size: 9.443299404665595e-07 | |
Epoch: 1, Step: 197, Rank: 5, loss = 1.1000573635101318 | |
Epoch: 1, Step: 197, Rank: 2, loss = 0.4328925609588623 | |
Epoch: 1, Step: 197, Rank: 3, loss = 0.7754886150360107 | |
Epoch: 1, Step: 197, Rank: 1, loss = 0.21895486116409302 | |
Epoch: 1, Step: 197, Rank: 4, loss = 1.374332308769226 | |
Epoch: 1, Step: 197, Rank: 7, loss = 0.6679981350898743Epoch: 1, Step: 197, Rank: 0, loss = 0.002893072785809636 | |
Per-token loss scaled by world size: 0.00030740915099158883 | |
Epoch: 1, Step: 197, Rank: 6, loss = 0.9417863488197327 | |
total tokens: 6920 num samples: 4 num padding tokens: 227 - rank: 1 max len: 1730 min len: 1622 avg len: 1673.25 num_loss_counted_tokens: 2502 | |
total tokens: 7749 num samples: 9 num padding tokens: 674 - rank: 4 max len: 861 min len: 729 avg len: 786.1111111111111 num_loss_counted_tokens: 3445 | |
{ | |
"epoch": 1, | |
"step": 197, | |
"rank": 0, | |
"loss": 0.002893072785809636, | |
"overall_throughput": 41.993274454566276, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.219207286834717, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 24509, | |
"batch_size": 94, | |
"total_loss": 0.6893004179000854, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:42.416522" | |
} | |
total tokens: 7620 num samples: 15 num padding tokens: 2321 - rank: 6 max len: 508 min len: 250 avg len: 353.26666666666665 num_loss_counted_tokens: 3281 | |
total tokens: 7766 num samples: 11 num padding tokens: 899 - rank: 5 max len: 706 min len: 532 avg len: 624.2727272727273 num_loss_counted_tokens: 4764 | |
total tokens: 5782 num samples: 2 num padding tokens: 1020 - rank: 0 max len: 2891 min len: 1871 avg len: 2381.0 num_loss_counted_tokens: 1847 | |
total tokens: 8015 num samples: 5 num padding tokens: 1364 - rank: 2 max len: 1603 min len: 1206 avg len: 1330.2 num_loss_counted_tokens: 1802 | |
total tokens: 7968 num samples: 32 num padding tokens: 2753 - rank: 7 max len: 249 min len: 81 avg len: 162.96875 num_loss_counted_tokens: 2041 | |
total tokens: 8008 num samples: 7 num padding tokens: 830 - rank: 3 max len: 1144 min len: 885 avg len: 1025.4285714285713 num_loss_counted_tokens: 4315 | |
Per-token loss scaled by world size: 0.00019664198043756187Per-token loss scaled by world size: 0.0002436544600641355Per-token loss scaled by world size: 7.737488886050414e-06Per-token loss scaled by world size: 0.0004965663538314402Per-token loss scaled by world size: 5.262534159555798e-06 | |
Per-token loss scaled by world size: 0.000477502413559705Per-token loss scaled by world size: 0.00032058294164016843 | |
Epoch: 1, Step: 198, Rank: 3, loss = 0.611785888671875 | |
Epoch: 1, Step: 198, Rank: 0, loss = 0.019427867606282234Epoch: 1, Step: 198, Rank: 6, loss = 1.2468160390853882Epoch: 1, Step: 198, Rank: 2, loss = 0.4937434196472168 | |
Epoch: 1, Step: 198, Rank: 1, loss = 0.013213565573096275 | |
Epoch: 1, Step: 198, Rank: 4, loss = 1.198948860168457Epoch: 1, Step: 198, Rank: 7, loss = 0.8049436807632446 | |
Per-token loss scaled by world size: 0.0005729582044295967 | |
Epoch: 1, Step: 198, Rank: 5, loss = 1.4386264085769653 | |
total tokens: 7452 num samples: 9 num padding tokens: 706 - rank: 4 max len: 828 min len: 701 avg len: 749.5555555555555 num_loss_counted_tokens: 4285 | |
total tokens: 7940 num samples: 5 num padding tokens: 529 - rank: 1 max len: 1588 min len: 1397 avg len: 1482.2 num_loss_counted_tokens: 3826 | |
total tokens: 7634 num samples: 11 num padding tokens: 983 - rank: 5 max len: 694 min len: 488 avg len: 604.6363636363636 num_loss_counted_tokens: 4895 | |
{ | |
"epoch": 1, | |
"step": 198, | |
"rank": 0, | |
"loss": 0.019427867606282234, | |
"overall_throughput": 41.69130867994642, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.38558578491211, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20087, | |
"batch_size": 77, | |
"total_loss": 0.7284382581710815, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:44.992595" | |
} | |
total tokens: 7712 num samples: 16 num padding tokens: 2400 - rank: 6 max len: 482 min len: 260 avg len: 332.0 num_loss_counted_tokens: 2891 | |
total tokens: 7206 num samples: 3 num padding tokens: 834 - rank: 0 max len: 2402 min len: 1692 avg len: 2124.0 num_loss_counted_tokens: 304 | |
total tokens: 7511 num samples: 29 num padding tokens: 2662 - rank: 7 max len: 259 min len: 84 avg len: 167.20689655172413 num_loss_counted_tokens: 1997 | |
total tokens: 7902 num samples: 6 num padding tokens: 937 - rank: 2 max len: 1317 min len: 1043 avg len: 1160.8333333333333 num_loss_counted_tokens: 5167 | |
total tokens: 7928 num samples: 8 num padding tokens: 746 - rank: 3 max len: 991 min len: 833 avg len: 897.75 num_loss_counted_tokens: 4239 | |
Per-token loss scaled by world size: 0.00035409454721957445Per-token loss scaled by world size: 0.00035435750032775104Per-token loss scaled by world size: 0.00030246065580286086Per-token loss scaled by world size: 9.443990165891591e-06 | |
Per-token loss scaled by world size: 1.890566181828035e-06Per-token loss scaled by world size: 0.00016794257680885494 | |
Per-token loss scaled by world size: 0.00030859385151416063 | |
Epoch: 1, Step: 199, Rank: 3, loss = 0.9465774893760681 | |
Epoch: 1, Step: 199, Rank: 1, loss = 0.0050501748919487Epoch: 1, Step: 199, Rank: 0, loss = 0.0252272579818964 | |
Epoch: 1, Step: 199, Rank: 2, loss = 0.8079480528831482Epoch: 1, Step: 199, Rank: 4, loss = 0.9458750486373901 | |
Epoch: 1, Step: 199, Rank: 5, loss = 0.8243313431739807 | |
Epoch: 1, Step: 199, Rank: 7, loss = 0.4486165940761566 | |
Per-token loss scaled by world size: 0.00041021130164153874 | |
Epoch: 1, Step: 199, Rank: 6, loss = 1.095776915550232 | |
total tokens: 7592 num samples: 13 num padding tokens: 445 - rank: 4 max len: 584 min len: 509 avg len: 549.7692307692307 num_loss_counted_tokens: 5215 | |
total tokens: 6915 num samples: 5 num padding tokens: 522 - rank: 1 max len: 1383 min len: 1143 avg len: 1278.6 num_loss_counted_tokens: 1111 | |
total tokens: 8064 num samples: 36 num padding tokens: 2321 - rank: 7 max len: 224 min len: 71 avg len: 159.52777777777777 num_loss_counted_tokens: 2309 | |
{ | |
"epoch": 1, | |
"step": 199, | |
"rank": 0, | |
"loss": 0.0252272579818964, | |
"overall_throughput": 42.03544117506762, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.382384777069092, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21370, | |
"batch_size": 70, | |
"total_loss": 0.6374253630638123, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:47.471975" | |
} | |
total tokens: 7920 num samples: 16 num padding tokens: 1517 - rank: 5 max len: 495 min len: 348 avg len: 400.1875 num_loss_counted_tokens: 4230 | |
total tokens: 7693 num samples: 7 num padding tokens: 1201 - rank: 2 max len: 1099 min len: 763 avg len: 927.4285714285714 num_loss_counted_tokens: 3819 | |
total tokens: 7610 num samples: 10 num padding tokens: 880 - rank: 3 max len: 761 min len: 590 avg len: 673.0 num_loss_counted_tokens: 4994 | |
total tokens: 8004 num samples: 23 num padding tokens: 1596 - rank: 6 max len: 348 min len: 229 avg len: 278.60869565217394 num_loss_counted_tokens: 3462 | |
total tokens: 7608 num samples: 3 num padding tokens: 1669 - rank: 0 max len: 2536 min len: 1608 avg len: 1979.6666666666667 num_loss_counted_tokens: 938 | |
Per-token loss scaled by world size: 0.0003396017709746957Per-token loss scaled by world size: 0.0003830210189335048Per-token loss scaled by world size: 0.000320710358209908 | |
Per-token loss scaled by world size: 0.0005103153525851667Per-token loss scaled by world size: 0.0006997347227297723 | |
Per-token loss scaled by world size: 1.8154447616325342e-06 | |
Epoch: 1, Step: 200, Rank: 6, loss = 0.8558957576751709 | |
Epoch: 1, Step: 200, Rank: 3, loss = 0.9063122272491455Epoch: 1, Step: 200, Rank: 2, loss = 1.022187352180481 | |
Epoch: 1, Step: 200, Rank: 5, loss = 1.3619040250778198 | |
Epoch: 1, Step: 200, Rank: 4, loss = 1.8674170970916748 | |
Per-token loss scaled by world size: 3.414201637497172e-05 | |
Epoch: 1, Step: 200, Rank: 0, loss = 0.0048449682071805 | |
Per-token loss scaled by world size: 5.252029586699791e-05 | |
Epoch: 1, Step: 200, Rank: 7, loss = 0.09111650288105011 | |
Epoch: 1, Step: 200, Rank: 1, loss = 0.14016354084014893 | |
total tokens: 7851 num samples: 3 num padding tokens: 1199 - rank: 1 max len: 2617 min len: 1961 avg len: 2217.3333333333335 num_loss_counted_tokens: 624 | |
total tokens: 8019 num samples: 27 num padding tokens: 2910 - rank: 7 max len: 297 min len: 79 avg len: 189.22222222222223 num_loss_counted_tokens: 2220 | |
total tokens: 7539 num samples: 7 num padding tokens: 949 - rank: 4 max len: 1077 min len: 826 avg len: 941.4285714285714 num_loss_counted_tokens: 4098 | |
{ | |
"epoch": 1, | |
"step": 200, | |
"rank": 0, | |
"loss": 0.0048449682071805, | |
"overall_throughput": 41.06569124957365, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.254440784454346, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21350, | |
"batch_size": 73, | |
"total_loss": 0.7812302112579346, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:50.045786" | |
} | |
total tokens: 7458 num samples: 11 num padding tokens: 2562 - rank: 6 max len: 678 min len: 317 avg len: 445.09090909090907 num_loss_counted_tokens: 2861 | |
total tokens: 7362 num samples: 9 num padding tokens: 766 - rank: 5 max len: 818 min len: 687 avg len: 732.8888888888889 num_loss_counted_tokens: 4108 | |
total tokens: 7554 num samples: 6 num padding tokens: 667 - rank: 3 max len: 1259 min len: 1083 avg len: 1147.8333333333333 num_loss_counted_tokens: 931 | |
total tokens: 5794 num samples: 2 num padding tokens: 43 - rank: 0 max len: 2897 min len: 2854 avg len: 2875.5 num_loss_counted_tokens: 721 | |
total tokens: 7444 num samples: 4 num padding tokens: 878 - rank: 2 max len: 1861 min len: 1304 avg len: 1641.5 num_loss_counted_tokens: 1667 | |
Per-token loss scaled by world size: 0.0003313705965410918Per-token loss scaled by world size: 0.00037777406396344304Per-token loss scaled by world size: 0.00013144082913640887Per-token loss scaled by world size: 0.00039307758561335504 | |
Per-token loss scaled by world size: 0.00033154338598251343 | |
Per-token loss scaled by world size: 1.7019568986142986e-05 | |
Per-token loss scaled by world size: 5.37650066689821e-06 | |
Epoch: 1, Step: 201, Rank: 6, loss = 0.9256009459495544 | |
Epoch: 1, Step: 201, Rank: 2, loss = 0.3671470880508423 | |
Epoch: 1, Step: 201, Rank: 0, loss = 0.04753991216421127Epoch: 1, Step: 201, Rank: 3, loss = 1.0552173852920532 | |
Epoch: 1, Step: 201, Rank: 4, loss = 1.0979639291763306 | |
Epoch: 1, Step: 201, Rank: 7, loss = 0.9260835647583008 | |
Epoch: 1, Step: 201, Rank: 1, loss = 0.015017910860478878 | |
Per-token loss scaled by world size: 0.0004141141544096172 | |
Epoch: 1, Step: 201, Rank: 5, loss = 1.1567243337631226 | |
total tokens: 7734 num samples: 3 num padding tokens: 874 - rank: 1 max len: 2578 min len: 2141 avg len: 2286.6666666666665 num_loss_counted_tokens: 217 | |
total tokens: 7511 num samples: 7 num padding tokens: 887 - rank: 4 max len: 1073 min len: 833 avg len: 946.2857142857143 num_loss_counted_tokens: 5956 | |
{ | |
"epoch": 1, | |
"step": 201, | |
"rank": 0, | |
"loss": 0.04753991216421127, | |
"overall_throughput": 42.99976331813581, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.05379819869995, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 22346, | |
"batch_size": 78, | |
"total_loss": 0.6989118456840515, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:52.499632" | |
} | |
total tokens: 6544 num samples: 2 num padding tokens: 303 - rank: 0 max len: 3272 min len: 2969 avg len: 3120.5 num_loss_counted_tokens: 203 | |
total tokens: 8040 num samples: 15 num padding tokens: 2803 - rank: 6 max len: 536 min len: 243 avg len: 349.1333333333333 num_loss_counted_tokens: 2637 | |
total tokens: 6354 num samples: 3 num padding tokens: 420 - rank: 2 max len: 2118 min len: 1749 avg len: 1978.0 num_loss_counted_tokens: 450 | |
total tokens: 7452 num samples: 9 num padding tokens: 1025 - rank: 5 max len: 828 min len: 555 avg len: 714.1111111111111 num_loss_counted_tokens: 3110 | |
total tokens: 7005 num samples: 5 num padding tokens: 460 - rank: 3 max len: 1401 min len: 1187 avg len: 1309.0 num_loss_counted_tokens: 3212 | |
total tokens: 6902 num samples: 29 num padding tokens: 3026 - rank: 7 max len: 238 min len: 78 avg len: 133.6551724137931 num_loss_counted_tokens: 1459 | |
Per-token loss scaled by world size: 0.00022836425341665745Per-token loss scaled by world size: 0.0001815208961488679Per-token loss scaled by world size: 8.564612653572112e-05Per-token loss scaled by world size: 0.0004483639495447278 | |
Per-token loss scaled by world size: 0.00014287869271356612 | |
Per-token loss scaled by world size: 0.0001819162571337074 | |
Per-token loss scaled by world size: 0.0001619806425878778 | |
Epoch: 1, Step: 202, Rank: 2, loss = 0.5714277625083923 | |
Epoch: 1, Step: 202, Rank: 5, loss = 1.411449670791626 | |
Epoch: 1, Step: 202, Rank: 1, loss = 0.26961401104927063 | |
Epoch: 1, Step: 202, Rank: 3, loss = 0.7188906669616699 | |
Epoch: 1, Step: 202, Rank: 0, loss = 0.4497821033000946 | |
Epoch: 1, Step: 202, Rank: 4, loss = 0.5726723670959473 | |
Epoch: 1, Step: 202, Rank: 7, loss = 0.5099150538444519 | |
Per-token loss scaled by world size: 0.0003626368416007608 | |
Epoch: 1, Step: 202, Rank: 6, loss = 1.1415808200836182 | |
total tokens: 7960 num samples: 4 num padding tokens: 506 - rank: 1 max len: 1990 min len: 1681 avg len: 1863.5 num_loss_counted_tokens: 885 | |
total tokens: 8028 num samples: 9 num padding tokens: 771 - rank: 4 max len: 892 min len: 741 avg len: 806.3333333333334 num_loss_counted_tokens: 4521 | |
{ | |
"epoch": 1, | |
"step": 202, | |
"rank": 0, | |
"loss": 0.4497821033000946, | |
"overall_throughput": 41.605158110193564, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.336650848388672, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 25184, | |
"batch_size": 99, | |
"total_loss": 0.7056666016578674, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:55.026480" | |
} | |
total tokens: 8040 num samples: 30 num padding tokens: 2761 - rank: 7 max len: 268 min len: 77 avg len: 175.96666666666667 num_loss_counted_tokens: 2391 | |
total tokens: 8090 num samples: 5 num padding tokens: 1606 - rank: 2 max len: 1618 min len: 1055 avg len: 1296.8 num_loss_counted_tokens: 2154 | |
total tokens: 7315 num samples: 7 num padding tokens: 483 - rank: 3 max len: 1045 min len: 932 avg len: 976.0 num_loss_counted_tokens: 4783 | |
total tokens: 7920 num samples: 16 num padding tokens: 1848 - rank: 6 max len: 495 min len: 269 avg len: 379.5 num_loss_counted_tokens: 3451 | |
total tokens: 7579 num samples: 11 num padding tokens: 1108 - rank: 5 max len: 689 min len: 497 avg len: 588.2727272727273 num_loss_counted_tokens: 4657 | |
total tokens: 7665 num samples: 3 num padding tokens: 442 - rank: 0 max len: 2555 min len: 2201 avg len: 2407.6666666666665 num_loss_counted_tokens: 2954 | |
Per-token loss scaled by world size: 0.0001610679319128394Per-token loss scaled by world size: 0.000670548586640507Per-token loss scaled by world size: 5.642953510687221e-06Per-token loss scaled by world size: 0.0005202414467930794 | |
Per-token loss scaled by world size: 0.00042746320832520723Per-token loss scaled by world size: 1.2397128557495307e-05Per-token loss scaled by world size: 0.0004940929939039052 | |
Epoch: 1, Step: 203, Rank: 5, loss = 1.5487158298492432 | |
Epoch: 1, Step: 203, Rank: 2, loss = 0.37200650572776794 | |
Epoch: 1, Step: 203, Rank: 0, loss = 0.013033106923103333Epoch: 1, Step: 203, Rank: 6, loss = 1.2015626430511475 | |
Epoch: 1, Step: 203, Rank: 1, loss = 0.028632719069719315 | |
Epoch: 1, Step: 203, Rank: 4, loss = 1.141169548034668 | |
Epoch: 1, Step: 203, Rank: 7, loss = 0.9872797131538391 | |
Per-token loss scaled by world size: 0.00014795419701840729 | |
Epoch: 1, Step: 203, Rank: 3, loss = 0.3417187035083771 | |
total tokens: 5726 num samples: 2 num padding tokens: 666 - rank: 1 max len: 2863 min len: 2197 avg len: 2530.0 num_loss_counted_tokens: 483 | |
total tokens: 7280 num samples: 8 num padding tokens: 395 - rank: 4 max len: 910 min len: 794 avg len: 860.625 num_loss_counted_tokens: 4783 | |
{ | |
"epoch": 1, | |
"step": 203, | |
"rank": 0, | |
"loss": 0.013033106923103333, | |
"overall_throughput": 41.573804652467835, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.309247970581055, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18477, | |
"batch_size": 80, | |
"total_loss": 0.7042648792266846, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:56:57.572853" | |
} | |
total tokens: 7865 num samples: 13 num padding tokens: 1975 - rank: 6 max len: 605 min len: 321 avg len: 453.0769230769231 num_loss_counted_tokens: 4477 | |
total tokens: 7068 num samples: 4 num padding tokens: 740 - rank: 2 max len: 1767 min len: 1462 avg len: 1582.0 num_loss_counted_tokens: 1481 | |
total tokens: 6864 num samples: 22 num padding tokens: 2832 - rank: 7 max len: 312 min len: 80 avg len: 183.27272727272728 num_loss_counted_tokens: 1790 | |
total tokens: 7750 num samples: 10 num padding tokens: 711 - rank: 5 max len: 775 min len: 654 avg len: 703.9 num_loss_counted_tokens: 2490 | |
total tokens: 7506 num samples: 6 num padding tokens: 1303 - rank: 3 max len: 1251 min len: 942 avg len: 1033.8333333333333 num_loss_counted_tokens: 3894 | |
total tokens: 7992 num samples: 2 num padding tokens: 51 - rank: 0 max len: 3996 min len: 3945 avg len: 3970.5 num_loss_counted_tokens: 176 | |
Per-token loss scaled by world size: 0.0005467137671075761Per-token loss scaled by world size: 0.0001344903139397502Per-token loss scaled by world size: 2.761472160273115e-06 | |
Per-token loss scaled by world size: 0.0002821572998072952 | |
Per-token loss scaled by world size: 0.00027734090690501034 | |
Per-token loss scaled by world size: 4.0628190618008375e-05 | |
Epoch: 1, Step: 204, Rank: 2, loss = 0.3588033616542816 | |
Epoch: 1, Step: 204, Rank: 5, loss = 1.4585639238357544 | |
Epoch: 1, Step: 204, Rank: 0, loss = 0.007367262616753578 | |
Epoch: 1, Step: 204, Rank: 4, loss = 0.7527604103088379 | |
Per-token loss scaled by world size: 0.00022612858447246253Epoch: 1, Step: 204, Rank: 1, loss = 0.1083909347653389 | |
Epoch: 1, Step: 204, Rank: 3, loss = 0.7399108409881592 | |
Per-token loss scaled by world size: 0.0005020878161303699 | |
Epoch: 1, Step: 204, Rank: 7, loss = 0.6032828092575073 | |
Epoch: 1, Step: 204, Rank: 6, loss = 1.3395075798034668 | |
total tokens: 6300 num samples: 3 num padding tokens: 1167 - rank: 1 max len: 2100 min len: 1431 avg len: 1711.0 num_loss_counted_tokens: 1709 | |
total tokens: 7578 num samples: 9 num padding tokens: 661 - rank: 4 max len: 842 min len: 721 avg len: 768.5555555555555 num_loss_counted_tokens: 5406 | |
{ | |
"epoch": 1, | |
"step": 204, | |
"rank": 0, | |
"loss": 0.007367262616753578, | |
"overall_throughput": 42.3404581250765, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.448826789855957, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21343, | |
"batch_size": 69, | |
"total_loss": 0.6710734367370605, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:57:00.072190" | |
} | |
total tokens: 7812 num samples: 28 num padding tokens: 2702 - rank: 7 max len: 279 min len: 93 avg len: 182.5 num_loss_counted_tokens: 2270 | |
total tokens: 8109 num samples: 17 num padding tokens: 1674 - rank: 6 max len: 477 min len: 282 avg len: 378.52941176470586 num_loss_counted_tokens: 3720 | |
total tokens: 7854 num samples: 11 num padding tokens: 1060 - rank: 5 max len: 714 min len: 484 avg len: 617.6363636363636 num_loss_counted_tokens: 5323 | |
total tokens: 5918 num samples: 2 num padding tokens: 56 - rank: 0 max len: 2959 min len: 2903 avg len: 2931.0 num_loss_counted_tokens: 173 | |
total tokens: 7658 num samples: 7 num padding tokens: 462 - rank: 3 max len: 1094 min len: 870 avg len: 1028.0 num_loss_counted_tokens: 4939 | |
total tokens: 8088 num samples: 6 num padding tokens: 933 - rank: 2 max len: 1348 min len: 1098 avg len: 1192.5 num_loss_counted_tokens: 6453 | |
Per-token loss scaled by world size: 0.0004229408223181963Per-token loss scaled by world size: 6.994641353230691e-06Per-token loss scaled by world size: 2.8473236852732953e-06Per-token loss scaled by world size: 0.0008069836185313761Per-token loss scaled by world size: 0.00010431646660435945Per-token loss scaled by world size: 0.00042939934064634144Per-token loss scaled by world size: 0.0006417424301616848 | |
Epoch: 1, Step: 205, Rank: 2, loss = 0.24320079386234283Epoch: 1, Step: 205, Rank: 6, loss = 1.8813813924789429 | |
Epoch: 1, Step: 205, Rank: 4, loss = 1.0010908842086792Epoch: 1, Step: 205, Rank: 7, loss = 0.9860336780548096Epoch: 1, Step: 205, Rank: 0, loss = 0.016307132318615913 | |
Epoch: 1, Step: 205, Rank: 5, loss = 1.4961422681808472Epoch: 1, Step: 205, Rank: 1, loss = 0.006638179067522287 | |
Per-token loss scaled by world size: 0.00024272690643556416 | |
Epoch: 1, Step: 205, Rank: 3, loss = 0.565887451171875 | |
total tokens: 7544 num samples: 4 num padding tokens: 665 - rank: 1 max len: 1886 min len: 1535 avg len: 1719.75 num_loss_counted_tokens: 918 | |
total tokens: 7810 num samples: 10 num padding tokens: 835 - rank: 4 max len: 781 min len: 634 avg len: 697.5 num_loss_counted_tokens: 3430 | |
{ | |
"epoch": 1, | |
"step": 205, | |
"rank": 0, | |
"loss": 0.016307132318615913, | |
"overall_throughput": 41.43704192183203, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.372108459472656, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 18651, | |
"batch_size": 75, | |
"total_loss": 0.7745852470397949, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:57:02.626787" | |
} | |
total tokens: 7680 num samples: 6 num padding tokens: 744 - rank: 2 max len: 1280 min len: 1064 avg len: 1156.0 num_loss_counted_tokens: 6296 | |
total tokens: 6860 num samples: 28 num padding tokens: 2293 - rank: 7 max len: 245 min len: 78 avg len: 163.10714285714286 num_loss_counted_tokens: 1793 | |
total tokens: 8112 num samples: 13 num padding tokens: 1290 - rank: 5 max len: 624 min len: 454 avg len: 524.7692307692307 num_loss_counted_tokens: 5007 | |
total tokens: 8046 num samples: 18 num padding tokens: 1903 - rank: 6 max len: 447 min len: 248 avg len: 341.27777777777777 num_loss_counted_tokens: 3492 | |
total tokens: 5766 num samples: 2 num padding tokens: 993 - rank: 0 max len: 2883 min len: 1890 avg len: 2386.5 num_loss_counted_tokens: 1080 | |
total tokens: 7856 num samples: 8 num padding tokens: 604 - rank: 3 max len: 982 min len: 795 avg len: 906.5 num_loss_counted_tokens: 3800 | |
Per-token loss scaled by world size: 0.0005271086702123284Per-token loss scaled by world size: 0.00045938679249957204Per-token loss scaled by world size: 3.3147989597637206e-06Per-token loss scaled by world size: 2.3308498384722043e-06Per-token loss scaled by world size: 0.0002847542054951191Per-token loss scaled by world size: 0.0002816052583511919 | |
Per-token loss scaled by world size: 0.00022615509806200862 | |
Epoch: 1, Step: 206, Rank: 1, loss = 0.005856843199580908Epoch: 1, Step: 206, Rank: 6, loss = 1.324492335319519Epoch: 1, Step: 206, Rank: 0, loss = 0.008329261094331741Epoch: 1, Step: 206, Rank: 2, loss = 0.7076036334037781 | |
Epoch: 1, Step: 206, Rank: 3, loss = 0.7155161499977112Epoch: 1, Step: 206, Rank: 4, loss = 1.1543241739273071 | |
Epoch: 1, Step: 206, Rank: 7, loss = 0.5682712197303772 | |
Per-token loss scaled by world size: 0.0005236774450168014 | |
Epoch: 1, Step: 206, Rank: 5, loss = 1.3158705234527588 | |
total tokens: 7600 num samples: 5 num padding tokens: 1200 - rank: 4 max len: 1520 min len: 945 avg len: 1280.0 num_loss_counted_tokens: 2473 | |
total tokens: 6054 num samples: 2 num padding tokens: 522 - rank: 1 max len: 3027 min len: 2505 avg len: 2766.0 num_loss_counted_tokens: 167 | |
total tokens: 6924 num samples: 3 num padding tokens: 422 - rank: 2 max len: 2308 min len: 2080 avg len: 2167.3333333333335 num_loss_counted_tokens: 427 | |
{ | |
"epoch": 1, | |
"step": 206, | |
"rank": 0, | |
"loss": 0.008329261094331741, | |
"overall_throughput": 41.21002850546722, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.469753742218018, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 20102, | |
"batch_size": 76, | |
"total_loss": 0.7250330448150635, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:57:05.194525" | |
} | |
total tokens: 7872 num samples: 24 num padding tokens: 2881 - rank: 7 max len: 328 min len: 83 avg len: 207.95833333333334 num_loss_counted_tokens: 2171 | |
total tokens: 7536 num samples: 8 num padding tokens: 1132 - rank: 5 max len: 942 min len: 619 avg len: 800.5 num_loss_counted_tokens: 4424 | |
total tokens: 7956 num samples: 13 num padding tokens: 1551 - rank: 6 max len: 612 min len: 344 avg len: 492.6923076923077 num_loss_counted_tokens: 3611 | |
total tokens: 8112 num samples: 4 num padding tokens: 1080 - rank: 3 max len: 2028 min len: 1605 avg len: 1758.0 num_loss_counted_tokens: 520 | |
total tokens: 7128 num samples: 2 num padding tokens: 321 - rank: 0 max len: 3564 min len: 3243 avg len: 3403.5 num_loss_counted_tokens: 173 | |
Per-token loss scaled by world size: 0.0001850179978646338Per-token loss scaled by world size: 3.4617110031831544e-06Per-token loss scaled by world size: 0.0002683971542865038Per-token loss scaled by world size: 0.0003899486910086125Per-token loss scaled by world size: 0.00042667845264077187Per-token loss scaled by world size: 8.096924830169883e-06 | |
Per-token loss scaled by world size: 0.00022364444157574326 | |
Epoch: 1, Step: 207, Rank: 2, loss = 0.7098768949508667Epoch: 1, Step: 207, Rank: 0, loss = 0.009155793115496635 | |
Epoch: 1, Step: 207, Rank: 6, loss = 1.0313655138015747 | |
Epoch: 1, Step: 207, Rank: 3, loss = 0.48934948444366455Epoch: 1, Step: 207, Rank: 1, loss = 0.021415354683995247 | |
Epoch: 1, Step: 207, Rank: 4, loss = 1.1285111904144287 | |
Epoch: 1, Step: 207, Rank: 7, loss = 0.591511607170105 | |
Per-token loss scaled by world size: 0.00048212718684226274 | |
Epoch: 1, Step: 207, Rank: 5, loss = 1.2751661539077759 | |
total tokens: 7845 num samples: 5 num padding tokens: 421 - rank: 1 max len: 1569 min len: 1404 avg len: 1484.8 num_loss_counted_tokens: 4961 | |
total tokens: 7887 num samples: 11 num padding tokens: 927 - rank: 4 max len: 717 min len: 576 avg len: 632.7272727272727 num_loss_counted_tokens: 3254 | |
total tokens: 6467 num samples: 29 num padding tokens: 1852 - rank: 7 max len: 223 min len: 79 avg len: 159.13793103448276 num_loss_counted_tokens: 1984 | |
{ | |
"epoch": 1, | |
"step": 207, | |
"rank": 0, | |
"loss": 0.009155793115496635, | |
"overall_throughput": 41.547768908524304, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.43647813796997, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21159, | |
"batch_size": 69, | |
"total_loss": 0.6570440530776978, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:57:07.743363" | |
} | |
total tokens: 7980 num samples: 20 num padding tokens: 2050 - rank: 6 max len: 399 min len: 224 avg len: 296.5 num_loss_counted_tokens: 3262 | |
total tokens: 8024 num samples: 8 num padding tokens: 757 - rank: 3 max len: 1003 min len: 749 avg len: 908.375 num_loss_counted_tokens: 5777 | |
total tokens: 8022 num samples: 14 num padding tokens: 1040 - rank: 5 max len: 573 min len: 420 avg len: 498.7142857142857 num_loss_counted_tokens: 4255 | |
total tokens: 7974 num samples: 6 num padding tokens: 1244 - rank: 2 max len: 1329 min len: 1027 avg len: 1121.6666666666667 num_loss_counted_tokens: 4559 | |
total tokens: 5482 num samples: 2 num padding tokens: 958 - rank: 0 max len: 2741 min len: 1783 avg len: 2262.0 num_loss_counted_tokens: 672 | |
Per-token loss scaled by world size: 0.0004824527713935822Per-token loss scaled by world size: 0.000384941027732566Per-token loss scaled by world size: 0.0005819547805003822Per-token loss scaled by world size: 0.0003608883998822421 | |
Per-token loss scaled by world size: 0.00046005993499420583Per-token loss scaled by world size: 5.933908323640935e-05 | |
Per-token loss scaled by world size: 1.4446718523686286e-05 | |
Epoch: 1, Step: 208, Rank: 6, loss = 1.4061481952667236 | |
Epoch: 1, Step: 208, Rank: 3, loss = 1.1657265424728394 | |
Epoch: 1, Step: 208, Rank: 5, loss = 0.9301137924194336Epoch: 1, Step: 208, Rank: 1, loss = 0.14337806403636932 | |
Epoch: 1, Step: 208, Rank: 7, loss = 0.8719965815544128 | |
Epoch: 1, Step: 208, Rank: 0, loss = 0.03490688279271126 | |
Epoch: 1, Step: 208, Rank: 4, loss = 1.1116198301315308 | |
Per-token loss scaled by world size: 0.0002515815431252122 | |
Epoch: 1, Step: 208, Rank: 2, loss = 0.607883870601654 | |
total tokens: 6438 num samples: 2 num padding tokens: 348 - rank: 1 max len: 3219 min len: 2871 avg len: 3045.0 num_loss_counted_tokens: 205 | |
total tokens: 7539 num samples: 7 num padding tokens: 636 - rank: 4 max len: 1077 min len: 894 avg len: 986.1428571428571 num_loss_counted_tokens: 4629 | |
{ | |
"epoch": 1, | |
"step": 208, | |
"rank": 0, | |
"loss": 0.03490688279271126, | |
"overall_throughput": 40.69133919277144, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.454561710357666, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19330, | |
"batch_size": 86, | |
"total_loss": 0.7839717268943787, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:57:10.341855" | |
} | |
total tokens: 7893 num samples: 9 num padding tokens: 1279 - rank: 5 max len: 877 min len: 621 avg len: 734.8888888888889 num_loss_counted_tokens: 3674 | |
total tokens: 7657 num samples: 13 num padding tokens: 2094 - rank: 6 max len: 589 min len: 288 avg len: 427.9230769230769 num_loss_counted_tokens: 3574 | |
total tokens: 5474 num samples: 2 num padding tokens: 292 - rank: 2 max len: 2737 min len: 2445 avg len: 2591.0 num_loss_counted_tokens: 182 | |
total tokens: 7275 num samples: 5 num padding tokens: 1073 - rank: 3 max len: 1455 min len: 1087 avg len: 1240.4 num_loss_counted_tokens: 3573 | |
total tokens: 6975 num samples: 25 num padding tokens: 2126 - rank: 7 max len: 279 min len: 79 avg len: 193.96 num_loss_counted_tokens: 2228 | |
total tokens: 7194 num samples: 2 num padding tokens: 95 - rank: 0 max len: 3597 min len: 3502 avg len: 3549.5 num_loss_counted_tokens: 205 | |
Per-token loss scaled by world size: 0.0006150374538265169Per-token loss scaled by world size: 0.00033614260610193014Per-token loss scaled by world size: 0.0003573091235011816Per-token loss scaled by world size: 0.0002815852640196681 | |
Per-token loss scaled by world size: 0.00023183257144410163 | |
Per-token loss scaled by world size: 2.870956450351514e-05 | |
Per-token loss scaled by world size: 2.8179058062960394e-05 | |
Epoch: 1, Step: 209, Rank: 6, loss = 0.96549391746521Epoch: 1, Step: 209, Rank: 4, loss = 0.9082993268966675 | |
Epoch: 1, Step: 209, Rank: 5, loss = 1.6619080305099487Epoch: 1, Step: 209, Rank: 2, loss = 0.7608785629272461 | |
Epoch: 1, Step: 209, Rank: 7, loss = 0.6264405846595764 | |
Epoch: 1, Step: 209, Rank: 0, loss = 0.07757683098316193 | |
Epoch: 1, Step: 209, Rank: 1, loss = 0.07614333927631378 | |
Per-token loss scaled by world size: 0.00017559101979713887 | |
Epoch: 1, Step: 209, Rank: 3, loss = 0.4744688868522644 | |
total tokens: 7210 num samples: 5 num padding tokens: 1177 - rank: 1 max len: 1442 min len: 1085 avg len: 1206.6 num_loss_counted_tokens: 3033 | |
total tokens: 7776 num samples: 9 num padding tokens: 1371 - rank: 4 max len: 864 min len: 614 avg len: 711.6666666666666 num_loss_counted_tokens: 2696 | |
{ | |
"epoch": 1, | |
"step": 209, | |
"rank": 0, | |
"loss": 0.07757683098316193, | |
"overall_throughput": 41.93509249352839, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.419190883636475, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21617, | |
"batch_size": 86, | |
"total_loss": 0.6939011812210083, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:57:12.865160" | |
} | |
total tokens: 7826 num samples: 13 num padding tokens: 1029 - rank: 5 max len: 602 min len: 416 avg len: 522.8461538461538 num_loss_counted_tokens: 4387 | |
total tokens: 7904 num samples: 19 num padding tokens: 1674 - rank: 6 max len: 416 min len: 259 avg len: 327.89473684210526 num_loss_counted_tokens: 3581 | |
total tokens: 7758 num samples: 3 num padding tokens: 1159 - rank: 0 max len: 2586 min len: 1912 avg len: 2199.6666666666665 num_loss_counted_tokens: 346 | |
total tokens: 7568 num samples: 8 num padding tokens: 373 - rank: 3 max len: 946 min len: 874 avg len: 899.375 num_loss_counted_tokens: 4817 | |
total tokens: 7441 num samples: 7 num padding tokens: 404 - rank: 2 max len: 1063 min len: 956 avg len: 1005.2857142857143 num_loss_counted_tokens: 3350 | |
total tokens: 7967 num samples: 31 num padding tokens: 2428 - rank: 7 max len: 257 min len: 80 avg len: 178.67741935483872 num_loss_counted_tokens: 2210 | |
Per-token loss scaled by world size: 0.0002890804025810212Per-token loss scaled by world size: 0.0005653423140756786Per-token loss scaled by world size: 0.000670413370244205Per-token loss scaled by world size: 9.497793507762253e-05Per-token loss scaled by world size: 2.160387111871387e-06 | |
Per-token loss scaled by world size: 9.657991176936775e-05 | |
Per-token loss scaled by world size: 0.00031631108140572906 | |
Epoch: 1, Step: 210, Rank: 5, loss = 1.631869912147522Epoch: 1, Step: 210, Rank: 2, loss = 0.23118816316127777 | |
Epoch: 1, Step: 210, Rank: 1, loss = 0.2350875735282898Epoch: 1, Step: 210, Rank: 4, loss = 1.3761138916015625Epoch: 1, Step: 210, Rank: 3, loss = 0.703657865524292 | |
Epoch: 1, Step: 210, Rank: 0, loss = 0.005258652381598949 | |
Epoch: 1, Step: 210, Rank: 7, loss = 0.7699407339096069 | |
Per-token loss scaled by world size: 0.000578251841943711 | |
Epoch: 1, Step: 210, Rank: 6, loss = 1.4075372219085693 | |
total tokens: 7904 num samples: 8 num padding tokens: 750 - rank: 4 max len: 988 min len: 839 avg len: 894.25 num_loss_counted_tokens: 5760 | |
total tokens: 7317 num samples: 3 num padding tokens: 636 - rank: 1 max len: 2439 min len: 1946 avg len: 2227.0 num_loss_counted_tokens: 845 | |
{ | |
"epoch": 1, | |
"step": 210, | |
"rank": 0, | |
"loss": 0.005258652381598949, | |
"overall_throughput": 42.45218456081296, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.25443983078003, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 19473, | |
"batch_size": 68, | |
"total_loss": 0.7950817346572876, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:57:15.357201" | |
} | |
total tokens: 7389 num samples: 9 num padding tokens: 1138 - rank: 5 max len: 821 min len: 562 avg len: 694.5555555555555 num_loss_counted_tokens: 4077 | |
total tokens: 7860 num samples: 15 num padding tokens: 2296 - rank: 6 max len: 524 min len: 263 avg len: 370.93333333333334 num_loss_counted_tokens: 3347 | |
total tokens: 6396 num samples: 26 num padding tokens: 1827 - rank: 7 max len: 246 min len: 83 avg len: 175.73076923076923 num_loss_counted_tokens: 2093 | |
total tokens: 7170 num samples: 6 num padding tokens: 733 - rank: 3 max len: 1195 min len: 1001 avg len: 1072.8333333333333 num_loss_counted_tokens: 2366 | |
total tokens: 7220 num samples: 4 num padding tokens: 329 - rank: 2 max len: 1805 min len: 1615 avg len: 1722.75 num_loss_counted_tokens: 1729 | |
total tokens: 7124 num samples: 2 num padding tokens: 264 - rank: 0 max len: 3562 min len: 3298 avg len: 3430.0 num_loss_counted_tokens: 186 | |
Per-token loss scaled by world size: 0.0008867266005836427Per-token loss scaled by world size: 0.00016864115605130792Per-token loss scaled by world size: 3.4453678381396458e-06 | |
Per-token loss scaled by world size: 7.723766611889005e-05Per-token loss scaled by world size: 7.638386159669608e-05Per-token loss scaled by world size: 0.0004412019916344434 | |
Per-token loss scaled by world size: 0.00035067120916210115 | |
Epoch: 1, Step: 211, Rank: 5, loss = 1.9709715843200684Epoch: 1, Step: 211, Rank: 3, loss = 0.3748471140861511 | |
Epoch: 1, Step: 211, Rank: 1, loss = 0.16978223621845245Epoch: 1, Step: 211, Rank: 0, loss = 0.1716800183057785 | |
Epoch: 1, Step: 211, Rank: 2, loss = 0.007658191490918398 | |
Epoch: 1, Step: 211, Rank: 4, loss = 0.9806817173957825Epoch: 1, Step: 211, Rank: 7, loss = 0.7794544100761414 | |
Per-token loss scaled by world size: 0.0005940343835391104 | |
Epoch: 1, Step: 211, Rank: 6, loss = 1.320389986038208 | |
total tokens: 6598 num samples: 2 num padding tokens: 510 - rank: 1 max len: 3299 min len: 2789 avg len: 3044.0 num_loss_counted_tokens: 202 | |
total tokens: 7126 num samples: 7 num padding tokens: 665 - rank: 4 max len: 1018 min len: 822 avg len: 923.0 num_loss_counted_tokens: 4429 | |
{ | |
"epoch": 1, | |
"step": 211, | |
"rank": 0, | |
"loss": 0.1716800183057785, | |
"overall_throughput": 42.027410025626445, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.381672859191895, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 17782, | |
"batch_size": 82, | |
"total_loss": 0.721933126449585, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:57:17.873318" | |
} | |
total tokens: 8049 num samples: 3 num padding tokens: 1897 - rank: 2 max len: 2683 min len: 1693 avg len: 2050.6666666666665 num_loss_counted_tokens: 2086 | |
total tokens: 6725 num samples: 25 num padding tokens: 2254 - rank: 7 max len: 269 min len: 87 avg len: 178.84 num_loss_counted_tokens: 2062 | |
total tokens: 7750 num samples: 10 num padding tokens: 610 - rank: 5 max len: 775 min len: 681 avg len: 714.0 num_loss_counted_tokens: 2811 | |
total tokens: 8060 num samples: 13 num padding tokens: 2361 - rank: 6 max len: 620 min len: 304 avg len: 438.38461538461536 num_loss_counted_tokens: 3659 | |
total tokens: 7035 num samples: 5 num padding tokens: 813 - rank: 3 max len: 1407 min len: 1064 avg len: 1244.4 num_loss_counted_tokens: 3760 | |
total tokens: 7708 num samples: 2 num padding tokens: 291 - rank: 0 max len: 3854 min len: 3563 avg len: 3708.5 num_loss_counted_tokens: 181 | |
Per-token loss scaled by world size: 0.00016636037616990507Per-token loss scaled by world size: 0.00016077600594144315Per-token loss scaled by world size: 0.00041039849747903645Per-token loss scaled by world size: 0.00037365706521086395 | |
Per-token loss scaled by world size: 0.00032838378683663905Per-token loss scaled by world size: 0.0003097376029472798 | |
Per-token loss scaled by world size: 3.766906047530938e-06 | |
Epoch: 1, Step: 212, Rank: 5, loss = 1.1992356777191162 | |
Epoch: 1, Step: 212, Rank: 3, loss = 0.46980756521224976 | |
Epoch: 1, Step: 212, Rank: 1, loss = 1.0918726921081543Epoch: 1, Step: 212, Rank: 2, loss = 0.48612579703330994 | |
Epoch: 1, Step: 212, Rank: 0, loss = 0.011007370427250862 | |
Epoch: 1, Step: 212, Rank: 4, loss = 0.9595785140991211 | |
Epoch: 1, Step: 212, Rank: 7, loss = 0.9050920009613037 | |
Per-token loss scaled by world size: 0.00043126812670379877 | |
Epoch: 1, Step: 212, Rank: 6, loss = 1.2602193355560303 | |
total tokens: 7704 num samples: 8 num padding tokens: 1465 - rank: 4 max len: 963 min len: 685 avg len: 779.875 num_loss_counted_tokens: 4503 | |
total tokens: 8007 num samples: 3 num padding tokens: 648 - rank: 1 max len: 2669 min len: 2232 avg len: 2453.0 num_loss_counted_tokens: 287 | |
{ | |
"epoch": 1, | |
"step": 212, | |
"rank": 0, | |
"loss": 0.011007370427250862, | |
"overall_throughput": 42.88947039606486, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.303375244140625, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23377, | |
"batch_size": 85, | |
"total_loss": 0.7978672981262207, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:57:20.342573" | |
} | |
total tokens: 7920 num samples: 12 num padding tokens: 1442 - rank: 5 max len: 660 min len: 470 avg len: 539.8333333333334 num_loss_counted_tokens: 4323 | |
total tokens: 6372 num samples: 3 num padding tokens: 828 - rank: 2 max len: 2124 min len: 1644 avg len: 1848.0 num_loss_counted_tokens: 677 | |
total tokens: 7533 num samples: 27 num padding tokens: 2411 - rank: 7 max len: 279 min len: 75 avg len: 189.7037037037037 num_loss_counted_tokens: 2282 | |
total tokens: 7255 num samples: 5 num padding tokens: 1237 - rank: 3 max len: 1451 min len: 980 avg len: 1203.6 num_loss_counted_tokens: 3251 | |
total tokens: 7388 num samples: 2 num padding tokens: 920 - rank: 0 max len: 3694 min len: 2774 avg len: 3234.0 num_loss_counted_tokens: 589 | |
total tokens: 7973 num samples: 17 num padding tokens: 1660 - rank: 6 max len: 469 min len: 291 avg len: 371.3529411764706 num_loss_counted_tokens: 3245 | |
Per-token loss scaled by world size: 0.0003884605539496988Per-token loss scaled by world size: 0.0004556115891318768Per-token loss scaled by world size: 0.0003792895295191556Per-token loss scaled by world size: 0.00035762478364631534Per-token loss scaled by world size: 0.00014603856834582984 | |
Per-token loss scaled by world size: 3.432124140090309e-05 | |
Per-token loss scaled by world size: 8.249920938396826e-05 | |
Epoch: 1, Step: 213, Rank: 6, loss = 1.0305296182632446 | |
Epoch: 1, Step: 213, Rank: 4, loss = 1.2378966808319092Epoch: 1, Step: 213, Rank: 7, loss = 0.9716665744781494 | |
Epoch: 1, Step: 213, Rank: 3, loss = 0.39678677916526794 | |
Epoch: 1, Step: 213, Rank: 0, loss = 0.0932508111000061Epoch: 1, Step: 213, Rank: 2, loss = 1.0554473400115967 | |
Epoch: 1, Step: 213, Rank: 1, loss = 0.22415034472942352 | |
Per-token loss scaled by world size: 0.00048252404667437077 | |
Epoch: 1, Step: 213, Rank: 5, loss = 1.3110178709030151 | |
total tokens: 7245 num samples: 3 num padding tokens: 751 - rank: 1 max len: 2415 min len: 2008 avg len: 2164.6666666666665 num_loss_counted_tokens: 1753 | |
total tokens: 8024 num samples: 8 num padding tokens: 872 - rank: 4 max len: 1003 min len: 763 avg len: 894.0 num_loss_counted_tokens: 3911 | |
{ | |
"epoch": 1, | |
"step": 213, | |
"rank": 0, | |
"loss": 0.0932508111000061, | |
"overall_throughput": 42.19836853236888, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.430901527404785, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 21736, | |
"batch_size": 78, | |
"total_loss": 0.7900933623313904, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:57:22.852352" | |
} | |
total tokens: 7104 num samples: 24 num padding tokens: 3028 - rank: 7 max len: 296 min len: 82 avg len: 169.83333333333334 num_loss_counted_tokens: 1782 | |
total tokens: 7856 num samples: 16 num padding tokens: 1303 - rank: 6 max len: 491 min len: 317 avg len: 409.5625 num_loss_counted_tokens: 3722 | |
total tokens: 8107 num samples: 11 num padding tokens: 1365 - rank: 5 max len: 737 min len: 516 avg len: 612.9090909090909 num_loss_counted_tokens: 5379 | |
total tokens: 7816 num samples: 4 num padding tokens: 627 - rank: 2 max len: 1954 min len: 1579 avg len: 1797.25 num_loss_counted_tokens: 1262 | |
total tokens: 7015 num samples: 5 num padding tokens: 1303 - rank: 3 max len: 1403 min len: 1004 avg len: 1142.4 num_loss_counted_tokens: 2463 | |
total tokens: 7132 num samples: 2 num padding tokens: 1084 - rank: 0 max len: 3566 min len: 2482 avg len: 3024.0 num_loss_counted_tokens: 887 | |
Per-token loss scaled by world size: 0.0001603560958756134Per-token loss scaled by world size: 0.00011656123388092965Per-token loss scaled by world size: 0.00013124111865181476Per-token loss scaled by world size: 0.0004040842177346349Per-token loss scaled by world size: 0.0002459329552948475 | |
Per-token loss scaled by world size: 0.00028513988945633173 | |
Per-token loss scaled by world size: 0.0004030088894069195 | |
Epoch: 1, Step: 214, Rank: 2, loss = 0.4810081720352173 | |
Epoch: 1, Step: 214, Rank: 0, loss = 0.3936741352081299 | |
Epoch: 1, Step: 214, Rank: 1, loss = 0.34963998198509216 | |
Epoch: 1, Step: 214, Rank: 6, loss = 1.2121011018753052 | |
Epoch: 1, Step: 214, Rank: 7, loss = 0.7377066612243652 | |
Epoch: 1, Step: 214, Rank: 4, loss = 0.855312705039978 | |
Epoch: 1, Step: 214, Rank: 5, loss = 1.2088755369186401 | |
Per-token loss scaled by world size: 0.00028479599859565496 | |
Epoch: 1, Step: 214, Rank: 3, loss = 0.8542811870574951 | |
total tokens: 7895 num samples: 5 num padding tokens: 1253 - rank: 1 max len: 1579 min len: 1089 avg len: 1328.4 num_loss_counted_tokens: 1673 | |
total tokens: 7480 num samples: 11 num padding tokens: 884 - rank: 4 max len: 680 min len: 516 avg len: 599.6363636363636 num_loss_counted_tokens: 4908 | |
{ | |
"epoch": 1, | |
"step": 214, | |
"rank": 0, | |
"loss": 0.3936741352081299, | |
"overall_throughput": 41.376296140594235, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.258357048034668, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 23997, | |
"batch_size": 85, | |
"total_loss": 0.761574923992157, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:57:25.410853" | |
} | |
total tokens: 8085 num samples: 35 num padding tokens: 2995 - rank: 7 max len: 231 min len: 82 avg len: 145.42857142857142 num_loss_counted_tokens: 1964 | |
total tokens: 6426 num samples: 2 num padding tokens: 1555 - rank: 0 max len: 3213 min len: 1658 avg len: 2435.5 num_loss_counted_tokens: 537 | |
total tokens: 7791 num samples: 21 num padding tokens: 1378 - rank: 6 max len: 371 min len: 236 avg len: 305.3809523809524 num_loss_counted_tokens: 3598 | |
total tokens: 7740 num samples: 15 num padding tokens: 793 - rank: 5 max len: 516 min len: 375 avg len: 463.1333333333333 num_loss_counted_tokens: 3953 | |
total tokens: 8104 num samples: 8 num padding tokens: 972 - rank: 2 max len: 1013 min len: 810 avg len: 891.5 num_loss_counted_tokens: 5296 | |
total tokens: 8100 num samples: 10 num padding tokens: 449 - rank: 3 max len: 810 min len: 698 avg len: 765.1 num_loss_counted_tokens: 3926 | |
Per-token loss scaled by world size: 0.00016885650984477252Per-token loss scaled by world size: 0.00022565454128198326Per-token loss scaled by world size: 0.00031425835913978517Per-token loss scaled by world size: 7.733783036201203e-07 | |
Per-token loss scaled by world size: 0.00022409454686567187 | |
Per-token loss scaled by world size: 0.00017783122893888503 | |
Per-token loss scaled by world size: 0.00018609287508297712 | |
Epoch: 1, Step: 215, Rank: 2, loss = 0.7786210179328918 | |
Epoch: 1, Step: 215, Rank: 5, loss = 1.084348440170288Epoch: 1, Step: 215, Rank: 1, loss = 0.5826393961906433 | |
Epoch: 1, Step: 215, Rank: 0, loss = 0.002668541856110096 | |
Epoch: 1, Step: 215, Rank: 6, loss = 0.7732382416725159 | |
Epoch: 1, Step: 215, Rank: 4, loss = 0.6136066317558289 | |
Epoch: 1, Step: 215, Rank: 7, loss = 0.642113447189331 | |
Per-token loss scaled by world size: 0.00024193401623051614 | |
Epoch: 1, Step: 215, Rank: 3, loss = 0.8347933292388916 | |
total tokens: 7668 num samples: 9 num padding tokens: 1000 - rank: 4 max len: 852 min len: 647 avg len: 740.8888888888889 num_loss_counted_tokens: 4699 | |
total tokens: 8060 num samples: 5 num padding tokens: 515 - rank: 1 max len: 1612 min len: 1420 avg len: 1509.0 num_loss_counted_tokens: 2649 | |
{ | |
"epoch": 1, | |
"step": 215, | |
"rank": 0, | |
"loss": 0.002668541856110096, | |
"overall_throughput": 41.217687956745806, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 24.42807674407959, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 27604, | |
"batch_size": 87, | |
"total_loss": 0.6640036106109619, | |
"gradnorm": 0.8729047775268555, | |
"weight_norm": 433.0433044433594, | |
"timestamp": "2024-08-18T20:57:27.975308" | |
} | |
total tokens: 7025 num samples: 5 num padding tokens: 744 - rank: 2 max len: 1405 min len: 1116 avg len: 1256.2 num_loss_counted_tokens: 4200 | |
total tokens: 7672 num samples: 7 num padding tokens: 693 - rank: 3 max len: 1096 min len: 866 avg len: 997.0 num_loss_counted_tokens: 5684 | |
total tokens: 8112 num samples: 13 num padding tokens: 563 - rank: 5 max len: 624 min len: 528 avg len: 580.6923076923077 num_loss_counted_tokens: 5251 | |
total tokens: 7856 num samples: 16 num padding tokens: 1465 - rank: 6 max len: 491 min len: 299 avg len: 399.4375 num_loss_counted_tokens: 3069 | |
total tokens: 7812 num samples: 28 num padding tokens: 2476 - rank: 7 max len: 279 min len: 86 avg len: 190.57142857142858 num_loss_counted_tokens: 2263 | |
total tokens: 6210 num samples: 3 num padding tokens: 683 - rank: 0 max len: 2070 min len: 1654 avg len: 1842.3333333333333 num_loss_counted_tokens: 1033 | |
Per-token loss scaled by world size: 0.00016547582345083356Per-token loss scaled by world size: 0.0002693594142328948Per-token loss scaled by world size: 0.00031613183091394603Per-token loss scaled by world size: 5.018114825361408e-05 | |
Per-token loss scaled by world size: 0.0004250952915754169Per-token loss scaled by world size: 0.00035303577897138894Per-token loss scaled by world size: 5.689787940355018e-05 | |
Epoch: 1, Step: 216, Rank: 5, loss = 1.0305107831954956 | |
Epoch: 1, Step: 216, Rank: 2, loss = 0.5394098162651062 | |
Epoch: 1, Step: 216, Rank: 7, loss = 0.8780443072319031 | |
Epoch: 1, Step: 216, Rank: 1, loss = 0.16357800364494324Epoch: 1, Step: 216, Rank: 0, loss = 0.18547286093235016 | |
Epoch: 1, Step: 216, Rank: 4, loss = 1.3857043981552124 | |
Epoch: 1, Step: 216, Rank: 3, loss = 1.150808334350586 | |
Per-token loss scaled by world size: 0.00032035927870310843 | |
Epoch: 1, Step: 216, Rank: 6, loss = 1.0442911386489868 | |
[2024-08-18 20:57:30,497] [INFO] [logging.py:96:log_dist] [Rank 0] step=6, skipped=0, lr=[4.800000000000001e-06], mom=[(0.9, 0.95)] | |
[2024-08-18 20:57:30,575] [INFO] [timer.py:258:stop] epoch=0/micro_step=216/global_step=6, RunningAvgSamplesPerSec=41.682838913878484, CurrSamplesPerSec=41.70386986575945, MemAllocated=22.89GB, MaxMemAllocated=30.61GB | |
total tokens: 8096 num samples: 11 num padding tokens: 403 - rank: 4 max len: 736 min len: 676 avg len: 699.3636363636364 num_loss_counted_tokens: 3366 | |
total tokens: 7895 num samples: 5 num padding tokens: 1102 - rank: 1 max len: 1579 min len: 1157 avg len: 1358.6 num_loss_counted_tokens: 2579 | |
{ | |
"epoch": 1, | |
"step": 216, | |
"rank": 0, | |
"loss": 0.18547286093235016, | |
"overall_throughput": 40.61028464966421, | |
"lr": 4.800000000000001e-06, | |
"cuda_mem_allocated": 22.89185380935669, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 26078, | |
"batch_size": 113, | |
"total_loss": 0.7972275614738464, | |
"gradnorm": 0.7308346033096313, | |
"weight_norm": 433.0433349609375, | |
"timestamp": "2024-08-18T20:57:30.638455" | |
} | |
total tokens: 7917 num samples: 7 num padding tokens: 347 - rank: 2 max len: 1131 min len: 1017 avg len: 1081.4285714285713 num_loss_counted_tokens: 5762 | |
total tokens: 7920 num samples: 12 num padding tokens: 1053 - rank: 5 max len: 660 min len: 459 avg len: 572.25 num_loss_counted_tokens: 3495 | |
total tokens: 7784 num samples: 8 num padding tokens: 989 - rank: 3 max len: 973 min len: 781 avg len: 849.375 num_loss_counted_tokens: 3166 | |
total tokens: 7786 num samples: 17 num padding tokens: 2184 - rank: 6 max len: 458 min len: 242 avg len: 329.52941176470586 num_loss_counted_tokens: 3198 | |
total tokens: 7887 num samples: 33 num padding tokens: 2326 - rank: 7 max len: 239 min len: 82 avg len: 168.5151515151515 num_loss_counted_tokens: 2289 | |
total tokens: 7284 num samples: 3 num padding tokens: 783 - rank: 0 max len: 2428 min len: 1739 avg len: 2167.0 num_loss_counted_tokens: 406 | |
Per-token loss scaled by world size: 0.0005763740628026426Per-token loss scaled by world size: 0.0009438088163733482Per-token loss scaled by world size: 6.679360376438126e-05Per-token loss scaled by world size: 0.0001800585159799084Per-token loss scaled by world size: 0.0003732589539140463Per-token loss scaled by world size: 7.849858957342803e-05 | |
Per-token loss scaled by world size: 0.00011912822810700163 | |
Epoch: 1, Step: 217, Rank: 1, loss = 0.1438567191362381 | |
Epoch: 1, Step: 217, Rank: 5, loss = 2.0327281951904297Epoch: 1, Step: 217, Rank: 3, loss = 0.38780102133750916 | |
Epoch: 1, Step: 217, Rank: 6, loss = 1.241365671157837Epoch: 1, Step: 217, Rank: 0, loss = 0.16906633973121643Epoch: 1, Step: 217, Rank: 4, loss = 0.8039065003395081 | |
Epoch: 1, Step: 217, Rank: 2, loss = 0.256572425365448 | |
Per-token loss scaled by world size: 0.0005111052305437624 | |
Epoch: 1, Step: 217, Rank: 7, loss = 1.1007928848266602 | |
total tokens: 6974 num samples: 2 num padding tokens: 562 - rank: 1 max len: 3487 min len: 2925 avg len: 3206.0 num_loss_counted_tokens: 216 | |
total tokens: 7728 num samples: 7 num padding tokens: 633 - rank: 4 max len: 1104 min len: 930 avg len: 1013.5714285714286 num_loss_counted_tokens: 5349 | |
{ | |
"epoch": 1, | |
"step": 217, | |
"rank": 0, | |
"loss": 0.16906633973121643, | |
"overall_throughput": 41.92863106467066, | |
"lr": 4.800000000000001e-06, | |
"cuda_mem_allocated": 24.260313034057617, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 17230, | |
"batch_size": 69, | |
"total_loss": 0.767011284828186, | |
"gradnorm": 0.7308346033096313, | |
"weight_norm": 433.0433349609375, | |
"timestamp": "2024-08-18T20:57:33.123706" | |
} | |
total tokens: 4065 num samples: 1 num padding tokens: 0 - rank: 0 max len: 4065 min len: 4065 avg len: 4065.0 num_loss_counted_tokens: 82 | |
total tokens: 6741 num samples: 3 num padding tokens: 1158 - rank: 2 max len: 2247 min len: 1528 avg len: 1861.0 num_loss_counted_tokens: 930 | |
total tokens: 7992 num samples: 27 num padding tokens: 3238 - rank: 7 max len: 296 min len: 81 avg len: 176.07407407407408 num_loss_counted_tokens: 2221 | |
total tokens: 7632 num samples: 12 num padding tokens: 2411 - rank: 6 max len: 636 min len: 301 avg len: 435.0833333333333 num_loss_counted_tokens: 3582 | |
total tokens: 7080 num samples: 5 num padding tokens: 814 - rank: 3 max len: 1416 min len: 1134 avg len: 1253.2 num_loss_counted_tokens: 2449 | |
total tokens: 8091 num samples: 9 num padding tokens: 1045 - rank: 5 max len: 899 min len: 643 avg len: 782.8888888888889 num_loss_counted_tokens: 5684 | |
Per-token loss scaled by world size: 0.0003471940290182829Per-token loss scaled by world size: 0.0005411332240328193Per-token loss scaled by world size: 6.277004104049411e-06Per-token loss scaled by world size: 0.0005339714116416872 | |
Per-token loss scaled by world size: 4.73601221528952e-06 | |
Per-token loss scaled by world size: 0.00031345669412985444 | |
Epoch: 1, Step: 218, Rank: 5, loss = 1.1664127111434937 | |
Per-token loss scaled by world size: 5.614342057924659e-07Epoch: 1, Step: 218, Rank: 1, loss = 0.010208474472165108 | |
Epoch: 1, Step: 218, Rank: 3, loss = 0.748376727104187Epoch: 1, Step: 218, Rank: 0, loss = 0.013530082069337368 | |
Epoch: 1, Step: 218, Rank: 4, loss = 1.1509753465652466 | |
Epoch: 1, Step: 218, Rank: 7, loss = 0.6756559014320374 | |
Epoch: 1, Step: 218, Rank: 2, loss = 0.0012101713800802827Per-token loss scaled by world size: 0.00039466869202442467 | |
Epoch: 1, Step: 218, Rank: 6, loss = 0.8507083654403687 | |
total tokens: 6480 num samples: 3 num padding tokens: 206 - rank: 1 max len: 2160 min len: 1983 avg len: 2091.3333333333335 num_loss_counted_tokens: 1674 | |
total tokens: 7760 num samples: 10 num padding tokens: 447 - rank: 4 max len: 776 min len: 688 avg len: 7 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment