Skip to content

Commit 784801f

Browse files
FanhaiLu1wang2yn84
authored andcommitted
Fixed exhausted bug between head and workers (#163)
* add xla2 fix * update jax version * revert jax TPU version
1 parent 743c0e5 commit 784801f

File tree

3 files changed

+3
-2
lines changed

3 files changed

+3
-2
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,7 @@ Note: Get address ip and port information from ray head.
184184
Here is an example to run the server with ray for llama2 7B model:
185185

186186
```bash
187+
export DISABLE_XLA2_PJRT_TEST="true"
187188
python run_server_with_ray.py --tpu_chips=16 --num_hosts=4 --worker_chips=4 -model_name=$model_name --size=7b --batch_size=96 --max_cache_length=2048 --quantize_weights=$quantize --quantize_type=$quantize_type --quantize_kv_cache=$quantize --checkpoint_path=$output_ckpt_dir --tokenizer_path=$tokenizer_path --sharding_config="default_shardings/llama.yaml"
188189
```
189190

deps/xla

Submodule xla updated 298 files

run_interactive_multiple_host.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ def create_engine():
5656
sharding_config=FLAGS.sharding_config,
5757
num_hosts=_NUM_HOSTS.value,
5858
worker_chips=_WORKER_CHIPS.value,
59-
tpu_chips=_TPU_CHIPS,
59+
tpu_chips=_TPU_CHIPS.value,
6060
)
6161

6262
print("Initialize engine", time.perf_counter() - start)

0 commit comments

Comments
 (0)