Prototype API to build global tensor using local processes #8842

pgmoka · 2025-03-17T17:28:05Z

As a sub-item for enabling 2D sharding with minibatch=True, we want to create a protoype of SPMD that uses local batching. This will hopefully be the beginning of an implementation that might substitute the current workaround.

pgmoka · 2025-04-03T17:40:00Z

We plan on creating this prototype based on the existing mark_sharding. We need an API such that each host TransferShardsToDevice for the devices they are associated with. We can consider the existing pipeline for mark_sharding:

mark_sharding -> _xla_mark_sharding -> XlaMarkSharding -> CreateTensorsData -> CreateShardedData -> TransferShardsToDevice

In our prototype, we will do something like:

create_global_tensor_from_local_process_data -> _create_global_tensor_from_local_process_data -> XlaGlobalTensorFromLocalProcessData -> CreateGlobalTensorData -> CreateGlobalShardedData -> TransferShardsToDevice

pgmoka · 2025-04-15T17:18:36Z

I am still working on this prototype; I wanted to give a follow-up in the progress:

I have done an attempt of prototyping by just trying to create CreateGlobalTensorData -> CreateGlobalShardedData -> TransferShardsToDevice, and creating something like load_local_shards_. I however ran into a series of issues related object creation.

I have increased the scope of this prototype for the entire flow cited in #8842 (comment).

I have ran into a couple type coversion issues with xla::Shape, but I currently believe there is a path forward by leveraging xla::ShapeUtil::MakeShape. This will hopefully unblock us to run some tests.

pgmoka added the distributed label Mar 17, 2025

pgmoka self-assigned this Mar 17, 2025

ysiraichi added the enhancement label Mar 18, 2025

pgmoka changed the title ~~Prototype SPMD using local process data~~ Prototype API to build global tensor using local processes Mar 27, 2025

pgmoka linked a pull request Apr 3, 2025 that will close this issue

add CreateGlobalShardedData prototype #8932

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype API to build global tensor using local processes #8842

Prototype API to build global tensor using local processes #8842

pgmoka commented Mar 17, 2025

pgmoka commented Apr 3, 2025 •

edited

Loading

pgmoka commented Apr 15, 2025

Prototype API to build global tensor using local processes #8842

Prototype API to build global tensor using local processes #8842

Comments

pgmoka commented Mar 17, 2025

pgmoka commented Apr 3, 2025 • edited Loading

pgmoka commented Apr 15, 2025

pgmoka commented Apr 3, 2025 •

edited

Loading