write _shard_map; refactor flash attention to support 5d inputs. #8730

qihqi · 2025-02-21T00:25:13Z

Jax's shard_map works by enabling manual sharding on inputs, and disabling it on outputs.
here we introduce _shard_map to simulate this behavior. This is sufficient to support use cases of calling pallas.
It is not sufficient to support other use cases of shard_map, such as desire to use manual collectives. Because current collective implementation, such as xm.all_gather checks fail with manual sharding.

torch_xla/experimental/custom_kernel.py

tengyifei

Makes sense to refactor flash_attention to be shard_map(local_flash_attention)

test/test_pallas_spmd.py

torch_xla/experimental/custom_kernel.py

test/test_pallas_spmd.py

pgmoka · 2025-03-01T00:02:10Z

test/test_pallas_spmd.py

+    q = torch.randn(4, 2, 2, 128, 4).to("xla")
+    k = torch.randn(4, 2, 2, 128, 4).to("xla")
+    v = torch.randn(4, 2, 2, 128, 4).to("xla")


In test_flash_attention_spmd_data_parallel, , (8, 2, 128, 8) is used. Would the equivalent here be something like (8, 2, 2, 128, 8)?

(8, 2, 2, 128, 8) would also work.
In general, the axis to shard has to be multiple of number of devices in this axis.
In this case the mesh is (n_devices // 2, 2, 1,1,1) so it's 4,2, 1,1,1; therefore having 4,2 as leading dim is sufficient.

For TPUs with 4 devices it would be 2,2,1,1,1 which 4,2 is still multiple-of. Therefore this configuration is general.

test/test_pallas_spmd.py

tengyifei · 2025-03-01T02:32:42Z

Still need to fix TPU test which looks relevant

qihqi force-pushed the hanq_shard_map branch from d5c7ae7 to af2bbf6 Compare February 21, 2025 00:37

pgmoka reviewed Feb 24, 2025

View reviewed changes

torch_xla/experimental/custom_kernel.py Outdated Show resolved Hide resolved

torch_xla/experimental/custom_kernel.py Show resolved Hide resolved

torch_xla/experimental/custom_kernel.py Outdated Show resolved Hide resolved

qihqi force-pushed the hanq_shard_map branch 2 times, most recently from 0a27ca4 to 8a5a29f Compare February 27, 2025 05:19

qihqi requested a review from pgmoka February 27, 2025 05:19

qihqi changed the title ~~write _shard_map; refactor flash attention to use it.~~ write _shard_map; refactor flash attention to support 5d inputs. Feb 27, 2025

qihqi requested a review from tengyifei February 27, 2025 05:25

qihqi force-pushed the hanq_shard_map branch 2 times, most recently from d885e7b to 8983df7 Compare February 28, 2025 04:21

tengyifei requested changes Feb 28, 2025

View reviewed changes

pgmoka reviewed Mar 1, 2025

View reviewed changes

qihqi requested review from pgmoka and tengyifei March 1, 2025 01:02

qihqi added 8 commits March 1, 2025 01:03

write _shard_map; refactor flash attention to use it.

d953631

fix error & comment

00a7843

add 5d output

576cd3d

stash

6ccaf03

Fix tests

651fc5c

lint

3342330

fix unit tests

4677c45

yapf

1119146

qihqi force-pushed the hanq_shard_map branch from cecefde to 876b771 Compare March 1, 2025 01:06

tengyifei approved these changes Mar 1, 2025

View reviewed changes

test/test_pallas_spmd.py Show resolved Hide resolved

comments

b14a500

qihqi force-pushed the hanq_shard_map branch from 876b771 to b14a500 Compare March 2, 2025 23:27

qihqi merged commit cee0820 into master Mar 3, 2025
23 checks passed

pgmoka pushed a commit that referenced this pull request Mar 5, 2025

write _shard_map; refactor flash attention to support 5d inputs. (#8730)

4a71be2

pgmoka pushed a commit that referenced this pull request Mar 5, 2025

write _shard_map; refactor flash attention to support 5d inputs. (#8730)

494df92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write _shard_map; refactor flash attention to support 5d inputs. #8730

write _shard_map; refactor flash attention to support 5d inputs. #8730

qihqi commented Feb 21, 2025

tengyifei left a comment

pgmoka Mar 1, 2025

qihqi Mar 1, 2025

tengyifei commented Mar 1, 2025

write _shard_map; refactor flash attention to support 5d inputs. #8730

write _shard_map; refactor flash attention to support 5d inputs. #8730

Conversation

qihqi commented Feb 21, 2025

tengyifei left a comment

Choose a reason for hiding this comment

pgmoka Mar 1, 2025

Choose a reason for hiding this comment

qihqi Mar 1, 2025

Choose a reason for hiding this comment

tengyifei commented Mar 1, 2025