pytorch
diff --git a/‎.github/workflows/torch_xla2.yml
Lines changed: 1 addition & 2 deletions b/‎.github/workflows/torch_xla2.yml
Lines changed: 1 addition & 2 deletions
diff --git a/‎torchax/README.md
Lines changed: 158 additions & 93 deletions b/‎torchax/README.md
Lines changed: 158 additions & 93 deletions
diff --git a/‎torchax/pyproject.toml
Lines changed: 31 additions & 10 deletions b/‎torchax/pyproject.toml
Lines changed: 31 additions & 10 deletions
@@ -39,7 +39,6 @@ jobs:
         run: |
           pip install -r test-requirements.txt
           pip install -e .[cpu]
-          pip install tensorflow-cpu  # for TF integrations tests
       - name: Run tests
         working-directory: torchax
         shell: bash
@@ -52,7 +51,7 @@ jobs:
           pytest test/test_context.py
           pytest test/test_train.py
           pytest test/test_mutations.py
-          pytest test/test_tf_integration.py
+          # pytest test/test_tf_integration.py # TODO(8770)
           pytest test/gemma/test_gemma.py
           pytest test/llama/test_llama.py
           pytest test/test_core_aten_ops.py
 
@@ -1,72 +1,90 @@
-# torchxla2
+# torchax: Running PyTorch on TPU
 
-## Install
-
-Currently this is only source-installable. Requires Python version >= 3.10.
+**torchax!** is a backend for PyTorch, allowing users to run
+PyTorch on Google CloudTPUs. **torchax!** is also a library for providing
+graph-level interoperability between PyTorch and Jax.
 
-### NOTE:
+This means, with **torchax** you can:
+* Run PyTorch code on TPU with as little as 2 lines of code change.
+* Call a jax function from a pytorch function, passing in `jax.Array`s
+* Call a pytorch function from a jax function, passing in a `torch.Tensor` subclass.
+* Use jax features such as `jax.grad`, `optax` and `GSMPD` to train a Pytorch model.
+* Use a Pytorch model as feature extractor and use it with a Jax model.
+etc etc.
 
-Please don't install torch-xla from instructions in
-https://github.com/pytorch/xla/blob/master/CONTRIBUTING.md .
-In particular, the following are not needed:
+## Install
 
-* There is no need to build pytorch/pytorch from source.
-* There is no need to clone pytorch/xla project inside of pytorch/pytorch
-  git checkout.
 
+### On Google Cloud TPU:
+First install torch CPU:
 
-TorchXLA2 and torch-xla have different installation instructions, please follow
-the instructions below from scratch (fresh venv / conda environment.)
+```bash
+pip install torch --index-url https://download.pytorch.org/whl/cpu
+```
 
+Then install jax TPU:
 
-### 1. Installing `torchax`
+```bash
+pip install -U jax[tpu]
+```
 
-The following instructions assume you are in the `torchax` directory:
+Finally install torchax
 
-```
-Fork the repository
-$ git clone https://github.com/<github_username>/xla.git
-$ cd xla/torchax
+```bash
+pip install torchax
 ```
 
+### On GPU machines:
+First install torch CPU:
 
-#### 1.0 (recommended) Make a virtualenv / conda env
+```bash
+pip install torch --index-url https://download.pytorch.org/whl/cpu
+```
 
-If you are using VSCode, then [you can create a new environment from
-UI](https://code.visualstudio.com/docs/python/environments). Select the
-`dev-requirements.txt` when asked to install project dependencies.
+Then install jax CUDA:
 
-Otherwise create a new environment from the command line.
+```bash
+pip install -U jax[cuda12]
+```
+
+Finally install torchax
 
 ```bash
-# Option 1: venv
-python -m venv my_venv
-source my_venv/bin/activate
+pip install torchax
+```
+
+### On CPU machines (mac included)
+First install torch CPU:
 
-# Option 2: conda
-conda create --name <your_name> python=3.10
-conda activate <your_name>
+```bash
+# Linux
+pip install torch --index-url https://download.pytorch.org/whl/cpu
 
-# Either way, install the dev requirements.
-pip install -r dev-requirements.txt
+# OR Mac:
+pip install torch
 ```
 
-Note: `dev-requirements.txt` will install the CPU-only version of PyTorch.
+Then install jax CPU:
+
+```bash
+pip install -U jax
+```
 
-#### 1.1 Install this package
+Finally install torchax
 
-Install `torchax` from source for your platform:
 ```bash
-pip install -e .[cpu]
-pip install -e .[cuda]
-pip install -e .[tpu] -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
+pip install torchax
 ```
 
-#### 1.2 (optional) verify installation by running tests
+NOTE: if you like metal support for Apple devices then install the
+metal version of jax: https://developer.apple.com/metal/jax/
+
+### Installing `torchax` from source
+
+Still need to install `torch` CPU and `Jax` of your accelerator (GPU, TPU or None).
 
 ```bash
-pip install -r test-requirements.txt
-pytest test
+pip install git+https://github.com/pytorch/xla.git#subdirectory=torchax
 ```
 
 ## Run a model
@@ -104,75 +122,47 @@ print(m(inputs))
 This model `m` contains 2 parts: the weights that is stored inside of the model
 and it's submodules (`nn.Linear`).
 
-To execute this model with `torchax`; we need construct and run the model
-under an `environment` that captures pytorch ops and swaps them with TPU equivalent.
-
-To create this environment: use
+To execute this model with `torchax`; we need to enable torchax to capture pytorch ops.
+To enable this, use:
 
 ```python
 import torchax
-
-env = torchax.default_env() 
+torchax.enable_globally()
 ```
-Then, execute the instantiation of the model, as well as evaluation of model, 
-using `env` as a context manager:
+Then, a `jax` device will be available to use
 
 ```python
-with env:
-  inputs = torch.randn(3, 3, 28, 28)
-  m = MyModel()
-  res = m(inputs)
-  print(type(res))  # outputs Tensor
+inputs = torch.randn(3, 3, 28, 28, device='jax')
+m = MyModel()
+res = m(inputs)
+print(type(res))  # outputs torchax.tensor.Tensor
 ```
 
-You can also enable the environment globally with
-```python
-import torchax
-
-torchax.enable_globally() 
-```
+`torchax.tensor.Tensor` is a `torch.Tensor` subclass that holds
+a `jax.Array`. You can inspect that jax array with `res.jax()`
 
-Then everything afterwards is run with XLA.
 
 ## What is happening behind the scene:
 
-When a torch op is executed inside of `env` context manager, we can swap out the 
-implementation of that op with a version that runs on TPU. 
-When a model's constructor runs, it will call some tensor constructor, such as
-`torch.rand`, `torch.ones` or `torch.zeros` etc to create its weights. Those
-ops are captured by `env` too and placed directly on TPU.
-
-See more at [how_it_works](docs/how_it_works.md) and [ops registry](docs/ops_registry.md).
-
-### What if I created model outside of `env`.
-
-So if you have
-
-```
-m = MyModel()
-```
-outside of env, then regular torch ops will run when creating this model.
-Then presumably the model's weights will be on CPU (as instances of `torch.Tensor`).
+We took the approach detailed in [new device](https://github.com/albanD/subclass_zoo/blob/main/new_device.py) recipe by Alban (@albanD); using `jax.Array` for the `raw_data`.
 
-To move this model into XLA device, one can use `env.to_xla()` function.
+In other words, When a torch op is executed inside of `env` context manager (which is enabled with `torchax.enable_globally()`), we can swap out the
+implementation of that op written in Jax.
 
-i.e.
-```
-m2 = env.to_xla(m)
-inputs = env.to_xla(inputs)
+When a model's constructor runs, it will call some tensor constructor, such as
+`torch.rand`, `torch.ones` or `torch.zeros` etc to create its weights. The constructor
+will create an `torch.Tensor` subclass that contains a `jax.Array`.
 
-with env:
-  res = m2(inputs)
-```
+Then, each subsequent op can unpack the `jax.Array`, call the op implementation,
+and wraps it back into `torch.Tensor` subclass.
 
-NOTE that we also need to move inputs to xla using `.to_xla`. 
-`to_xla` works with all pytrees of `torch.Tensor`.
+See more at [how_it_works](docs/how_it_works.md) and [ops registry](docs/ops_registry.md).
 
 
 ### Executing with jax.jit
 
-The above script will execute the model using eager mode Jax as backend. This 
-does allow executing torch models on TPU, but is often slower than what we can 
+The above script will execute the model using eager mode Jax as backend. This
+does allow executing torch models on TPU, but is often slower than what we can
 achieve with `jax.jit`.
 
 `jax.jit` is a function that takes a Jax function (i.e. a function that takes jax array
@@ -190,9 +180,9 @@ def model_func(param, inputs):
   return torch.func.functional_call(m, param, inputs)
 
 ```
-Here we use [torch.func.functional_call](https://pytorch.org/docs/stable/generated/torch.func.functional_call.html) 
+Here we use [torch.func.functional_call](https://pytorch.org/docs/stable/generated/torch.func.functional_call.html)
 from PyTorch to replace the model
-weights with `param`, then call the model. This is equivalent to:
+weights with `param`, then call the model. This is roughly equivalent to:
 
 ```python
 def model_func(param, inputs):
@@ -208,4 +198,79 @@ model_func_jitted = jax_jit(model_func)
 print(model_func_jitted(new_state_dict, inputs))
 ```
 
-See more examples at [eager_mode.py](examples/eager_mode.py) and the (examples folder)[examples/]
+See more examples at [eager_mode.py](examples/eager_mode.py) and the (examples folder)[examples/]
+
+However, to ease the idiom of creating functional model and calling it with parameters,
+we also created the `JittableModule` helper class.
+
+So the above can be written as:
+
+```python
+
+from torchax.interop import JittableModule
+
+m_jitted = JittableModule(m)
+res = m_jitted(...)
+```
+
+The first time that `m_jitted` is called , it will trigger `jax.jit`
+then the subsequent computation with inputs of same shape will be fast.
+
+
+
+# Citation:
+
+@software{torchax,
+  author = {Han Qi, Chun-nien Chan, Will Cromar, Manfei Bai, Kevin Gleanson},
+  title = {torchax: PyTorch on TPU and Jax interoperability},
+  url = {https://github.com/pytorch/xla/tree/master/torchax}
+  version = {0.0.4},
+  date = {2025-02-24},
+}
+
+# Maintainers & Contributors:
+
+This library is created and maintained by the PyTorch/XLA team at Google Cloud.
+
+However, it benefitted from many direct and indirect
+contributions outside of the team. Many of them done by
+fellow Googlers using [Google's 20% project policy](https://ebsedu.org/blog/google-tapping-workplace-actualization-20-time-rule), others by partner teams.
+
+Here is the full list of contributors by 2025-02-25.
+
+Han Qi (qihqi), Pytorch / XLA
+Manfei Bai (manfeibai), Pytorch / XLA
+Will Cromar (will-cromar), Meta
+Milad Mohammadi (miladm), Pytorch / XLA
+Siyuan Liu (lsy323), Pytorch / XLA
+Bhavya Bahl (bhavya01), Pytorch / XLA
+Pei Zhang (zpcore), Pytorch / XLA
+Yifei Teng (tengyifei), Pytorch / XLA
+Chunnien Chan (chunnienc), Google, ODML
+Alban Desmaison (albanD), Meta, Pytorch
+Simon Teo (simonteozw), Google(20%)
+David Huang (dvhg), Google(20%)
+Barni Seetharaman (barney-s), Google(20%)
+Anish Karthik (anishfish2) , Google(20%)
+Yao Gu (guyao) , Google(20%)
+Yenkai Wang (yenkwang) , Google(20%)
+Greg Shikhman (commander) , Google(20%)
+Matin Akhlaghinia (matinehAkhlaghinia), Google(20%)
+Tracy Chen (tracych477), Google(20%)
+Matthias Guenther (mrguenther) , Google(20%)
+WenXin Dong (wenxindongwork), Google(20%)
+Kevin Gleason (GleasonK) , Google, StableHLO
+Nupur Baghel (nupurbaghel), Google(20%)
+Gwen Mittertreiner (gmittert), Google(20%)
+Zeev Melumian (zmelumian), Lightricks
+Vyom Sharma (vyom1611), Google(20%)
+Shitong Wang (ShitongWang), Adobe
+Rémi Doreau (ayshiff), Google(20%)
+Lance Wang (wang2yn84), Google, CoreML
+Hossein Sarshar (hosseinsarshar) , Google(20%)
+Daniel Vega-Myhre (danielvegamyhre) , Google(20%)
+Tianqi Fan (tqfan28), Google(20%)
+Jim Lin (jimlinntu), Google(20%)
+Fanhai Lu (FanhaiLu1), Google Cloud
+DeWitt Clinton (dewitt), Google PyTorch
+Aman Gupta (aman2930) , Google(20%)
@@ -4,25 +4,46 @@ build-backend = "hatchling.build"
 
 [project]
 name = "torchax"
-dependencies = [
-    "absl-py",
-    "immutabledict",
-    "pytest",
-    # Developers should install `dev-requirements.txt` first
-    "torch>=2.3.0",
-]
+dependencies = []
 requires-python = ">=3.10"
 license = {file = "LICENSE"}
 dynamic = ["version"]
+authors = [
+    {name = "Han Qi", email = "[email protected]"},
+    {name = "Pytorch/XLA team", email = "[email protected]"},
+]
+description = "torchax is a library for running PyTorch on TPU"
+readme = "README.md"
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "Intended Audience :: Developers",
+    "Intended Audience :: Education",
+    "Intended Audience :: Science/Research",
+    "License :: OSI Approved :: BSD License",
+    "Topic :: Scientific/Engineering",
+    "Topic :: Scientific/Engineering :: Mathematics",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence",
+    "Topic :: Software Development",
+    "Topic :: Software Development :: Libraries",
+    "Topic :: Software Development :: Libraries :: Python Modules",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+]
+
+[project.urls]
+"Homepage" = "https://github.com/pytorch/xla/tree/master/torchax"
+
 
 [tool.hatch.version]
 path = "torchax/__init__.py"
 
 [project.optional-dependencies]
-cpu = ["jax[cpu]>=0.4.30", "jax[cpu]", "tensorflow-cpu"]
+cpu = ["jax[cpu]>=0.4.30", "jax[cpu]"]
 # Add libtpu index `-f https://storage.googleapis.com/libtpu-releases/index.html`
-tpu = ["jax[cpu]>=0.4.30", "jax[tpu]", "tensorflow-cpu"]
-cuda = ["jax[cpu]>=0.4.30", "jax[cuda12]", "tensorflow-cpu"]
+tpu = ["jax[cpu]>=0.4.30", "jax[tpu]"]
+cuda = ["jax[cpu]>=0.4.30", "jax[cuda12]"]
 odml = ["jax[cpu]>=0.4.30", "jax[cpu]"]
 
 [tool.hatch.build.targets.wheel]