Skip to content
This repository was archived by the owner on Mar 14, 2024. It is now read-only.

Commit fddb146

Browse files
author
Thomas Markovich
committed
Expanded readme
1 parent 334164f commit fddb146

File tree

1 file changed

+23
-4
lines changed

1 file changed

+23
-4
lines changed

torchbiggraph/examples/e2e/README.md

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
This is intended as a simple end-to-end example of how to get your data into
22
the format that PyTorch BigGraph expects using SQL. It's implemented in SQLite
33
for portability, but similar techniques scale to billions of edges using cloud
4-
databases such as BigQuery. This pipeline can be split into three different
5-
components:
4+
databases such as BigQuery or SnowFlake. This pipeline can be split into three
5+
different components:
66

77
1. Data preparation
88
2. Data verification/checking
@@ -17,5 +17,24 @@ pedagogical purpose.
1717

1818
In the data preparation stage, we first load the graph
1919
into a SQLite database and then we transform and partition it. The transformation
20-
can be understood as first generating a mapping between the graph-ids and
21-
ordinal ids per-type that PBG will expect.
20+
can be understood as first partitioning the entities, then generating a mapping
21+
between the graph-ids and ordinal ids per-type that PBG will expect, and finally
22+
writing out all the files required to train, including the config file. By
23+
keeping track of the vertex types, we're able to specifically verify our mappings
24+
in a fully self consistent fashion.
25+
26+
Once the data has been prepared and generated, we're ready to embed the graph. We
27+
do this by passing the generated config to `torchbiggraph_train` in the following
28+
way:
29+
30+
```
31+
torchbiggraph_train \
32+
path/to/generated/config.py
33+
```
34+
35+
The `data_prep.py` script will also compute the approximate amount of shared memory
36+
that will be needed for training. If the training demands are more than the
37+
available shared memory, you'll need to regenerate your data with more partitions
38+
than what you currently have. If you're seeing either a bus error or a OOM kill
39+
message in the kernel ring buffer but your machine has enough ram, you'll want to
40+
verify that `/dev/shm` is large enough to accomodate your embedding table.

0 commit comments

Comments
 (0)