You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>There have been <a href="https://pair-code.github.io/interpretability/bert-tree/">several</a><a href="https://arxiv.org/abs/1905.05950">interesting</a><a href="https://arxiv.org/abs/1906.04341">papers</a> from the NLP community on what Transformers might be learning.
344
-
The basic premise is that performing attention on all word pairs in a sentence&ndash;with the purpose of identifying which pairs are the most interesting&ndash;enables Transformers to learn something like a <strong>task-specific syntax</strong>.<br>
344
+
The basic premise is that performing attention on all word pairs in a sentence&ndash;with the purpose of identifying which pairs are the most interesting&ndash;enables Transformers to learn something like a <strong>task-specific syntax</strong>.
345
345
Different heads in the multi-head attention might also be &lsquo;looking&rsquo; at different syntactic properties.</p>
346
346
<p>In graph terms, by using GNNs on full graphs, can we recover the most important edges&ndash;and what they might entail&ndash;from how the GNN performs neighbourhood aggregation at each layer?
347
347
I&rsquo;m <a href="https://arxiv.org/abs/1909.07913">not so convinced</a> by this view yet.</p>
@@ -419,8 +419,8 @@ For a code walkthrough, the DGL team has <a href="https://docs.dgl.ai/en/
419
419
<p>Finally, we wrote <a href="https://graphdeeplearning.github.io/publication/xu-2019-multi/">a recent paper</a> applying Transformers to sketch graphs. Do check it out!</p>
420
420
<hr>
421
421
<h4 id="updates">Updates</h4>
422
-
<p>The post has also been translated to <a href="https://mp.weixin.qq.com/s/DABEcNf1hHahlZFMttiT2g">Chinese</a>.
423
-
Do join the discussion on <a href="https://twitter.com/chaitjo/status/1233220586358181888?s=20">Twitter</a> or <a href="https://www.reddit.com/r/MachineLearning/comments/fb86mo/d_transformers_are_graph_neural_networks_blog/">Reddit</a>!</p>
422
+
<p>The post is also available on <a href="https://medium.com/@chaitjo/transformers-are-graph-neural-networks-bca9f75412aa?source=friends_link&amp;sk=c54de873b2cec3db70166a6cf0b41d3e">Medium</a>, and has been translated to <a href="https://mp.weixin.qq.com/s/DABEcNf1hHahlZFMttiT2g">Chinese</a> and <a href="https://habr.com/ru/post/491576/">Russian</a>.
423
+
Do join the discussion on <a href="https://twitter.com/chaitjo/status/1233220586358181888?s=20">Twitter</a>, <a href="https://www.reddit.com/r/MachineLearning/comments/fb86mo/d_transformers_are_graph_neural_networks_blog/">Reddit</a> or <a href="https://news.ycombinator.com/item?id=22518263">HackerNews</a>!</p>
424
424
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Transformers are a special case of Graph Neural Networks. This may be obvious to some, but the following blog post does a good job at explaining these important concepts. <a href="https://t.co/H8LT2F7LqC">https://t.co/H8LT2F7LqC</a></p>&mdash; Oriol Vinyals (@OriolVinyalsML) <a href="https://twitter.com/OriolVinyalsML/status/1233783593626951681?ref_src=twsrc%5Etfw">February 29, 2020</a></blockquote><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
0 commit comments