Merge pull request #1 from dayyass/develop

dayyass · web-flow · commit 445ddeb37128 · 2021-08-21T15:08:35.000+03:00
first prototype
diff --git a/.gitignore b/.gitignore
@@ -1,7 +1,14 @@
-*.ipynb_checkpoints/
 *.DS_Store
-*__pycache__/
-*.cache/
-*.pyc
-/venv/
+*.ipynb_checkpoints
+
+# cache
+*__pycache__
+**.mypy_cache
+*.cache
+
+# coverage
+*.coverage
+*coverage.xml
+
+venv
 .idea
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -11,7 +11,11 @@ repos:
 - repo: https://github.com/pre-commit/pre-commit-hooks
   rev: v2.3.0
   hooks:
+    - id: check-yaml
+    - id: name-tests-test
+      args: ['--django']
     - id: debug-statements
+    - id: end-of-file-fixer
     - id: trailing-whitespace
     - id: check-docstring-first
     - id: requirements-txt-fixer
@@ -20,4 +24,4 @@ repos:
 - repo: https://github.com/pre-commit/mirrors-mypy
   rev: v0.790
   hooks:
-    - id: mypy
+    - id: mypy
diff --git a/README.md b/README.md
@@ -1,100 +1 @@
-# prior_knowledge_matrix_for_sequence_tagging
-
-**Paper name (useful phrases)**:
-* Bring prior knowledge 
-* Constraints in sequence tagging (maybe not only)
-* Use it knowledge in training process
-* Add it to loss
-* Prior knowledge matrix
-* Faster convergence
-* Unsupervised (we don't need labels to compute gumbel loss)
-
-
-**Data**:
-* [Datasets for Entity Recognition](https://github.com/juand-r/entity-recognition-datasets) - use this!
-* [Annotated Corpus for Named Entity Recognition](https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus/kernels)
-
-
-**Links for relevant papers, articles, implementations**:
-* [Categorical Reparameterization with Gumbel-Softmax](https://arxiv.org/abs/1611.01144)
-* [The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables](https://arxiv.org/abs/1611.00712)
-* [Neural Networks gone wild! They can sample from discrete distributions now!](https://anotherdatum.com/gumbel-gan.html)
-* [ ] [Anticipation-RNN: enforcing unary constraints in sequence generation, with application to interactive music generation](https://link.springer.com/article/10.1007/s00521-018-3868-4)
-* [ ] [Enhancing Neural Sequence Labeling with Position-Aware Self-Attention](https://arxiv.org/pdf/1908.09128.pdf)
-* [ ] [Inference constraints (not in training) in allennlp](https://github.com/allenai/allennlp/blob/master/allennlp/modules/conditional_random_field.py)
-* [ ] [Prior initialization for impossible transitions (-10000)](https://github.com/threelittlemonkeys/lstm-crf-pytorch/blob/master/model.py)
-
-
-**Useful links**:
-* [Attention? Attention!](https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html) - Lilian Weng Blog
-* [The Illustrated Transformer](http://jalammar.github.io/illustrated-transformer/) - Jay Alammar Blog
-* [The Annotated Transformer](http://nlp.seas.harvard.edu/2018/04/03/attention.html) - Harvard Blog
-* [15 Free Datasets and Corpora for Named Entity Recognition (NER)](https://lionbridge.ai/datasets/15-free-datasets-and-corpora-for-named-entity-recognition-ner/)
-* [PyTorch-Tutorial-to-Sequence-Labeling](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Sequence-Labeling)
-* [Sequence tagging example](http://www.cse.chalmers.se/~richajo/nlp2019/l6/Sequence%20tagging%20example.html)
-* NER: [tutorial](https://cs230.stanford.edu/blog/namedentity/) - [github](https://github.com/cs230-stanford/cs230-code-examples)
-* [Approaching a Named Entity Recognition (NER) — End to End Steps](https://mc.ai/approaching-a-named-entity-recognition-ner%E2%80%8A-%E2%80%8Aend-to-end-steps/)
-* [Named Entity Recognition on CoNLL dataset using BiLSTM+CRF implemented with Pytorch](https://pythonawesome.com/named-entity-recognition-on-conll-dataset-using-bilstm-crf-implemented-with-pytorch/)
-* [Named Entity Recognition with BiLSTM-CNNs](https://medium.com/illuin/named-entity-recognition-with-bilstm-cnns-632ba83d3d41)
-* [How does pytorch backprop through argmax?](https://stackoverflow.com/questions/54969646/how-does-pytorch-backprop-through-argmax)
-* [Differentiable Argmax!](https://lucehe.github.io/differentiable-argmax/)
-* [Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation](https://arxiv.org/pdf/1308.3432.pdf)
-* [nn.Embedding source code](https://pytorch.org/docs/stable/_modules/torch/nn/modules/sparse.html#Embedding)
-* [What is the correct way to use OHE lookup table for a pytorch RNN?](https://stackoverflow.com/questions/57632084/what-is-the-correct-way-to-use-ohe-lookup-table-for-a-pytorch-rnn)
-* Beam Search: [Andrew Ng YouTube](https://youtu.be/RLWuzLLSIgw) - [How to Implement a Beam Search Decoder for Natural Language Processing](https://machinelearningmastery.com/beam-search-decoder-natural-language-processing/)
-* [Backpropgating error to emedding matrix](https://datascience.stackexchange.com/questions/33041/backpropgating-error-to-emedding-matrix)
-* [Inside–outside–beginning (tagging)](https://en.wikipedia.org/wiki/Inside–outside–beginning_(tagging))
-* [Gumbel-Softmax trick vs Softmax with temperature](https://datascience.stackexchange.com/questions/58376/gumbel-softmax-trick-vs-softmax-with-temperature)
-* PyTorchLightning: [github](https://github.com/PyTorchLightning/pytorch-lightning) - [docs](https://pytorch-lightning.readthedocs.io/en/stable/) - [medium](https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09)
-* [PyTorch Lightning vs PyTorch Ignite vs Fast.ai](https://towardsdatascience.com/pytorch-lightning-vs-pytorch-ignite-vs-fast-ai-61dc7480ad8a)
-
-
-**Architecture**:
-* RNN with attention
-* BERT and friends
-
-
-**Hypotheses**:
-* [ ] Does *gumbel-softmax* take log probabilities as input?
-* [ ] Basic configuration for *two-tokens constraints* (matrix), the *three-tokens constraints* and more (tensor), compare convergence and results (although NN could capture difficult dependencies from data, it still strangle to capture temporal dependencies over tokens, especially from long sequence patterns)
-* [ ] Check filtering PAD token in gumble loss term
-* [ ] Play with parameter *"hard"* in *PyTorch F.gumbel_softmax*
-* [ ] Play with *Beam search* (see links)
-* [ ] Check *FP/FN* when using *gumbel-softmax*
-* [ ] Try to use *softmax* probabilities instead of *gumbel-softmax* samples, what strategy to use
-* [ ] Try *F.argmax* with backpropagation (see in links)
-* [ ] Since loss is a sum of *cross-entropy loss* and *prior-knowledge loss*, compare its order of magnitude and find out how to select *lambda* automatically (ratio of two losses)
-* [ ] Play with reverse matrix (replace 0 to 1 and 1 to 0) and understand, how it affects final score
-* [ ] Make *nn.Module* for efficient computing and write it in paper (see link about embedding)
-* [ ] Use two classifiers (neural networks) - first for *tokenization*, second for *classification* (halve num of classes)
-* [ ] Play with knowledge matrix init (with 0/1 or something else)
-* [ ] (additional) Try seq2seq architecture for sequence labelling (maybe find relevant paper and add to references)
-* [ ] (additional) Try seq2seq constraints (output length matches input length)
-
-
-**TODO**:
-* [ ] Understand, how presubmit arxiv paper, what rules to publish, and other about conferences
-* [ ] Find datasets to compare with (NER, POS, etc)
-* [ ] Make table on another page (or another board) for hypothesis results
-* [ ] Add POS-tags in useful links
-* [ ] Decide, what papers to use for structure (use conferences restrictions)
-* [ ] Compare our implementation vs cs230-code-examples (on their dataset)
-* [ ] (additional) Maybe use docker during experiments and after (final project version)
-
-
-**Rules**:
-* Save all references to use it later in paper
-* Make issues for hypothesis
-* Use PyTorchLightning
-* Use different folder for different hypothesis
-
-
-**Metrics**:
-
-
-**NER Datasets**
-
-project to collect datasets for named entity recognition using git lfs
-* [15 Free Datasets and Corpora for Named Entity Recognition (NER)](https://lionbridge.ai/datasets/15-free-datasets-and-corpora-for-named-entity-recognition-ner/)
-* [NeuroNER](https://github.com/Franck-Dernoncourt/NeuroNER)
-* [Datasets for Entity Recognition](https://github.com/juand-r/entity-recognition-datasets)
+# Prior Knowledge Layer for Sequence Tagging
diff --git a/experiments/init_layer.ipynb b/experiments/init_layer.ipynb
@@ -0,0 +1,217 @@
+{
+ "metadata": {
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  },
+  "orig_nbformat": 4,
+  "kernelspec": {
+   "name": "python3",
+   "display_name": "Python 3.9.5 64-bit ('venv': venv)"
+  },
+  "interpreter": {
+   "hash": "845c5602ab99810ab764514180c741e7c3ea060beb690dec8bc374d83bb9cb81"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sys\n",
+    "sys.path.append(\"..\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "execute_result",
+     "data": {
+      "text/plain": [
+       "<torch._C.Generator at 0x112b812d0>"
+      ]
+     },
+     "metadata": {},
+     "execution_count": 2
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "torch.manual_seed(42)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from prior_knowledge_layer import PriorKnowledgeLayer, init_prior_knowledge_matrix"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "labels = [\"O\", \"B-PER\", \"B-LOC\", \"B-ORG\", \"I-PER\", \"I-LOC\", \"I-ORG\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prohibited_transitions = {\n",
+    "    \"O\": (\"I-PER\", \"I-LOC\", \"I-ORG\"),\n",
+    "    \"B-PER\": (\"I-LOC\", \"I-ORG\"),\n",
+    "    \"B-LOC\": (\"I-PER\", \"I-ORG\"),\n",
+    "    \"B-ORG\": (\"I-PER\", \"I-LOC\"),\n",
+    "    \"I-PER\": (\"I-LOC\", \"I-ORG\"),\n",
+    "    \"I-LOC\": (\"I-PER\", \"I-ORG\"),\n",
+    "    \"I-ORG\": (\"I-PER\", \"I-LOC\"),\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prior_knowledge_matrix = init_prior_knowledge_matrix(\n",
+    "    labels=labels,\n",
+    "    prohibited_transitions=prohibited_transitions,\n",
+    "    prohibited_transition_value=1,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "execute_result",
+     "data": {
+      "text/plain": [
+       "tensor([[0., 0., 0., 0., 1., 1., 1.],\n",
+       "        [0., 0., 0., 0., 0., 1., 1.],\n",
+       "        [0., 0., 0., 0., 1., 0., 1.],\n",
+       "        [0., 0., 0., 0., 1., 1., 0.],\n",
+       "        [0., 0., 0., 0., 0., 1., 1.],\n",
+       "        [0., 0., 0., 0., 1., 0., 1.],\n",
+       "        [0., 0., 0., 0., 1., 1., 0.]])"
+      ]
+     },
+     "metadata": {},
+     "execution_count": 7
+    }
+   ],
+   "source": [
+    "prior_knowledge_matrix"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "layer = PriorKnowledgeLayer(prior_knowledge_matrix)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "execute_result",
+     "data": {
+      "text/plain": [
+       "PriorKnowledgeLayer()"
+      ]
+     },
+     "metadata": {},
+     "execution_count": 9
+    }
+   ],
+   "source": [
+    "layer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "execute_result",
+     "data": {
+      "text/plain": [
+       "torch.Size([8, 25, 7])"
+      ]
+     },
+     "metadata": {},
+     "execution_count": 10
+    }
+   ],
+   "source": [
+    "batch_size = 8\n",
+    "seq_len = 25\n",
+    "n_classes = len(labels)\n",
+    "\n",
+    "logits = torch.randn(batch_size, seq_len, n_classes)\n",
+    "\n",
+    "logits.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "execute_result",
+     "data": {
+      "text/plain": [
+       "torch.Size([8, 24])"
+      ]
+     },
+     "metadata": {},
+     "execution_count": 11
+    }
+   ],
+   "source": [
+    "output = layer(logits)\n",
+    "\n",
+    "output.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ]
+}
diff --git a/prior_knowledge_layer/__init__.py b/prior_knowledge_layer/__init__.py
@@ -0,0 +1,6 @@
+from .main import PriorKnowledgeLayer, init_prior_knowledge_matrix
+
+__all__ = [
+    "PriorKnowledgeLayer",
+    "init_prior_knowledge_matrix",
+]
diff --git a/prior_knowledge_layer/main.py b/prior_knowledge_layer/main.py
diff --git a/requirements.txt b/requirements.txt