Skip to content

Text Recognition: Add script to evaluate text recognition by ICDAR2003 #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions models/text_recognition_crnn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,24 @@

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Results of accuracy evaluation with [tools/eval](../../tools/eval) at different text recognition datasets.

| Model name | ICDAR03(%) | IIIT5k(%) | CUTE80(%) |
|--------------|------------|-----------|-----------|
| CRNN_EN | 81.66 | 74.33 | 52.78 |
| CRNN_EN_FP16 | 82.01 | 74.93 | 52.34 |
| CRNN_CH | 71.28 | 80.90 | 67.36 |
| CRNN_CH_FP16 | 78.63 | 80.93 | 67.01 |

\*: 'FP16' stands for 'model quantized into FP16'.

Note:
- Model source:
- `text_recognition_CRNN_EN_2021sep.onnx`: https://docs.opencv.org/4.5.2/d9/d1e/tutorial_dnn_OCR.html (CRNN_VGG_BiLSTM_CTC.onnx)
- `text_recognition_CRNN_CH_2021sep.onnx`: https://docs.opencv.org/4.x/d4/d43/tutorial_dnn_text_spotting.html (crnn_cs.onnx)
- `text_recognition_CRNN_CN_2021nov.onnx`: https://docs.opencv.org/4.5.2/d4/d43/tutorial_dnn_text_spotting.html (crnn_cs_CN.onnx)
- `text_recognition_CRNN_EN_2021sep.onnx` can detect digits (0\~9) and letters (return lowercase letters a\~z) (view `charset_36_EN.txt` for details).
- `text_recognition_CRNN_CH_2021sep.onnx` can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), and some special characters (view `charset_94_CH.txt` for details).
- `text_recognition_CRNN_CN_2021nov.onnx` can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), some Chinese characters and some special characters (view `charset_3944_CN.txt` for details).
- For details on training this model series, please visit https://github.com/zihaomu/deep-text-recognition-benchmark.

Expand All @@ -16,6 +29,7 @@ Note:
- This demo uses [text_detection_db](../text_detection_db) as text detector.
- Selected model must match with the charset:
- Try `text_recognition_CRNN_EN_2021sep.onnx` with `charset_36_EN.txt`.
- Try `text_recognition_CRNN_CH_2021sep.onnx` with `charset_94_CH.txt`
- Try `text_recognition_CRNN_CN_2021sep.onnx` with `charset_3944_CN.txt`.

Run the demo detecting English:
Expand Down
94 changes: 94 additions & 0 deletions models/text_recognition_crnn/charset_94_CH.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
0
1
2
3
4
5
6
7
8
9
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
!
"
#
$
%
&
'
(
)
*
+
,
-
.
/
:
;
<
=
>
?
@
[
\
]
^
_
`
{
|
}
~
4 changes: 3 additions & 1 deletion models/text_recognition_crnn/crnn.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,9 @@ def _preprocess(self, image, rbbox):
rotationMatrix = cv.getPerspectiveTransform(vertices, self._targetVertices)
cropped = cv.warpPerspective(image, rotationMatrix, self._inputSize)

if 'CN' in self._model_path:
# 'CN' can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), and some special characters
# 'CH' can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), some Chinese characters and some special characters
if 'CN' in self._model_path or 'CH' in self._model_path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments about the difference between CN and CH.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Charles-258 Kindly remainder.

pass
else:
cropped = cv.cvtColor(cropped, cv.COLOR_BGR2GRAY)
Expand Down
Empty file.
53 changes: 53 additions & 0 deletions tools/eval/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ Supported datasets:
- [ImageNet](#imagenet)
- [WIDERFace](#widerface)
- [LFW](#lfw)
- [ICDAR](#icdar)
- [IIIT5K](#iiit5k)

## ImageNet

Expand Down Expand Up @@ -137,4 +139,55 @@ Run evaluation with the following command:

```shell
python eval.py -m sface -d lfw -dr /path/to/lfw
```

## ICDAR2003

### Prepare data

Please visit http://iapr-tc11.org/mediawiki/index.php/ICDAR_2003_Robust_Reading_Competitions to download the ICDAR2003 dataset and the labels.

```shell
$ tree -L 2 /path/to/icdar
.
├── word
│   ├── 1
│ │ ├── self
│ │ ├── ...
│ │ └── willcooks
│   ├── ...
│   └── 12
└── word.xml

```

### Evaluation

Run evaluation with the following command:

```shell
python eval.py -m crnn -d icdar -dr /path/to/icdar
```

### Example

```shell
download zip file from http://www.iapr-tc11.org/dataset/ICDAR2003_RobustReading/TrialTrain/word.zip
upzip file to /path/to/icdar
python eval.py -m crnn -d icdar -dr /path/to/icdar
```

## IIIT5K

### Prepare data

Please visit https://github.com/cv-small-snails/Text-Recognition-Material to download the IIIT5K dataset and the labels.

### Evaluation

All the datasets in the format of lmdb can be evaluated by this script.<br>
Run evaluation with the following command:

```shell
python eval.py -m crnn -d iiit5k -dr /path/to/iiit5k
```
6 changes: 5 additions & 1 deletion tools/eval/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
from .imagenet import ImageNet
from .widerface import WIDERFace
from .lfw import LFW
from .icdar import ICDAR
from .iiit5k import IIIT5K

class Registery:
def __init__(self, name):
Expand All @@ -16,4 +18,6 @@ def register(self, item):
DATASETS = Registery("Datasets")
DATASETS.register(ImageNet)
DATASETS.register(WIDERFace)
DATASETS.register(LFW)
DATASETS.register(LFW)
DATASETS.register(ICDAR)
DATASETS.register(IIIT5K)
53 changes: 53 additions & 0 deletions tools/eval/datasets/icdar.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import os
import numpy as np
import cv2 as cv
import xml.dom.minidom as minidom
from tqdm import tqdm

class ICDAR:
def __init__(self, root):
self.root = root
self.acc = -1
self.inputSize = [100, 32]
self.val_label_file = os.path.join(root, "word.xml")
self.val_label = self.load_label(self.val_label_file)

@property
def name(self):
return self.__class__.__name__

def load_label(self, label_file):
label = list()
dom = minidom.getDOMImplementation().createDocument(None, 'Root', None)
root = dom.documentElement
dom = minidom.parse(self.val_label_file)
root = dom.documentElement
names = root.getElementsByTagName('image')
for name in names:
key = os.path.join(self.root, name.getAttribute('file'))
value = name.getAttribute('tag').lower()
label.append([key, value])

return label

def eval(self, model):
right_num = 0
pbar = tqdm(self.val_label)
for fn, label in pbar:
pbar.set_description("Evaluating {} with {} val set".format(model.name, self.name))

img = cv.imread(fn)

rbbox = np.array([0, img.shape[0], 0, 0, img.shape[1], 0, img.shape[1], img.shape[0]])
pred = model.infer(img, rbbox)
if label == pred:
right_num += 1

self.acc = right_num/(len(self.val_label) * 1.0)


def get_result(self):
return self.acc

def print_result(self):
print("Accuracy: {:.2f}%".format(self.acc*100))
55 changes: 55 additions & 0 deletions tools/eval/datasets/iiit5k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import lmdb
import os
import numpy as np
import cv2 as cv
from tqdm import tqdm

class IIIT5K:
def __init__(self, root):
self.root = root
self.acc = -1
self.inputSize = [100, 32]

self.val_label = self.load_label(self.root)

@property
def name(self):
return self.__class__.__name__

def load_label(self, root):
lmdb_file = root
lmdb_env = lmdb.open(lmdb_file)
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()
label = list()
for key, value in lmdb_cursor:
image_index = key.decode()
if image_index.split('-')[0] == 'image':
img = cv.imdecode(np.fromstring(value, np.uint8), 3)
label_index = 'label-' + image_index.split('-')[1]
value = lmdb_txn.get(label_index.encode()).decode().lower()
label.append([img, value])
else:
break
return label

def eval(self, model):
right_num = 0
pbar = tqdm(self.val_label)
for img, value in pbar:
pbar.set_description("Evaluating {} with {} val set".format(model.name, self.name))


rbbox = np.array([0, img.shape[0], 0, 0, img.shape[1], 0, img.shape[1], img.shape[0]])
pred = model.infer(img, rbbox).lower()
if value == pred:
right_num += 1

self.acc = right_num/(len(self.val_label) * 1.0)


def get_result(self):
return self.acc

def print_result(self):
print("Accuracy: {:.2f}%".format(self.acc*100))
11 changes: 11 additions & 0 deletions tools/eval/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,11 @@
name="SFace",
topic="face_recognition",
modelPath=os.path.join(root_dir, "models/face_recognition_sface/face_recognition_sface_2021dec-act_int8-wt_int8-quantized.onnx")),
crnn=dict(
name="CRNN",
topic="text_recognition",
modelPath=os.path.join(root_dir, "models/text_recognition_crnn/text_recognition_CRNN_EN_2021sep.onnx"),
charsetPath=os.path.join(root_dir, "models/text_recognition_crnn/charset_36_EN.txt")),
)

datasets = dict(
Expand All @@ -87,6 +92,12 @@
name="LFW",
topic="face_recognition",
target_size=112),
icdar=dict(
name="ICDAR",
topic="text_recognition"),
iiit5k=dict(
name="IIIT5K",
topic="text_recognition"),
)

def main(args):
Expand Down