SynthText文本数据详细解析

文章列表

SynthText文本数据详细解析

1. 数据整体官方描述

2. 数据特点

2.1 imnames

2.2 wordBB：单词级别

2.3 charBB：字符级别的bbox

2.4 txt：文本级别

1. 数据整体官方描述

SynthText in the Wild Dataset
-----------------------------
Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman
Visual Geometry Group, University of Oxford, 2016Data format:
------------SynthText.zip (size = 42074172 bytes (41GB)) contains 858,750 synthetic
scene-image files (.jpg) split into 200 directories, with 
7,266,866 word-instances, and 28,971,487 characters.Ground-truth annotations are contained in the file "gt.mat" (Matlab format).
The file "gt.mat" contains the following cell-arrays, each of size 1x858750:1. imnames :  names of the image files2. wordBB  :  word-level bounding-boxes for each image, represented bytensors of size 2x4xNWORDS_i, where:- the first dimension is 2 for x and y respectively,- the second dimension corresponds to the 4 points(clockwise, starting from top-left), and-  the third dimension of size NWORDS_i, corresponds tothe number of words in the i_th image.3. charBB  : character-level bounding-boxes,each represented by a tensor of size 2x4xNCHARS_i(format is same as wordBB's above)4. txt     : text-strings contained in each image (char array).Words which belong to the same "instance", i.e.,those rendered in the same region with the same font, color,distortion etc., are grouped together; the instanceboundaries are demarcated by the line-feed character (ASCII: 10)A "word" is any contiguous substring of non-whitespacecharacters.A "character" is defined as any non-whitespace character.For any questions or comments, contact Ankush Gupta at:
removethisifyouarehuman-ankush@robots.ox.ac.uk

2. 数据特点

数据集下文件如下。

（1）数据集总共有41g，858750张合成图片，jpg格式，这么图片分成200个场景图片（即图片背景不同，其实有202个场景），单词有7,266,866个，字符有28,971,487个；

（2）标注文件时mat格式，读取后保存内容如下。

2.1 imnames

保存图片文件相对路径

2.2 wordBB：单词级别

每张图片对应其中一个标注tensor，该tensor的size是(2, 4, n_word_i)：2是xy坐标；4是表示4个点，左上角开始，顺时针方向；n_word_i是第i张图片中的word个数。

“单词”是指任何非空白的连续字符串。

2.3 charBB：字符级别的bbox

size也是(2, 4, n_char_i). 意义同wordBB.

字符是指任何非空白字符。

2.4 txt：文本级别

每个图像中包含的文本字符串（字符数组）。

以图片ballet_106_0.jpg为例. 其标注有8个文本，同一个区域、且字体、颜色、扭曲等特征相同的单词被视为一个文本。

SynthText文本数据详细解析

1. 数据整体官方描述

2. 数据特点

2.1 imnames

2.2 wordBB：单词级别

2.3 charBB：字符级别的bbox

2.4 txt：文本级别

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

SynthText文本数据详细解析

1. 数据整体官方描述

2. 数据特点

2.1 imnames

2.2 wordBB：单词级别

2.3 charBB：字符级别的bbox

2.4 txt：文本级别

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签