在弯曲文本数据集上微调TrOCR

TrOCR (Transformer based Optical Character Recognition) models are some of the best performing OCR models. In our previous article, we analyzed how well they perform on single line printed and handwritten text. However, like any other deep learning model, they have their limitations. TrOCR does not perform well on curved text out of the box.  This article will take the TrOCR series a step further by fine tuning TrOCR model on a curved text dataset.

TrOCR(基于变换器的光学字符识别)模型是一些表现最好的OCR模型。在我们之前的文章中,我们分析了它们在单行印刷和手写文本上的表现。然而,像其他深度学习模型一样,它们也有其局限性。TrOCR在处理弯曲文本时表现不佳。本文将通过在弯曲文本数据集上微调TrOCR模型,将TrOCR系列推向更高的层次。

Fine Tuning TrOCR

Figure 1. Fine Tuning TrOCR

图1. 微调TrOCR

We know from the previous article that TrOCR cannot recognize text on curved and vertical images. Those images were part of the SCUT-CTW1500 dataset. We will train the TrOCR model on this dataset and run inference again to analyze the results. This will provide us with a comprehensive idea of how far we can push the boundaries of the TrOCR models for different use cases.

我们从之前的文章中知道,TrOCR无法识别弯曲和垂直图像上的文本。这些图像是SCUT-CTW1500数据集的一部分。我们将对该数据集上的TrOCR模型进行训练,并再次进行推理以分析结果。这将为我们提供一个全面的了解,看看我们可以在不同用例中将TrOCR模型的边界推得多远。

We will use the Hugging Face Trainer API for training the model. To complete the entire process, the following steps must be followed:

我们将使用Hugging Face Trainer API来训练模型。要完成整个过程,必须遵循以下步骤:

  • Prepare and analyze the curved text images dataset.
  • 准备和分析曲线文本图像数据集。
  • Load the TrOCR Small Printed model from Hugging Face.
  • 从 Hugging Face 加载 TrOCR 小型打印模型。
  • Initialize the Hugging Face Sequence to Sequence Trainer API.
  • 初始化 Hugging Face 序列到序列训练器 API。
  • Define the evaluation metric
  • 定义评估指标
  • Train the model and run inference.
  • 训练模型并运行推理。

The Curved Text Dataset

曲线文本数据集

The SCUT-CTW1500 dataset (referred to as CTW1500 from here on) contains several thousand images of curved text and text in the wild.

SCUT-CTW1500数据集(以下简称CTW1500)包含数千张弯曲文本和自然环境中的文本图像。

The original dataset is available in the official GitHub repository. This comprises both the training and test set. Only the training set contains ...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2025 iteam. Current version is 2.142.1. UTC+08:00, 2025-03-15 07:11
浙ICP备14020137号-1 $访客地图$