使用Detectron2进行文档布局检测和OCR !

This article was published as a part of the Data Science Blogathon

这篇文章是作为数据科学博客马拉松的一部分发表的

image feature

To get the bounding boxes around the scanned documents with paragraphs and tables.

为了获得带有段落和表格的扫描文件周围的边界框。

If we are having a scanned document or scanned images and we want to detect the paragraphs and tables in the image and like a layout and get bounding boxes around them as shown in the image below.

如果我们有一个扫描的文件或扫描的图像，我们想检测图像中的段落和表格，像一个布局，并得到它们周围的边界框，如下图所示。

The problem is that we do not have to detect the words or headlines. We just have to detect the paragraphs and tables. This will be useful in many use cases in official documents.

问题是，我们不需要检测单词或标题。我们只需要检测段落和表格。这在官方文件的许多用例中都很有用。

Document layout detection | get bounded box

Solution Approach

解决方案的方法

To get the bounding boxes from the model in Deep learning and performing OCR with OpenCV and API. Here are some steps to make this work.

为了从深度学习的模型中获得边界框，并使用OpenCV和API执行OCR。这里有一些步骤可以使其发挥作用。

1. Install all required packages

1.安装所有需要的软件包

You have to install layout parser and detectron2 for detection.

你必须安装布局分析器和detectron2进行检测。

You can see more details on Detectron here:https://github.com/facebookresearch/detectron2/tree/master/projects

你可以在这里看到关于Detectron的更多细节：https://github.com/facebookresearch/detectron2/tree/master/projects

!pip install layoutparser !pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html

!pip install layoutparser !pip install detectron2 -fhttps://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html

2. Convert the image from BGR (cv2 default loading style) to RGB

2.将图像从BGR（cv2默认加载样式）转换为RGB

OpenCV uses the BGR image format. So, when we read an image using cv2.imread() it interprets in BGR format by default.

OpenCV使用BGR图像格式。因此，当我们使用cv2.imread()读取图像时，它默认以BGR格式进行解释。

We can use cvtColor() or image[…, ::-1] method to convert a BGR image to RGB and vice-versa.

我们可以使用cvtColor()或image[..., ::-1]方法将BGR图像转换成RGB，反之亦然。

image = cv2.imread("/content/imagetemp.png") image = image[..., ::-1]

image = cv2.imread("/content/imagetemp.png") image = image[..., :-1]

使用Detectron2进行文档布局检测和OCR !