基于大空白的扫描图像切片

2023-12-19

我打算将问题分开this https://www.tnpsc.gov.in/Tentative/Document/RAGS-2022_opt.pdfPDF 文档。挑战在于问题的间隔不是有序的。例如第一个问题占据一整页,第二个问题也占据一整页,第三个问题和第四个问题一起占据一页。如果我必须手动切片,那将需要很长时间。所以,我想把它分成图像并对其进行处理。是否有可能拍摄这样的图像

并像这样分成单独的组件?


这是一个经典的情况dilate https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html#dilation。这个想法是相邻的文本对应于同一问题,而较远的文本是另一个问题的一部分。每当您想要将多个项目连接在一起时,您可以扩大它们以将相邻轮廓连接成单个轮廓。这是一个简单的方法:

  1. 获取二值图像。 加载图像 https://www.geeksforgeeks.org/python-opencv-cv2-imread-method/, 转换成灰度 https://opencv24-python-tutorials.readthedocs.io/en/stable/py_tutorials/py_imgproc/py_colorspaces/py_colorspaces.html, 高斯模糊 https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_filtering/py_filtering.html#gaussian-filtering, then 大津的门槛 https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_thresholding/py_thresholding.html#otsus-binarization以获得二值图像。

  2. 消除小噪音和伪影。我们创建一个矩形核 https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html#structuring-element and 变形开放 https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html#opening去除图像中的小噪声和伪影。

  3. 将相邻的单词连接在一起。我们创建一个更大的矩形内核并且dilate https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html#dilation将各个轮廓合并在一起。

  4. 检测问题。从这里我们找到轮廓 https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html#findcontours,使用从上到下对轮廓进行排序imutils.sort_contours() https://github.com/PyImageSearch/imutils/blob/master/imutils/contours.py#L7,用过滤器最小轮廓面积 https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html#contourarea,得到矩形边界矩形坐标 https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html?highlight=boundingrect#boundingrect and 突出显示矩形轮廓 https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_contours/py_contours_begin/py_contours_begin.html。然后,我们使用 Numpy 切片裁剪每个问题并保存 ROI 图像。


获得二值图像的大津阈值

这是有趣的部分发生的地方。我们假设相邻的文本/字符是同一问题的一部分,因此我们将各个单词合并成一个轮廓。问题是靠近在一起的单词的一部分,因此我们扩展以将它们连接在一起。

个别问题以绿色突出显示

热门问题

底部问题

已保存的 ROI 问题(假设从上到下)

Code

import cv2
from imutils import contours

# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Remove small artifacts and noise with morph open
open_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, open_kernel, iterations=1)

# Create rectangular structuring element and dilate
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9,9))
dilate = cv2.dilate(opening, kernel, iterations=4)

# Find contours, sort from top to bottom, and extract each question
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")

# Get bounding box of each question, crop ROI, and save
question_number = 0
for c in cnts:
    # Filter by area to ensure its not noise
    area = cv2.contourArea(c)
    if area > 150:
        x,y,w,h = cv2.boundingRect(c)
        cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
        question = original[y:y+h, x:x+w]
        cv2.imwrite('question_{}.png'.format(question_number), question)
        question_number += 1

cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('image', image)
cv2.waitKey()
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

基于大空白的扫描图像切片 的相关文章

随机推荐