初次使用PPYOLOE-R

2023-11-03

目的：优化基于yolov5-obb旋转目标检测算法的证件区域检测，之前的方法是基于anchor，每次使用都要调试anchor；而ppyoloe-r是free anchor的算法；

源码位置：https://github.com/PaddlePaddle/PaddleDetection/issues/7291

本次尝试使用PaddleDetection中的 ppyoloe-r进行效果优化；

本次的目的：保持backbone一致的情况下，从anchor转移到free anchor；

基本的步骤如下：

1.PaddleDetection工程的安装和试运行；

paddle的安装我就不再详述，可参照官网，其他细节的补充：

# pip install cython
# pip install lap -i https://pypi.tuna.tsinghua.edu.cn/simple
# pip install terminaltables -i https://pypi.tuna.tsinghua.edu.cn/simple
# pip install typeguard -i https://pypi.tuna.tsinghua.edu.cn/simple
# git clone https://github.com/cocodataset/cocoapi
# pip install pycocotools==2.0.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
# pip install numba==0.56.4 -i https://pypi.tuna.tsinghua.edu.cn/simple
# cd PaddleDetection-release-2.6
# python setup.py install
# cd ppdet/ext_op
# python setup.py install


#安装后确认测试通过：
#python ppdet/modeling/tests/test_architectures.py
#测试通过后会提示如下信息：
#.......
#----------------------------------------------------------------------
#Ran 7 tests in 12.816s
#OK

2.数据格式的变化；

数据格式从txt转换到coco格式，可以通过tools/x2coco.py来进行处理，自己对其中的代码进行了修改，因为自己的数据格式不符合其中的任意一种选项；我这边是将yolov5-obb的数据转换成coco格式，训练样本不同批次的数据在各个子文件夹中，子文件夹中包含images子文件夹和labels文件夹，其中images子文件夹中包含的是图片，而labels文件夹包含的是用于yolov5-obb中旋转目标检测的标签：

修改后的x2coco.py中的代码如下：

import argparse
import glob
import json
import os
import os.path as osp
import shutil
import xml.etree.ElementTree as ET

import numpy as np
import PIL.ImageDraw
from tqdm import tqdm
import cv2
from tqdm import tqdm

# label_to_num = {}
categories_list = []
labels_list = []
label_to_num = {}



class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        else:
            return super(MyEncoder, self).default(obj)


def images_labelme(data, num, file_):
    image = {}
    image['height'] = data['imageHeight']
    image['width'] = data['imageWidth']
    image['id'] = num + 1
    if '\\' in data['imagePath']:
        image['file_name'] = os.path.join(file_, data['imagePath'].split('\\')[-1])
    else:
        image['file_name'] = os.path.join(file_, data['imagePath'].split('/')[-1])
    return image


def images_cityscape(data, num, img_file):
    image = {}
    image['height'] = data['imgHeight']
    image['width'] = data['imgWidth']
    image['id'] = num + 1
    image['file_name'] = img_file
    return image


def categories(label, labels_list):
    category = {}
    category['supercategory'] = 'component'
    category['id'] = len(labels_list) + 1
    category['name'] = label
    return category


def annotations_rectangle(points, label, image_num, object_num, label_to_num):
    annotation = {}
    seg_points = np.asarray(points).copy()
    seg_points[1, :] = np.asarray(points)[2, :]
    seg_points[2, :] = np.asarray(points)[1, :]
    annotation['segmentation'] = [list(seg_points.flatten())]
    annotation['iscrowd'] = 0
    annotation['image_id'] = image_num + 1
    annotation['bbox'] = list(
        map(float, [
            points[0][0], points[0][1], points[1][0] - points[0][0], points[1][
                1] - points[0][1]
        ]))
    annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
    annotation['category_id'] = label_to_num[label]
    annotation['id'] = object_num + 1
    return annotation


def annotations_polygon(height, width, points, label, image_num, object_num,
                        label_to_num):
    annotation = {}
    annotation['segmentation'] = [list(np.asarray(points).flatten())]
    annotation['iscrowd'] = 0
    annotation['image_id'] = image_num + 1
    annotation['bbox'] = list(map(float, get_bbox(height, width, points)))
    annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
    annotation['category_id'] = label_to_num[label]
    annotation['id'] = object_num + 1
    return annotation


def get_bbox(height, width, points):
    polygons = points
    mask = np.zeros([height, width], dtype=np.uint8)
    mask = PIL.Image.fromarray(mask)
    xy = list(map(tuple, polygons))
    PIL.ImageDraw.Draw(mask).polygon(xy=xy, outline=1, fill=1)
    mask = np.array(mask, dtype=bool)
    index = np.argwhere(mask == 1)
    rows = index[:, 0]
    clos = index[:, 1]
    left_top_r = np.min(rows)
    left_top_c = np.min(clos)
    right_bottom_r = np.max(rows)
    right_bottom_c = np.max(clos)
    return [
        left_top_c, left_top_r, right_bottom_c - left_top_c,
        right_bottom_r - left_top_r
    ]


def deal_json(ds_type, img_dir, files):
    data_coco = {}
    images_list = []
    annotations_list = []
    image_num = -1
    object_num = -1
    for file_ in files:
        print(file_)
        img_path = os.path.join(img_dir, file_)
        json_path = os.path.join(img_dir, file_.replace('/images', '/labels'))
        for img_file in tqdm(os.listdir(img_path)):
            img = cv2.imread(os.path.join(img_path, img_file))
            height, width = img.shape[:2]
            img_label = os.path.splitext(img_file)[0]
            if img_file.split('.')[
                    -1] not in ['bmp', 'jpg', 'jpeg', 'png', 'JPEG', 'JPG', 'PNG']:
                continue
            # label_file = osp.join(json_path, img_label + '.json')
            label_file = osp.join(json_path, img_label + '.txt')
            # print('Generating dataset from:', label_file)
            image_num = image_num + 1
            # with open(label_file) as f:
            with open(label_file, 'r', encoding='utf-8') as f:
                # data = json.load(f)
                data_str = f.readlines()
                # point_list = []
                # label_list = []
                data = {}
                data['imageHeight'] = height
                data['imageWidth'] = width
                data['imagePath'] = os.path.join(img_path, img_file)
                data['shapes'] = []

                for data_ in data_str:
                    shapes = {}
                    data_list = data_.strip().split(' ')
                    points = data_list[:-2]
                    label = data_list[-2]
                    # point_list.append(points)
                    # label_list.append(label)
                    shapes['label'] = label
                    point_s = list(map(int, points))
                    shapes['points'] = [[point_s[0], point_s[1]], [point_s[2], point_s[3]], [point_s[4], point_s[5]], [point_s[6], point_s[7]]]
                    shapes['shape_type'] = 'polygon'
                    data['shapes'].append(shapes)

                if ds_type == 'labelme':
                    images_list.append(images_labelme(data, image_num, file_))
                elif ds_type == 'cityscape':
                    images_list.append(images_cityscape(data, image_num, img_file))
                    # images_list.append(annotations_polygon(height, width, points, label, image_num, object_num,
                    #         label_to_num))
                if ds_type == 'labelme':
                    for shapes in data['shapes']:
                        object_num = object_num + 1
                        label = shapes['label']
                        if label not in labels_list:
                            categories_list.append(categories(label, labels_list))
                            labels_list.append(label)
                            label_to_num[label] = len(labels_list)
                        p_type = shapes['shape_type']
                        if p_type == 'polygon':
                            points = shapes['points']
                            annotations_list.append(
                                annotations_polygon(data['imageHeight'], data[
                                    'imageWidth'], points, label, image_num,
                                                    object_num, label_to_num))

                        if p_type == 'rectangle':
                            (x1, y1), (x2, y2) = shapes['points']
                            x1, x2 = sorted([x1, x2])
                            y1, y2 = sorted([y1, y2])
                            points = [[x1, y1], [x2, y2], [x1, y2], [x2, y1]]
                            annotations_list.append(
                                annotations_rectangle(points, label, image_num,
                                                      object_num, label_to_num))
                elif ds_type == 'cityscape':
                    for shapes in data['objects']:
                        object_num = object_num + 1
                        label = shapes['label']
                        if label not in labels_list:
                            categories_list.append(categories(label, labels_list))
                            labels_list.append(label)
                            label_to_num[label] = len(labels_list)
                        points = shapes['polygon']
                        annotations_list.append(
                            annotations_polygon(data['imgHeight'], data[
                                'imgWidth'], points, label, image_num, object_num,
                                                label_to_num))
    data_coco['images'] = images_list
    data_coco['categories'] = categories_list
    data_coco['annotations'] = annotations_list
    return data_coco


def voc_get_label_anno(ann_dir_path, ann_ids_path, labels_path):
    with open(labels_path, 'r') as f:
        labels_str = f.read().split()
    labels_ids = list(range(1, len(labels_str) + 1))

    with open(ann_ids_path, 'r') as f:
        ann_ids = [lin.strip().split(' ')[-1] for lin in f.readlines()]

    ann_paths = []
    for aid in ann_ids:
        if aid.endswith('xml'):
            ann_path = os.path.join(ann_dir_path, aid)
        else:
            ann_path = os.path.join(ann_dir_path, aid + '.xml')
        ann_paths.append(ann_path)

    return dict(zip(labels_str, labels_ids)), ann_paths


def voc_get_image_info(annotation_root, im_id):
    filename = annotation_root.findtext('filename')
    assert filename is not None
    img_name = os.path.basename(filename)

    size = annotation_root.find('size')
    width = float(size.findtext('width'))
    height = float(size.findtext('height'))

    image_info = {
        'file_name': filename,
        'height': height,
        'width': width,
        'id': im_id
    }
    return image_info


def voc_get_coco_annotation(obj, label2id):
    label = obj.findtext('name')
    assert label in label2id, "label is not in label2id."
    category_id = label2id[label]
    bndbox = obj.find('bndbox')
    xmin = float(bndbox.findtext('xmin'))
    ymin = float(bndbox.findtext('ymin'))
    xmax = float(bndbox.findtext('xmax'))
    ymax = float(bndbox.findtext('ymax'))
    assert xmax > xmin and ymax > ymin, "Box size error."
    o_width = xmax - xmin
    o_height = ymax - ymin
    anno = {
        'area': o_width * o_height,
        'iscrowd': 0,
        'bbox': [xmin, ymin, o_width, o_height],
        'category_id': category_id,
        'ignore': 0,
    }
    return anno


def voc_xmls_to_cocojson(annotation_paths, label2id, output_dir, output_file):
    output_json_dict = {
        "images": [],
        "type": "instances",
        "annotations": [],
        "categories": []
    }
    bnd_id = 1  # bounding box start id
    im_id = 0
    print('Start converting !')
    for a_path in tqdm(annotation_paths):
        # Read annotation xml
        ann_tree = ET.parse(a_path)
        ann_root = ann_tree.getroot()

        img_info = voc_get_image_info(ann_root, im_id)
        output_json_dict['images'].append(img_info)

        for obj in ann_root.findall('object'):
            ann = voc_get_coco_annotation(obj=obj, label2id=label2id)
            ann.update({'image_id': im_id, 'id': bnd_id})
            output_json_dict['annotations'].append(ann)
            bnd_id = bnd_id + 1
        im_id += 1

    for label, label_id in label2id.items():
        category_info = {'supercategory': 'none', 'id': label_id, 'name': label}
        output_json_dict['categories'].append(category_info)
    output_file = os.path.join(output_dir, output_file)
    with open(output_file, 'w') as f:
        output_json = json.dumps(output_json_dict)
        f.write(output_json)


def widerface_to_cocojson(root_path):
    train_gt_txt = os.path.join(root_path, "wider_face_split", "wider_face_train_bbx_gt.txt")
    val_gt_txt = os.path.join(root_path, "wider_face_split", "wider_face_val_bbx_gt.txt")
    train_img_dir = os.path.join(root_path, "WIDER_train", "images")
    val_img_dir = os.path.join(root_path, "WIDER_val", "images")
    assert train_gt_txt
    assert val_gt_txt
    assert train_img_dir
    assert val_img_dir
    save_path = os.path.join(root_path, "widerface_train.json")
    widerface_convert(train_gt_txt, train_img_dir, save_path)
    print("Wider Face train dataset converts sucess, the json path: {}".format(save_path))
    save_path = os.path.join(root_path, "widerface_val.json")
    widerface_convert(val_gt_txt, val_img_dir, save_path)
    print("Wider Face val dataset converts sucess, the json path: {}".format(save_path))


def widerface_convert(gt_txt, img_dir, save_path):
    output_json_dict = {
        "images": [],
        "type": "instances",
        "annotations": [],
        "categories": [{'supercategory': 'none', 'id': 0, 'name': "human_face"}]
    }
    bnd_id = 1  # bounding box start id
    im_id = 0
    print('Start converting !')
    with open(gt_txt) as fd:
        lines = fd.readlines()

    i = 0
    while i < len(lines):
        image_name = lines[i].strip()
        bbox_num = int(lines[i + 1].strip())
        i += 2
        img_info = get_widerface_image_info(img_dir, image_name, im_id)
        if img_info:
            output_json_dict["images"].append(img_info)
            for j in range(i, i + bbox_num):
                anno = get_widerface_ann_info(lines[j])
                anno.update({'image_id': im_id, 'id': bnd_id})
                output_json_dict['annotations'].append(anno)
                bnd_id += 1
        else:
            print("The image dose not exist: {}".format(os.path.join(img_dir, image_name)))
        bbox_num = 1 if bbox_num == 0 else bbox_num
        i += bbox_num
        im_id += 1
    with open(save_path, 'w') as f:
        output_json = json.dumps(output_json_dict)
        f.write(output_json)


def get_widerface_image_info(img_root, img_relative_path, img_id):
    image_info = {}
    save_path = os.path.join(img_root, img_relative_path)
    if os.path.exists(save_path):
        img = cv2.imread(save_path)
        image_info["file_name"] = os.path.join(os.path.basename(
            os.path.dirname(img_root)), os.path.basename(img_root),
            img_relative_path)
        image_info["height"] = img.shape[0]
        image_info["width"] = img.shape[1]
        image_info["id"] = img_id
    return image_info


def get_widerface_ann_info(info):
    info = [int(x) for x in info.strip().split()]
    anno = {
        'area': info[2] * info[3],
        'iscrowd': 0,
        'bbox': [info[0], info[1], info[2], info[3]],
        'category_id': 0,
        'ignore': 0,
        'blur': info[4],
        'expression': info[5],
        'illumination': info[6],
        'invalid': info[7],
        'occlusion': info[8],
        'pose': info[9]
    }
    return anno


def main():
    parser = argparse.ArgumentParser(
        formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument(
        '--dataset_type',default='labelme',
        help='the type of dataset, can be `voc`, `widerface`, `labelme` or `cityscape`')

    parser.add_argument('--image_label_dir_train',
                        default=['2347/train/images'],
                        help='image and label dir')

    parser.add_argument('--image_label_dir_test',
                        default=['2347/test/images'],
                        help='image and label dir')

    parser.add_argument('--image_input_dir',
                        default='./card_det/process_ok',
                        help='image directory')
    parser.add_argument(
        '--output_dir', help='output dataset directory', default='./card_det/process_ok/ppyoloer/demo')
    parser.add_argument(
        '--train_proportion',
        help='the proportion of train dataset',
        type=float,
        default=1.0)
    parser.add_argument(
        '--val_proportion',
        help='the proportion of validation dataset',
        type=float,
        default=1.0)
    parser.add_argument(
        '--test_proportion',
        help='the proportion of test dataset',
        type=float,
        default=0.0)
    parser.add_argument(
        '--voc_anno_dir',
        help='In Voc format dataset, path to annotation files directory.',
        type=str,
        default=None)
    parser.add_argument(
        '--voc_anno_list',
        help='In Voc format dataset, path to annotation files ids list.',
        type=str,
        default=None)
    parser.add_argument(
        '--voc_label_list',
        help='In Voc format dataset, path to label list. The content of each line is a category.',
        type=str,
        default=None)
    parser.add_argument(
        '--voc_out_name',
        type=str,
        default='voc.json',
        help='In Voc format dataset, path to output json file')
    parser.add_argument(
        '--widerface_root_dir',
        help='The root_path for wider face dataset, which contains `wider_face_split`, `WIDER_train` and `WIDER_val`.And the json file will save in this path',
        type=str,
        default=None)
    args = parser.parse_args()
    try:
        assert args.dataset_type in ['voc', 'labelme', 'cityscape', 'widerface']
    except AssertionError as e:
        print(
            'Now only support the voc, cityscape dataset and labelme dataset!!')
        os._exit(0)
    img_dir_list = []
    label_dir_list = []
    if args.dataset_type == 'voc':
        assert args.voc_anno_dir and args.voc_anno_list and args.voc_label_list
        label2id, ann_paths = voc_get_label_anno(
            args.voc_anno_dir, args.voc_anno_list, args.voc_label_list)
        voc_xmls_to_cocojson(
            annotation_paths=ann_paths,
            label2id=label2id,
            output_dir=args.output_dir,
            output_file=args.voc_out_name)
    elif args.dataset_type == "widerface":
        assert args.widerface_root_dir
        widerface_to_cocojson(args.widerface_root_dir)
    else:
        files_list = args.image_label_dir_train
        for file in files_list:
            img_file = os.path.join(args.image_input_dir, file)
            label_file = os.path.join(args.image_input_dir, file.replace('/images','/labels'))
            try:
                assert os.path.exists(img_file)
                img_dir_list.append(img_file)
            except AssertionError as e:
                print(img_file, 'The json folder does not exist!')
                os._exit(0)
            try:
                assert os.path.exists(label_file)
                label_dir_list.append(label_file)
            except AssertionError as e:
                print(label_file, 'The json folder does not exist!')
                os._exit(0)

        files_list = args.image_label_dir_test
        for file in files_list:
            img_file = os.path.join(args.image_input_dir, file)
            label_file = os.path.join(args.image_input_dir, file.replace('/images', '/labels'))
            try:
                assert os.path.exists(img_file)
                img_dir_list.append(img_file)
            except AssertionError as e:
                print(img_file, 'The json folder does not exist!')
                os._exit(0)
            try:
                assert os.path.exists(label_file)
                label_dir_list.append(label_file)
            except AssertionError as e:
                print(label_file, 'The json folder does not exist!')
                os._exit(0)

        try:
            assert os.path.exists(args.image_input_dir)
        except AssertionError as e:
            print('The image folder does not exist!')
            os._exit(0)
       
        # Deal with the json files.
        if not os.path.exists(args.output_dir + '/annotations'):
            os.makedirs(args.output_dir + '/annotations')
        if args.train_proportion != 0:
            train_data_coco = deal_json(args.dataset_type,
                                        args.image_input_dir, #所在主文件夹
                                        args.image_label_dir_train) #各个子文件夹
            train_json_path = osp.join(args.output_dir + '/annotations',
                                       'instance_train.json')
            json.dump(
                train_data_coco,
                open(train_json_path, 'w'),
                indent=4,
                cls=MyEncoder)

        if args.val_proportion != 0:
            val_data_coco = deal_json(args.dataset_type,
                                      args.image_input_dir,
                                      args.image_label_dir_test)
            val_json_path = osp.join(args.output_dir + '/annotations',
                                     'instance_val.json')
            json.dump(
                val_data_coco,
                open(val_json_path, 'w'),
                indent=4,
                cls=MyEncoder)
        


if __name__ == '__main__':
    main()

3.训练参数配置；

可以参考该链接：手把手教你使用PP-YOLOE-R进行旋转框检测 - 飞桨AI Studio

我这边还修改了图片的输入尺寸为640*640：

PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/_base_/ppyoloe_r_reader.yml

修改了epoch后，还对max_epochs和预热的epoch进行了修改：

PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/_base_/optimizer_3x.yml

4.主干的迁移；

正在处理中；（计划是将yolov5_obb中的轻量级主干运用到PPYOLOE-R）

其中遇到的问题：

1.数据转化问题

（略）训练使用的json文件和测试的json文件同时生成；

2.修改yaml问题：

ERROR: yaml.parser.ParserError: while parsing a block mapping in “./docker-peer.yaml“

原因：空格导致的未对齐（严格意义上的对齐）
解决方案，添加或者删除空格，使得同一层的保持对齐。

问题详情查看：

yaml.parser.ParserError: while parsing a block mapping · Issue #8339 · PaddlePaddle/PaddleDetection · GitHub

3.训练时回报问题：

不知道什么原因导致的，重新训练就可以了；

4.使用 PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota.yml训练结果分析：

长宽比明显的目标效果较好，当图像中的目标接近正方形的时候，效果就比较差；这个问题issue中有人进行了提问，没有得到解答；参考链接：使用自己的数据集训练ppyoloe_r模型，dfl_loss下降到0.8左右不下降，可视化训练后预测结果角度预测有偏差 · Issue #8013 · PaddlePaddle/PaddleDetection · GitHub

目前没有最新回复，后续我这边是将计算角度的loss从0.05调到了0.25，（同事说有好的预训练模型可以进行这种尝试，没有好的预训练模型，这种尝试会失败），我这边已经进行了修改，正在训练，结果还没有出来；

PaddleDetection-release-2.6/configs/rotate/ppyoloe_r/_base_/ppyoloe_r_crn.yml

5.安装pycocotool遇到的问题

# pip install pycocotools==2.0.2 -i https://pypi.tuna.tsinghua.edu.cn/simple

参考链接：https://wenku.csdn.net/answer/e2e2016ef0955f357421520784e2e8bf

补充一下知识点：

1.PPYOLOE-R的角度预测使用的是0-90°；yolov5-obb中角度和长宽比是分离的且基于anchor，所以yolov5obb在对接近方形的目标预测效果较好，但是宽高比远离1的目标效果较差；

未来的工作：

1.搜索文献的时候发现有yolov6旋转目标检测和mmrotate旋转目标检测，后期可以进行尝试；

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

初次使用PPYOLOE-R 的相关文章

用通俗易懂的方式讲解：使用 LlamaIndex 和 Eleasticsearch 进行大模型 RAG 检索增强生成

检索增强生成 Retrieval Augmented Generation RAG 是一种结合了检索 Retrieval 和生成 Generation 的技术它有效地解决了大语言模型 LLM 的一些问题比如幻觉知识限制等随着 RAG
OpenCV-Python 中的简单数字识别 OCR

我正在尝试在 OpenCV Python cv2 中实现数字识别 OCR 它仅用于学习目的我想学习 OpenCV 中的 KNearest 和 SVM 功能我有每个数字 100 个样本即图像我想和他们一起训练有一个样本letter
为什么用 PIL 和 pytesseract 无法获取字符串？

这是一个简单的Python 3光学字符识别 OCR 程序来获取字符串我已经在这里上传了目标gif文件请下载并另存为 tmp target gif try from PIL import Image except ImportError
了解 OCR 的 Freeman 链码

请注意我确实在寻找问题的答案我是not寻找一些源代码或一些学术论文的链接我已经使用了源代码并且我已经阅读了论文但仍然没有弄清楚这个问题的最后部分我正在研究一些快速屏幕字体 OCRing 并且取得了很好的进展我已经找到基线分离
Tesseract 对阿拉伯语单词/字母不返回任何内容

我已经安装了 Pytesseract 它可以完美地处理法语英语文本以及数字但是当我尝试阅读任何阿拉伯文本字母时它不会返回任何内容这是我使用过的代码 try from PIL import Image except ImportEr
使用 python 和 opencv 检测图像中的文本区域

我想使用 python 2 7 和 opencv 2 4 9 检测图像的文本区域并在其周围画一个矩形区域就像下面的示例图片所示我对图像处理很陌生所以任何想法如何做到这一点将不胜感激有多种方法可以检测图像中的文本我建议看看这个问题
Tess-2 OCR 不工作

我试图在 Android 上使用 tess two 从图像中获取文本但这给了我一个非常糟糕的结果 01 16 12 00 25 339 I Tesseract native 29038 Initialized Tesseract API
提高 Python Tesseract OCR 的准确性

我在用pytesseract https pypi org project pytesseract 随着openCV https pypi org project opencv python 在 Python 中的简单 django 应用程
提高识别率的图像预处理步骤

我正在为我的项目使用 TessBaseAPI 制作一个简单的 OCR Android 应用程序我已经完成了一些图像预处理步骤例如二值化和图像增强但他们的结果是50 到60 怎样才能提高识别率呢我包括两个示例图像 http image
我自己的 Python OCR 程序

我还是一个初学者但我想写一个字符识别程序这个程序还没有准备好而且我编辑了很多所以评论可能不完全一致我将使用 8 个连通性来标记连通分量 from PIL import Image import numpy as np im Ima
使用 OpenCV 对 Tesseract OCR 进行图像预处理

我正在尝试开发一个应用程序它使用 Tesseract 来识别手机摄像头拍摄的文档中的文本我使用 OpenCV 来预处理图像以实现更好的识别应用高斯模糊和阈值方法进行二值化但结果非常糟糕 Here https s6 postimg c
tess4j 与 Spring mvc

我已经尝试将 tess4j 作为独立的 java 程序并且它可以正常工作并给出文本输出现在我正在尝试创建一个 spring mvc web 项目在 pom 中添加 tess4j 的依赖项并且我已在我的项目中添加了 tess4j 源
让 tesseract 只识别数字

我正在尝试改进我制作的 OCR 程序来读取我正在使用的某个图像的布局现在我希望我的 OCR 程序只能识别数字 0 9 我尝试遵循问题的解决方案限制 tesseract 正在寻找的字符 https stackoverflow com q
断言失败 - 训练 Tesseract

我正在尝试使用 Serak Tesseract Trainer 训练 tesseract https code google com p serak tesseract trainer https code google com p ser
Google Vision API 文本识别器无法正常工作

我使用 Google Vision API 来读取报纸等任何物体上的文本或墙上的文本我已经尝试过来自 Google 开发者网站的相同示例但我的文本识别器总是返回 falseIsOperational功能我在 Blackberry ke
tesseract (v3.03) 输出为 PDF [关闭]

Closed 这个问题不符合堆栈溢出指南 help closed questions 目前不接受答案为什么会返回这个错误呢 root amd 3700 2gb ocr test tesseract l dan pdf png out pd
OCR 扑克牌 [关闭]

Closed 这个问题需要多问focused help closed questions 目前不接受答案我决定做一个有趣的项目我想将扑克牌的图像作为输入并返回其等级和花色我认为我只需要查看左上角因为那里包含了所有信息它应该是稳健的
收据褪色部分可以恢复吗？

我有一些包含一些扫描收据的文件我需要使用 OCR 从中提取文本由于收据上打印的文字在一段时间后会褪色导致收据上的某些文字不清晰影响OCR结果褪色单词的一些示例有什么方法可以恢复褪色的部分以便提高 OCR 结果吗我在OpenC
训练 tesseract 与 iPhone 一起使用

我正在尝试在我的 iPhone 应用程序中使用 tesseract 2 04 只想检测数字我在这里所做的首先是使用这篇文章交叉编译 tesseract 以生成 lib 文件http robertcarlsen net 2009 07 15
来自 Google Vision API OCR 的响应 400，带有指定图像的 base64 字符串

我读了如何使用 Google Vision API 对 Base64 编码图像进行文本检测 https stackoverflow com questions 43094048 how to use the google vision ap

随机推荐

【千律】C++基础：List 链表循环删除元素，及其报错的解决方案

报错原因采用erase移除迭代器后迭代器的值变为 572662307 无法作为迭代器继续运算详细当程序执行到 list int erase itor 时满足条件的第一个元素被删除从而导致 itor 指针被删除使其不指向任何元素
全新出品！阿里 P5 工程师~P8 架构师晋升路线揭秘

阿里巴巴终于公开了从初级程序员到架构师的学习路线图这里相对应的基本上就是从 P5 到 P8 的晋升体系今天老师将会带着大家从初级程序员开始一点点分享整个晋升体系职级初级程序员薪资 6 12K 开发年限 0 1 年技术能力能够理
小程序开发：监听返回当前页面

故事背景小程序开发需要判断进入当前页面是初次加载还是返回的操作就分享一下卤煮的实现思路吧实现原理先上图我们都知道在创建page页面的时候开发工具会默认帮我们把生命周期的钩子一起生成卤煮就用到了下面这两个钩子函数原理页面
改变Element-ui 主题颜色

主题颜色的改变本质是改变样式的color 文字背景描边局部改变可以使用样式穿透普通css root gt gt gt el upload list item transition none important webkit tr
产业区块链一周动态丨深圳龙华区与腾讯共建产业区块链联盟，新四板试水区块链...

作者邱祥宇 10 24 会议之后产业区块链似乎成为了一种政治正确全国26省将区块链写入政府工作报告 12省发布专项政策19件一顿操作猛如虎整个行业拿到融资的公司却只有17家政策热火朝天资本望而却步是什么原因导致中间出现断层
mysql分布式实践 - keepalived 实现IP漂移

前言 mysql 分布式尤其是主主复制架构中也是实现了读写分离的如果有一个主master 挂掉了那么如何让用户无感知的将请求打到另外一个master上 keepalived 插件的IP漂移就可以实现 1 keepalived 原理
利用ipv6，在windows和ipad上远程访问共享文件夹

利用ipv6 在windows和ipad上远程访问共享文件夹背景之前说到用ipv6 解决了远程桌面连接问题利用ipv6远程桌面彻底解决校园网掉线带来的问题那么问题又来了我在实验室电脑下载了python学习视频这个视频特别大几
帧间差分法、背景减法、光流场法简介

概述运动目标检测是指当监控场景中有活动目标时采用图像分割的方法从背景图像中提取出目标的运动区域运动目标检测技术是智能视频分析的基础因为目标跟踪行为理解等视频分析算法都是针对目标区域的像素点进行的目标检测的结果直接决定着智能视觉监
4个最实用最强大ChatGPT插件

GPT的插件有很多功能也很强大这些插件是自定义模块可以集成到为特定行业量身定制的 AI 聊天机器人中包括电子商务医疗保健金融和教育使用 ChatGPT 插件您现在可以做的不仅仅是聊天今天给大家分享4个经常使用的插件希望能
Qt Plugin插件机制与实例

1 Qt 插件机制 1 1 Qt 插件简介插件是一种遵循一定规范的应用程序接口编写出来的程序定位于开发实现应用软件平台不具备的功能的程序插件与宿主程序之间通过接口联系就像硬件插卡一样可以被随时删除插入和修改所以结构很灵活容易
深度：微软对Sun的步步紧逼催生了JavaFX

在旧金山召开的JavaOne会议上 Sun首次公开了Java家族的新产品JavaFX 之后国内外各大媒体就开始竞相报道也采访了国内的一些Java技术方面的专家大家针对这一产品的褒贬不一打开参与JavaFX产品研发的Sun的工程师Chr
解决一个安装office2016缺少vcruntime140.dll的问题

问题描述在安装并激活好office2016之后双击word 显示缺少vcruntime140 dll 解决方案一 1 下载vcruntime140 dll 2 复制到C盘windows sysWOW64 3 然后寻找下载文件夹里zhu
一文详解 Spring Bean 循环依赖

一背景有好几次线上发布老应用时遇到代码启动报错具体错误如下 Caused by org springframework beans factory BeanCurrentlyInCreationException Error cre
chooseAddress:fail the api need to be declared in the requiredPrivateInfos field in app.json/ext.js

错误描述在使用uni app开发微信小程序的时候想要通过uni chooseLocation获取用户地理位置的时候出现chooseAddress fail the api need to be declared in the requi
Linux环境安装配置ffmpeg

最近需要在云主机上配置ffmpeg 租的服务器上面的环境往往是Linux 参考别人的文章配好了环境在此进行综合记录参考文章 https zhuanlan zhihu com p 347780238 https blog csdn net
Python 变量类型

变量是存储在内存中的值这就意味着在创建变量时会在内存中开辟一个空间基于变量的数据类型解释器会分配指定内存并决定什么数据可以被存储在内存中因此变量可以指定不同的数据类型这些变量可以存储整数小数或字符变量赋值 Python 中
python技能描述_【python】利用python爬虫将LOL所有英雄的技能介绍给爬取下来

工欲善其事必先利其器要想玩好LOL 那了解所有英雄的技能必然是其最基本的所以此爬虫就应运而生运行环境 python 3 7 此爬虫所用的库有 requests 获取网页信息 openpyxl Excel相关操作 pymysql My
UE4 命令行创建Pak

原创文章转载请注明出处回头还会出一个通过编辑器扩展创建Pak的命令行的还是比较麻烦命令行打包如下引擎版本4 25 由于使用新的引擎版本感觉pak这块变化挺大的 1 gt 注意中间的空格 2 gt 解析 1 E engine 4
2021全国大学生信息安全竞赛初赛部分WP

第一阶段 Misc tiny traffic 分析pcap包能发现几个可疑的请求 flag wrapper test secret 分别把它们传输的文件提取出来得到三个压缩包通过flag gzip的内容可以确认这个IP就是我们要分析的
初次使用PPYOLOE-R

目的优化基于yolov5 obb旋转目标检测算法的证件区域检测之前的方法是基于anchor 每次使用都要调试anchor 而ppyoloe r是free anchor的算法源码位置 https github com PaddlePad

初次使用PPYOLOE-R

初次使用PPYOLOE-R 的相关文章

随机推荐

热门标签