生成小批量数据集

2023-05-16

shell脚本随机筛选一个目录下后缀为2、4、6、8的*.mp4文件。

find /mnt/sdb/dataset/20181217_RX5_zheA5MV46/ -name *.mp4 | grep [2,4,6,8].mp4 > tem.txt

结果如下:

/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_002104_0022.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_002305_0024.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_002505_0026.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_002705_0028.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_003105_0032.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_003305_0034.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_003505_0036.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_003705_0038.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_004105_0042.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_004306_0044.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_004506_0046.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_004706_0048.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_005106_0052.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_005306_0054.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_005506_0056.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_005706_0058.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_010106_0062.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_010306_0064.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_010506_0066.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_010707_0068.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_011107_0072.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_011307_0074.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_011507_0076.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_011707_0078.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_012043_0082.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_012243_0084.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_012443_0086.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_012643_0088.mp4
/mnt/sdb/dataset/20181217_RX5_zheA5MV46/suzhou2shanghai_freeway_sunny_day/ch1_20181217_013043_0092.mp4

或者下面正则表达式的命令:

find /mnt/sdb/dataset/20181217_RX5_zheA5MV46/ -regex .*[2,4,6,8].mp4 > tem.txt

#随机从测试数据集中生成小批量数据集
数据集目录:JPEGImages(源数据)、Annotations(xml标签文件)、testshuffle100.txt,testshuffle100.txt内容如下:

/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190707_002425_0053_2130_0.jpg 1920 1080 2,855,506,929,585 0,930,543,964,570 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190110_003309_0034_540_0.jpg 1920 1080 6,1216,371,1275,423 0,955,523,1033,587 4,1247,515,1314,586 2,852,491,908,557 2,900,503,938,550 7,948,384,1009,426 7,1005,385,1070,424 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190123_005205_0051_630_0.jpg 1920 1080 0,1147,507,1322,631 0,839,521,954,604 0,950,526,1005,570 0,1060,502,1107,562 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190725_134338_0072_4200_0.jpg 1920 1080 5,710,427,737,483 5,1078,328,1107,400 5,1132,336,1162,408 5,1183,343,1210,411 5,1758,494,1780,537 0,1254,601,1677,831 7,1503,570,1528,605 6,1328,561,1350,589 6,1325,587,1349,614 6,1070,449,1098,478 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190707_113025_0023_1890_0.jpg 1920 1080 7,1306,488,1338,510 0,831,541,900,568 0,503,543,560,563 0,100,530,590,821 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190119_030713_0385_600_0.jpg 1920 1080 2,1544,427,1642,581 0,1138,526,1348,660 0,1047,524,1118,578 0,924,503,1055,627 4,685,535,714,595 4,501,534,545,603 3,329,531,370,616 2,1438,420,1614,584 3,1367,507,1387,575 4,873,526,887,552 4,781,531,808,576 7,937,479,966,493 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190725_161919_0117_3600_0.jpg 1920 1080 4,313,568,335,601 0,559,563,614,599 0,637,567,682,599 0,679,572,710,594 1,705,540,768,596 1,658,550,708,589 7,875,522,902,545 5,889,468,899,494 5,860,467,870,493 5,403,429,419,467 5,457,446,472,483 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190119_024712_0365_90_0.jpg 1920 1080 1,1191,56,1917,814 0,944,521,1153,692 0,986,495,1042,552 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190725_135537_0076_3240_0.jpg 1920 1080 1,927,550,985,619 0,589,564,663,618 6,986,428,1013,457 6,1017,428,1044,457 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190119_000509_0118_930_0.jpg 1920 1080 0,489,531,764,683 0,930,512,1029,606 2,1107,404,1335,634 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20181230_013534_0096_720_0.jpg 1920 1080 2,1567,222,1956,683 0,961,523,1012,566 2,1020,502,1049,538 2,2,325,641,660 6,1274,472,1299,501 0,1069,522,1093,544 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190123_011829_0079_810_0.jpg 1920 1080 0,673,507,831,614 0,884,521,926,551 2,970,453,1046,563 7,1068,490,1090,505 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190110_001608_0017_780_0.jpg 1920 1080 4,1131,524,1154,554 2,1042,473,1116,561 0,970,523,1016,564 0,1000,526,1020,547 
/mnt/sdb/dataset/BMDD/Target/JPEGImages/ch1_20190122_003846_0035_930_0.jpg 1920 1080 0,413,458,770,711 

#小数据分配脚本如下,方便训练模型调试

#!/usr/bin/env python
#coding:utf-8

import random
import time
import cv2
import shutil
import os

def test(num):
    start = time.clock()

    f = open("/mnt/sdb/dataset/BMDD/Target/test_8.txt","r")
    fw = open("/home/szhang/project/data/test/test100/testshuffle100.txt","w")
    raw_list = f.readlines()
    random.shuffle(raw_list)
    for i in range(num):
        #拷贝图片数据
        image_path = raw_list[i].strip().split(" ")[0]
        #shutil.copy(文件的路径,另一个目录)
        shutil.copy(image_path,"/home/szhang/project/data/test/test100/JPEGImages/")
        #拷贝数据标签
        img_path_list = image_path.strip().split("/")
        data_root = img_path_list[0:-2]
        annotation = '/'
        for str_path in data_root:
        annotation = os.path.join(annotation, str_path)
        annotation = os.path.join(annotation, "Annotations")
        img_name = img_path_list[-1]
        img_name_pre  = img_name.split(".")[0]
        xml_name = img_name_pre + '.xml'
        annotation = os.path.join(annotation, xml_name)
        shutil.copy(annotation,"/home/szhang/project/data/test/test100/Annotations/")

        #dst_img = "/home/szhang/project/data/test/test100/JPEGImages/" + img_name
        fw.writelines(raw_list[i])

    end = time.clock()
    print ("cost time is {}".format(end - start))

test(100)

从一个txt文件中取出字符串,在另一个txt文件中查找对应字符串及对应行号:

    #读取txt文件中的特定行,找到对应的原图片
    src_txt_path = '/home/szhang/project/data/test/test100/testshuffle100.txt'
    src_txt_data = open(src_txt_path,'r')
    txt_data_lists = src_txt_data.readlines()

    for index, txt_data_list in enumerate(txt_data_lists):
        image_name_pre = txt_data_list.strip().split(" ")[0].split("/")[-1].split(".")[0]
        #查找txt文件中的特定的字符串,并返回行号
        src_txt_path2 = '/home/szhang/project/data/test/test100/detection_SNPE_CDSP_result/SNPE_CDSP_result/file_list.txt'
        src_txt_data2 = open(src_txt_path2,'r')
        txt_data_lists2 = src_txt_data2.readlines()
        i = 0
        for index2, txt_data_list2 in enumerate(txt_data_lists2):
            if txt_data_list2.find(image_name_pre) != -1:
                image_name_pre2 = txt_data_list2.strip().split("/")[-1].split(".")[0]
                i = index2
                break

Python中split()和os.path.split()区别

Python中有split()和os.path.split()两个函数,具体作用如下:

split():拆分字符串。通过指定分隔符对字符串进行切片,并返回分割后的字符串列表(list)

os.path.split():按照路径将文件名和路径分割开


1、split()函数


语法:str.split(str="",num=string.count(str))[n]

参数说明:
str:表示为分隔符,默认为空格,但是不能为空('')。若字符串中没有分隔符,则把整个字符串作为列表的一个元素
num:表示分割次数

。如果存在参数num,则仅分隔成 num+1 个子字符串,并且每一个子字符串可以赋给新的变量
[n]:表示选取第n个分片

注意:当使用空格作为分隔符时,对于中间为空的项会自动忽略

2、os.path.split

()函数
语法:os.path.split('PATH')

参数说明:

1.PATH指一个文件的全路径作为参数:

2.如果给出的是一个目录和文件名,则输出路径和文件名

3.如果给出的是一个目录名,则输出路径和为空文件名

二、分离字符串

string = "http://www.gziscas.com.cn"

1.'.'为分隔符

print(string.split('.'))

['www', 'gziscas', 'com', 'cn']

2.分割两次

print(string.split('.'2))

['www', 'gziscas', 'com.cn']

3.分割两次,并取序列为1的项

print(string.split('.',2)[1])

gziscas

4.分割两次,并把分割后的三个部分保存到三个文件

u1, u2, u3 =string.split('.',2)

print(u1)—— www

print(u2)—— gziscas

print(u3) ——http://com.cn

三、分离文件名和路径

import os

print(os.path.split('/dodo/soft/python/'))

('/dodo/soft/python', '')

print(os.path.split('/dodo/soft/python'))

('/dodo/soft', 'python')

os.path.splitext()与os.path.split()的区别

总结:

#os.path.splitext() 将文件名和扩展名分开

#os.path.split() 返回文件的路径和文件名

# -*- coding:utf-8 -*-
"""
@author:lei 
"""
import os
 
#os.path.join() 将分离的部分合成一个整体
filename=os.path.join('/home/ubuntu/python_coding','split_func')
print filename
#输出为:/home/ubuntu/python_coding/split_func
 
#os.path.splitext()将文件名和扩展名分开
fname,fename=os.path.splitext('/home/ubuntu/python_coding/split_func/split_function.py')
print 'fname is:',fname
print 'fename is:',fename
#输出为:
# fname is:/home/ubuntu/python_coding/split_func/split_function
#fename is:.py
 
#os.path.split()返回文件的路径和文件名
dirname,filename=os.path.split('/home/ubuntu/python_coding/split_func/split_function.py')
print dirname
print filename
#输出为:
# /home/ubuntu/python_coding/split_func
#split_function.py
 
#split()函数
#string.split(str="", num=string.count(str))[n]
#str - - 分隔符,默认为所有的空字符,包括空格、换行(\n)、制表符(\t)等。
#num - - 分割次数。
#[n] - - 选取的第n个分片
string = "hello.world.python"
print string.split('.')#输出为:['hello', 'world', 'python']
print(string.split('.',1))#输出为:['hello', 'world.python']
print(string.split('.',1)[0])#输出为:hello
print(string.split('.',1)[1])#输出为:world.python
string2="hello<python.world>and<c++>end"
print(string2.split("<",2)[2].split(">")[0])#输出为:c++

os.walk()方法详细讲解

在这里插入图片描述

Python中List的复制(直接复制、浅拷贝、深拷贝)

直接赋值:

如果用 = 直接赋值,是非拷贝方法。

这两个列表是等价的,修改其中任何一个列表都会影响到另一个列表。

old = [1,[1,2,3],3]
new = []
for i in range(len(old)):
    new.append(old[i])
 
new[0] = 3
new[1][0] = 3
 
'''
-----------------------
Before:
[1, [1, 2, 3], 3]
[1, [1, 2, 3], 3]
After:
[3, [3, 2, 3], 3]
[3, [3, 2, 3], 3]
-----------------------
'''

浅拷贝:

1.copy()方法
对于List来说,其第一层,是实现了深拷贝,但对于其内嵌套的List,仍然是浅拷贝。

因为嵌套的List保存的是地址,复制过去的时候是把地址复制过去了,嵌套的List在内存中指向的还是同一个。

old = [1,[1,2,3],3]
new = old.copy()
 
new[0] = 3
new[1][0] =3
 
'''
---------------------
Before:
[1, [1, 2, 3], 3]
[1, [1, 2, 3], 3]
After:
[1, [3, 2, 3], 3]
[3, [3, 2, 3], 3]
---------------------
'''

2.使用列表生成式
使用列表生成式产生新列表也是一个浅拷贝方法,只对第一层实现深拷贝。

old = [1,[1,2,3],3]
new = [i for i in old]
 
new[0] = 3
new[1][0] = 3
 
'''
----------------------
Before
[1, [1, 2, 3], 3]
[1, [1, 2, 3], 3]
After
[1, [3, 2, 3], 3]
[3, [3, 2, 3], 3]
----------------------
'''

3.for循环遍历
通过for循环遍历,将元素一个个添加到新列表中。这也是一个浅拷贝方法,只对第一层实现深拷贝。

old = [1,[1,2,3],3]
new = []
for i in range(len(old)):
    new.append(old[i])
 
new[0] = 3
new[1][0] = 3
 
'''
-----------------------
Before:
[1, [1, 2, 3], 3]
[1, [1, 2, 3], 3]
After:
[1, [3, 2, 3], 3]
[3, [3, 2, 3], 3]
-----------------------
'''

4.使用切片
通过使用 [ : ] 切片,可以浅拷贝整个列表,同样的,只对第一层实现深拷贝。

old = [1,[1,2,3],3]
new = old[:]
 
new[0] = 3
new[1][0] = 3
 
'''
------------------
Before:
[1, [1, 2, 3], 3]
[1, [1, 2, 3], 3]
After:
[1, [3, 2, 3], 3]
[3, [3, 2, 3], 3]
------------------
'''

深拷贝:

如果用deepcopy()方法,则无论多少层,无论怎样的形式,得到的新列表都是和原来无关的,这是最安全最清爽最有效的方法。

需要import copy

import copy
old = [1,[1,2,3],3]
new = copy.deepcopy(old)
 
new[0] = 3
new[1][0] = 3
 
'''
-----------------------
Before:
[1, [1, 2, 3], 3]
[1, [1, 2, 3], 3]
After:
[1, [1, 2, 3], 3]
[3, [3, 2, 3], 3]
-----------------------
'''

tensor与numpy之间相互转化

python中List类型与numpy.array类型的互相转换

Tensor张量转化为numpy
a = torch.FloatTensor(2,3)
print a.numpy(); # pytorch1.0版本以前
print a.detach().numpy(); # pytorch1.0
print a.detach().cpu().numpy() #从GPU的Tensor张量转化为cpu的numpy数据

将numpy转换为Tensor张量
a = np.ones(5)
torch.from_numpy(a)


当然要先引入numpy包
import numpy as np
将python中List转换为numpy类型
temp = np.array(list)

将numpy转换为python中List类型
arr = temp.tolist()

案例:List、tensor与numpy之间对比

a = [1,2,3,4,5]
A  = np.array(a)  				#A = [1,2,3,4,5]
AA = torch.from_numpy(A)		#AA = [1,2,3,4,5]
[1] + a                             #输出[1, 1, 2, 3, 4, 5]
[1] + A                             #报错
[1] + AA                             #报错
1 + a                                #报错
1 + A                                #输出array([2, 3, 4, 5, 6])
1 + AA                                #输出tensor([2, 3, 4, 5, 6])
2 * a                                #输出[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
2 * A                                #输出array([2,  4,  6,  8, 10])
2 * AA                                #输出tensor([2,  4,  6,  8, 10])

lanenet车道线后处理numpy部分详解:

https://github.com/harryhan618/LaneNet中的utils/postprocess.py文件。

import numpy as np
from sklearn.cluster import MeanShift, estimate_bandwidth


def embedding_post_process(embedding, bin_seg, band_width=1.5, max_num_lane=4):
    """
    First use mean shift to find dense cluster center.

    Arguments:
    ----------
    embedding: numpy [H, W, embed_dim]
    bin_seg: numpy [H, W], each pixel is 0 or 1, 0 for background pixel
    delta_v: coordinates within distance of 2*delta_v to cluster center are

    Return:
    ---------
    cluster_result: numpy [H, W], index of different lanes on each pixel
    """
    cluster_result = np.zeros(bin_seg.shape, dtype=np.int32)#[256,512]

    cluster_list = embedding[bin_seg>0] #[2438,4]
    if len(cluster_list)==0:
        return cluster_result

    mean_shift = MeanShift(bandwidth=1.5, bin_seeding=True, n_jobs=-1)
    mean_shift.fit(cluster_list)

    labels = mean_shift.labels_   #[2438]
    cluster_result[bin_seg>0] = labels + 1

    cluster_result[cluster_result > max_num_lane] = 0
    for idx in np.unique(cluster_result):
        if len(cluster_result[cluster_result==idx]) < 15:
            cluster_result[cluster_result==idx] = 0

    return cluster_result

测试:
1.终端输入命令"python"进入python环境,接着"import numpy as np"命令。
2.输入:a=np.random.random((5,5))请添加图片描述
3.输入:c=np.random.random((5,5,3))请添加图片描述
4.输入:c[a>0.1],请注意此时c[a>0.1]的维度是()请添加图片描述5.错误示例对比:输入:d=np.random.random((3,5,5)),再输入d[a>0.1]就会报错。可以体会下1-4步正确结果和第五步的错误结果,对比一下,总结规律。请添加图片描述

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

生成小批量数据集 的相关文章

随机推荐