多目标跟踪-DeepSort分析（一）论文解读及代码架构

2023-05-16

先引入多目标跟踪DeepSort的论文地址及代码链接（Python版）：

论文地址：https://arxiv.org/pdf/1703.07402.pdf

代码链接：https://github.com/nwojke/deep_sort

写在前面：

    这些天看了deepsort的论文及源码，并在网上检索一些相关资料（不是很多），现结合论文、博客及自己的理解进行总结。
    本文为第一篇，首先对论文进行解读，然后对github的代码进行简要的流程分析及解读，后期还会对所用算法进行具体的展开。
    
    本文结构：
        一、论文重点部分解读
        二、代码流程及算法分析

1. 论文重点部分解读

1.1. 轨迹处理及状态估计（track handing and state estimation）

第i时刻的状态用8维状态空间表示：

其中，u，v表示目标框的中心坐标；
γ，h（第三和第四个值）表示宽高比（宽高比应该为一个常量）、高度；
以上四个值构成观测变量。
以及图像坐标系下的速度（导数）

对于每条轨迹 k 都有一个阈值a用于记录轨迹从上一次成功匹配到当前时刻的时间。当该值大于提前设置的阈值 Amax 则认为改轨迹终止，直观上说就是长时间匹配不上的轨迹认为已经结束。
轨迹的三种状态：（在代码中定义为枚举变量）
tentative（初始默认状态）
confirmed
deleted

enum TrackState
{
     Tentative = 1
     Confirmed = 2
     Deleted = 3
}

在匹配时，对于没有匹配成功的检测都认为可能产生新的轨迹。但由于这些检测结果可能是一些错误警告，所以：
对这种新生成的轨迹标注状态 tentative （初始默认状态）；
然后判定在接下来的连续3帧中是否连续匹配成功，若成功，则标注为 confirmed ，认为是新轨迹产生；
否则，标注为 deleted，删除。另外，超过预先设置的Amax = 30 的轨迹，也被认为离开场景，并标注为 deleted，删除。

        if self.state == TrackState.Tentative:
            self.state = TrackState.Deleted
        elif self.time_since_update > self._max_age:
            self.state = TrackState.Deleted

1.2分配（匹配）问题(assignment problem)

这里的匹配，是只当前被标注为“ confirmed ”的轨迹(即有效轨迹)与当前的检测之间的匹配。
使用匈牙利算法进行目标框的匹配；
使用运动匹配和外观匹配对目标框进行匹配。

1.2.1运动匹配

用Mahalanobis距离（马氏距离）来表示第j个检测和第i条轨迹之间的运动匹配程度。公式如下图所示：

其中，
dj表示第j个检测的状态;
yi是轨迹在当前时刻的预测值；
si是轨迹，由kalman滤波器预测得到的；
通过该马氏距离对检测框进行筛选，使用卡方分布的0.95分位点作为阈值。

1.2.2外观匹配

在实际中，比如相机运动，都会导致马氏距离匹配失效，因此引入余弦距离（第i次跟踪和第j次检测的最小余弦距离）来进行外观匹配，该匹配对长时间遮挡后恢复尤其有用，公式如下：

最后，利用加权的方式对这两个距离进行融合。关联度量的总公式如下所示：

其中，λ设置为0（论文中）。

1.3级联匹配（matching cascade）

 //# Run matching cascade.
 typedef std::vector<int> IDS;
 
 struct RR
{
    std::vector<std::pair<int, int>> matches;
    IDS unmatched_detections; 
    IDS unmatched_tracks;
};
 RR _match(const std::vector<Detection> &detections)
    {
        int64_t mtm1 = line_gtm();
        //Split track set into confirmed and unconfirmed kalmanTrackers.
        IDS confirmed_trackIds;
        IDS unconfirmed_trackIds;
        for (int i = 0; i < kalmanTrackers_.size(); i++)
        {
            KalmanTracker t = kalmanTrackers_[i];
            if (t->is_confirmed())
            {
                confirmed_trackIds.push_back(i);
            }
            else
            {
                unconfirmed_trackIds.push_back(i);
            }
        }

        //# Associate confirmed kalmanTrackers using appearance features.
        RR rr = linear_assignment::matching_cascade(
            getCostMatrixByNND,
            NearestNeighborDistanceMetric::Instance()->matching_threshold(),
            max_age_,
            kalmanTrackers_,
            detections,
            &confirmed_trackIds);
        std::vector<std::pair<int, int>> matches_a = rr.matches;
        IDS unmatched_tracks_a = rr.unmatched_tracks;
        IDS unmatched_detections = rr.unmatched_detections;

        int64_t mtm2 = line_gtm();

        //# Associate remaining kalmanTrackers together with unconfirmed kalmanTrackers using IOU.
        IDS iou_track_candidateIds, tmp;
        std::copy(unconfirmed_trackIds.begin(),
                  unconfirmed_trackIds.end(),
                  std::back_inserter(iou_track_candidateIds));
        for (int k = 0; k < unmatched_tracks_a.size(); k++)
        {
            int id = unmatched_tracks_a[k];
            if (kalmanTrackers_[id]->time_since_update_ == 1)
            {
                iou_track_candidateIds.push_back(id);
            }
            else
            {
                tmp.push_back(id);
            }
        }
        unmatched_tracks_a.clear();
        unmatched_tracks_a = tmp;

        int64_t mtm3 = line_gtm();
        
        RR rr1 = linear_assignment::min_cost_matching(
            iou_matching::getCostMatrixByIOU,
            max_iou_distance_,
            kalmanTrackers_,
            detections,
            &iou_track_candidateIds,
            &unmatched_detections);
        std::vector<std::pair<int, int>> matches_b = rr1.matches;
        IDS unmatched_tracks_b = rr1.unmatched_tracks;
        unmatched_detections = rr1.unmatched_detections;

        int64_t mtm4 = line_gtm();
        
        RR re;
        re.matches = matches_a;
        std::copy(matches_b.begin(), matches_b.end(),
                  std::back_inserter(re.matches));
        re.unmatched_detections = unmatched_detections;
        re.unmatched_tracks = unmatched_tracks_a;
        std::copy(unmatched_tracks_b.begin(),
                  unmatched_tracks_b.end(),
                  std::back_inserter(re.unmatched_tracks));
        int64_t mtm5 = line_gtm();
        return re;
    }

    int _NewTrack(const Detection &detection)
    {
        int id = _next_id_;
        std::pair<MEAN, VAR> pa =
            KF::Instance()->initiate(detection.to_xyah());
        KalmanTracker newt(new KalmanTrackerN(
            pa.first, pa.second, _next_id_, n_init_, max_age_,
            detection.feature_, true, detection.oriPos_));
        kalmanTrackers_.push_back(newt); 
        _next_id_ += 1;
        return id;
    }
};
RR rr = this->_match(detections);

当一个目标被遮挡很长时间，Kalman滤波的不确定性就会大大增加，为了解决该问题，论文采用级联匹配的策略来提高匹配精度。文中算法如下图所示：

其中，T表示目标跟踪集合
D表示目标检测集合
C矩阵存放所有目标跟踪与目标检测之间距离的计算结果
B矩阵存放所有目标跟踪与目标检测之间是否关联的判断（0或者1）
M,U为返回值，分别表示匹配集合和非匹配集合。

1.4深度表观特征（deep appearance descriptor）

论文中，作者用一个深度卷积神经网络去提取目标的特征信息，论文中的预训练的网络是在一个ReID的大数据集上训练得到的，包含1261个人的1100000幅图像，非常适合对人物目标跟踪。
网络结构如下：
CNN
该网络有2,800,864参数和32个目标框，在NVIDIA GTX1050上需要30ms。

2.代码流程及算法分析（待补充）

2.1detection

检测基类。

2.2HungarianOper

匈牙利指派，采用的是匈牙利算法/Hungrain/带权重的二分图指派算法Munkres Alogrithm。

2.3iou_matching

IOU匹配模块。（Iou–重叠区域面积）

2.4kalman_filter

卡尔曼滤波器，该模块实现图像空间的目标状态的预测/创建及移除，即滤波的具体参数化。

2.5linear_assignment

线性匹配–用最小的cost-matirx匹配级联。考虑了运动信息和外观信息。

for (int row = 0; row < track_indices.size(); row++)
        {
            int track_idx = track_indices[row];
            KalmanTracker track = tracks[track_idx];

 //计算detection中边框 dj(u,v,r,h)dj(u,v,r,h)和Track中的边框 yiyi之间的马氏距离
//计算 predicted Kalman states 和newly arrived measurements之间的马氏距离

Eigen::Matrix<float, 1, -1> gating_distance = kalmanFilter.gating_distance(
              track->mean_, track->covariance_, measurements, only_position);
          for (int i = 0; i < gating_distance.cols(); i++)
          // gating_distance is a vector
          {
              if (gating_distance(0, i) > gating_threshold)
              {
                  cost_matrix(row, i) = gated_cost;
              }
          }
      }
      return cost_matrix;

2.6nn_matching

2.7tracker

目标跟踪。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)