Computer Vision for Autonomous Vehicles:
Problems, Datasets and State-of-the-Art
自动驾驶技术的计算机视觉:问题,数据和前沿技术
摘要,前沿和技术历史

Abstract 摘要

  • Recent years have witnessed amazing progress in AI related fields such as computer vision, machine learning and autonomous vehicles. |As with any rapidly growing field, however, it becomes increasingly difficult to stay up-to-date or enter the field as a beginner. |While several topic specific survey papers have been written, to date no general survey on problems, datasets and methods in computer vision for autonomous vehicles exists.|This paper attempts to narrow this gap by providing a state-of-the-art survey on this topic. Our survey includes both the historically most relevant literature as well as the current state-of-the-art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding and end-to-end learning. |Towards this goal, we first provide a taxonomy to classify each approach and then analyze the performance of the state-of-the-art on several challenging benchmarking datasets including KITTI, ISPRS, MOT and Cityscapes. |Besides, we discuss open problems and current research challenges. To ease accessibility and accommodate missing references, we will also provide an interactive platform which allows to navigate topics and methods, and provides additional information and project links for each paper.
    Keywords: Computer Vision, Autonomous Vehicles, Autonomous Vision
  • 概述:人类见证了最近几年 AI 相关领域的惊人进步,如计算机视觉,机器学习和自动驾驶。
    然而任何一个快速发展的领域,保持领先或刚开始进入这些领域变得越来越难(业内人员难以跟上行业节奏或者业外人员难入行)。
    尽管已经(有人)发表了一些这方面的专题研究文章,但在自动驾驶技术中,计算机视觉的问题、数据和方法至今没有普遍的研究。
    对于这个话题,这篇论文试图通过提供对前沿技术的研究来减少这种缺口。我们的研究包括最相关的历史资料和当前最前沿的技术,包括识别、重建运动估测、追踪场景理解和端到端学习等。
    为完成这个目标,我们首先通过分类学对每一个方法进行分类,然后在一些具有挑战性的基础数据集上,如 KITTI、ISPRS、MOT 和 Cityscapes 上分析每一个方法在前沿技术上的表现
    此外,我们还讨论了一些开放问题和当前研究的挑战,为了轻松访问和适应缺失的参考,我们将提供一个具有主题和方法的驾驶交互平台,并提供额外信息和每篇论文的项目链接
    关键词:计算机视觉,自动驾驶,自主视觉

previous 前言

  • Since the first successful demonstrations in the 1980s (Dick-manns & Mysliwetz (1992); Dickmanns & Graefe (1988); Thorpeet al. (1988)), great progress has been made in the field of autonomous vehicles. |Despite these advances, however, it is safe to believe that fully autonomous navigation in arbitrarily complex environments is still decades away. |The reason for this is two-fold: First, autonomous systems which operate in complex dynamic environments require artificial intelligence which generalizes to unpredictable situations and reasons in a timely manner. |Second, informed decisions require accurate perception, yet most of the existing computer vision systems produce errors at a rate which is not acceptable for autonomous navigation.

  • 从 20 世纪 80 年代首次成功展示以来(Dick-manns & Mysliwetz (1992); Dickmanns & Graefe (1988); Thorpeet al. (1988))(Dick-manns & Mysliwetz (1992); Dickmanns & Graefe (1988); Thorpeet al. (1988)),自动驾驶技术领域已经取得了很大进展
    尽管有了这些进展,但在任意复杂环境中,实现完全自动驾驶仍然被认为需要几十年
    原因有两点:第一,在复杂的、动态的环境中运行的自动驾驶系统需要人工智能来归纳不可预测的情形和原因,给出及时的方法
    第二,信息的决策需要准确的感知,目前大多数已有的计算机视觉系统有一定的错误率,这是自动驾驶技术无法接受的

  • In this paper, we focus on the second aspect which we call autonomous vision and investigate the performance of current perception systems for autonomous vehicles. |Towards this goal, we first provide a taxonomy of problems and classify existing datasets and techniques using this taxonomy, describing the pros and cons of each method. Second, we analyze the current state-of-the-art performance on several popular publicly available benchmarking datasets. |In particular, we provide a novel in-depth qualitative analysis of the KITTI benchmark which shows the easiest and most difficult examples based on the methods submitted to the evaluation server. |Based on this analysis, we discuss open research problems and challenges. To ease navigation, we also provide an interactive online tool which visualizes our taxonomy using a graph and provides additional information and links to project pages in an easily accessible manner. |We hope that our survey will become a useful tool for researchers in the field of autonomous vision and lowers the entry barrier for beginners by providing an exhaustive overview over the field.

  • 在这篇论文中,我们关注第二个方面的问题,也就是自动驾驶视觉,同时调查最近的自动驾驶视觉中感知系统的表现
    为完成这个目标,我们首先给出了问题的分类,归类了已有的数据和可使用的技术,描述每种方法的优缺点。第二,我们在几个流行的公开数据集上分析了最近前沿成果的表现
    特别是我们给出一种 KITTI 基准的新的深入定性分析,这些分析展示了提交给评价服务器的方法中最简单和最困难的例子
    基于这些分析,我们讨论了开放的研究问题和挑战,为了简化学习,我们也给出一个在线交互式工具,用图像可视化了分类,并提供额外信息和一个简单可行的方法与项目页链接
    我们希望我们的研究能够成为自动驾驶领域研究人员的一个有用的工具,并通过透彻的概述,降低新人进入该领域的门槛

  • There exist several other related surveys. Winner et al. (2015) explains in detail systems for active safety and driver assistance, considering both their structure and their function. |Their focus is to cover all aspects of driver assistance systems and the chapter about machine vision covers only the most basic concepts of the autonomous vision problem. |Klette (2015) provide an overview over vision-based driver assistance systems. They describe most aspects of the perception problem at a high level, but do not provide an in-depth review of the state-of-the-art in each task as we pursue in this paper. |Complementary to our survey, Zhu et al. (2017) provide an overview of environment perception for intelligent vehicles, focusing on lane detection, traffic sign/light recognition as well as vehicle tracking. |In contrast, our goal is to bridge the gap between the robotics, intelligent vehicles, photogrammetry and computer vision communities by providing an extensive overview and comparison which includes works from all fields.

  • 目前也有一些其它相关的研究,Winner et al. (2015)详细地解释了主动安全性和驾驶辅助系统,同时考虑了它们的结构和功能
    这些研究注重覆盖辅助驾驶系统的所有方面,但关于机器视觉的章节只覆盖到了自动驾驶技术中最基础的概念。
    Klette (2015)提供了一个基于视觉的辅助驾驶系统的概述,他们描述了高层次感知问题的大部分方面,但并没有像我们在论文中追求的一样,给出在各种前沿任务中比较深入的评测
    Zhu et al. (2017)提出了智能汽车环境感知的概述,聚焦在车道检测,交通信号灯识别和机车追踪问题,这与我们的研究相互补充。
    相比较下,我们的目标是通过提供广泛的概述和比较,包括在这个领域所有的成果,在机器人、智能汽车、摄影测绘和计算机视觉之间建立起一座桥梁

History of Autonomous Driving 自动驾驶技术历史

Autonomous Driving Projects 自动驾驶项目

  • Many governmental institutions worldwide started various projects to explore intelligent transportation systems (ITS). The PROMETHEUS project started 1986 in Europe and involved more than 13 vehicle manufacturers, several research units from governments and universities of 19 European countries. |One of the first projects in the United States was Navlab Thorpe et al. (1988) by the Carnegie Mellon University which achieved a major milestone in 1995, by completing the first autonomous drive from Pittsburgh, PA and Sand Diego, CA. |After many initiatives were launched by universities, research centers and automobile companies, the U.S. government established the National Automated Highway System Consortium (NAHSC) in 1995. |Similar to the U.S., Japan established the Advanced Cruise-Assist Highway System Research Association in 1996 among many automobile industries and research centers to foster research on automatic vehicle guidance. |Bertozzi et al. (2000) survey many approaches to the challenging task of autonomous road following developed during these projects. They concluded that suffcient computing power is become increasingly available, but diffculties like reflections, wet road, direct sunshine, tunnels and shadows still make data interpretation challenging. |Thus, they suggested the enhancement of sensor capabilities. They also pointed out that the legal aspects related to the responsibility and impact of automatic driving on human passengers need to be considered carefully. |In summary, the automation will likely be restricted to special infrastructures and will be extended gradually.

  • 世界各地的许多政府机构启动各式各样的项目来开发智能交通系统(ITS)。PROMETHEUS 这个项目 1986 年在欧洲启动,包括超过 13 个交通工具生产商,当中的许多研究成员来自 19 个欧洲国家的政府和高校。
    美国的其中一个项目就是由卡耐基梅隆大学的 Navlab Thorpe 等人(1988)创建的。这个项目完成了第一次从 Pittsburgh,PA,Sand Diego 和 CA 的自动驾驶,在 1995 年是一个重要的里程碑。
    在许多大学,研究中心和自动驾驶公司的倡议下,美国政府在 1995 年成立了自动化公路系统联盟(NAHSC)。
    和美国一样,日本于 1996 年成立了高级巡航公路系统研究协会(Advanced Cruise-Assist Highway System Research Association),包括各大自动驾驶公司和研究中心,来促进自动驾驶导航的研究。
    Bertozzi 等人(2000)调查了许多具有挑战性的任务(通过这些项目发展的自动道路跟随),给出解决方法。他们得出结论,计算能力逐渐得到满足,但像反射,湿面潮湿,阳光直射,隧道和阴影这样的困难仍然使数据解释具有挑战性。
    因此,他们建议提高传感器性能,同时也指出,关系到自动驾驶对行人法律方面的责任和影响,应该认真的考虑
    总之,自动化技术(发展)可能会受限于特殊的基础设施,然后再慢慢的普及开来。

  • Motivated by the success of the PROMETHEUS projects to drive autonomously on highways, Franke et al. (1998) describe a real-time vision system for autonomous driving in complex urban traffic situations. |While highway scenarios have been studied intensively, urban scenes have not been addressed before. Their system included depth-based obstacle detection and tracking from stereo as well as a framework for monocular detection and recognition of relevant objects such as traffic signs.

  • The fusion of several perception systems developed by Vis-Lab have led to several proto-type vehicles including ARGO Broggi et al. (1999), TerraMax Braid et al. (2006), and BRAiVE Grisleri & Fedriga (2010). |BRAiVE is the latest vehicle proto-type which is now integrating all systems that VisLab has developed so far. Bertozzi et al. (2011) demonstrated the robustness of their system at the VisLab Intercontinental Autonomous Challenge, a semi-autonomous drive from Italy to China. |The onboard system allows to detect obstacles, lane marking, ditches,berms and identify the presence and position of a preceding vehicle. The information produced by the sensing suite is used to perform different tasks such as leader-following and stop & go.

  • PROMETHEUS 项目可以实现在高速公路上自动驾驶,在这个成功的案例推动下,Franke 等人描述了在复杂的城市交通场景下的自动驾驶的实时视觉系统。
    虽然在此之前公路场景情况已经有很多深入的研究,但城市场景却从未得到解决。他们的系统包括基于深度的障碍检测和立体追踪,以及针对相关物体(比如:交通信号)的单目检测和识别框架。

  • Vis-Lab 发展的多种传感系统的融合促成了几款原型车包括 ARGO Broggi(1999),TerraMax Braid(2006)和 BRAiVE Grisleri & Fedriga(2010)的出现
    BRAiVE 是目前 VisLab 开发的整合所有系统的最新车型。 Bertozzi 等人(2011)在 VisLab 洲际自治挑战赛(VisLab Intercontinental Autonomous Challenge,意大利到中国的半自主驾驶)展示了其系统的稳健性(鲁棒性)。
    车载系统允许检测障碍物,标记车道、沟渠、护堤,并识别前方是否存在车辆和车辆位置。感应套件提供的信息被用于执行不同的任务,如(leader-following)和前进/停止。?

  • The PROUD project Broggi et al. (2015) slightly modified the BRAiVE prototype Grisleri & Fedriga (2010) to drive in urban roads and freeways open to regular traffic in Parma. |Towards this goal they enrich an openly licensed map with information about the maneuver to be managed (e.g. pedestrian crossing, traffic light, . . . ). |The vehicle was able to handle complex situations such as roundabouts, intersections, priority roads, stops, tunnels, crosswalks, traffic lights, highways, and urban roads without any human intervention.

  • The V-Charge project Furgale et al. (2013) presents an electric automated car outfitted with close-to-market sensors. A fully operational system is proposed including vision-only localization, mapping, navigation and control. |The project supported many works on different problems such as calibration Heng et al. (2013, 2015), stereo H¨ane et al. (2014), reconstruction Haene et al. (2012, 2013, 2014), SLAM Grimmett et al.(2015) and free space detection H¨ane et al. (2015). In addition to these research objectives, the project keeps a strong focus on deploying and evaluating the system in realistic environments.

  • PROUD 的项目 Broggi(2015)略微修改了 BRAiVE 原型 Grisleri & Fedriga(2010)使得汽车可以在 parma 城市道路和高速公路的常规交通情况下开车。
    为了实现这一目标,他们丰富了一份公开授权的地图,其中包含有待完成的机动信息(比如行人过路,交通信号灯等)。
    该车辆能够在没有人为干涉的情况下处理复杂的场景,例如回旋处,交叉口,优先道路,站点,隧道,人行横道,交通信号灯,高速公路和城市道路。

  • V-Charge 项目 Furgale 等人 (2013 年)提供配备了近距离市场(close-to-market)传感器的电动自动车。提出了一个全面可使用的系统,包括视觉定位,映射,导航和控制。
    该项目解决了诸多困难比如,Heng et al. (2013, 2015)的校准问题, H¨ane(2014)的立体问题,Haene(2012, 2013, 2014)的重建问题, Grimmett(2015)的 SLAM 问题和 H¨ane(2015)的空白区域检测的问题。除了这些研究目标,该项目还非常重视在现实环境中部署和系统评估。

  • Google started their self-driving car project in 2009 and completed over 1,498,000 miles autonomously until March 2016 in Mountain View, CA, Austin, TX and Kirkland, WA. |Different sensors (i.a. cameras, radars, LiDAR, wheel encoder, GPS) allow to detect pedestrians, cyclists, vehicles, road work and more in all directions. |According to their accident reports, Google’s self-driving cars were involved only in 14 collisions while 13 times were caused by others. In 2016, the project was split off to Waymo, an independent self-driving technology company.

  • Tesla Autopilot is an advanced driver assistant system developed by Tesla which was first rolled out(推出) in 2015 with version of their software. The automation level of the system allows full automation but requires the full attention of the driver to take control if necessary. |From October 2016, all vehicles produced by Tesla were equipped with eight cameras, twelve ultrasonic sensors and a forward-facing radar to enable full self-driving capability.

  • Google 于 2009 年开始了自驾车项目,直到 2016 年 3 月完成了超过 1,498,000 英里的驾驶距离,在美国加利福尼亚州奥斯汀市的 Mountain View,WA 和柯克兰。
    不同的传感器(例如摄像机,雷达,LiDAR,车轮编码器,GPS)可以全方位的检测行人,骑自行车的人,车辆,道路工作等等。
    据他们的事故报道,Google 的自动驾驶车只涉及 14 次碰撞,13 次是由别人造成的。 在 2016 年,这个项目分引入到了一家独立的自动驾驶技术公司 Waymo。

  • Tesla Autopilot 是由特斯拉开发的高级驾驶员辅助系统,该系统于 2015 年第一次推出其视觉软件。系统的自动化级别允许完全的自动化,但是仍然需要 要求驾驶员集中注意来控制。
    从 2016 年 10 月起,特斯拉生产的所有车辆配备了 8 台摄像机,12 台超声波传感器和一个前置雷达,以实现全自动驾驶能力。

  • Long Distance Test Demonstrations: In 1995 the team within the PROMETHEUS project Dickmanns et al. (1990); Franke et al. (1994); Dickmanns et al. (1994) performed the first autonomous long-distance drive from Munich, Germany, to Odense, Denmark, at velocities up to 175 km/h with about 95% autonomous driving. |Similarly, in the U.S. Pomerleau & Jochem (1996) drove from Washington DC to San Diego in the ’No hands across America’ tour with 98% automated steering yet manual longitudinal control.

  • In 2014, Ziegler et al. (2014) demonstrated a 103 km ride from Mannheim to Pforzheim Germany, known as Bertha Benz memorial route, in nearly fully autonomous manner. |They present an autonomous vehicle equipped with close-to-production sensor hardware. Object detection and free-space analysis is performed with radar and stereo vision. Monocular vision is used for traffic light detection and object classification. |Two complementary vision algorithms, point feature based and lane marking based, allow precise localization relative to manually annotated digital road maps. They concluded that even thought the drive was successfully completed the overall behavior is far inferior to the performance level of an attentive human driver.

  • 长距离测试演示:1995 年,PROMETHEUS 项目里 Dickmanns(1990)、Franke(1994)、Dickmanns(1994 年)的团队演示了从德国慕尼黑(Munich)到丹麦欧登塞(Odense)进行的第一次自动长途驾驶,速度达 175 公里/小时,其中约 95%为自主驾驶。
    同样,在美国 Pomerleau 和 Jochem(1996)在‘No hands across from America ???’中从华盛顿特区开往圣地亚哥,整个行程中有 98%的自动驾驶和偶尔的手动纵向控制。

  • 2014 年,Zieglar(2014)以近乎完全自动的方式,展示了从曼海姆(Mannheim)到德国普福尔茨海姆(Pforzheim Germany)的 103km 的骑行,也就是众人所熟知的 Bertha Benz 纪念路线。
    他们展示了一种装配有接近生产(close-to-production)的传感器硬件的自动驾驶车辆。由雷达 radar 和立体视觉来进行物体检测和空白区域分析。单目视觉用来检测交通信号灯和目标分类。
    两种互补的算法,基于点特征和基于场景标记,允许相对于手动注释的数字路线图进行精确定位。他们得出结论,甚至认为自动驾驶虽然成功完成了,但是整体行为远远达不到细心的驾驶司机的水平。

  • Recently, Bojarski et al. (2016) drove autonomously 98% of the time from Holmdel to Atlantic Highlands in Monmouth County NJ as well as 10 miles on the Garden State Parkway without intervention. |Towards this goal, a convolutional neural network which predicts vehicle control directly from images is used in the NVIDIA DRIVETM PX self-driving car. The system is discussed in greater detail in Section 11.

  • While all aforementioned performed impressively, the general assumption of precisely annotated road maps as well as prerecorded maps for localization demonstrates that autonomous systems are still far from human capabilities. |Most importantly, robust perception from visual information but also general artificial intelligence are required to reach human level reliability and react safely even in complex innercity situations.

  • 最近,Bojarski(2016)从霍尔姆德尔(Holmdel)到新泽西州蒙茅斯县(Monmouth)的大西洋高原,以及在花园州立大道没有任何干扰的自动行驶了 10 英里,其中 98%是在自动驾驶。
    为了实现这一目标,在 NVIDIA DRIVETM PX 自动驾驶车中使用了一种从图像直接预测车辆控制的卷积神经网络。该系统在第 11 节中有更详细的讨论。

  • 虽然所有上述表现令人印象深刻,但精确注释路线图的一般假设,以及用于定位的预先载入的地图证实了自主性系统仍然差强人意。
    最重要的是,这不仅需要视觉信息的强大的感知,也需要一般的人工智能达到和人一样的可靠性,并且在复杂的城市情况下也能安全地做出反应。

Autonomous Driving Competitions 自动驾驶竞赛

  • The European Land Robot Trial (ELROB) is a demonstration and competition of unmanned systems in realistic scenarios and terrains, focusing mainly on military aspects such as reconnaissance and surveillance, autonomous navigation and convoy transport. In contrast to autonomous driving challenges, ELROB scenarios typically include navigation in rough terrain.
  • The first autonomous driving competition focusing on road scenes (though primarily dirt roads) has been initiated by the American Defense Advanced Research Projects Agency (DARPA) in 2004. The DARPA Grand Challenge 2004 offered a prize money of 1 million for the team first finishing a 150 mile route which crossed the border from California to Nevada. |However, none of the robot vehicles completed the route. One year later, in 2005, DARPA announced a second edition of its challenge with 5 vehicles successfully completing the route (Buehler et al.(2007)). The third competition of the DARPA Grand Challenge, known as the Urban Challenge (Buehler et al. (2009)), took place on November 3, 2007 at the site of the George Air Force Base in California. The challenge involved a 96 km urban area course where traffic regulations had to be obeyed while negotiating with other vehicles and merging into traffic.
  • The Grand Cooperative Driving Challenge (GCDC, see also Geiger et al. (2012a)), a competition focusing on autonomous cooperative driving behavior was held in Helmond, Netherlands in 2011 for the first time and in 2016 for a second edition. During the competition, teams had to negotiate convoys, join convoys and lead convoys. The winner was selected based on a system that assigned points to randomly mixed teams.

  • European Land Robot Trial (ELROB)是现实场景和地形中无人系统的示范与竞赛,主要集中在军事方面,如侦察监视,自动导航和车队运输。与自主驾驶挑战相反,ELROB 场景通常包括崎岖地形的导航。

  • 2004 年,美国国防高级研究计划署(DARPA)发起了第一个专注于道路场景(主要是泥土路)的自动驾驶比赛。挑战赛提供了 100 万美元的奖金给首先完成从加利福尼亚州内华达州过境的 150 英里的路线。
    然而,机器人车辆都没有完成路线。 一年后,DARPA 公布了第二版的挑战,5 辆车顺利完成了路线(Buehler(2007))。DARPA 大挑战赛的第三场比赛,被称为城市挑战赛(Buehler(2009)),于 2007 年 11 月 3 日在乔治航空加利福尼亚州的基地。
    这个挑战涉及到一个 96 公里的城市地区航线,在这段路程中车辆在对其他车辆进行判断并汇合车流时,必须遵守交通法规。
  • 专注于自动合作驾驶行为的大型合作驾驶挑战(GCDC,Geiger et al(2012a))在荷兰赫尔蒙德(Helmond)举行,2011 年首次,2016 年第二次。在比赛中,团队需要判断,加入和引导车队。获胜者是基于给随机混合团队分配点数的系统选出来的。