The challenge will have 2 benchmarks, details of which can be seen bellow:

  1. Segmentation
  2. Instance Segmentation
The latest version of the evaluation code will be available at:

1. Segmentation

The segmentation benchmark involves pixel level predictions for all the 26 classes at level 3 of the label hierarchy (see Overview, for details of the level 3 ids).

Output Format

The output format is a png image with the same resolution as the input image, where the value of every pixel is an integer in {1. .... , 27}, where the first 26 classes corresponds to the level3Ids (see Overview, for details of the level 3 ids) and the class 27 is used as a miscellaneous class.


We will be using the mean Intersection over Union metric. All the ground truth and predictions maps will be resized to 1080p (using nearest neighbor) and True positives (TP), False Negatives (FN) and False positives (FP) will be computed for each class (except 27) over the entire test split of the dataset. Intersection over Union (IoU) will be computed for each class by the formula TP/(TP+FN+FP) and the mean value is taken as the metric (commonly known as mIoU) for the segmentation challenge.

Additionally we will also be reporting the mIoU for level 2 and level 1 ids also at 720p resolution in the leader board.

Leader Board

Team Name Model Name mIoU at level 3Id and 1080p
Mapillary Research - 74.32%
BDAI - 74.12%
Vinda - 74.07%
Geelpen - 73.76%
HUST_IALab - 73.39%
DeepScene - 71.11%
Team7 - 67.94%
Baseline* DRN-D-38 [3] 66.56%
SGGS - 65.96%
TUEMPS - 64.13%
Great wall of motors - 63.76%
Baseline* ERFNet [2] 55.41%
* Baseline was run by the organizers using the code released by the authors (ERFNet [2] here: and (DRN [3] here:

2. Instance Segmentation

In the instance segmentation benchmark, the model is expected to segment each instance of a class separately. Instance segments are only expected of "things" classes which are all level3Ids under living things and vehicles (ie. level3Ids 4-12).

Output Format and Metric

The output format and metric is the same as Cityscapes instance segmentation [1].

The predictions should use "id" specified in : , unlike the semantic segmentation challenge where level3Ids were used.

Leader Board

Team Name AP AP 50%
TUTU 39.2% 67.5%
Poly 26.8% 49.9%
Dynamove_IITM 18.6% 38.7%
DV 10.4% 20%


  1. The Cityscapes Dataset for Semantic Urban Scene Understanding.
    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele.
    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213-3223

  2. ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation.
    E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo,
    Transactions on Intelligent Transportation Systems (T-ITS), December 2017.
  3. Dilated Residual Networks.
    Fisher Yu and Vladlen Koltun and Thomas Funkhouser
    Computer Vision and Pattern Recognition (CVPR) 2017.