Datasets

Research datasets for computer vision, human action recognition, and geo-localization. All data is only for research purposes, unless stated differently. Please make sure to reference the authors properly when using the data.

4
Total Datasets
1
Video Datasets
2
Geo-localization
1
Multimodal

Research Datasets

Video Anomaly Detection Dataset
video
Video Anomaly Detection Dataset

UCF-Crime dataset is a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies including Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism.

These anomalies are selected because they have a significant impact on public safety. This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities.

Video Anomaly Detection Dataset

Waqas Sultani, Chen Chen, Mubarak Shah

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Download Options

Primary Download:

www.crcv.ucf.edu/data1/chenchen/UCF_Crimes.zip

Alternative Download (Dropbox):

Note: The "Anomaly_Train.txt" file in the zip file is corrupted, please download it here: Anomaly_Train.txt
VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
geolocalization
VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval

Cross-view image geo-localization aims to determine the locations of street-view query images by matching with GPS-tagged reference images from aerial view. Recent works have achieved surprisingly high retrieval accuracy on city-scale datasets.

However, these results rely on the assumption that there exists a reference image exactly centered at the location of any query image, which is not applicable for practical scenarios. In this paper, we redefine this problem with a more realistic assumption that the query image can be arbitrary in the area of interest and the reference images are captured before the queries emerge. This assumption breaks the one-to-one retrieval setting of existing datasets as the queries and reference images are not perfectly aligned pairs, and there may be multiple reference images covering one query location. To bridge the gap between this realistic setting and existing datasets, we propose a new large-scale benchmark –VIGOR– for crossView Image Geo-localization beyond One-to-one Retrieval.

VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval

Sijie Zhu, Taojiannan Yang, Chen Chen

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Cross-View Geolocalization Dataset
geolocalization
Cross-View Geolocalization Dataset

UCF cross-view geolocalization dataset is created for the geo-localization task using cross-view image matching. The dataset has street view and bird's eye view image pairs around downtown Pittsburgh, Orlando and part of Manhattan.

There are 1,586, 1,324 and 5,941 GPS locations in Pittsburgh, Orlando and Manhattan, respectively. We utilize DualMaps to generate side-by-side street view and bird's eye view images at each GPS location with the same heading direction. The street view images are from Google and the overhead 45 degree bird's eye view images are from Bing. For each GPS location, four image pairs are generated with camera heading directions of 0 degree, 90 degree, 180 degree and 270 degree. In order to learn the deep network for building matching, we annotate corresponding buildings in every street view and bird's eye view image pair.

Cross-View Geolocalization Dataset

Yicong Tian, Chen Chen, Mubarak Shah

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

UTD-MHAD Dataset
multimodal
UTD-MHAD Dataset

UTD-MHAD dataset was collected as part of our research on human action recognition using fusion of depth and inertial sensor data. The objective of this research has been to develop algorithms for more robust human action recognition using fusion of data from differing modality sensors.

The UTD-MHAD dataset consists of 27 different actions: (1) right arm swipe to the left, (2) right arm swipe to the right, (3) right hand wave, (4) two hand front clap, (5) right arm throw, (6) cross arms in the chest, (7) basketball shoot, (8) right hand draw x, (9) right hand draw circle (clockwise), (10) right hand draw circle (counter clockwise), (11) draw triangle, (12) bowling (right hand), (13) front boxing, (14) baseball swing from right, (15) tennis right hand forehand swing, (16) arm curl (two arms), (17) tennis serve, (18) two hand push, (19) right hand knock on door, (20) right hand catch an object, (21) right hand pick up and throw, (22) jogging in place, (23) walking in place, (24) sit to stand, (25) stand to sit, (26) forward lunge (left foot forward), (27) squat (two arms stretch out).

UTD-MHAD Dataset

Chen Chen, Roozbeh Jafari, Nasser Kehtarnavaz

IEEE International Conference on Image Processing (ICIP), 2015

Important Notice

All datasets provided here are intended for research purposes only. Please ensure proper citation of the original authors and papers when using these datasets in your research. Commercial use may require separate licensing agreements.