YouTube-8M Dataset
(Redirected from YouTube-8M)
Jump to navigation
Jump to search
A YouTube-8M Dataset is an large-scale labeled video dataset that contains YouTube videos.
- Context:
- It can support a Youtube8M Video Understanding Challenge.
- It can be a Image Classification Dataset (benchmark).
- …
- Counter-Example(s):
- an ImageNet Dataset.
- See: Image Recognition.
References
2019
- https://github.com/google/youtube-8m
- QUOTE: ... This repo contains starter code for training and evaluating machine learning models over the YouTube-8M dataset. This is the starter code for our 3rd Youtube8M Video Understanding Challenge on Kaggle and part of the International Conference on Computer Vision (ICCV) 2019 selected workshop session. The code gives an end-to-end working example for reading the dataset, training a TensorFlow model, and evaluating the performance of the model. ...
2017
- https://research.google.com/youtube8m/
- QUOTE: YouTube-8M is a large-scale labeled video dataset that consists of millions of YouTube video IDs, with high-quality machine-generated annotations from a diverse vocabulary of 3,800+ visual entities. It comes with precomputed audio-visual features from billions of frames and audio segments, designed to fit on a single hard disk. This makes it possible to train a strong baseline model on this dataset in less than a day on a single GPU! At the same time, the dataset's scale and diversity can enable deep exploration of complex audio-visual models that can take weeks to train even in a distributed fashion.
Our goal is to accelerate research on large-scale video understanding, representation learning, noisy data modeling, transfer learning, and domain adaptation approaches for video. More details about the dataset and initial experiments can be found in our technical report and in last year's workshop. Some statistics from the latest version of the dataset are included below.
- QUOTE: YouTube-8M is a large-scale labeled video dataset that consists of millions of YouTube video IDs, with high-quality machine-generated annotations from a diverse vocabulary of 3,800+ visual entities. It comes with precomputed audio-visual features from billions of frames and audio segments, designed to fit on a single hard disk. This makes it possible to train a strong baseline model on this dataset in less than a day on a single GPU! At the same time, the dataset's scale and diversity can enable deep exploration of complex audio-visual models that can take weeks to train even in a distributed fashion.
2016
- (Abu-El-Haija et al., 2016) ⇒ Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. (2016). “YouTube-8m: A Large-scale Video Classification Benchmark.” In: arXiv preprint arXiv:1609.08675.