A Text-to-Video Generation Model is a text-to-* generation model that can support a video generation task.