Zero-Shot In-Context Learning Task

A Zero-Shot In-Context Learning Task is an in-context learning task where a pretrained model is expected to make accurate predictions about data from classes that it has not encountered during training.

Context:
- Optional Inputs: Auxiliary Information that encodes observable distinguishing properties of objects.
- Optional Inputs: Semantic Spaces that represent the high-level feature space in which the model makes predictions.
- It can be solved by a Zero-Shot Learning System (that implements a zero-shot learning algorithm).
- It can range from being a Zero-Shot Computer Vision Learning Task, to being a Zero-Shot Natural Language Processing Task, to being a Zero-Shot Sound Processing Task.
- It can range from being a Zero-Shot Classification Task, to being a Zero-Shot Regression Task, to being a Zero-Shot Ordering Task.
- ...
Example(s):
- a Zero-Shot Benchmark Task.
- a Zero-Shot Reasoning Task.
- a Zero-Shot NLP Task, such as: Zero-Shot Document Classification or Zero-Shot Information Extraction.
- a Zero-Shot Image Processing Task, such as an zero-shot animal identification task (that has never seen a zebra).
- a Benchmark Zero-Shot Learning Task.
- …
Counter-Example(s):
- Few-Shot Learning: Where one or more examples are provided.
- Transductive Learning: This learning strategy involves simultaneously making predictions on the entire test set, leveraging the test samples.
- Supervised Learning: This requires the model to have been trained with labeled examples of all classes it will encounter.
- Unsupervised Learning: This involves the model discovering patterns in the data without any labels or minimal supervision.
See: Semantic Space, Auxiliary Information, Computer Vision, Natural Language Processing.

References

2022

(Wikipedia, 2022) ⇒ https://en.wikipedia.org/wiki/zero-shot_learning Retrieved:2022-12-8.
- Zero-shot learning (ZSL) is a problem setup in machine learning, where at test time, a learner observes samples from classes which were not observed during training, and needs to predict the class that they belong to. Zero-shot methods generally work by associating observed and non-observed classes through some form of auxiliary information, which encodes observable distinguishing properties of objects. For example, given a set of images of animals to be classified, along with auxiliary textual descriptions of what animals look like, an artificial intelligence model which has been trained to recognize horses, but has never been given a zebra, can still recognize a zebra when it also knows that zebras look like striped horses. This problem is widely studied in computer vision, natural language processing, and machine perception.

2020

(Brown et al., 2020) ⇒ Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, et al. (2020). “Language Models Are Few-Shot Learners.” In: Advances in Neural Information Processing Systems 33 (NeurIPS 2020).
- QUOTE: ... There are many approaches to building multi-task models. Giving task instructions in natural language was first formalized in a supervised setting with [ MKXS18 ] and used in [ RWC+19 ] for in-context learning and in [ RSR+19 ] for multi-task fine-tuning. ...
- Figure 2.1: Zero-shot, one-shot and few-shot, contrasted with traditional fine-tuning. The panels above show four methods for performing a task with a language model – fine-tuning is the traditional method, whereas zero-, one-, and few-shot, which we study in this work, require the model to perform the task with only forward passes at test time. We typically present the model with a few dozen examples in the few shot setting. Exact phrasings for all task descriptions, examples and prompts can be found in Appendix G.

2019

(Wang, Zheng et al., 2019) ⇒ Wei Wang, Vincent W. Zheng, Han Yu, and Chunyan Miao. (2019). “A Survey of Zero-shot Learning: Settings, Methods, and Applications.” In: ACM Transactions on Intelligent Systems and Technology (TIST), 10(2).
- ABSTRACT: Most machine-learning methods focus on classifying instances whose classes have already been seen in training. In practice, many applications require classifying instances whose classes have not been seen previously. Zero-shot learning is a powerful and promising learning paradigm, in which the classes covered by training instances and the classes we aim to classify are disjoint. In this paper, we provide a comprehensive survey of zero-shot learning. First of all, we provide an overview of zero-shot learning. According to the data utilized in model optimization, we classify zero-shot learning into three learning settings. Second, we describe different semantic spaces adopted in existing zero-shot learning works. Third, we categorize existing zero-shot learning methods and introduce representative methods under each category. Fourth, we discuss different applications of zero-shot learning. Finally, we highlight promising future research directions of zero-shot learning.
- ... In zero-shot learning, the goal is to learn the zero-shot classifier fu(⋅). During model learning, if information about the testing instances is involved, the learned model is transductive for these specific testing instances. In zero-shot learning, this transduction can be embodied in two progressive degrees: transductive for specific unseen classes and transductive for specific testing instances. This is different from the well-known transductive setting in semisupervised learning, which is just for the testing instances. In the setting that is transductive for specific unseen classes, information about the unseen classes is involved in model learning, and the model is optimized for these specific unseen classes. In the setting that is transductive for specific testing instances, the transductive degree goes further. The testing instances are also involved in model learning, and the model is optimized for these specific testing instances. Based on the degree of transduction, we categorize zero-shot learning into three learning settings.
  - Definition 1.2 (Class-Inductive Instance-Inductive (CIII) Setting). Only labeled training instances Dtr

and seen class prototypes Ts are used in model learning.

- - Definition 1.3 (Class-Transductive Instance-Inductive (CTII) Setting). Labeled training instances Dtr, seen class prototypes Ts, and unseen class prototypes Tu are used in model learning.
  - Definition 1.4 (Class-Transductive Instance-Transductive (CTIT) Setting). Labeled training instances Dtr, seen class prototypes Ts, unlabeled testing instances Xte, and unseen class prototypes Tu are used in model learning. ...
- ... We organize the existing works on zero-shot learning from three perspectives: (1) semantic spaces, which contain the semantic information that is important for zero-shot learning; (2) methods, which are different methods for solving zero-shot learning problems under different learning settings; and (3) applications, the application areas in which zero-shot learning is used. ...

2017

(Xian et al., 2017) ⇒ Yongqin Xian, Bernt Schiele, and Zeynep Akata. (2017). “Zero-shot Learning-the Good, the Bad and the Ugly.” In: Proceedings of the IEEE conference on computer vision and pattern recognition.
- ABSTRACT: Due to the importance of zero-shot learning, the number of proposed approaches has increased steadily recently. We argue that it is time to take a step back and to analyze the status quo of the area. The purpose of this paper is three-fold. First, given the fact that there is no agreed upon zero-shot learning benchmark, we first define a new benchmark by unifying both the evaluation protocols and data splits. This is an important contribution as published results are often not comparable and sometimes even flawed due to, e.g. pre-training on zero-shot test classes. Second, we compare and analyze a significant number of the state-of-the-art methods in depth, both in the classic zero-shot setting but also in the more realistic generalized zero-shot setting. Finally, we discuss limitations of the current status of the area which can be taken as a basis for advancing it.

2015

(Paredes & Torr, 2015) ⇒ Bernardino Romera-Paredes, and Philip Torr. (2015). “An Embarrassingly Simple Approach to Zero-shot Learning.” In: International Conference on Machine Learning.
- ABSTRACT: Zero-shot learning consists in learning how to recognize new concepts by just having a description of them. Many sophisticated approaches have been proposed to address the challenges this problem comprises. In this paper we describe a zero-shot learning approach that can be implemented in just one line of code, yet it is able to outperform state of the art approaches on standard datasets. The approach is based on a more general framework which models the relationships between features, attributes, and classes as a two linear layers network, where the weights of the top layer are not learned but are given by the environment. We further provide a learning bound on the generalization error of this kind of approaches, by casting them as domain adaptation methods. In experiments carried out on three standard real datasets, we found that our approach is able to perform significantly better than the state of art on all of them, obtaining a ratio of improvement up to 17%.

2008

(Chang et al., 2008) ⇒ Ming-Wei Chang, Lev-Arie Ratinov, Dan Roth, and Vivek Srikumar. (2008). “Importance of Semantic Representation: Dataless Classification.” In: AAAI, 2.