LeakGAN Model

A LeakGAN is a Generative Adversarial Network that leaks information from a feature extraction discriminator and a FuN-based generator to produce long coherent and semantically meaningful text.

AKA: Long Text Generation Via Adversarial Training With Leaked Information (LeakGAN).
Context:
- It was first introduced by Guo et al. (2018).
- It can be trained using a LeakGAN Training Sytem.
- It based on a discriminator-generator neural network architecture containing the following modules:
  - a generator module based on manager-worker network composed by:
    - a generator-worker module: a LSTM that generates an action embedding vector;
    - a generator-manager module: a LSTM that generates a feature sub-goal vector that is later transformed to action sub-goal through linear projection.
  - a discriminator network: a CNN that extracts leaked information using a feature extractor and softmax classification.
Example(s):
- .
Counter-Example(s):
- MaskGAN,
- RankGAN,
- SeqGAN,
- SQuAD,
- Texygen.
See: FeUdal neural network, Text Generation System, Natural Language Generation System, Natural Language Understanding System, Language Model, Generative Adversarial Network, Leaked Information, Reinforcement Learning Neural Network, FuN Manager Module, FuN Worker Module, GAN Discriminator Module, GAN Generator Module, Hierarchical Neural Network.

References

2018

(Guo et al., 2018) ⇒ Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. (2018). “Long Text Generation via Adversarial Training with Leaked Information.” In: Proceedings of the Thirty-Second (AAAI) Conference on Artificial Intelligence (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th (AAAI) Symposium on Educational Advances in Artificial Intelligence (EAAI-18).
- QUOTE: As illustrated in Figure 1, we specifically introduce a hierarchical generator $G$, which consists of a high-level MANAGER module and a low-level WORKER module. The MANAGER is a long short-term memory network (LSTM) (Hochreiter and Schmidhuber 1997) and serves as a mediator. In each step, it receives generator $D$’s high-level feature representation, e.g., the feature map of the CNN, and uses it to form the guiding goal for the WORKER module in that timestep. As the information from $D$ is internally-maintained and in an adversarial game it is not supposed to provide $G$ with such information. We thus call it a leakage of information from $D$.

**Figure 1:** An overview of our LeakGAN text generation framework. While the generator is responsible to generate the next word, the discriminator adversarially judges the generated sentence once it is complete. The chief novelty lies in that, unlike conventional adversarial training, during the process, the discriminator reveals its internal state (feature $f_t$) in order to guide the generator more informatively and frequently. (See Methodology Section for more details.)

{|style="border: 0px; text-align:center; border-spacing: 1px; margin: 1em auto; width: 80%"

|- |$f =\mathcal{F}\left(s ; \phi_{f}\right)$ |style="width:5%;text-align:right"|(1) |- |$D_{\phi}(s) =\operatorname{sigmoid}\left(\phi_{l} \cdot \mathcal{F}\left(s ; \phi_{f}\right)\right)=\operatorname{sigmoid}\left(\phi_{l}, f\right)$ |style="width:5%;text-align:right"|(2) |- |+ align="bottom" style="caption-side:top;text-align:center;font-weight:bold"|Discriminator

|}

{|style="border: 0px; text-align:center; border-spacing: 1px; margin: 1em auto; width: 80%" |- |$\hat{g}_{t}, h_{t}^{M} =\mathcal{M}\left(f_{t}, h_{t-1}^{M} ; \theta_{m}\right) $ |style="width:5%;text-align:right"|(3) |- |$g_{t} =\hat{g}_{t} /\left\|\hat{g}_{t}\right\|$ |style="width:5%;text-align:right"|(4) |- |$w_{t}=\psi\left(\sum_{i=1}^{c} g_{t-i}\right)=W_{\psi}\left(\sum_{i=1}^{c} g_{t-i}\right)$ |style="width:5%;text-align:right"|(5) |- |$O_{t}, h_{t}^{W}= \mathcal{W}\left(x_{t}, h_{t-1}^{W} ; \theta_{w}\right)$ |style="width:5%;text-align:right"|(6) |- |$G_{\theta}\left(\cdot \mid s_{t}\right)= \operatorname{sigmoid}\left(O_{t} \cdot w_{t} / \alpha\right)$ |style="width:5%;text-align:right"|(7) |- |$Q\left(f_{t}, g_{t}\right)=\mathbb{E}\left[r_{t}\right]$ |style="width:5%;text-align:right"|(8) |- |$\nabla_{\theta_{m}}^{\mathrm{adv}} g_{t}=-Q\left(f_{t}, g_{t}\right) \nabla_{\theta_{m}} d_{\cos }\left(\mathcal{F}\left(s_{t+c}\right)-\mathcal{F}\left(s_{t}\right), g_{t}\left(\theta_{m}\right)\right)$ |style="width:5%;text-align:right"|(9) |- |+ align="bottom" style="caption-side:top;text-align:center;font-weight:bold"|MANAGER of Generator |}

{|style="border: 0px; text-align:center; border-spacing: 1px; margin: 1em auto; width: 80%" |- |$\nabla_{\theta_{w}} \mathbb{E}_{s_{t-1} \sim G}\left[\sum_{x_{t}} r_{t}^{I} \mathcal{W}\left(x_{t} \mid s_{t-1} ; \theta_{w}\right)\right] =\mathbb{E}_{s_{t-1} \sim G, x_{t} \sim \mathcal{W}\left(x_{t} \mid s_{t-1}\right)}\left[r_{t}^{I} \nabla_{\theta_{w}} \log \mathcal{W}\left(x_{t} \mid s_{t-1} ; \theta_{w}\right)\right] $ |style="width:5%;text-align:right"|(10) |- |$r_{t}^{I}=\frac{1}{c} \sum_{i=1}^{c} d_{\cos }\left(\mathcal{F}\left(s_{t}\right)-\mathcal{F}\left(s_{t-i}\right), g_{t-i}\right)$ |style="width:5%;text-align:right"|(11) |- |+ align="bottom" style="caption-side:top;text-align:center;font-weight:bold"|WORKER of Generator |}

2017

(Vezhnevets et al., 2017) ⇒ Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. (2017). “FeUdal Networks for Hierarchical Reinforcement Learning.” In: Proceedings of the 34th International Conference on Machine Learning (ICML2017).
- QUOTE: What is FuN? FuN is a modular neural-network consisting of two modules – the Worker and the Manager. The Manager internally computes a latent state representation $s_t$ and outputs a goal vector $g_t$. The Worker produces actions conditioned on external observation, its own state, and the Managers goal. The Manager and the Worker share a perceptual module which takes an observation from the environment $x_t$ and computes a shared intermediate representation $z_t$. The Manager's goals $g_t$ are trained using an approximate transition policy gradient. This is a particularly efficient form of policy gradient training that exploits the knowledge that the Worker's behaviour will ultimately align with the goal directions it has been set. The Worker is then trained via intrinsic reward to produce actions that cause these goal directions to be achieved. Figure 1a illustrates the overall design and the following equations describe the forward dynamics of our network:

$z_{t}=f^{\text {percept }}\left(x_{t}\right) ; s_{t}=f^{\text {Mspace}}\left(z_{t}\right)$	(1)
$h_{t}^{M}, \hat{g}_{t}=f^{M r n n}\left(s_{t}, h_{t-1}^{M}\right) ; g_{t}=\dfrac{\hat{g}_{t}}{\parallel\hat{g}_{t}\parallel}$	(2)
$w_{t}=\phi\left(\sum_{i=t-c}^{t} g_{i}\right) $	(3)
$h^{W}, U_{t}=f^{W r n n}\left(z_{t}, h_{t-1}^{W}\right) ; \pi_{t}=\operatorname{SoftMax}\left(U_{t} w_{t}\right)$	(4)

**Figure 1**. The schematic illustration of FuN.