PPLM

Controlling text-generating language models without any training


Plug and Play Language Models (PPLMs) are a way to steer large language models like GPT-2 with attribute models 100000 times smaller, making critical progress toward controllable language generation.

Other approaches finetune the original language model, but that is really expensive given the number of parameters of such models, and needs to be done for each specific attribute we want the generated text to have. Moreover, one needs to have specific data to finetune the model on, which may be non trivial for certain types of attirbutes.

A better approach is conditional language generation, where a control code is presented to the model, and the model generates text conditioned on it. CTRL is an example of this approach. Still, one needs to train such huge model from scratch, which is slow and expensive.

In PPLM, we do not retrain the original language generation model (the mammuth in the pictures), we just plug an additional small model (the mouse in the picture), that steers it, modifying at each step of the generation the latent representations so that the probability of the attribute given the text increases. The attribute models con either be bags of words that represent a topic or they could be little models trained with a dataset. In our experiments we tested moth topical bags of words like science, military, politics, etc. and with attribute models for sentiment, toxicity and clickbaitiness.

A paper describing PPLMs was accepted at ICLR 2020 (here is the 5 minute video poster), and we also wrote a blogpost about it in very readable format with only minimal use of math.

We released the code in two ways: as a crystalized version for reproducibility purposes on the Uber Research repository together with an example Colab, and as a contribution to the amazing Hugging Face's Transformers repository. Our friends from Hugging Face also built an amazing demo to play around with.

News outlets talked about it, in particular VentureBeat's article, there was buzz in the chinese community, the blog post got toe the front page of HackerNews with this and this entry and reddit.com r/ML, PapersWithCode.com,random meetup in Toronto, InfoQ's article ,and Kyunghyun Cho discussed in his NeurIPS 2019 tutorial.

Additioanl press coverage:


Collaborators

Sumanth Dathathri

Rosanne Liu

Andrea Madotto

Jason Yosinski

Janice Lan

Jane Hung

Eric Frank