Skip to content

Latest commit

 

History

History
48 lines (31 loc) · 2.9 KB

1910.03771.md

File metadata and controls

48 lines (31 loc) · 2.9 KB

HuggingFace's Transformers: State-of-the-art Natural Language Processing, Wolf et al., 2019

Paper, Website, Tags: #nlp, #frameworks

With Transformers, a paradigm shift came in NLP with the starting point for training a model on a downstream task moving from a blank specific model to a general-purpose pretrained architecture. But creating these general-purpose architectures is expensive and time-consuming process.

We present HuggingFace's Transformers library, making these developments available to the community by gathering state-of-the-art general-purpose pretrained models under a unified API together with an ecosystem of libraries, examples, tutorials and scripts targeting many downstream tasks.

The API follows a classic NLP pipeline: setting up configuration, processing data with a tokenizer and encoder, and using a model for either training or inference.

Library design

Steps: defining the model architecture, processing the text data and training the model and performing inference in production.

1. Configuration

  • A configuration class instance stores the model and tokenizer parameters (vocabulary size, hidden dimensions, dropout rate, etc). Configurations are agnostic to the deep learning framework used
  • Tokenizers: A tokenizer class is available for each model, it stores the vocabulary token-to-index map for the corresponding model and handles the encoding and decoding of input sequences according to the model's tokenization-specific process.
  • Model: base class implementing the model's computational graph from encoding through the series of self-attention layers and up to the last hidden states.
  • Auto classes: a set of Auto classes provides a unified API that enables very fast switching between different models/configs/tokenizers.

2. Training

  • Optimizer: we provide a few optimization utilities as subclasses of Pytorch 'torch.optim.Optimizer'
  • Scheduler: additional learning rate schedulers are also provided as subclasses of Pytorch 'torch.optim.lr_scheduler.LambdaLR'

Experiments with Transformers

  • Language understanding benchmarks
  • Language model fine-tuning
  • Ecosystems
    • Write with Transformer: an interactive interface that leverages the generative capabilities of pretrained architectures like GPT, GPT2 and XLNet to suggest text like an auto-completion plugin.
    • Conversational AI
    • Using in production: all the models in the library are compatible with TorchScript, an intermediate representation of a Pytorch model that can be run in Python or C++

Architectures

List of architectures for which reference implementations and pretrained weights are currently provided by Transformers. These are generative models (GPT, GPT-2, Transformer-XL, XLNet, XLM) and language understanding (BERT, DistilBERT, RoBERTa, XLM)

  • BERT
  • RoBERTa
  • DistilBERT
  • GPT
  • Transformer-XL
  • XLNet
  • XLM