Skip to content

ahmed-moubtahij/TokenHealer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is token healing?

Token healing rectifies the token boundary bias in greedy tokenization. It does this by trimming and regrowing the prompt to better align with the model's tokenizer, thus enhancing generation quality. The improvement is clearest with completion models.

Example: given a completion prompt with a partial url ending with :, the model might have seen the expected completion :// as a single token in training. However, the prompt's tail token : tells it that the next token is not //, and so it looks for wrong completions. Such errors compound in auto-regressive language models.

A more thorough explanation can be found on The Art of Prompt Design: Prompt Boundaries and Token Healing | by Scott Lundberg.

Installation

The only dependency is transformers. pip install transformers or pip install . should pick it up from pyproject.toml.

Usage

from token_healing import TokenBoundaryHealer

prompt = 'The link is <a href="http:'

output = generate(prompt, completion_model, tokenizer)
# The link is <a href="http:&#47;&#47;www&#47;dailymail&#

# The model saw '://' as a single token in training. Seeing a prompt ending with `:` tells it that the
# next token is likely not `//`, because otherwise it would've seen `://`.
# Thus, it completes with a token other than `//`, in this case, `&`.

token_healer = TokenBoundaryHealer(completion_model, tokenizer)
healed_prompt = token_healer(prompt)
# The link is <a href="http://
healed_output = generate(healed_prompt, completion_model, tokenizer)
# The link is <a href="http://www.365doki.com/post/3699

See example.py for the full example.

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Contact

@ahmed_moubtahij

(back to top)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages