ByT5: Towards a token-free future with pre-trained byte-to-b | Data Science by ODS.ai 🦜

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Pre-trained language models usually operate on the sequences of tokens, which are based on words or subword units.

Token-free models operate directly on the raw text (characters or bytes) instead. They can work with any language, are more robust to the noise, and don’t require preprocessing.

The authors use a modified mT5 architecture and show that their approach is competitive with token-level models.

Paper: https://arxiv.org/abs/2105.13626
Code: https://github.com/google-research/byt5

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-byt5

#nlp #deeplearning #transformer #pretraining

Data Science by ODS.ai 🦜

👪 51.70K
Technologies

First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of f...

Join
▲ Vote (1)

​​ByT5: Towards a token-free future with pre-trained byte-to-b | Data Science by ODS.ai 🦜

Login

ByT5: Towards a token-free future with pre-trained byte-to-b | Data Science by ODS.ai 🦜