🔥 Burn Fat Fast. Discover How! 💪

Auto-generate summaries from Google Docs Google Docs now autom | Big Data Science

Auto-generate summaries from Google Docs
Google Docs now automatically generate summaries of their content. summaries of content when available. While all users can add summaries, auto-generated suggestions are currently only available to Google Workspace business customers.
This is achieved through natural language understanding (NLU) and natural language generation (NLG) ML models, especially Transformer and Pegasus. A popular technique for combining NLU and NLG is to train a machine learning model using sequence-to-sequence learning, where the input is the words of the document and the output is the final words. The neural network then learns to map input tokens to output tokens. Early applications of the sequence-to-sequence paradigm used recurrent neural networks (RNNs) for both the encoder and decoder.
The introduction of Transformers has provided a promising alternative to RNNs due to internal attention for better modeling of long input and output dependencies, which is critical when summarizing documents. However, these models require large amounts of manually labeled data for sufficient training, so the appearance of Transformers alone was not enough to make significant progress in the field of document summarization.
The combination of Transformers with self-supervised preconditioning (BERT, GPT, T5) has led to major breakthroughs in many NLU problems for which limited labeled data is available. In self-supervised pre-learning, the model uses large amounts of unlabeled text to learn general language understanding and generation capabilities. Then, in a subsequent fine-tuning step, the model learns to apply these abilities to a specific task, such as debriefing or answering questions.
Pegasus' work takes this idea one step further by introducing a pre-workout goal tailored to abstract generalization. In Pegasus pre-training, also called Sentence Gap Prediction (GSP), full sentences from untagged news articles and web documents are masked from the input and a model is required to reconstruct them based on the remaining untagged sentences. In particular, GSP attempts to mask sentences that are considered important to the document with various heuristics to make pre-training as close to a debriefing task as possible. Pegasus has achieved state-of-the-art results on a diverse set of summation datasets.
Taking advantage of Transformer and Pegasus, the Google AI researchers carefully cleaned and filtered the fine-tuning data to contain training examples that were more consistent and presented a coherent definition of the summary text. Despite the reduction in the amount of training data, this resulted in a better model. Then the problem of maintaining a high-quality model in production was solved. Although the Transformer version of the encoder-decoder architecture is the dominant approach to model training for sequential sequence transformation problems such as abstract summation, it can be inefficient and impractical for use in real world applications. The main inefficiency is associated with the Transformer decoder, where the output summary token is generated sequentially through autoregressive decoding. The decoding process becomes noticeably slower as summaries get longer as the decoder processes all previously generated tokens at each step. RNNs are a more efficient architecture for decoding, since there is no internal attention when using the previous tokens, as in the Transformer model.
After transferring knowledge from a large model to a more efficient smaller model to transform the Pegasus model into a hybrid architecture of the Transformer encoder and RNN decoder, the number of layers of the RNN decoder was reduced to improve efficiency. The resulting model has improved delays and memory, while maintaining the original quality.
https://ai.googleblog.com/2022/03/auto-generated-summaries-in-google-docs.html