How has Google's new natural language processing model "ALBERT" evolved?
A natural language processing model announced by GoogleBidirectional Encoder Representations from Transformers (BERT)Can pre-learn context understanding and sentiment analysis from vast amounts of existing text data, eliminating the need to train a natural language processing model from scratch and using it with knowledge of the language in advance. In September 2019, BERT was made lighter and faster.ALBERTHas been published by Google.
Google AI Blog: ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations
Google,International Conference on Learning Representations (ICLR) 2020Is a BERT upgradeLaunch of ALBERTDid. Google researcher Radu Solikat said that ALBERTStanford Question Answering DatasetWhenRACEHe says he has excellent performance in 12 natural language processing tasks including benchmarks.
BERT and ALBERT learn both context-independent and context-dependent expressions for a single word. For example, the word "bank" classifies different expressions depending on the context, such as the meaning of the word itself, the expression "bank" in the context of financial transactions, and the expression "dyke" in the context of rivers. This improves learning accuracy.
Improving the learning accuracy leads to an increase in the capacity of the model itself. “ Although the capacity of the model increases, the performance improves, but the time required for pre-learning and the unexpected increase in the model capacity increase, '' said Jeng John Lan, a Google researcher like Solicut. It is more likely that bugs will occur and the model capacity cannot be increased. "
ALBERT optimizes performance by designing the model capacity to be properly allocated to the parameterized training data. ALBERT reduces data capacity by as much as 80% at the expense of a slight performance loss from BERT by using low-dimensional input parameters for context-independent words and high-dimensional input levels similar to BERT for parameters for context understanding. Has succeeded.
According to Solicut, ALBERT adds another important design. BERT ・XLNet・RoBERTaNatural language processing models such asstackIt relies on multiple independent layers of the structure, but provides redundancy by performing the same operations on different layers. ALBERT optimizes processing by sharing parameters between different layers. By adopting this design, the accuracy of language processing has been slightly reduced, but the model capacity has been further reduced and the processing speed has been increased.
By implementing two design changes, ALBERT achieves 89% parameter reduction and improved learning speed compared to BERT. The reduced parameter size allows more memory, allowing more pre-learning and consequently significantly improving performance.
The research team of Solicut and his colleaguesRACE datasetIs evaluated by a reading test using. The scores of the reading comprehension tests of each natural language model are as follows, Gated AR, which is a model that pre-learns only word expressions that do not depend on context, is “ 45.9 '', BERT that performed context-dependent language learning is “ 72.0 '' , XLNet and RoBERTa developed after BERT were "81.8" and "83.2", respectively, and ALBERT was "89.4". The higher the score, the better the model gave the answer.
"Albert's high scores in reading comprehension tests show the importance of natural language processing models in producing more compelling contextual representations. By focusing on improving design, Solicut said, Both model efficiency and performance for language processing tasks can be greatly improved. "