Similarly, we will likely see image and speech models that can perform many common tasks in a single model. This is already the case in language where models can perform many tasks by framing them in a common text-to-text format. At the same time, we should expect individual models to perform more tasks at the same time. What’s next? We will undoubtedly see more and even larger pre-trained models developed in the future. As such, they are a valuable building block for research advances and enable new practical applications. They demonstrate strong few-shot learning behaviour and robust learning capabilities. Why is it important? Pre-trained models have been shown to generalize well to new tasks in a given domain or modality. However, increases in pre-training performance do not necessarily translate to downstream settings. Given the observed scaling behaviour of many of these models, it has become common to report performance at different parameter sizes. By framing different tasks in the paradigm of language modelling, models have had great success also in other domains such as reinforcement learning and protein structure prediction. In vision and language, controlled studies shed new light on important components of such multi-modal models. At the same time, we saw new unified pre-trained models for previously under-researched modality pairs such as for videos and language as well as speech and language. In speech, new models have been built based on wav2vec 2.0 such as W2v-BERT as well as more powerful multilingual models such as XLS-R. The latter have been scaled beyond the controlled environment of ImageNet to random collections of images. In computer vision, supervised pre-trained models such as Vision Transformer have been scaled up and self-supervised pre-trained models have started to match their performance. Pre-trained models were applied in many different domains and started to be considered critical for ML research. What happened? 2021 saw the continuation of the development of ever larger pre-trained models. The trained model can then be fine-tuned on different speech tasks (Babu et al., 2021). The model is pre-trained on diverse multilingual speech data using a self-supervised wav2vec 2.0-style loss. I discuss the following highlights:ġ) Universal Models Self-supervised cross-lingual representation learning on speech using XLS-R. Feel free to highlight them as well as ones that you found inspiring in the comments. I tried to cover the papers that I was aware of but likely missed many relevant ones. In this post, I will cover the papers and research areas that I found most inspiring. (2021)Ģ021 saw many exciting advances in machine learning (ML) and natural language processing (NLP).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |