Abstract: We explore the boundaries of scaling up a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results