Human language communication is via sequences of words, but language understanding requires constructing rich hierarchical structures that are never observed explicitly. The mechanisms for this have been a prime mystery of human language acquisition, while engineering work has mainly proceeded by supervised learning on treebanks of sentences hand labeled for this latent structure.
However, we demonstrate that modern deep contextual language models learn major aspects of this structure, without any explicit supervision. (...) we show that a linear transformation of learned embeddings in these models captures parse tree distances to a surprising degree, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists. These results help explain why these models have brought such large improvements across many language-understanding tasks.
This is super super cool. We've started to understand the mechanisms of why BERT works as well as it does and it turns out that it's parsing sentence structure in much the same way we do without having been explicitly taught how to do so.
If you've been waiting for a resource to dig into the inner workings of BERT and transformer models, this is a good one.Read more...