Weight Banding


Let's just follow the "we don't know what the fuck is happening inside of neural networks" thread for a second. OpenAI just released two papers—Weight Banding (this link) and Branch Specialization—and both of them are focused on this. The conclusion from Weight Banding is particularly noteworthy in just how open-ended it is:

Once we really understand neural networks, one would expect us to be able to leverage that understanding to design more effective neural networks architectures. (...) It’s unclear whether weight banding is “good” or “bad.” We don’t have any recommendation or action to take away from it. However, it is an example of a consistent link between architecture decisions and the resulting trained weights. It has the right sort of flavor for something that could inform architectural design, even if it isn’t particularly actionable itself.

(emphasis mine)

I can't tell you the last time I read "flavor" used in an academic paper on ML, but I absolutely love it. I increasingly feel like many researchers today are perfectly happy constructing black-box experiments and describing effects, but I find myself far more curious about what is actually inside the box. For example of why focusing on this stuff matters, check out the comment from a neuroscientist at the end of Branch Specialization:

From the perspective of a neuroscientist, a striking result from the investigation of branch specialization by Voss and her colleagues is that robust branch specialisation emerges in the absence of any complex branch specific design rules. Their analyses show that specialisation is similar within and across architectures, and across different training tasks. The implication here is that no specific instructions are required for branch specialisation to emerge. Indeed, their analyses suggest that it even emerges in the absence of predetermined branches. By contrast, the intuition of many neuroscientists would be that specialisation of different areas of the neocortex requires developmental mechanisms that are specific to each area. For neuroscientists aiming to understand how perceptual and cognitive functions of the brain arise, an important idea here is that developmental mechanisms that drive the separation of cortical pathways, such as the dorsal and ventral visual streams, may be absolutely critical.

!!! So neat.


Want to receive more content like this in your inbox?