I was writing a note to a friend that mentioned my tedious opinions on “AI” discourse. It veered off into my usual argument that big “AI” companies are shaping the industry ecosystem to their own ends by setting up a situation where expensive-to-run models are overvalued. I think they’re doing this because they have a competitive advantage in that tier of the market, having bought (time on) a lot of GPUs. It’s like how a company that owns diamond mines will probably promote the idea that large, mined diamonds are important and valuable, and that there’s something off about running a sub-industrial mine or lab-growing diamonds. You can do this without lying at all, but I still dislike it. Large mined diamonds here are
To support this argument, I started making my case against the necessity of the standard transformer model. I admit that the case is scattershot and circumstantial. It’s not that SDPA (the normal transformer architecture) is a fraud, or that there is something m