Has anyone tried >70B LLMs on M3 Ultra?

Since the Mac Studio is the only machine with 0.5TB of memory at decent memory bandwidth under $15k, I'd like to know what's the PP and token generation speeds for dense LLMs, such Llama 3.1 70B and 3.1 405B.

Has anyone acquired the new Macs and tried them? Or, what speculations you have if you used M2 Ultra/M3 Max/M4 Max?