gpu is slow and only supports GLES/open gl 2. open gl 2.1 isn’t different from gles in terms of featureset, and that’s what matters.
cpu/core use doesn’t matter because cpu is barely used with pi3 plus mupen64plus (thanks to the arm dynarec).
mupen64plus-gliden64 is fairly well optimised for pi at this point, in my view. ultimately the gpu’s lack of features doom it.
system bus speed is always probably a bottleneck but no amount of speed will get over the GLES/gl limitations.