Enough information now that I'll hazard an answer.
By the way, the logic behind suggesting GMRES is NOT that your matrix is sparse, but that if it changes only by a tiny amount each time, then you might be able to gain using GMRES, because you supply a starting value. It might be worth a shot, but I'm not sure you would gain. Depending on the condition number of your matrices, the solution could change a fair amount based on only a small perturbation.
But something in your response makes no sense. A single solve for an 800x800 matrix takes 20-30 seconds? Or 90000 of them? I assume you mean the latter. A quick test on my computer shows:
A = rand(800,800);
tic,[L,U,P] = lu(A);toc
Elapsed time is 0.024238 seconds.
And a factorization will be the lion's share of the time. But that also makes no sense.
So not 20-30 seconds, but 2000 seconds. Your computer must be either REALLY old and slow, or REALLY fast. A really slow computer makes no sense, as you say you have 40GB of RAM, and a computer old enough to be that slow would be unlikely have that much RAM.
Anyway, one thing I would point out is that QR is not necesssaily a good choice to do a square matrix solve. If these matrices are square, then why are you using QR? A pivoted LU will be nicely stable, and will be considerably faster than QR.
A = rand(4000);
>> tic,[L,U,P] = lu(A);toc
Elapsed time is 0.300080 seconds.
>> tic,[Q,R,E] = qr(A);toc
Elapsed time is 8.192923 seconds.
So perhaps when you said 20-30 seconds, you meant for a 4kx4k matrix took that long for a solve, if your computer is 3x as slow as mine. That is possible, since mine should be putting up some decent bench scores. (All 8 cores on my CPU were running on that QR factorization.)
But the point is, why are you using QR here at all? The matrix is square! Just use a column pivoted LU. The difference is huge, certainly on a 4kx4k matrix.
In fact, since your matrices are changing on every iteration anyway, then why are you using QR or even LU anyway? Just use backslash, and let it worry about then doing the final solve too. Its probably faster than doing an LU, then applying those factors to a column vector for the solve.
tic,x = A\b;toc
Elapsed time is 0.349814 seconds.
That was pretty fast, in comparison to the time the LU itself took.
My point in this is that using QR to solve the problem may well be a major factor in why your code is so slow. Just use backslash, unless there is some reason why you REALLY, REALLY need a QR. But if you would be willing to consider an iterative solver, that also makes no sense. So use backslash.