Here are the problems with conclusively proving how much one thing or another helps your precision:
Individually, the improvements you make for adding or improving a specific step are pretty small. They are small enough that without a large sample size, critics will tell you that the improvement is within the margin of error. Then, when you do actually use a large enough sample, they'll say it's something else.
So, why is it that some people have reliably lower SDs than others? Because they do a number of things together that, in the aggregate, produce meaningful results.
This would be a pretty invalid test for what you're looking for. Here's why:
1) It would have no bearing on brass life. EDIT: for clarification, one firing would have no bearing
2) It would likely have little to no impact on precision either. Firing brass that's newly annealed vs. brass that was annealed last firing will show little difference. Harder cases will spring back more after sizing and using a mandrel, and will thus exhibit a higher effective neck tension on the bullet. If they're all the same, you will get progressively higher effective neck tension as you go up in the number of firings, which will ramp pressure progressively as well, but with proper prep, they should all maintain a consistent firing profile with respect to each other. Whether this pressure ramp impacts precision down range is entirely dependent on a vast array of other variables - some of which will change by the hour.
3) Even going to 3 or 4 firings, as long as you're firing the same group of brass (e.g. they all have the same number of firings), and you use other mitigating factors in your reloading process, you probably won't see much difference. This is provided you keep the brass separated by number of firings.
The most effective way to test would be to get a group of brass that you never anneal, let's say 40 pieces. Each firing you pull 5 out and put them aside. This will give you a group of brass ranging from 1 to 8 firings since annealing. Then fire them all together, measure the SDs and group size of that group vs. another group that's been freshly annealed.
I've done enough testing on other things. Perhaps I'll do that once I get my newly rebarreled 308 dialed in again. Would be a fun, if long, test to do.