• Online Training Rescheduled: Join Us Next Week And Get 25% Off Access

    Use code FRIDAY25 and SATURDAY25 to get 25% off access to Frank’s online training. Want a better deal? Subscribe to get 50% off.

    Get Access Subscribe

5rd groups aren't statistically significant! Wanna bet?!!

JR1200W3

Supporter
Supporter
Full Member
Minuteman
Apr 21, 2020
546
623
The ongoing, neverending argument over how many rounds a shooter needs to generate a reasonable estimate of the quality of his ammo, rifle precision, and average MV for his ballistic profile.

Hornady says minimum of 30rds and quite a few people agree.

Some people say a handful of 5rd groups from this or that shooting session.


Who's right?

"You're sample size isn't statistically significant! REEEE!!!!"

Is that a good hook? It's a matter of statistics, huh? Okay, I went out and shot 30rds in an aggregate of six 5rd strings today. Shot'em all into the same target, ran the Xero for the entire 30. Here are the results.

Equipment:
Zermatt TL3 in a bedded AI AICS ATX chassis, CRB .25 cal Comp Contour barrel chambered in 25GT, NF ATACR 7x35 T3, Hawkins Heavy Tactical rings, Hawkins updraft brake, TT Diamond set at 16oz, Gen I CkyePod, Accurate Mag 12rd mag

Garmin Xero, Schedium GC with heavy fill, TAB Pollock shooting mat

100yd Indoor range

Alpha brass(2x fired), 37gr H4350, Berger 135gr LRHT, FGMM SRP


Method:

Brass prep Consisted of Decap with A419 decapping die, wet tumble for 20mins (no pins), dry, anneal on AMP MKII using code 142, trim on Henderson 3-way trimmer, lube with imperial sizing die wax on the body, and swab Neolube 2 inside the necks, size using a Forster FL bushing die with .278 bushing and mandrel with a SS Sinclair mandrel turned to make a .279" neck diameter for a loaded .282" neck. This Alpha brass has a .012" neck thickness. Finally dry tumble in walnut. All in a 21 year old Dillon 550.

Load Prepped brass was loaded on a Dillon 550. Prime in station 1 using FGMM SRP, nothing in station 2, charged in station 3 with a Dillon powder thru die and funnel insert with a A419 funnel. Powder is dropped, weighed, and trickled on a FX-120i equipped AT V2+ and IP trickler. Powder was dried to my ambient RH and gained approximately 50fps since opening the jug. Charged round seated in station 4 with a Forster Micrometer seating die in 25GT. Dumped into an Akron bin. Every 5 loaded rounds were gathered and placed in an MTM ammo box so some loading sequence was maintained to be carried over to the test. All charges trickled to either 36.98 or 37.00.

Barrel The barrel had 700rds on it since early June. This consisted of 5 matches. It was recently cleaned thoroughly after the last match two weeks ago so I had a clean bore. I loaded 5 more rounds to season the bore prior to shooting the 30rds for record.

Equipment I shot the test in the prone using the Ckyepod and heavy fill GC as a rear bag. I laid on the mat, placed the Xero just in front of the bipod, under the barrel so the brake didn't blow it around. The test was shot in an indoor 100yd range. The distance from the front of my scope to the target measured 98 yards.

Environment: 9500ft DA, 23.12inhg, 75°, 50% RH Indoor 100yd range. No wind, mirage, or lighting changes

Shooting method I fouled the bore using 5rds. Let the barrel cool for 3 mins and commenced testing. I used a .3" white paster and dialed .5mrad into my elevation to place my group roughly 2"(1.98 actual) above my POA on the white paster.

I shot 5rds at a time, stopping to run the target back up range , take a photo, run it back, write down the individual velocities from the Xero, load 5 more rounds in the mag, return to Xero to it's position in front of the bipod, and commence shooting again. This process took on average 2 mins to shoot a string and 8 mins to reset. I repeated this for 6 total 5rd strings of 30rds. The Xero ran the entire time and recorded every shot as a cumulative 30rd string.

1000008342.jpg
1000008348.png

1000008343.jpg

Results:

Here's the raw data:
1000008341.png


30rd Mean/Standard Deviation/ Extreme Spread
Mean: 2691.5
SD: 7.9
ES: 27.5

String 1:
Mean: 2689
SD: 6
ES: 16

String 2:
Mean: 2684
SD: 4.6
ES: 10

String 3:
Mean: 2686.8
SD: 5
ES: 13

String 4:
Mean: 2695
SD: 6.4
ES: 12

String 5:
Mean: 2701
SD: 4.9
ES: 12

String 6:
Mean: 2693
SD: 9.3
ES: 25

Here's a comparison of each string vs the running total as the 30rd data develops. * Indicates running total:

String 1:
2689/5.4/16*

String 2:
2684/4.6/10
2687/5.4/20*

String 3:
2686.8/5/12.7
2686.9/5.2/20.1*

String 4:
2694.5/6.4/12
2688.9/6.3/25.3*

String 5:
2701/4.9/12
2691.3/7.7/27.1*

String 6:
2693/9.3/25
2691.5/7.9/27.5*

1000008315.jpg
1000008339.jpg
1000008313.jpg



Observations:
Let's inspect the data. My mean SD across all six strings was 6. My lowest 5rd SD was 4.6 and the highest 5rd SD was 9.3.My total finished SD across the larger sample size was 7.9.

The mean of my means was of course 2692fps. The difference of my means was 17fps. The 95% confidence interval is 2688 and 2694. Just 6fps!

There is a trend in the last two 5rd strings where the velocities were increasing. I was seeing mirage off the barrel that correlated to the increased velocities. String #5 was the highest and had an interesting result that we'll talk about later.

As a note, AB is saying for a 135 Berger at 2692fps in a DA environment of 9500ft, it takes 14fps to generate a 0.1mrad elevational difference at 1000yds

Discussion:
Okay. What does all this mean? Well, first I want to point out something that I picked up from my wife who has 20 years of experience in clinical research and deals in stats all the time. What is statistically significant isn't always clinically significant. I mean sometimes it is, and researchers absolutely use P values to demonstrate statistical significance as a way of proving clinical significance, but the two aren't always reciprocal. This is a key concept because it's the same with shooting. What is statistically significant isn't always ballistically significant and there's a data set in this population that proves that point. So for all the nerds still using Dandy tricklers and tucking their tshirts into their underwear and thundering on the Internet shooting forums, "Your sample size isn't statistically significant!". This data set is about to prove you wrong.

As you can see each 5rd strings is under 10fps and almost all of them were under 8fps. String #5 had an increased Mean. The increased mean was due to a general speed-up in individual velocities. I believe this was the consequence of the barrel and chamber heating up. If this is true it would be a false artifact of shooting a large sample size. I was seeing mirage and shooting through it during strings 5 and 6. The 8 mins in-between strings wasn't enough to adequately cool the barrel. Perhaps once you get a CRB Comp Contour hot enough, all that steel retains the heat longer. Not something you would encounter at a 10rd stage PRS match. This makes for an interesting counter-argument against the 30rd group folks. The increased SD in string 6 is due to one round. 2980. It's not even the lowest velocity in the extreme spread. And I would not call it an outlier. In my estimation, it still falls within the category of random chance. It wouldn't cause a miss, I'm not going to change the way I load ammo due to it, and it still doesn't generate a full 0.1mrad of difference at 1000yds. (Remember, 14fps to generate a dial'able diff)

We ran a standard T test to compare the individual strings against each other and the overall 30rd data. The goal was to compare the differences to each other and use a standard statistics metric to determine if a "statical significance" existed. This will answer the question, through a statistics metric, if the 5rd groups provide a good estimate and are representative of the overall 30rd population. We assumed the standard Alpha of 5% or 0.05. The standard T tests generated a P value to be judged against the Alpha. The only string that generated a P value greater than the Alpha, or demonstrated a statistical significance in difference from the other strings or the 30rd population, was String #5. You can see the mean MV falls outside of the 95% confidence interval. This means this result should not happen often. Hence this one 5rd string is not representative. Statistically speaking.

Oh! So a win for the 30rd group crowd, right?!! I shot a 5rd group that isn't representative of the normal outcome, right? So if I shot one 5rd group and got this result, it would not be representative of a 30rd sample size. Well, case closed! The 30rd group guys win the argument! Well, not so fast. This one string would make me think my SDs are worse in a 30rd string than they actually are. So, if the argument is that some guy posts a a 5rd group with a single digit SD and 30rd group guy says, "Cool, good for you but that SD in that 5rd strings won't hold up over a larger sample size." ..... then.... Akshually.... this unrepresentative String #5 ... doesn't actually prove that. In fact the difference in SD of 9 vs 6 is pretty F'ing immaterial. You can refer to Bryan Litz's WEZ to determine the hit probability difference of 3fps SD. Okay, well... you'd have 2701 plugged into your Kestrel instead of 2692 if you went off this one 5rd string. Okay, so what? Is that ballistically significant? It's 9fps difference. You still can't even dial that at 1000. And certainly inconsequential at 560, or 735, or 865. So this demonstrates the thing I was saying about what is statistically significant isn't necessarily ballistically significant to us as shooters. And let's not forget there's a decent chance this is only an artificial result of increased heat from shooting an annoyingly large sample size.

The remaining 5 strings are both statistically representative and practically representative.

Conclusion:
This 30rd data set demonstrates that when you use instruments that are capable of reproducing accurate and precise data, a 5rd string can provide a good estimate of the greater population and is representative of what your rifle and ammo will do throughout a match and larger population size.

Now, if the data were different, this conclusion could be different. If your data was less accurate and precise, then yes, you probably do need a larger sample size to have a good estimate and be representative of what your gun and ammo is capable of. Both of these two things can be true. It depends on how good your ammo and gun is. It's not a truism that you need a larger sample size to have a good estimate of what you're setup is capable of. It is absolutely a statistical truth that a larger sample size will provide a better estimate and be more representative, but it's about the context. A lot of us don't need this better estimate and representation. We're in the range of splitting hairs when your already using really precise and accurate data. And it's not usable to us as shooters. This is where the confusion comes in. Just because it's a statistical truth that a larger sample size will provide a better estimate doesn't mean a handful of 5rd groups can't provide all the precision and accuracy we need as shooters to understand what our ammo is capable of throughout a match.

Edit: phone autocorrect induced typos
 
Last edited:
I am in that same boat that 5 round groups are statistically insignificant and that the more rounds the closer to actual results.

However, I also know that it's not practical to spend 1000 rounds testing, load development, etc.

I perform my initial load development with 5 rounds groups. After I've selected a load, I will evaluate at longer ranges and might even do a greater round group to validate my initial results.
 
You’ll have to demonstrate #1 first.

Otherwise, the tests would be inappropriate.
Without getting into a statistical nerd fight, anyone can tell that the six strings with AVG MVs of 2684, 2689, 2687, 2693, 2701, and 2695 and SDs of 6, 6, 5, 5, 5, and 9 are representative of a 30rd population of 2692 AVG MV and SD of 7.
 
Without getting into a statistical nerd fight, anyone can tell that the six strings with AVG MVs of 2684, 2689, 2687, 2693, 2701, and 2695 and SDs of 6, 6, 5, 5, 5, and 9 are representative of a 30rd population of 2692 AVG MV and SD of 7.

Look, I'm not.

But I remember this much from High School, and it's been stuck in my memory ever since.

Our instructor gave us a data set to crunch with an ANOVA.

While we demonstrated our technical capacity to actually perform the test, we failed the exercise because the numbers did not follow a Normal Distribution to begin with.

1754797193540.png


Google's bot actually gives a surprisingly good, easily understandable (with some minor digging) summary on it.

Assuming there was in fact a Normal Distribution to begin with...

First: Shooting off the hip, I can tell you that your numbers will not be representative of a sample set of thirty "with any degree of confidence," as they would say.

Second: The sample set of thirty is meant to represent the "total sum of the universe's results." Using a set of five to represent thirty, which was supposed to represent the "universe" to begin with - I really don't think it works that way. You can attempt to use a set of five to again, represent the "total sum of the universe's results," but we all know this would fail as your test would lack statistical power.

I am old and there may be some errors in what I typed - let's wait for the young 'uns to chime in.

Hope this helps.
 
Last edited:
Second: The sample set of thirty is meant to represent the "total sum of the universe's results." Using a set of five to represent thirty, which was supposed to represent the "universe" to begin with - I really don't think it works that way. You can attempt to use a set of five to again, represent the "total sum of the universe's results," but we all know this would fail.
I think this is where statisticians get lost in the rules of the math and lose touch with reality. They stick to the truisms outside of context and make false statements.

I don't feel like getting up right now and logging in to R to run the distribution but I will tomorrow.
 
On the other side of the statistical coin there is me today. I moved a scope from the CZ 22 to the 300PRC. Zerod with pre loaded known hand loads. Bore sight and one shot on steel at 200. A few clicks to Center-ish on the paper and send 3. 0.1 mil down and zero the turrets.
Of course my 22LR data collection was a different. That stuff is a lot cheaper.
 

Attachments

  • IMG_0075.jpeg
    IMG_0075.jpeg
    1.3 MB · Views: 6
  • IMG_0077.jpeg
    IMG_0077.jpeg
    1.8 MB · Views: 5
  • IMG_0069.jpeg
    IMG_0069.jpeg
    1 MB · Views: 6
  • IMG_0073.jpeg
    IMG_0073.jpeg
    1.7 MB · Views: 5
  • IMG_0072.jpeg
    IMG_0072.jpeg
    1.1 MB · Views: 6
  • IMG_0074.jpeg
    IMG_0074.jpeg
    1.5 MB · Views: 7
One thing about statistics -- Stats is/are a tool, like a hammer.

You know that saying: the guy holding the hammer, to him everything looks like a nail.

Hornady's batch sizes are quite a bit bigger, I think, compared to most any home reloader. It would make sense that their demands, statistics-speaking, are for larger sample groups. If you're going to make 100,000 rounds, you probably don't make the powder charge decision based on a small test sample group.

I think that's where they are coming from on the 30+ rds POV.
 
  • Like
Reactions: memilanuk
One thing about statistics -- Stats is/are a tool, like a hammer.

You know that saying: the guy holding the hammer, to him everything looks like a nail.

Hornady's batch sizes are quite a bit bigger, I think, compared to most any home reloader. It would make sense that their demands, statistics-speaking, are for larger sample groups. If you're going to make 100,000 rounds, you probably don't make the powder charge decision based on a small test sample group.

I think that's where they are coming from on the 30+ rds POV.
I agreed with you right up to the last sentence. The one thing I learned from this is if you have sloppy SDs and ES and can't reproduce ammo with very similar average velocities from session to session, then yes, you need much larger sample sizes. Enter Hornady and their QC. Makes sense that this is their position.

But their statements about 30rd minimum sample sizes was very much directed at shooters. Not guys running ammo companies producing large runs of ammo with high variability.
 
I find it utterly ironic that Hornady can only seem to keep their “match grade” bullets within 1.4 grain of each other, in a box of 100. Nor can they make “match grade” brass that isn’t known as some of the softest out there and even when it is free, I toss it and buy Alpha, Lapua, Peterson or ADG. Yet they lecture folks on reloading match quality ammo. Clowns…..
 
I literally just proved they absolutely can be
Again, I MYSELF am in that boat that 5 round groups are statistically insignificant. Not saying YOU are not.

Also, you say you proved otherwise. Your rifle, loads, barrel, etc are all only ONE system. You proved it with one. That is the exception and the exception does not make the rule.

Unless you can prove it with multiple other systems, idk how you can prove that your results apply across the board.
 
Again, I MYSELF am in that boat that 5 round groups are statistically insignificant. Not saying YOU are not.

Also, you say you proved otherwise. Your rifle, loads, barrel, etc are all only ONE system. You proved it with one. That is the exception and the exception does not make the rule.

Unless you can prove it with multiple other systems, idk how you can prove that your results apply across the board.
This is a very American trait right now. Willful belief in the face of contradictory evidence. New evidence will not change your mind.

I discussed this topic of how reproducible this test with a statistics mentor. I did this to answer, is my situation just an outlier? I don't think so. Look at my equipment. Look at my rifle, ammo components. It's very common. And it's very common for people using the same equipment to Chrono 5rd strings with SDs below 10, repeatedly, on demand. I don't think I'm at some platinum level apex of reloading. I'm loading on a Dillon 550, lol.

And again....as I said....if you're AR15 varmint shooter guy who is regularly chrono'ing SDs of 15-20 with ES's around 40 -60.....yes, you need a larger sample size. But like Null hyposthesis and alternative hyposthesis logic, just because the latter requires larger sample sizes doesn't mean everyone needs larger sample sizes.
 
  • Like
Reactions: flogxal and lash
@Edsel I assumed chrono'ing MVs creates a data set that follows a normal distribution because the mean is in the middle of the distribution, we're using SDs, and the small variations are due to some chance. I did some poking around and see that everyone seems to agree. I'll open up POSIT tomorrow and generate a histogram of the data. And you can use T tests on skewed distributions.
 
This 30rd data set demonstrates that when you use instruments that are capable of reproducing accurate and precise data, a 5rd string can provide a good estimate of the greater population and is representative of what your rifle and ammo will do throughout a match and larger population size.
This depends on the estimator. If you’re talking about the mean, then yes. But standard deviation is biased at small sample sizes and highly variable.
 
I find it utterly ironic that Hornady can only seem to keep their “match grade” bullets within 1.4 grain of each other, in a box of 100. Nor can they make “match grade” brass that isn’t known as some of the softest out there and even when it is free, I toss it and buy Alpha, Lapua, Peterson or ADG. Yet they lecture folks on reloading match quality ammo. Clowns…..
I was just about to post how audacious Hornady is touting this shit, when if using their ammo one would chase their tail forever! Lol
 
School me up. Where in this data are error rate corrections required?
First, you should have used an ANOVA test for 6 groups. It improves their error estimations. Second, you cannot make 15 statistical test comparisons without inflating the type I error rate. You have to make some sort of adjustment to either the alpha or the p-values. You selected 0.05 as your overall alpha. Now divide that by 15 and that’s the new alpha to compare each p-value too.
 
  • Like
Reactions: memilanuk
First, you should have used an ANOVA test for 6 groups. It improves their error estimations. Second, you cannot make 15 statistical test comparisons without inflating the type I error rate. You have to make some sort of adjustment to either the alpha or the p-values. You selected 0.05 as your overall alpha. Now divide that by 15 and that’s the new alpha to compare each p-value too.
Given that String 5's mean doesn't fall within the 30rd strings CI, doesn't that prove in a normal distribution that it's significantly different?
 
Given that String 5's mean doesn't fall within the 30rd strings CI, doesn't that prove in a normal distribution that it's significantly different?

No because the type I error rate is inflated. The CI is based on an alpha that is no longer the correct value because you made all those comparisons. You have to adjust the alpha to 0.0033 and the CI coefficient to 1-0.0033.

The problem with introductory statistics courses is they don’t really teach you statistics. I have over 60 credit hours in statistics and there’s still too much crap to learn.
 
  • Like
Reactions: memilanuk
No because the type I error rate is inflated.

The problem with introductory statistics courses is they don’t really teach you statistics. I have over 60 credit hours in statistics and there’s still too much crap to learn.
Does it matter if String 5 is or isn't statistically significant based on P values(which can be over relied on) since we know it isn't ballistically significant? It's kind of the minor point given that the other 5 strings are representative and provide a good estimate.
 
First, you should have used an ANOVA test for 6 groups. It improves their error estimations. Second, you cannot make 15 statistical test comparisons without inflating the type I error rate. You have to make some sort of adjustment to either the alpha or the p-values. You selected 0.05 as your overall alpha. Now divide that by 15 and that’s the new alpha to compare each p-value too.
I'll do an anova tomorrow and have my wife walk me through adjusted P values.
 
  • Like
Reactions: memilanuk and JB.IC
Does it matter if String 5 is or isn't statistically significant based on P values(which can be over relied on) since we know it isn't ballistically significant? It's kind of the minor point given that the other 5 strings are representative and provide a good estimate.
Personally, it’s not really that much of a difference to care. But this was about the in correct practice of conducting so many comparisons without adjusting the error rates. Don’t worry though, researchers outside the stats department make these mistakes all the time because they get one or two stats courses in grad school that beat them up mathematically but never really teach them correct practices.
 
  • Like
Reactions: memilanuk
Personally, it’s not really that much of a difference to care. But this was about the in correct practice of conducting so many comparisons without adjusting the error rates. Don’t worry though, researchers outside the stats department make these mistakes all the time because they get one or two stats courses in grad school that beat them up mathematically but never really teach them correct practices.
I highly suspected folks that really know stats would sense blood and come in to nibble at small details, without disproving the conclusion.
 
  • Like
Reactions: memilanuk and lash
And again....as I said....if you're AR15 varmint shooter guy who is regularly chrono'ing SDs of 15-20 with ES's around 40 -60.....yes, you need a larger sample size. But like Null hyposthesis and alternative hyposthesis logic, just because the latter requires larger sample sizes doesn't mean everyone needs larger sample sizes.
I think you just shot your own story down.

If I gave you only samples from one recipe to compare to samples of another recipe, and you have to bet the farm on the results, but I don't show you the total populations from either recipe.... if you are being honest then how many samples of each would you want me to show you before giving your answer? Just five of each, or more like 30?
 
I think you just shot your own story down.

If I gave you only samples from one recipe to compare to samples of another recipe, and you have to bet the farm on the results, but I don't show you the total populations from either recipe.... if you are being honest then how many samples of each would you want me to show you before giving your answer? Just five of each, or more like 30?
That's a contrived trap. Inherent in this is that you've shot YOUR rifle more than 5 rounds.

I think guys that are fine with a couple of 5 and 10rd groups getting their rifle ready for a match have intuitively taken aggregate samples from multiple preps, practices, and matches. So when they shoot two 5rd groups and see the same results as all the past sessions, especially a rock solid AVG MV over and over again...they aren't looking at just one or two isolated samples. Hence the position of "Why do I need to shoot a 30rd group when everytime I shoot a 5rd group the SD is under 8?". The counter-argument is that a 5rd group won't show natural and random variation, but a 30rd group will. And I'm making the argument that with quality components and equipment, natural and random variation is in the neighborhood of 5-8 fps. Which isn't going to be statistically significant or ballistically significant. A significant variation is frankly and outlier and is going to be caused by a mechanical or chemical catalyst that we as shooters seek to find, control, and eliminate. If that kind of outlier exists, we eliminate the cause. So, once you can demonstrate reproducibility of accurate and precise data with YOUR equipment, you don't need a large sample to understand unknown data, because it's not completely unknown to you.

Think outside of hypothetical logic traps and think about what we do practically.

I shot about 100 - 150 rounds through this barrel and took it to it's first match. This white target was the 25rds of zero confirm before I ran the press on the 100rd sample and shot the match. The yellow cut out was from a later "check" and I don't remember when.
1000008340.jpg

I shot this prep session on 18 June. Looking back into my Xero, I have two strings. A 5rd AVG MV 2700, SD 7, and a 19rd string AVG velocity 2698.7, SD 6.4 My Xero occasionally misses shots, but this is a good example of what I'm referring to. When this just keeps happening, do you really need to shoot a 30rd group every time?
 
Pretty cool test you did. Nice! And that’s a very good 30 round group!
It would have been cool if you recorded the group size of each 5 round group. Seems like that was what hornady emphasized more?

On that note I’ve seen many times guys claim their rifle is “sub moa” based on their 3 round group. And it is a sub moa 3rd group. But soon as they shoot 5 to 10 rounds then it grows quickly to 1.5 or sub 2 moa. Then they think something is wrong “cause I know this is a sub moa rifle”
I’m sure much of what I have seen is based on hunting style rifles and barrel heat is an issue.
Dad would always do 3rd groups, or sometimes a single round to verify zero. Then again he was strictly a big game hunter, and I don’t know how many dozens or scores of deer, elk, and moose he downed with a single shot.

Not statistically or scientifically accurate, but I did an 80 round string with my 308 ar10, 8 different loads, 130gr to 180gr, 3 different shooters. And at 100 yards everything was sub 3moa, closer to 2.5moa. The best 5 round group was just under 1 moa. For whatever that’s worth.

TLDR
I do 5 rd groups, if that’s accurate enough for me then rest is usually accurate enough.

Anyways, nice 30 rd group!
 
That's a contrived trap. Inherent in this is that you've shot YOUR rifle more than 5 rounds.

I think guys that are fine with a couple of 5 and 10rd groups getting their rifle ready for a match have intuitively taken aggregate samples from multiple preps, practices, and matches. So when they shoot two 5rd groups and see the same results as all the past sessions, especially a rock solid AVG MV over and over again...they aren't looking at just one or two isolated samples. Hence the position of "Why do I need to shoot a 30rd group when everytime I shoot a 5rd group the SD is under 8?"
Well, just trying to help you out, but here is the problem.

We have everything from experts to rookies on here.

When you give general advise about only needing 5 shots to make a judgement call, you are only correct when addressing folks who have very tight guns and lots of experience. What has your advise done for the folks with no scientific background and low shooting experience?

This is what happened when Scott Satterlee told the 6.5 Guys that you only need a chrono and a ten round ladder to find a flat spot and you are done.....

Not all systems have an inherently tight SD/ES so your story only works when you already know the gun has a tight SD/ES. The average rookie who reads on this forum has become confused by the swirl that has erupted from the proliferation of chronographs and that YT video that told them they only needed to blast a 10 shot ladder over the chrono and not even aim.
 
  • Like
Reactions: memilanuk
Well, just trying to help you out, but here is the problem.

We have everything from experts to rookies on here.

When you give general advise about only needing 5 shots to make a judgement call, you are only correct when addressing folks who have very tight guns and lots of experience. What has your advise done for the folks with no scientific background and low shooting experience?

This is what happened when Scott Satterlee told the 6.5 Guys that you only need a chrono and a ten round ladder to find a flat spot and you are done.....

Not all systems have an inherently tight SD/ES so your story only works when you already know the gun has a tight SD/ES. The average rookie who reads on this forum has become confused by the swirl that has erupted from the proliferation of chronographs and that YT video that told them they only needed to blast a 10 shot ladder over the chrono and not even aim.
I stated many times that when you have equipment that can produce accurate and precise data a 5rd group CAN be a good estimator. I know the OP is long and most people just skimmed it, but I've already said this
 
  • Like
Reactions: lash and LR1845
I stated many times that when you have equipment that can produce accurate and precise data a 5rd group CAN be a good estimator. I know the OP is long and most people just skimmed it, but I've already said this
Yes, but if we are being honest, are you setting a trap too? Like the premise that just because you already found the tight recipe that rookies will find one in 5 shots too?

Sure, if we have a very tight system, many 5 shot samples will look similar. But what if that system is an unknown before we start?

How do you know ahead of time if a system is tight? Will we tell rookies to stop at 5 samples and go to work?
 
  • Like
Reactions: memilanuk
Yes, but if we are being honest, are you setting a trap too? Like the premise that just because you already found the tight recipe that rookies will find one in 5 shots too?

Sure, if we have a very tight system, many 5 shot samples will look similar. But what if that system is an unknown before we start?

How do you know ahead of time if a system is tight? Will we tell rookies to stop at 5 samples and go to work?
I'm not telling rookies anything. That is your world-building. I'm simply proving that we don't always need 30rd samples. You're having a hard time with absolutes.

I think a lot of people are stuck on the very simplistic, yet true fact in statistics that a larger sample size will expose more random variation. But what they're not thinking about is the fact that variation sizes and their effects can be inconsequential and the additional variation that a larger sample size exposes is immaterial if you're working with accurate and precise data set that is well under the real world requirement.

It's not always true that a 5rd group with an SD of 6 WIL BE an SD of 15 in a 30rd sample. And not just by random chance. Our equipment is getting to the point where we can control this.
 
  • Like
Reactions: LR1845
And here's another point that I forgot to tease out. That one low MV in String 6 that caused the SD to shoot up to 9. That one round had the statistical power to blow up that SD in a 5rd string, but when factored into the total 30rd sample it became muted by the volume of data. And this is something that experienced shooters intuitively understand. That larger sample sizes don't always generate larger SDs because an errant single low or high velocity can be somewhat concealed actually. The normal distribution settles down over time.
 
It's not always true that a 5rd group with an SD of 6 WIL BE an SD of 15 in a 30rd sample. And not just by random chance. Our equipment is getting to the point where we can control this.
I'll make one more comment and let the rest roll.

It is also not true that you can always know the SD/ES of an unknown system before you know it.

Taking the stance that your specific 5 shot samples prove this is only because your population was tight, and it is a disingenuous argument against statistics and ballistics.

We are not saying that every SD grows just because the sample size is small, but we are saying that unless you already have some view of the real population stats, that sample sizes smaller than 30 run the risk that the 5 shot samples are wrong when there is unknown ES.
 
I'll make one more comment and let the rest roll.

It is also not true that you can always know the SD/ES of an unknown system before you know it.

Taking the stance that your specific 5 shot samples prove this is only because your population was tight, and it is a disingenuous argument against statistics and ballistics.

We are not saying that every SD grows just because the sample size is small, but we are saying that unless you already have some view of the real population stats, that sample sizes smaller than 30 run the risk that the 5 shot samples are wrong when there is unknown ES.
You're using the word we like your take in this is everyone's take. The fact is that is exactly what some people have said.

I also don't understand why you keep sticking to this surprise drop gun scenario. Who is shooting an unknown gun? I guess it could happen. New build. New barrel. But that would be the exception and not the norm right? We wouldn't build an argument in absolutism about the minority case would we? You should probably caveat that you believe in 30rd samples specifically for this odd occurrence, not just a statement without caveat and then when someone presents evidence disproving it, it's this other thing actually.

I think what's happening is that you're taking this stance because it's a very statistically minded way of thinking about it. It's what happens to a statistician. Someone gives him or her a data set and says earn your paycheck. And so this unknown gun scenario fits your statistical belief habits. IBut that's not really what the majority of shooters experience when they shoot a group or two and post it on SH right before someone comes along and clubs them over the head with the law of large numbers. That group they posted was an excerpt from a history.
 
  • Like
Reactions: lash and LR1845
You have me all wrong.

I am never going to tell folks they "need to" burn 30 shot samples doing their testing and personal load development. However....

I am going to warn them that there is a risk when they hang their hat on just 3 or 5 shots to make decisions.

I then show them that in real world ballistics, we often do not see a clean normal Gaussian Distribution and that means we sometimes can't assume that even 30 will show the real story.

Few of us have DoD or corporate sized budgets to burn for personal use, so most of us run our lives taking some risks that we can make decisions with less than the whole picture, much less.

We run on accumulated experience with both good and bad barrels, guns, weather, etc.. Some of us can also do the math, and have professional industrial experience to lean on as well.

However, when we make posts that are very specific to a system that is very tight, and set up a straw-man hypothesis that because we can show 5 shot samples that lays inside the distribution, it doesn't cancel the statistics text books.

If there have been folks who banged the table and insisted that only 30 shot samples will work, let them be. Don't let them bother you.

At the same time, we have to highlight we take risks when we use smaller samples and that time (more samples) will tell if we were correct. Your example shows only a very good ES and as such the 5 shot samples line up well. But we also only know this because you have 25 more good shots to prove it.

We must be careful to help the rookies learn what happens when they are starting up, not just what happens when a Satterlee puts on another barrel that matches the last good one. The context matters.

Nice shooting. Good Luck, carry on.
 
You have me all wrong.

I am never going to tell folks they "need to" burn 30 shot samples doing their testing and personal load development. However....

I am going to warn them that there is a risk when they hang their hat on just 3 or 5 shots to make decisions.

I then show them that in real world ballistics, we often do not see a clean normal Gaussian Distribution and that means we sometimes can't assume that even 30 will show the real story.

Few of us have DoD or corporate sized budgets to burn for personal use, so most of us run our lives taking some risks that we can make decisions with less than the whole picture, much less.

We run on accumulated experience with both good and bad barrels, guns, weather, etc.. Some of us can also do the math, and have professional industrial experience to lean on as well.

However, when we make posts that are very specific to a system that is very tight, and set up a straw-man hypothesis that because we can show 5 shot samples that lays inside the distribution, it doesn't cancel the statistics text books.

If there have been folks who banged the table and insisted that only 30 shot samples will work, let them be. Don't let them bother you.

At the same time, we have to highlight we take risks when we use smaller samples and that time (more samples) will tell if we were correct. Your example shows only a very good ES and as such the 5 shot samples line up well. But we also only know this because you have 25 more good shots to prove it.

We must be careful to help the rookies learn what happens when they are starting up, not just what happens when a Satterlee puts on another barrel that matches the last good one. The context matters.

Nice shooting. Good Luck, carry on.

I'll take your acknowledgement that this test proved out the conclusion as a win, even though I feel you didn't read that part.

As far as the rookies go, they can take advice from the front of the Ranger handbook on this topic as far as I'm concerned.
 
The 30rnd thing. Temp management: Separating SD from group sizes, I think it important to fire the set amount of rounds at a certain barrel heat. Very thin/pencil barrel might start throwing rounds past 3 shots, but 2 shot groups could always be touching, or sub-moa. If those 2 shot groups recorded over a period of time print consistently/precisely for those 2 shot groups, I call that a win recipe. This was the case with a Kimber lightweight hunter I had. I also only expected to fire two shots on a hunt. 2 at most. Switch to a 1.25" straight taper and it can and has to date held up to 30 shots .75-1 MOA.

Then things change with ambient temp conditions. Shooting every gun I own prints better in 30F than 100F. So I separate my load development for the ambient temps I shoot the rigs. Sometimes the load development groups in 30F go pear shaped at 100F.

My recipes SDs are typically low double digits with some single digits. I print sub MOA at 1000 consistently when I read the wind right. I know we don't want to stack tolerances, but making ammo to 0SD won't matter shit at distance if one can't read wind to 2mph and with accurate wind value. There is inches of difference if those two, too!
 
  • Like
Reactions: LR1845
I highly suspected folks that really know stats would sense blood and come in to nibble at small details, without disproving the conclusion.
I stand corrected and owe you the result. With an ANOVA model pooling the error together, which is more precise than a bunch of t-tests as you did, and an error rate adjustments with Tukey in the p-values and 95% confidence intervals, Group2-Group5 MV difference is significant at the 0.003 level with a difference in averages of -16.68 and CI of [-28.95, -4.4] and Group3-Group5 MV difference is significant at the 0.016 level with a difference in averages of -14.26 and CI of [-26.53, -1.98].

The standard T tests generated a P value to be judged against the Alpha. The only string that generated a P value greater than the Alpha, or demonstrated a statistical significance in difference from the other strings or the 30rd population, was String #5.
Should say P value less than Alpha and show which comparisons have a difference. And also be specific what you’re comparing when making a statement. Group A vs Group B and the estimator.

a 5rd group CAN be a good estimator.
A five round group isn’t an estimator. The statistic estimating a population parameter is an estimator. So, the sample mean is an estimator for the population mean and 5 rounds, assuming the variance is low, can be quite good as an estimator. But the same cannot be said about sample standard deviation for population standard deviation. The variance is quite high at 4 degrees of freedom. This applies to ES as well. Worse, ES is dramatically more biased than SD and has a larger variance.
 
In here now to hold my spot until after church. Thanks for the effort man.
- I literally just proved they absolutely can be

Edit:

I have been around long enough to make most every incorrect assumption about reloading for precision (or watch other people make them and fail). With that experience, I can say with practiced certainty that by the time I've shot 3 - 4 groups of five shots - at least 2 of which are at distances of 300+ - that I have a clue of what is going to happen to that next 5-10 round group.

I will never argue against the statistical accuracy and confidence level of more data inputs. That being said, there comes a point where it is less applicable to your intended discipline than the online reloading statisticians might have one believe. And until you get a human who is capable of shooting equally across all days and environment that is equal across those days as well, then you are going to have a mean that will not be replicated the next time out. It would be like taking two samples of political leanings in Oklahoma, and then conducting it again in Connecticut.

^ So what we really need are a bunch of 50 round groups, shot on different days, in order to get your true data :LOL: .

What we learned from this exercise above, is that people are going to poo-poo whatever it is that doesn't meet their ideologies and methods.

@JR1200W3 Thanks for your effort. If anything, it replicated what I've seen around here and gave me more confidence that I'm not completely we-todd-did.
 
Last edited:
  • Like
Reactions: Genin and JR1200W3
Well, just trying to help you out, but here is the problem.

We have everything from experts to rookies on here.

When you give general advise about only needing 5 shots to make a judgement call, you are only correct when addressing folks who have very tight guns and lots of experience. What has your advise done for the folks with no scientific background and low shooting experience?

This is what happened when Scott Satterlee told the 6.5 Guys that you only need a chrono and a ten round ladder to find a flat spot and you are done.....

Not all systems have an inherently tight SD/ES so your story only works when you already know the gun has a tight SD/ES. The average rookie who reads on this forum has become confused by the swirl that has erupted from the proliferation of chronographs and that YT video that told them they only needed to blast a 10 shot ladder over the chrono and not even aim.
The trap is that we all were new once, and carefully written technically sound statistics presentation meant SFA to us then. We were worried about spending an extra $200 on a scope or rifle. Our shooting wasn’t consistent or precise, along with Hornady 6.5 CM ammo now being $4 a shot means $120 wasted.
Like I posted earlier, my 3 shot group is representative of my experience with that rifle, load, and my shooting ability. For several years my results have averaged out to about 0.75 MOA. I’m not burning a bunch of V570, 245 Bergers, and Lapua brass life, along with limited barrel life to give a statistician a hard on and prove to myself what I already know.
I do 20-30 shot groups with Rimfire at 200+ because it is very inconsistent and it’s easy to forget the cost/session. Let’s see some super geek statistical discussion if Rimfire results if folks want real frustration.
 
The trap is that we all were new once, and carefully written technically sound statistics presentation meant SFA to us then. We were worried about spending an extra $200 on a scope or rifle. Our shooting wasn’t consistent or precise, along with Hornady 6.5 CM ammo now being $4 a shot means $120 wasted.
Like I posted earlier, my 3 shot group is representative of my experience with that rifle, load, and my shooting ability. For several years my results have averaged out to about 0.75 MOA. I’m not burning a bunch of V570, 245 Bergers, and Lapua brass life, along with limited barrel life to give a statistician a hard on and prove to myself what I already know.
I do 20-30 shot groups with Rimfire at 200+ because it is very inconsistent and it’s easy to forget the cost/session. Let’s see some super geek statistical discussion if Rimfire results if folks want real frustration.
Agreed.

When I lot test expensive rimfire ammo before buying a bunch, I shoot the entire 50rd box at 100yds in five 10rd groups. It gives a the best understanding of that lot I can afford in time and $. It also fits Region Rat's argument.

The problem with the new guy argument is that they're missing so many other tools that a lot of them can't leverage the increased reliability of their data. I would argue that the three most important skills or attributes in the competitions that I do are: reproducing consistent ammo, being able to shoot your same zero out of any position, and the mental game of shooting a stage well. When you cheaped out on a scope, are shooting factory ammo, still haven't figured out how to drive a gun the same way off wierd barricades, struggle with wind, and are chasing dope, what are you going to do with 30rd data knowledge? Okay, your Bergera and Vortex PST and Hornady ammo's real capability is 1.5" group and a 24fps SD. What does the guy do with that? It's like AB Quantum integrating WEZ into the field use part of the app. Like I'm going to be at a match and determining hit probability of the individual targets. What am I? Skip targets in the stage?, lol. Some data just isn't helpful. But don't tell a statistician that.