The ongoing, neverending argument over how many rounds a shooter needs to generate a reasonable estimate of the quality of his ammo, rifle precision, and average MV for his ballistic profile.
Hornady says minimum of 30rds and quite a few people agree.
Some people say a handful of 5rd groups from this or that shooting session.
Who's right?
"You're sample size isn't statistically significant! REEEE!!!!"
Is that a good hook? It's a matter of statistics, huh? Okay, I went out and shot 30rds in an aggregate of six 5rd strings today. Shot'em all into the same target, ran the Xero for the entire 30. Here are the results.
Equipment:
Zermatt TL3 in a bedded AI AICS ATX chassis, CRB .25 cal Comp Contour barrel chambered in 25GT, NF ATACR 7x35 T3, Hawkins Heavy Tactical rings, Hawkins updraft brake, TT Diamond set at 16oz, Gen I CkyePod, Accurate Mag 12rd mag
Garmin Xero, Schedium GC with heavy fill, TAB Pollock shooting mat
100yd Indoor range
Alpha brass(2x fired), 37gr H4350, Berger 135gr LRHT, FGMM SRP
Method:
Brass prep Consisted of Decap with A419 decapping die, wet tumble for 20mins (no pins), dry, anneal on AMP MKII using code 142, trim on Henderson 3-way trimmer, lube with imperial sizing die wax on the body, and swab Neolube 2 inside the necks, size using a Forster FL bushing die with .278 bushing and mandrel with a SS Sinclair mandrel turned to make a .279" neck diameter for a loaded .282" neck. This Alpha brass has a .012" neck thickness. Finally dry tumble in walnut. All in a 21 year old Dillon 550.
Load Prepped brass was loaded on a Dillon 550. Prime in station 1 using FGMM SRP, nothing in station 2, charged in station 3 with a Dillon powder thru die and funnel insert with a A419 funnel. Powder is dropped, weighed, and trickled on a FX-120i equipped AT V2+ and IP trickler. Powder was dried to my ambient RH and gained approximately 50fps since opening the jug. Charged round seated in station 4 with a Forster Micrometer seating die in 25GT. Dumped into an Akron bin. Every 5 loaded rounds were gathered and placed in an MTM ammo box so some loading sequence was maintained to be carried over to the test. All charges trickled to either 36.98 or 37.00.
Barrel The barrel had 700rds on it since early June. This consisted of 5 matches. It was recently cleaned thoroughly after the last match two weeks ago so I had a clean bore. I loaded 5 more rounds to season the bore prior to shooting the 30rds for record.
Equipment I shot the test in the prone using the Ckyepod and heavy fill GC as a rear bag. I laid on the mat, placed the Xero just in front of the bipod, under the barrel so the brake didn't blow it around. The test was shot in an indoor 100yd range. The distance from the front of my scope to the target measured 98 yards.
Environment: 9500ft DA, 23.12inhg, 75°, 50% RH Indoor 100yd range. No wind, mirage, or lighting changes
Shooting method I fouled the bore using 5rds. Let the barrel cool for 3 mins and commenced testing. I used a .3" white paster and dialed .5mrad into my elevation to place my group roughly 2"(1.98 actual) above my POA on the white paster.
I shot 5rds at a time, stopping to run the target back up range , take a photo, run it back, write down the individual velocities from the Xero, load 5 more rounds in the mag, return to Xero to it's position in front of the bipod, and commence shooting again. This process took on average 2 mins to shoot a string and 8 mins to reset. I repeated this for 6 total 5rd strings of 30rds. The Xero ran the entire time and recorded every shot as a cumulative 30rd string.
Results:
Here's the raw data:
30rd Mean/Standard Deviation/ Extreme Spread
Mean: 2691.5
SD: 7.9
ES: 27.5
String 1:
Mean: 2689
SD: 6
ES: 16
String 2:
Mean: 2684
SD: 4.6
ES: 10
String 3:
Mean: 2686.8
SD: 5
ES: 13
String 4:
Mean: 2695
SD: 6.4
ES: 12
String 5:
Mean: 2701
SD: 4.9
ES: 12
String 6:
Mean: 2693
SD: 9.3
ES: 25
Here's a comparison of each string vs the running total as the 30rd data develops. * Indicates running total:
String 1:
2689/5.4/16*
String 2:
2684/4.6/10
2687/5.4/20*
String 3:
2686.8/5/12.7
2686.9/5.2/20.1*
String 4:
2694.5/6.4/12
2688.9/6.3/25.3*
String 5:
2701/4.9/12
2691.3/7.7/27.1*
String 6:
2693/9.3/25
2691.5/7.9/27.5*
Observations:
Let's inspect the data. My mean SD across all six strings was 6. My lowest 5rd SD was 4.6 and the highest 5rd SD was 9.3.My total finished SD across the larger sample size was 7.9.
The mean of my means was of course 2692fps. The difference of my means was 17fps. The 95% confidence interval is 2688 and 2694. Just 6fps!
There is a trend in the last two 5rd strings where the velocities were increasing. I was seeing mirage off the barrel that correlated to the increased velocities. String #5 was the highest and had an interesting result that we'll talk about later.
As a note, AB is saying for a 135 Berger at 2692fps in a DA environment of 9500ft, it takes 14fps to generate a 0.1mrad elevational difference at 1000yds
Discussion:
Okay. What does all this mean? Well, first I want to point out something that I picked up from my wife who has 20 years of experience in clinical research and deals in stats all the time. What is statistically significant isn't always clinically significant. I mean sometimes it is, and researchers absolutely use P values to demonstrate statistical significance as a way of proving clinical significance, but the two aren't always reciprocal. This is a key concept because it's the same with shooting. What is statistically significant isn't always ballistically significant and there's a data set in this population that proves that point. So for all the nerds still using Dandy tricklers and tucking their tshirts into their underwear and thundering on the Internet shooting forums, "Your sample size isn't statistically significant!". This data set is about to prove you wrong.
As you can see each 5rd strings is under 10fps and almost all of them were under 8fps. String #5 had an increased Mean. The increased mean was due to a general speed-up in individual velocities. I believe this was the consequence of the barrel and chamber heating up. If this is true it would be a false artifact of shooting a large sample size. I was seeing mirage and shooting through it during strings 5 and 6. The 8 mins in-between strings wasn't enough to adequately cool the barrel. Perhaps once you get a CRB Comp Contour hot enough, all that steel retains the heat longer. Not something you would encounter at a 10rd stage PRS match. This makes for an interesting counter-argument against the 30rd group folks. The increased SD in string 6 is due to one round. 2980. It's not even the lowest velocity in the extreme spread. And I would not call it an outlier. In my estimation, it still falls within the category of random chance. It wouldn't cause a miss, I'm not going to change the way I load ammo due to it, and it still doesn't generate a full 0.1mrad of difference at 1000yds. (Remember, 14fps to generate a dial'able diff)
We ran a standard T test to compare the individual strings against each other and the overall 30rd data. The goal was to compare the differences to each other and use a standard statistics metric to determine if a "statical significance" existed. This will answer the question, through a statistics metric, if the 5rd groups provide a good estimate and are representative of the overall 30rd population. We assumed the standard Alpha of 5% or 0.05. The standard T tests generated a P value to be judged against the Alpha. The only string that generated a P value greater than the Alpha, or demonstrated a statistical significance in difference from the other strings or the 30rd population, was String #5. You can see the mean MV falls outside of the 95% confidence interval. This means this result should not happen often. Hence this one 5rd string is not representative. Statistically speaking.
Oh! So a win for the 30rd group crowd, right?!! I shot a 5rd group that isn't representative of the normal outcome, right? So if I shot one 5rd group and got this result, it would not be representative of a 30rd sample size. Well, case closed! The 30rd group guys win the argument! Well, not so fast. This one string would make me think my SDs are worse in a 30rd string than they actually are. So, if the argument is that some guy posts a a 5rd group with a single digit SD and 30rd group guy says, "Cool, good for you but that SD in that 5rd strings won't hold up over a larger sample size." ..... then.... Akshually.... this unrepresentative String #5 ... doesn't actually prove that. In fact the difference in SD of 9 vs 6 is pretty F'ing immaterial. You can refer to Bryan Litz's WEZ to determine the hit probability difference of 3fps SD. Okay, well... you'd have 2701 plugged into your Kestrel instead of 2692 if you went off this one 5rd string. Okay, so what? Is that ballistically significant? It's 9fps difference. You still can't even dial that at 1000. And certainly inconsequential at 560, or 735, or 865. So this demonstrates the thing I was saying about what is statistically significant isn't necessarily ballistically significant to us as shooters. And let's not forget there's a decent chance this is only an artificial result of increased heat from shooting an annoyingly large sample size.
The remaining 5 strings are both statistically representative and practically representative.
Conclusion:
This 30rd data set demonstrates that when you use instruments that are capable of reproducing accurate and precise data, a 5rd string can provide a good estimate of the greater population and is representative of what your rifle and ammo will do throughout a match and larger population size.
Now, if the data were different, this conclusion could be different. If your data was less accurate and precise, then yes, you probably do need a larger sample size to have a good estimate and be representative of what your gun and ammo is capable of. Both of these two things can be true. It depends on how good your ammo and gun is. It's not a truism that you need a larger sample size to have a good estimate of what you're setup is capable of. It is absolutely a statistical truth that a larger sample size will provide a better estimate and be more representative, but it's about the context. A lot of us don't need this better estimate and representation. We're in the range of splitting hairs when your already using really precise and accurate data. And it's not usable to us as shooters. This is where the confusion comes in. Just because it's a statistical truth that a larger sample size will provide a better estimate doesn't mean a handful of 5rd groups can't provide all the precision and accuracy we need as shooters to understand what our ammo is capable of throughout a match.
Edit: phone autocorrect induced typos
Hornady says minimum of 30rds and quite a few people agree.
Some people say a handful of 5rd groups from this or that shooting session.
Who's right?
"You're sample size isn't statistically significant! REEEE!!!!"
Is that a good hook? It's a matter of statistics, huh? Okay, I went out and shot 30rds in an aggregate of six 5rd strings today. Shot'em all into the same target, ran the Xero for the entire 30. Here are the results.
Equipment:
Zermatt TL3 in a bedded AI AICS ATX chassis, CRB .25 cal Comp Contour barrel chambered in 25GT, NF ATACR 7x35 T3, Hawkins Heavy Tactical rings, Hawkins updraft brake, TT Diamond set at 16oz, Gen I CkyePod, Accurate Mag 12rd mag
Garmin Xero, Schedium GC with heavy fill, TAB Pollock shooting mat
100yd Indoor range
Alpha brass(2x fired), 37gr H4350, Berger 135gr LRHT, FGMM SRP
Method:
Brass prep Consisted of Decap with A419 decapping die, wet tumble for 20mins (no pins), dry, anneal on AMP MKII using code 142, trim on Henderson 3-way trimmer, lube with imperial sizing die wax on the body, and swab Neolube 2 inside the necks, size using a Forster FL bushing die with .278 bushing and mandrel with a SS Sinclair mandrel turned to make a .279" neck diameter for a loaded .282" neck. This Alpha brass has a .012" neck thickness. Finally dry tumble in walnut. All in a 21 year old Dillon 550.
Load Prepped brass was loaded on a Dillon 550. Prime in station 1 using FGMM SRP, nothing in station 2, charged in station 3 with a Dillon powder thru die and funnel insert with a A419 funnel. Powder is dropped, weighed, and trickled on a FX-120i equipped AT V2+ and IP trickler. Powder was dried to my ambient RH and gained approximately 50fps since opening the jug. Charged round seated in station 4 with a Forster Micrometer seating die in 25GT. Dumped into an Akron bin. Every 5 loaded rounds were gathered and placed in an MTM ammo box so some loading sequence was maintained to be carried over to the test. All charges trickled to either 36.98 or 37.00.
Barrel The barrel had 700rds on it since early June. This consisted of 5 matches. It was recently cleaned thoroughly after the last match two weeks ago so I had a clean bore. I loaded 5 more rounds to season the bore prior to shooting the 30rds for record.
Equipment I shot the test in the prone using the Ckyepod and heavy fill GC as a rear bag. I laid on the mat, placed the Xero just in front of the bipod, under the barrel so the brake didn't blow it around. The test was shot in an indoor 100yd range. The distance from the front of my scope to the target measured 98 yards.
Environment: 9500ft DA, 23.12inhg, 75°, 50% RH Indoor 100yd range. No wind, mirage, or lighting changes
Shooting method I fouled the bore using 5rds. Let the barrel cool for 3 mins and commenced testing. I used a .3" white paster and dialed .5mrad into my elevation to place my group roughly 2"(1.98 actual) above my POA on the white paster.
I shot 5rds at a time, stopping to run the target back up range , take a photo, run it back, write down the individual velocities from the Xero, load 5 more rounds in the mag, return to Xero to it's position in front of the bipod, and commence shooting again. This process took on average 2 mins to shoot a string and 8 mins to reset. I repeated this for 6 total 5rd strings of 30rds. The Xero ran the entire time and recorded every shot as a cumulative 30rd string.
Results:
Here's the raw data:
30rd Mean/Standard Deviation/ Extreme Spread
Mean: 2691.5
SD: 7.9
ES: 27.5
String 1:
Mean: 2689
SD: 6
ES: 16
String 2:
Mean: 2684
SD: 4.6
ES: 10
String 3:
Mean: 2686.8
SD: 5
ES: 13
String 4:
Mean: 2695
SD: 6.4
ES: 12
String 5:
Mean: 2701
SD: 4.9
ES: 12
String 6:
Mean: 2693
SD: 9.3
ES: 25
Here's a comparison of each string vs the running total as the 30rd data develops. * Indicates running total:
String 1:
2689/5.4/16*
String 2:
2684/4.6/10
2687/5.4/20*
String 3:
2686.8/5/12.7
2686.9/5.2/20.1*
String 4:
2694.5/6.4/12
2688.9/6.3/25.3*
String 5:
2701/4.9/12
2691.3/7.7/27.1*
String 6:
2693/9.3/25
2691.5/7.9/27.5*
Observations:
Let's inspect the data. My mean SD across all six strings was 6. My lowest 5rd SD was 4.6 and the highest 5rd SD was 9.3.My total finished SD across the larger sample size was 7.9.
The mean of my means was of course 2692fps. The difference of my means was 17fps. The 95% confidence interval is 2688 and 2694. Just 6fps!
There is a trend in the last two 5rd strings where the velocities were increasing. I was seeing mirage off the barrel that correlated to the increased velocities. String #5 was the highest and had an interesting result that we'll talk about later.
As a note, AB is saying for a 135 Berger at 2692fps in a DA environment of 9500ft, it takes 14fps to generate a 0.1mrad elevational difference at 1000yds
Discussion:
Okay. What does all this mean? Well, first I want to point out something that I picked up from my wife who has 20 years of experience in clinical research and deals in stats all the time. What is statistically significant isn't always clinically significant. I mean sometimes it is, and researchers absolutely use P values to demonstrate statistical significance as a way of proving clinical significance, but the two aren't always reciprocal. This is a key concept because it's the same with shooting. What is statistically significant isn't always ballistically significant and there's a data set in this population that proves that point. So for all the nerds still using Dandy tricklers and tucking their tshirts into their underwear and thundering on the Internet shooting forums, "Your sample size isn't statistically significant!". This data set is about to prove you wrong.
As you can see each 5rd strings is under 10fps and almost all of them were under 8fps. String #5 had an increased Mean. The increased mean was due to a general speed-up in individual velocities. I believe this was the consequence of the barrel and chamber heating up. If this is true it would be a false artifact of shooting a large sample size. I was seeing mirage and shooting through it during strings 5 and 6. The 8 mins in-between strings wasn't enough to adequately cool the barrel. Perhaps once you get a CRB Comp Contour hot enough, all that steel retains the heat longer. Not something you would encounter at a 10rd stage PRS match. This makes for an interesting counter-argument against the 30rd group folks. The increased SD in string 6 is due to one round. 2980. It's not even the lowest velocity in the extreme spread. And I would not call it an outlier. In my estimation, it still falls within the category of random chance. It wouldn't cause a miss, I'm not going to change the way I load ammo due to it, and it still doesn't generate a full 0.1mrad of difference at 1000yds. (Remember, 14fps to generate a dial'able diff)
We ran a standard T test to compare the individual strings against each other and the overall 30rd data. The goal was to compare the differences to each other and use a standard statistics metric to determine if a "statical significance" existed. This will answer the question, through a statistics metric, if the 5rd groups provide a good estimate and are representative of the overall 30rd population. We assumed the standard Alpha of 5% or 0.05. The standard T tests generated a P value to be judged against the Alpha. The only string that generated a P value greater than the Alpha, or demonstrated a statistical significance in difference from the other strings or the 30rd population, was String #5. You can see the mean MV falls outside of the 95% confidence interval. This means this result should not happen often. Hence this one 5rd string is not representative. Statistically speaking.
Oh! So a win for the 30rd group crowd, right?!! I shot a 5rd group that isn't representative of the normal outcome, right? So if I shot one 5rd group and got this result, it would not be representative of a 30rd sample size. Well, case closed! The 30rd group guys win the argument! Well, not so fast. This one string would make me think my SDs are worse in a 30rd string than they actually are. So, if the argument is that some guy posts a a 5rd group with a single digit SD and 30rd group guy says, "Cool, good for you but that SD in that 5rd strings won't hold up over a larger sample size." ..... then.... Akshually.... this unrepresentative String #5 ... doesn't actually prove that. In fact the difference in SD of 9 vs 6 is pretty F'ing immaterial. You can refer to Bryan Litz's WEZ to determine the hit probability difference of 3fps SD. Okay, well... you'd have 2701 plugged into your Kestrel instead of 2692 if you went off this one 5rd string. Okay, so what? Is that ballistically significant? It's 9fps difference. You still can't even dial that at 1000. And certainly inconsequential at 560, or 735, or 865. So this demonstrates the thing I was saying about what is statistically significant isn't necessarily ballistically significant to us as shooters. And let's not forget there's a decent chance this is only an artificial result of increased heat from shooting an annoyingly large sample size.
The remaining 5 strings are both statistically representative and practically representative.
Conclusion:
This 30rd data set demonstrates that when you use instruments that are capable of reproducing accurate and precise data, a 5rd string can provide a good estimate of the greater population and is representative of what your rifle and ammo will do throughout a match and larger population size.
Now, if the data were different, this conclusion could be different. If your data was less accurate and precise, then yes, you probably do need a larger sample size to have a good estimate and be representative of what your gun and ammo is capable of. Both of these two things can be true. It depends on how good your ammo and gun is. It's not a truism that you need a larger sample size to have a good estimate of what you're setup is capable of. It is absolutely a statistical truth that a larger sample size will provide a better estimate and be more representative, but it's about the context. A lot of us don't need this better estimate and representation. We're in the range of splitting hairs when your already using really precise and accurate data. And it's not usable to us as shooters. This is where the confusion comes in. Just because it's a statistical truth that a larger sample size will provide a better estimate doesn't mean a handful of 5rd groups can't provide all the precision and accuracy we need as shooters to understand what our ammo is capable of throughout a match.
Edit: phone autocorrect induced typos
Last edited: