It is scientific and what we do is scientifically analysed too, despite our tendency to try and improve the situation!
It is in the nature of scientifically run trials that they are ramdomised and that the agents and participants are "blinded" - not knowing what to expect from any particular test or process.
The "mushroom theory" (keep us in the dark and feed us with manure) is a common comic example of this.
Some cases are controls having received neutral or placebo input during the process, others will receive more potent input during the trial without being alerted to which group they are in.
The trouble is that the participants are human and curious and may develop beliefs that affect their behaviour, and worse, communicate these beliefs to other participants, changing their behaviour too, beyond the original conditions laid out by the researchers and developers.
Thus a build or update that is found by some early adopters to be difficult to install, or to be repeatedly bugchecking, may be feared by other users who may not attempt to install it on the basis of the reports from those with difficulties, or will perform workarounds based on popular received knowledge.
The effect of this may be to reduce the number of attempted installations since some are convinced "to sit this one out", based on the experiences of others, or avoiding an upgrade and obtaining an unsupported clean install instead. This may skew the statistics for the analysts working out the results.
There are always going to be differences between faster and slower hardware combinations. Whereas internal rings may have optimum hardware and strictly regulated third party additions, and run much of the early testing on standardised VMs, the Insiders offer a multi-dimensioned testbed for mainstream real world testing.
Feedback and Insider reporting offers subjective experiences to be communicated by participants, however this is very limited and likely to be irreproducible without the necessary combination of hardware and software, but the biggest wealth of objective reports come from telemetry, mostly from failure mode dumps.
Fortunately feedback and telemetry is tied together by the Insider Account unless the user uses separate accounts for testing and feedback.
When these reports come to MS electronically via the well established WER route following bugchecks,and other routes like rollbacks they feed directly into the analytical system that has led to the improved quality and stability of Windows so much over the past decade and a half.
That's a pretty well established methodology - data mining - and I know that some of my medico-scientist colleagues thought it was not scientific, since it turns the traditional scientific method on its head. It doesn't, really, it's just a tool to give insights into complex datasets - like a telescope resolves very distant objects, or a microscope very small ones.
The traditional scientific method starts off with a proposal, say "that B always follows A", and then goes out to falsify or test that hypothesis through experiment.
If no experiment can falsify the original premise, then the theory is generally considered to be sound, but it is never possible that a theory can be proved - Newtonian theory was superseded (shown to be inaccurate in some circumstances) by Einsteinian relativity, but is considered a special case that is generally applicable to non-relativistic situations.
Data mining has no original premise - it is a shotgun approach that takes all available outcomes and all measurable variables over a large number of test situations, and derives relationships and probabilities between variables, combinations of factors and the outcomes.
Thus any set of starting values can be given a predictive value even before testing.
It is then that the scientific process comes into play.
A theory may be established why an unwanted event may happen and the system can be examined to see if there is a causative explanation and a fix for it.
Since only MS has the code, the analysts and their data from us, and their developers in regular conference, they will get around to their considered priorities, which may not be the same as ours.
Some bugs may never get fixed.