[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC] Results of Phase 1 of the Review Process study



On Thu, 2015-10-15 at 22:18 +0100, Lars Kurth wrote:
> 
> > On 15 Oct 2015, at 10:06, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> > 
> > On Wed, 2015-10-14 at 18:32 +0100, Lars Kurth wrote:
> >> C1) Only 60% percent of the reviews on the mailing list could be matched
> >> to commits. This can be improved going forward, but we felt that the
> >> dataset is big enough for statical analysis and didn't want to spend too
> >> much time to get the matching perfect at this stage. See "Coverage
> >> analysis" for more details
> > 
> > How strict or fuzzy is the matching?
> 
> Not very: it doesn't deal with extra/fewer spaces, punctuation, slash
> direction changes and capitalisation. Once we make some minor changes to
> accommodate for that, we can also look at common things such as changes
> to the prefix ("xen/arm", ...) or in fact ignore them. That should get us
> to a *far* higher percentage. Whether we still need to look at spelling
> mistakes, then remains to be seen.

Doing some simple normalisation of the Subject (which I guess is what is
being matched?) would quite possibly help, yes. I'm thinking of things
like:
 * Stripping potentially multiple prefixes matching /^[a-zA-Z][:/] ?/
 * Removing all whitespace and perhaps punctuation
 * Uppercasing the whole lot

Then we can see how much that helps.

Jesus raised in his reply the potential worry of false matching. My gut
(which while expansive has been known to be wrong) thinks that these will
either be largely mechanical changes (e.g. "QEMU_TAG update") which don't
tend to need much review or take long to get in or a small minority of poor
(i.e. short or not very specific ones) commit messages which don't get
fixed on commit. In either case just associating such mails with the first
(chronologically) matching commit in the tree would probably not be too
harmful to the overall data.

> >> == Backlog Analysis ==
> >> This section shows us the total of patch series reviews that could be
> >> modelled (60%) over the project's life-time 
> > 
> > How does this interact with the 60% in caveat C1? Is it the same 60% or
> is
> > this 60% of that 60% (i.e. 36% overall)?
> 
> I don't know: the coverage data was really only added yesterday. And I
> have not had time to look into it in more detail. Thus, this is one for
> @Bitergia, if the answer is unsatisfactopry.
> 
> However, given the data we looked at 60% (which excludes more detailed
> data for the backlog), is a big enough sample for the metrics we looked
> at. In particular, most of the data covered only completed reviews. There
> may be some skew, but it should not change the bulk of what is in section
> 6 in any significant way.

Right, I didn't spot any issue with the majority of the analysis, just the
backlog analysis, which also happened to be the thing which looked most
worrying.

> I attached a sample for 2015: there are two files 
> a) commits from git not matched to mails, 
> b) e-mails with patches attached that are not matched to git (which is
> larger than a).

I'll have a skim in a moment.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.