[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
Taking the advisory board list off the CC list: will summarize when we have more of a plan forward On 03/07/2018, 11:47, "Juergen Gross" <jgross@xxxxxxxx> wrote: On 03/07/18 12:23, Lars Kurth wrote: > Combined reply to Jan and Roger > Lars > > On 03/07/2018, 11:07, "Roger Pau Monne" <roger.pau@xxxxxxxxxx> wrote: > > On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote: > > We then had a discussion around why the positive benefits didn't materialize: > > * Andrew and a few other believe that the model isn't broken, but that the issue is with how we > > develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but > > merely provide an incentive not to fix them. > > * Issues highlighted were: > > * 2-3 months stabilizing period is too long > > I think one of the goals with the 6 month release cycle was to shrink > the stabilizing period, but it didn't turn that way, and the > stabilizing period is quite similar with a 6 or a 9 month release > cycle. > > Right: we need to establish what the reasons are: > * One has to do with a race condition between security issues and the desire to cut a release which has issues fixed in it. If I remember correctly, that has in effect almost added a month to the last few releases (more to this one). The only way to avoid that would be to not allow any security fixes to be included in the release the last few weeks before the planned release date. I don't think this is a good idea. I'd rather miss the planned release date. This kind of comes back down partially to opening master. When we are at the stage that we are only waiting for security issues, we should already have opened master. Although in this case, we also had BTW: the problem wasn't waiting for the security patches, but some fixes for those needed. And this is something you can never rule out. And waiting for the fixes meant new security fixes being ready... That is of course true. And some of the side-channel attack mitigations are generally complex and large and introduce more risk than more traditional fixes. > * One seems to have to do with issues with OSSTEST ... which in turn led to more security fixes being available. Agreed: because we didn't release when we planned, another set of security fixes pushed out the release. > * <Please add other reasons> We didn't look at the sporadic failing tests thoroughly enough. The hypercall buffer failure has been there for ages, a newer kernel just made it more probable. This would have saved us some weeks. That is certainly something we could look at. It seems to me that there is a dynamic of "because there is too much noise/random issues/HW issues", we ignore OSSTEST too often. I am wondering whether there is a way of mapping some tests to maintainers. Maintainers should certainly care about test failures in their respective areas, but to make this practical, we need to have a way to map failures and also CC reports to the right people. We could also potentially use get_maintainers.pl on the patches which are being tested (aka the staging => master transition), but we would need to know that a test was "clean" before. Maybe we need to build in an effort to deal with the sporadically failing tests: e.g. a commit moratorium until we get to a better base state. I also think that from a mere psychological viewpoint, having some test capability at patch posting time and a patchbot rejecting a patch, would change the contribution dynamic significantly from a psychological viewpoint. In other words, it would make dealing with quality issues part of the contribution process, which kind of often seems to be deferred until commit time and/or release hardening time. Just a thought. Also, coming back to Jan's bandwidth issue: if we had a set of more generic tests that can be offloaded into a cloud instance (e.g. via testing on QEMU), then we could reserve OSSTEST for tests which require hardware, thus potentially reducing bottlenecks. I am also wondering whether the bottleneck we are seeing is caused by the lack of good Arm test hardware (aka is that the critical path for the entire system): if so, maybe the two things can somehow be de-coupled. These ideas are fairly half-baked right now, so open up for discussion. I wanted to get a good amount of input before we discuss at the community call. Regards Lars _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |