[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Enabling vm_event for a guest with more VCPUs than available ring buffer slots freezes the virtual machine
On 02/08/2017 01:23 AM, Tamas K Lengyel wrote: > > > On Tue, Feb 7, 2017 at 1:41 PM, Razvan Cojocaru > <rcojocaru@xxxxxxxxxxxxxxx <mailto:rcojocaru@xxxxxxxxxxxxxxx>> wrote: > > On 02/07/2017 10:20 PM, Tamas K Lengyel wrote: > > > > > > On Tue, Feb 7, 2017 at 11:57 AM, Razvan Cojocaru > > <rcojocaru@xxxxxxxxxxxxxxx <mailto:rcojocaru@xxxxxxxxxxxxxxx> > <mailto:rcojocaru@xxxxxxxxxxxxxxx > <mailto:rcojocaru@xxxxxxxxxxxxxxx>>> wrote: > > > > On 02/07/2017 08:39 PM, Andrew Cooper wrote: > > > On 07/02/17 18:31, Razvan Cojocaru wrote: > > >> On 02/07/2017 08:15 PM, Tamas K Lengyel wrote: > > >>> > > >>> On Tue, Feb 7, 2017 at 9:53 AM, Razvan Cojocaru > > >>> <rcojocaru@xxxxxxxxxxxxxxx <mailto:rcojocaru@xxxxxxxxxxxxxxx> > <mailto:rcojocaru@xxxxxxxxxxxxxxx <mailto:rcojocaru@xxxxxxxxxxxxxxx>> > > <mailto:rcojocaru@xxxxxxxxxxxxxxx > <mailto:rcojocaru@xxxxxxxxxxxxxxx> > > <mailto:rcojocaru@xxxxxxxxxxxxxxx > <mailto:rcojocaru@xxxxxxxxxxxxxxx>>>> wrote: > > >>> > > >>> Hello, > > >>> > > >>> Setting, e.g. 16 VCPUs for a HVM guest, ends up > blocking the > > guest > > >>> completely when subscribing to vm_events, apparently > because > > of this > > >>> code in xen/common/vm_event.c: > > >>> > > >>> 315 /* Give this vCPU a black eye if necessary, on the > > way out. > > >>> 316 * See the comments above wake_blocked() for more > > information > > >>> 317 * on how this mechanism works to avoid > waiting. */ > > >>> 318 avail_req = vm_event_ring_available(ved); > > >>> 319 if( current->domain == d && avail_req < > d->max_vcpus ) > > >>> 320 vm_event_mark_and_pause(current, ved); > > >>> > > >>> It would appear that even if the guest only has 2 online > > VCPUs, the > > >>> "avail_req < d->max_vcpus" condition will pause > current, and we > > >>> eventually end up with all the VCPUs paused. > > >>> > > >>> An ugly hack ("avail_req < 2") has allowed booting a guest > > with many > > >>> VCPUs (max_vcpus, the guest only brings 2 VCPUs online), > > however that's > > >>> just to prove that that was the culprit - a real > solution to > > this needs > > >>> more in-depth understading of the issue and potential > > solution. That's > > >>> basically very old code (pre-2012 at least) that got moved > > around into > > >>> the current shape of Xen today - please CC anyone relevant > > to the > > >>> discussion that you're aware of. > > >>> > > >>> Thoughts? > > >>> > > >>> > > >>> I think is a side-effect of the growth of the vm_event > structure > > and the > > >>> fact that we have a single page ring. The check > effectively sets a > > >>> threshold of having enough space for each vCPU to place at > least one > > >>> more event on the ring, and if that's not the case it gets > > paused. OTOH > > >>> I think this would only have an effect on asynchronous events, > > for all > > >>> other events the vCPU is already paused. Is that the case > you have? > > >> No, on the contrary, all my events are synchronous (the VCPU is > > paused > > >> waiting for the vm_event reply). > > >> > > >> I've debugged this a bit, and the problem seems to be that > > >> vm_event_wake_blocked() breaks here: > > >> > > >> 150 /* We remember which vcpu last woke up to avoid > scanning > > always > > >> linearly > > >> 151 * from zero and starving higher-numbered vcpus under > > high load */ > > >> 152 if ( d->vcpu ) > > >> 153 { > > >> 154 int i, j, k; > > >> 155 > > >> 156 for (i = ved->last_vcpu_wake_up + 1, j = 0; j < > > >> d->max_vcpus; i++, j++) > > >> 157 { > > >> 158 k = i % d->max_vcpus; > > >> 159 v = d->vcpu[k]; > > >> 160 if ( !v ) > > >> 161 continue; > > >> 162 > > >> 163 if ( !(ved->blocked) || online >= avail_req ) > > >> 164 break; > > >> 165 > > >> 166 if ( test_and_clear_bit(ved->pause_flag, > > &v->pause_flags) ) > > >> 167 { > > >> 168 vcpu_unpause(v); > > >> 169 online++; > > >> 170 ved->blocked--; > > >> 171 ved->last_vcpu_wake_up = k; > > >> 172 } > > >> 173 } > > >> 174 } > > >> > > >> at "if ( !(ved->blocked) || online >= avail_req )". At this > point, > > >> nothing ever gets unblocked. It's hard to believe that this is > > desired > > >> behaviour, as I don't know what could possibly happen for that > > condition > > >> to become false once all the online VCPUs are stuck (especially > > when the > > >> guest has just started booting). > > > > > > Ah I see what happens. During boot vCPU 0 generates an event and gets > > marked blocked because the number of vCPUs is so high. The other vCPUs > > are still unblocked since they are idle, but this test here will still > > be true (online >= avail_req) and thus we can never unblock vCPU0. And > > then the boot process is hanging because vCPU0 never resumes. I would > > argue that this test should be changed to check that there is at > least 1 > > spot on the ring and only break if that is not the case anymore (ie. > > instead of incrementing online we should be decrementing avail_req). > > That is exactly what happens. And it can't really be fixed just by > increasing the ring buffer (although that definitely helps a lot and > would be a smart move): no matter how large it is, we can always ask the > domain to use more VCPUs than there are slots in the buffer. > > > No, not increasing the ring buffer but fixing the check to unblock when > there is at least 1 spot on the ring. So look at this comment... > > /* > * vm_event_wake_blocked() will wakeup vcpus waiting for room in the > * ring. These vCPUs were paused on their way out after placing an event, > * but need to be resumed where the ring is capable of processing at least > * one event from them. > */ > > .. it seems to me that the check "online >= avail_req" was just wrong > from the start. I've read that to read that there need to be more than one available slot in the ring: "wake vcpus [plural] [...] at leas one event from them". > > > > > > I wouldn't bet that this logic has ever been tested. If you > > recall, the > > > addition of register state into the vm_event ring made each > entry far > > > larger, which in turns makes it more likely to hit this > condition. > > > > > > However, simply fixing the logic to re-online the cpus isn't > a good > > > solution either, as having $N vcpus paused at any one time > because of > > > ring contention is not conducive good system performance. > > > > > > Realistically, the ring size needs to be max_cpus * > sizeof(largest > > > vm_event) at an absolute minimum, and I guess this is now > beyond 1 > > page? > > > > Yes, of course the reason this triggers earlier now is the > growth of the > > request's size. Yes, using e.g. 20 VCPUs in the guest's setup will > > exceed a page's number of slots. > > > > And yes, ideally we should have multi-page ring buffers - > however that > > is a long-term project that requires design changes in other > parts of > > Xen as well (Andrew, CCd here, was recently talking about one). > > > > However, even with a one-page ring buffer, surely it's not > good to end > > up in this situation, especially for guests such as mine, > which never > > actually bring more than 2 VCPUs online. But even if they were > to use > > more, blocking the guest on vm_event init is completely > pointless - we > > might as well return some kind of error if max_vcpus > > available slots. > > > > I don't follow the system performance argument. Surely completely > > blocking the guest is worse. > > > > > > I also don't see the point in marking a vCPU blocked if it is already > > paused. I think this behavior of blocking vCPUs makes only sense for > > asynchronous events. Razvan, could you test what happens if > > vm_event_mark_and_pause is only called if the vCPU is unpaused? > > It works for me with this change (using Xen 4.7 sources here): > > @@ -318,7 +329,11 @@ void vm_event_put_request(struct domain *d, > * on how this mechanism works to avoid waiting. */ > avail_req = vm_event_ring_available(ved); > if( current->domain == d && avail_req < d->max_vcpus ) > - vm_event_mark_and_pause(current, ved); > + { > + if ( !atomic_read( ¤t->vm_event_pause_count ) ) > + vm_event_mark_and_pause(current, ved); > + } > > > Good, that's what I expected. There really is no point in blocking a > vCPU that is already paused. But I would just add that check to the if > above as another && clause. Quite right, this was late-night debugging and I also had a printk() in that scope, which I quickly removed on pasting the change. The patch will simply have the additional && condition. Thanks, Razvan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |