[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Enabling VT-d PI by default
> From: Gao, Chao > Sent: Monday, April 17, 2017 4:14 AM > > On Tue, Apr 11, 2017 at 02:21:07AM -0600, Jan Beulich wrote: > >>>> On 11.04.17 at 02:59, <chao.gao@xxxxxxxxx> wrote: > >> As you know, with VT-d PI enabled, hardware can directly deliver external > >> interrupts to guest without any VMM intervention. It will reduces overall > >> interrupt latency to guest and reduces overheads otherwise incurred by > the > >> VMM for virtualizing interrupts. In my mind, it's an important feature to > >> interrupt virtualization. > >> > >> But VT-d PI feature is disabled by default on Xen for some corner > >> cases and bugs. Based on Feng's work, we have fixed those corner > >> cases related to VT-d PI. Do you think it is a time to enable VT-d PI by > >> default. If no, could you list your concerns so that we can resolve them? > > > >I don't recall you addressing the main issue (blocked vCPU-s list > >length; see the comment next to the iommu_intpost definition). > > > > Indeed. I have gone through the discussion happened in April 2016[1, 2]. > [1] https://lists.gt.net/xen/devel/422661?search_string=VT-d%20posted- > interrupt%20core%20logic%20handling;#422661 > [2] > https://lists.gt.net/xen/devel/422567?search_string=%20The%20length%20o > f%20the%20list%20depends;#422567. > > First of all, I admit this is an issue in extreme case and we should > come up with a solution. > > The problem we are facing is: > There is a per-cpu list used to maintain all the blocked vCPU on a > pCPU. When a wakeup interrupt comes, the interrupt handler travels > the list to wake the vCPUs whose pi_desc indicates an interrupt has > been posted. There is no policy to restrict the size of the list such > that in some extreme case, the list can be too long to cause some > issues (the most obvious issue is about interrupt latency). > > The theoretical max number of entry in the list is 4M as one host can > have 32k domains and every domain can have 128vCPU. If all the vCPUs > are blocked in one list, the list gets its theoretical maximum. > > The root cause of this issue, I think, is that the wakeup interrupt > vector is shared by all the vCPUs on one pCPU. Lacking of enough > information (such as which device sends or which IRTE translates this > interrupt), there is no effective method to distinguish the > interrupt's destination vCPU except traveling this list. Right? So we > only can mitigate this issue through decreasing or limiting the > entry's maximum in one list. > > Several methods we can take to mitigate this issue: > 1. According to your discussions, evenly distributing all the blocked > vCPUs among all pCPUs can mitigate this issue. With this approach, all > vCPUs are blocked in one list can be avoided. It can decrease the > entry's maximum in one list by N times (N is the number of pCPU). > > 2. Don't put the blocked vCPUs which won't be woken by the wakeup > interrupt into the per-cpu list. Currently, we put the blocked vCPUs > belong to domains who have assigned devices into the list. But if one > blocked vCPU of such domain is not a destination of every posted > format IRTE, it needn't be added to the per-cpu list. The blocked vCPU > will be woken by IPIs or other virtual interrupts. From this aspect, we > can decrease the entries in the per-cpu list. > > 3. Like what we do in struct irq_guest_action_t, can we limit the > maximum of entry we support in the list. With this approach, during > domain creation, we calculate the available entries and compare with > the domain's vCPU number to decide whether the domain can use VT-d PI. VT-d PI is global instead of per-domain. I guess you actually mean failing device assignment operation if counting new domain's #VCPUs exceeds the limitation. > This method will pose a strict restriction to the maximum of entry in > one list. But it may affect vCPU hotplug. > > According to your intuition, which methods are feasible and > acceptable? I will attempt to mitigate this issue per your advices. > My understanding is that we need them all. #1 is the baseline, with #2/#3 as further optimization. :-) Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |