[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] <summary-1> (v2) Design proposal for RMRR fix



> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Friday, January 09, 2015 6:46 PM
> 
> >>> On 09.01.15 at 11:26, <kevin.tian@xxxxxxxxx> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> >>> On 09.01.15 at 07:57, <kevin.tian@xxxxxxxxx> wrote:
> >> > 1.1) per-device 'warn' vs. global 'warn'
> >> >
> >> > Both Tim/Jan prefer to 'warn' as a per-device option to the admin instead
> >> > of a global option.
> >> >
> >> > In a glimpse a per-device 'warn' option provides more fine-grained
> control
> >> > than a global option, however if thinking it carefully allowing one 
> >> > device
> >> > w/
> >> > potential problem isn't more correct or secure than allowing multiple
> >> > devices w/ potential problem. Even in practice a device like USB can
> >> > work bearing <1MB confliction, like Jan pointed out there's always corner
> >> > cases which we might not know so as long as we open door for one
> device,
> >> > it implies a problematic environment to users and user's judge on
> whether
> >> > he can live up to this problem is not impacted by how many devices the
> door
> >> > is opened for (he anyway needs to study warning message and do
> >> verification
> >> > if choosing to live up)
> >> >
> >> > Regarding to that, imo if we agree to provide 'warn' option, just 
> >> > providing
> >> > a global overriding option (definitely per-vm) is acceptable and simpler.
> >>
> >> If the admin determined that ignoring the RMRR requirements for one
> >> devices is safe, that doesn't (and shouldn't) mean this is the case for
> >> all other devices too.
> >
> > I don't think admin can determine whether it's 100% safe. What admin can
> > decide is whether he lives up to the potential problem based on his purpose
> > or based on some experiments. only device vendor knows when and how
> > RMRR is used. So as long as warn is opened for one device, I think it
> > already means a problem environment and then adding more device is
> > just same situation.
> 
> What if the admin consulted the device and BIOS vendors, and got
> assured there's not going to be any accesses to the reserved regions
> post-boot?

consultancy could be still inaccurate, or man-error may happen. 

> 
> >> > 1.2) when to 'fail'
> >> >
> >> > There is one open whether we should fail immediately in domain builder
> >> > if a confliction is detected.
> >> >
> >> > Jan's comment is yes, we should 'fail' the VM creation as it's an error.
> >> >
> >> > My previous point is more mimicking native behavior, where a device
> >> > failure (in our case it's actually potential device failure since VM is 
> >> > not
> >> > powered yet) doesn't impact user until its function is actually touched.
> >> > In our case, even domain builder fails to re-arrange guest RAM to skip
> >> > reserved regions, we have centralized policy (either 'fail' or 'warn' per
> >> > above conclusion) in Xen hypervisor when the device is actually assigned.
> >> > so a 'warn' should be fine, but my insist on this is not strong.
> >>
> >> See my earlier reply: Failure to add a device to me is more like a
> >> device preventing a bare metal system from coming up altogether.
> >
> > not all devices are required for bare metal to boot. it causes problem
> > only when it's being used in the boot process. say at powering up the
> > disk (insert in the PCI slot) is broken (not sure whether you call such
> > thing as 'failure to add a device'), it is only error when BIOS tries to
> > read disk.
> 
> Not necessarily. Any malfunctioning device touched by the BIOS,
> irrespective of whether the device is needed for booting, can cause
> the boot process to hang. Again, the analogy to bare metal is
> device presence, not whether the device is functioning properly.
> 
> > note device assignment path is the actual path to decide whether a
> > device will be present to the guest. not at this domain build time.
> 
> That would only make a marginal difference in time of when domain
> creation fails.

it's not marginal difference. instead it's about who owns the policy.

to me, detect/avoid conflictions in domain builder is just a preparation
for later device assignment (either deterministic static assignment or
non-deterministic hotplug). As a preparation, we don't need to make
a failure here as a blocker to prevent guest boot. Instead, leave the
decision to where device assignment actually happens then hard
requirement is made on any conflictions a.t.m. Then we just follow
the existing policy of device assignment (either block guest boot, or 
move forward w/o presenting the device), if confliction is treat as
a failure by default (w/o 'warn' override)

> 
> >> > and another point is about hotplug. 'fail' for future devices is too
> > strict,
> >> > but to differentiate that from static-assigned devices, domain builder
> >> > will then need maintain a per-device reserved region structure. just
> >> > 'warn' makes things simple.
> >>
> >> Whereas here I agree - hotplug should just fail (without otherwise
> >> impacting the guest).
> >
> > so 'should' -> 'shoundn't'?
> 
> No. Perhaps what you imply from fail is different from my reading:
> I mean this to be the result of the hotplug operation - the device
> would just not appear in the guest. The guest isn't to be brought
> down because of such failure (i.e. behavior here is different from
> the boot time assignment, where the guest would be prevented
> from coming up).

yes, the guest shouldn't be blocked for failure which is only in possible
future. but to differentiate such case from static assignment, as you
proposed earlier we need whitelist all potentially-to-be hotplugged 
devices which are user unfriendly to figure out. That's why I want to
check whether just report-all is simple enough w/o big impact (as
we discussed in another mail we can't make boot time reservation
adapting to dynamic device change in the future so anyway there's
a point user may see unrelated reserved region)

> 
> >> > second, I'm not sure to what level users care about those reserved
> regions.
> >> > At most it's same layout as physical so even sensitive users won't see 
> >> > it as
> >> > a UFO. :-) and e820 is platform attributes so user shouldn't set
> assumption
> >> > on it.
> >>
> >> Just consider the case where, in order to accommodate the reserved
> >> regions, low memory needs to be reduced from the default of over
> >> 3Gb to say 1Gb. If the guest OS then is incapable of using memory
> >> above 4Gb (say Linux with HIGHMEM=n), there is a significant
> >> difference to be seen by the user.
> >
> > that makes some sense... but if yes it's also a limitation to your below
> > proposal on avoid fiddling lowmem, if there's a region at 1GB. I think
> > for this we can go with your earlier assumption, that we only support
> > the case which is reasonable high say 3G. violating that assumption
> > will be warned (so guest RAM is not moved) and later device assignment
> > will fail.
> 
> No, we shouldn't put in arbitrary restrictions on where RMRRs can sit.
> If there's one at 1Gb, and the associated device is to be passed through,
> so be it. All I wanted to make clear is that the report-all approach is
> going to have too heavy an impact on the guest.
> 

I think the key here is back to above policy discussion about domain
builder.

If we think 'warn' in domain builder is acceptable, then report-all doesn't
make things worse. If 1G confliction is caused by the statically assigned
device, both report-all and report-sel will fail in later assignment path.
If 1G confliction is caused by unrelated devices, report-all will only throw
warning but no failure since that device is not assigned. hotplug is same
for both.

it's only a problem for report-all if we think any confliction as a 'failure'
to block guest boot in domain builder.

if 'warn' is acceptable, then the only impact we discussed is that report-all
will have more reserved regions than report-sel, but per our other
discussion I haven't see it as a hard problem now.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.