[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features

To: Julien Grall <julien.grall.oss@xxxxxxxxx>
From: Oleksandr <olekstysh@xxxxxxxxx>
Date: Tue, 11 Aug 2020 09:13:26 +0300
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
Delivery-date: Tue, 11 Aug 2020 06:13:57 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>


On 11.08.20 01:37, Julien Grall wrote:

Hi Julien

On Mon, 10 Aug 2020 at 21:29, Oleksandr <olekstysh@xxxxxxxxx> wrote:


On 10.08.20 22:00, Julien Grall wrote:

Hi Julien

@@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
    */
   void leave_hypervisor_to_guest(void)
   {
+#ifdef CONFIG_IOREQ_SERVER
+    /*
+     * XXX: Check the return. Shall we call that in
+     * continue_running and context_switch instead?
+     * The benefits would be to avoid calling
+     * handle_hvm_io_completion on every return.
+     */

Yeah, that could be a simple and good optimization

Well, it is not simple as it is sounds :).
handle_hvm_io_completion() is the function in charge to mark the
vCPU as waiting for I/O. So we would at least need to split the
function.

I wrote this TODO because I wasn't sure about the complexity of
handle_hvm_io_completion(current). Looking at it again, the main
complexity is the looping over the IOREQ servers.

I think it would be better to optimize handle_hvm_io_completion()
rather than trying to hack the context_switch() or continue_running().

Well, is the idea in proposed dirty test patch below close to what
you expect? Patch optimizes handle_hvm_io_completion() to avoid extra
actions if vcpu's domain doesn't have ioreq_server, alternatively
the check could be moved out of handle_hvm_io_completion() to avoid
calling that function at all.

This looks ok to me.

BTW, TODO also suggests checking the return value of
handle_hvm_io_completion(), but I am completely sure we can simply
just return from leave_hypervisor_to_guest() at this point. Could you
please share your opinion?

 From my understanding, handle_hvm_io_completion() may return false if
there is pending I/O or a failure.

It seems, yes

In the former case, I think we want to call handle_hvm_io_completion()
later on. Possibly after we call do_softirq().

I am wondering whether check_for_vcpu_work() could return whether
there are more work todo on the behalf of the vCPU.

So we could have:

do
{
   check_for_pcpu_work();
} while (check_for_vcpu_work())

The implementation of check_for_vcpu_work() would be:

if ( !handle_hvm_io_completion() )
   return true;

/* Rest of the existing code */

return false;

Thank you, will give it a try.

Can we behave the same way for both "pending I/O" and "failure" or we
need to distinguish them?

We don't need to distinguish them. In both cases, we will want to
process softirqs. In all the failure cases, the domain will have
crashed. Therefore the vCPU will be unscheduled.


Got it.

Probably we need some sort of safe timeout/number attempts in order to
not spin forever?

Well, anything based on timeout/number of attempts is flaky. How do
you know whether the I/O is just taking a "long time" to complete?

But a vCPU shouldn't continue until an I/O has completed. This is
nothing very different than what a processor would do.

In Xen case, if an I/O never completes then it most likely means that
something went horribly wrong with the Device Emulator. So it is most
likely not safe to continue. In HW, when there is a device failure,
the OS may receive an SError (this is implementation defined) and
could act accordingly if it is able to recognize the issue.

It *might* be possible to send a virtual SError but there are a couple
of issues with it:
      * How do you detect a failure?
      * SErrors are implementations defined. You would need to teach
your OS (or the firmware) how to deal with them.

I would expect quite a bit of effort in order to design and implement
it. For now, it is probably best to just let the vCPU spin forever.

This wouldn't be an issue for Xen as do_softirq() would be called at
every loop.


 Thank you for clarification. Fair enough and sounds reasonable.


--
Regards,

Oleksandr Tyshchenko

Follow-Ups:
- Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  - From: Oleksandr

References:
- [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm
  - From: Oleksandr Tyshchenko
- [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  - From: Oleksandr Tyshchenko
- Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  - From: Stefano Stabellini
- Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  - From: Julien Grall
- Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  - From: Oleksandr
- Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  - From: Julien Grall
- Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  - From: Oleksandr
- Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  - From: Julien Grall

Prev by Date: [seabios test] 152554: tolerable FAIL - PUSHED
Next by Date: [xen-unstable test] 152555: tolerable FAIL - PUSHED
Previous by thread: Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
Next by thread: Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.