[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 6/6] x86/microcode: Synchronize late microcode loading


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Chao Gao <chao.gao@xxxxxxxxx>
  • From: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
  • Date: Thu, 29 Nov 2018 17:43:15 -0500
  • Autocrypt: addr=boris.ostrovsky@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFH8CgsBEAC0KiOi9siOvlXatK2xX99e/J3OvApoYWjieVQ9232Eb7GzCWrItCzP8FUV PQg8rMsSd0OzIvvjbEAvaWLlbs8wa3MtVLysHY/DfqRK9Zvr/RgrsYC6ukOB7igy2PGqZd+M MDnSmVzik0sPvB6xPV7QyFsykEgpnHbvdZAUy/vyys8xgT0PVYR5hyvhyf6VIfGuvqIsvJw5 C8+P71CHI+U/IhsKrLrsiYHpAhQkw+Zvyeml6XSi5w4LXDbF+3oholKYCkPwxmGdK8MUIdkM d7iYdKqiP4W6FKQou/lC3jvOceGupEoDV9botSWEIIlKdtm6C4GfL45RD8V4B9iy24JHPlom woVWc0xBZboQguhauQqrBFooHO3roEeM1pxXjLUbDtH4t3SAI3gt4dpSyT3EvzhyNQVVIxj2 FXnIChrYxR6S0ijSqUKO0cAduenhBrpYbz9qFcB/GyxD+ZWY7OgQKHUZMWapx5bHGQ8bUZz2 SfjZwK+GETGhfkvNMf6zXbZkDq4kKB/ywaKvVPodS1Poa44+B9sxbUp1jMfFtlOJ3AYB0WDS Op3d7F2ry20CIf1Ifh0nIxkQPkTX7aX5rI92oZeu5u038dHUu/dO2EcuCjl1eDMGm5PLHDSP 0QUw5xzk1Y8MG1JQ56PtqReO33inBXG63yTIikJmUXFTw6lLJwARAQABzTNCb3JpcyBPc3Ry b3Zza3kgKFdvcmspIDxib3Jpcy5vc3Ryb3Zza3lAb3JhY2xlLmNvbT7CwXgEEwECACIFAlH8 CgsCGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEIredpCGysGyasEP/j5xApopUf4g 9Fl3UxZuBx+oduuw3JHqgbGZ2siA3EA4bKwtKq8eT7ekpApn4c0HA8TWTDtgZtLSV5IdH+9z JimBDrhLkDI3Zsx2CafL4pMJvpUavhc5mEU8myp4dWCuIylHiWG65agvUeFZYK4P33fGqoaS VGx3tsQIAr7MsQxilMfRiTEoYH0WWthhE0YVQzV6kx4wj4yLGYPPBtFqnrapKKC8yFTpgjaK jImqWhU9CSUAXdNEs/oKVR1XlkDpMCFDl88vKAuJwugnixjbPFTVPyoC7+4Bm/FnL3iwlJVE qIGQRspt09r+datFzPqSbp5Fo/9m4JSvgtPp2X2+gIGgLPWp2ft1NXHHVWP19sPgEsEJXSr9 tskM8ScxEkqAUuDs6+x/ISX8wa5Pvmo65drN+JWA8EqKOHQG6LUsUdJolFM2i4Z0k40BnFU/ kjTARjrXW94LwokVy4x+ZYgImrnKWeKac6fMfMwH2aKpCQLlVxdO4qvJkv92SzZz4538az1T m+3ekJAimou89cXwXHCFb5WqJcyjDfdQF857vTn1z4qu7udYCuuV/4xDEhslUq1+GcNDjAhB nNYPzD+SvhWEsrjuXv+fDONdJtmLUpKs4Jtak3smGGhZsqpcNv8nQzUGDQZjuCSmDqW8vn2o hWwveNeRTkxh+2x1Qb3GT46uzsFNBFH8CgsBEADGC/yx5ctcLQlB9hbq7KNqCDyZNoYu1HAB Hal3MuxPfoGKObEktawQPQaSTB5vNlDxKihezLnlT/PKjcXC2R1OjSDinlu5XNGc6mnky03q yymUPyiMtWhBBftezTRxWRslPaFWlg/h/Y1iDuOcklhpr7K1h1jRPCrf1yIoxbIpDbffnuyz kuto4AahRvBU4Js4sU7f/btU+h+e0AcLVzIhTVPIz7PM+Gk2LNzZ3/on4dnEc/qd+ZZFlOQ4 KDN/hPqlwA/YJsKzAPX51L6Vv344pqTm6Z0f9M7YALB/11FO2nBB7zw7HAUYqJeHutCwxm7i BDNt0g9fhviNcJzagqJ1R7aPjtjBoYvKkbwNu5sWDpQ4idnsnck4YT6ctzN4I+6lfkU8zMzC gM2R4qqUXmxFIS4Bee+gnJi0Pc3KcBYBZsDK44FtM//5Cp9DrxRQOh19kNHBlxkmEb8kL/pw XIDcEq8MXzPBbxwHKJ3QRWRe5jPNpf8HCjnZz0XyJV0/4M1JvOua7IZftOttQ6KnM4m6WNIZ 2ydg7dBhDa6iv1oKdL7wdp/rCulVWn8R7+3cRK95SnWiJ0qKDlMbIN8oGMhHdin8cSRYdmHK kTnvSGJNlkis5a+048o0C6jI3LozQYD/W9wq7MvgChgVQw1iEOB4u/3FXDEGulRVko6xCBU4 SQARAQABwsFfBBgBAgAJBQJR/AoLAhsMAAoJEIredpCGysGyfvMQAIywR6jTqix6/fL0Ip8G jpt3uk//QNxGJE3ZkUNLX6N786vnEJvc1beCu6EwqD1ezG9fJKMl7F3SEgpYaiKEcHfoKGdh 30B3Hsq44vOoxR6zxw2B/giADjhmWTP5tWQ9548N4VhIZMYQMQCkdqaueSL+8asp8tBNP+TJ PAIIANYvJaD8xA7sYUXGTzOXDh2THWSvmEWWmzok8er/u6ZKdS1YmZkUy8cfzrll/9hiGCTj u3qcaOM6i/m4hqtvsI1cOORMVwjJF4+IkC5ZBoeRs/xW5zIBdSUoC8L+OCyj5JETWTt40+lu qoqAF/AEGsNZTrwHJYu9rbHH260C0KYCNqmxDdcROUqIzJdzDKOrDmebkEVnxVeLJBIhYZUd t3Iq9hdjpU50TA6sQ3mZxzBdfRgg+vaj2DsJqI5Xla9QGKD+xNT6v14cZuIMZzO7w0DoojM4 ByrabFsOQxGvE0w9Dch2BDSI2Xyk1zjPKxG1VNBQVx3flH37QDWpL2zlJikW29Ws86PHdthh Fm5PY8YtX576DchSP6qJC57/eAAe/9ztZdVAdesQwGb9hZHJc75B+VNm4xrh/PJO6c1THqdQ 19WVJ+7rDx3PhVncGlbAOiiiE3NOFPJ1OQYxPKtpBUukAlOTnkKE6QcA4zckFepUkfmBV1wM Jg6OxFYd01z+a+oL
  • Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Borislav Petkov <bp@xxxxxxx>, Ashok Raj <ashok.raj@xxxxxxxxx>
  • Delivery-date: Thu, 29 Nov 2018 22:40:40 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 11/29/18 4:56 AM, Roger Pau Monné wrote:
> On Thu, Nov 29, 2018 at 12:43:25PM +0800, Chao Gao wrote:
>> On Wed, Nov 28, 2018 at 04:22:09PM +0100, Roger Pau Monné wrote:
>>> On Wed, Nov 28, 2018 at 01:34:16PM +0800, Chao Gao wrote:
>>>
>>>> @@ -311,13 +350,45 @@ int 
>>>> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
>>>>      if ( ret <= 0 )
>>>>      {
>>>>          printk("No valid or newer microcode found. Update abort!\n");
>>>> -        return -EINVAL;
>>>> +        ret = -EINVAL;
>>>> +        goto put;
>>>>      }
>>>>  
>>>> -    info->error = 0;
>>>> -    info->cpu = cpumask_first(&cpu_online_map);
>>>> +    atomic_set(&info->cpu_in, 0);
>>>> +    atomic_set(&info->cpu_out, 0);
>>>> +
>>>> +    /* Calculate the number of online CPU core */
>>>> +    nr_cores = 0;
>>>> +    for_each_online_cpu(cpu)
>>>> +        if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
>>>> +            nr_cores++;
>>>> +
>>>> +    printk("%d cores are to update its microcode\n", nr_cores);
>>>>  
>>>> -    return continue_hypercall_on_cpu(info->cpu, do_microcode_update, 
>>>> info);
>>>> +    /*
>>>> +     * We intend to disable interrupt for long time, which may lead to
>>>> +     * watchdog timeout.
>>>> +     */
>>>> +    watchdog_disable();
>>>> +    /*
>>>> +     * Late loading dance. Why the heavy-handed stop_machine effort?
>>>> +     *
>>>> +     * - HT siblings must be idle and not execute other code while the 
>>>> other
>>>> +     *   sibling is loading microcode in order to avoid any negative
>>>> +     *   interactions cause by the loading.
>>> Well, the HT siblings will be executing code, since they are in a
>>> while loop waiting for the non-siblings cores to finish updating.
>> Strictly speaking, you are right. The 'idle' I think means no other
>> workload on the cpu except microcode loading (for a HT sibling which
>> isn't chosen to do the update, means waiting for the completion of
>> the other sibling).
> Could you clarify the comment then?
>
> By workload you mean that no other microcode loading should be
> attempted from a HT sibling?
>
> Is there a set of instructions or functionality that cannot be used by
> HT siblings while performing a microcode load?

The sibling should really not execute anything. For example, when
updating from microcode which introduced MSR0x48 to a newer microcode
which also updates 0x48 behavior the MSR (apparently) momentarily
disappears. We've seen this reliably happen, with crashes when the
sibling tries to access the MSR while the other thread is loading the
microcode.

One other comment about this patch (which IIRC was raised by Andrew on
an earlier version) is that it may be worth to stop timer calibration. I
am pretty sure we've seen deadlocks, which is why we ended up disabling
it during microcode updates.

-boris




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.