| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
 Re: [PATCH 3/3] x86: correct fencing around CLFLUSH
 
To: Jan Beulich <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx"	<xen-devel@xxxxxxxxxxxxxxxxxxxx>From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>Date: Wed, 23 Feb 2022 12:33:00 +0000Accept-language: en-GB, en-USArc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=noneArc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1Wetn+zddTJITpG8eRN2BGHMe/qZpbS3LX2iByw6EHY=; b=ZJsSuQMPwby0hq9JQWbS9zpai4H70XaMGsZstJEgDfrgwDaPyoIVhkEb/W4h0AKh697ApN0Tg++XkNtB+GQ8QigYMUfKpgPFzLmn2af4Ym65pT1z3IQvtKo5IZDgJ8d7K0M5ie/li+4+valmajO8paMMsklfP6/CtvRORYdAC6oM36bvogj/j06WiMFRY8EHzDiMnZUblEieni6A+o/aoLDdZyTW7lz6OG7JnYdxGO8MF6qcUDcWQZTgurckLcD48Mvgu1kphJAdq8ENCn74Ia3SgZKlv2BLrx2fDyZILfqm6BTjbshdmLSAD26uqdKUj502xv0LrOecm1VmeplYFw==Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TBuwGofTIRDemmoqwuT9LZpFne1P4sgX5MSVXEGIbj2bnQdc3xGgKt5gWUDlaPi2gJlh3dQKXKctWEKHswE+sywNoDxbfO79t1hkXgg0gyw8G3CybL6uzZXqBjpsDGx5+3efWkKEZS5kKwbJztIBnAEoHCBVZt0eE5mklqSjaXES4P59hSrjUY1rZwMcvkyqOrAWi9Rtpff0sE3GodRqls2/sDCVeSSRFlgc1scWrR2IMn195eIQ7ia6j9u1VglxXuSuApX6WRchbFg6suV/Rcv2zogEXz0U5cllc7ajB5lw6Xt6PxymCtc/4MhMQalpX+TPfVZCxSbAg93/hJvzCQ==Authentication-results: esa3.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.comCc: Wei Liu <wl@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>Delivery-date: Wed, 23 Feb 2022 12:33:26 +0000Ironport-data: A9a23:ptcQAawYHrn08QnTwGt6t+clxirEfRIJ4+MujC+fZmUNrF6WrkUHm GIZUG3SP66NZzH8eYxzYI6yp04Bu8fSnIAwT1Zo+yAxQypGp/SeCIXCJC8cHc8zwu4v7q5Dx 59DAjUVBJlsFhcwnj/0bv656yMUOZigHtIQMsadUsxKbVIiGHdJZS5LwbZj2NYy24PhWmthh PupyyHhEA79s9JLGjp8B5Kr8HuDa9yr5Vv0FnRnDRx6lAe2e0s9VfrzFonoR5fMeaFGH/bSe gr25OrRElU1XfsaIojNfr7TKiXmS1NJVOSEoiI+t6OK2nCuqsGuu0qS2TV1hUp/0l20c95NJ Nplj7qBVDkmPof1qt8dXkJnCRF+MKJk5+qSSZS/mZT7I0zudnLtx7NlDV0sPJ1e8eFyaY1M3 aVGcnZXNEnF3r/ohuLgIgVvrp1LwM3DFYUToHx/ixreCu4rW8vrSKTW/95Imjw3g6iiGN6AO 5pBNGIxPHwsZTVjFQZGEIMHv9zxg2PDImJpl3y34po4tj27IAtZj+G2bYu9lsaxbdpRtlaVo CTB5WuRKgEXMpmTxCSI9lqoh/TThmXrVYQKDrq6+/V2xlqJyQQ7ChcbSF+6qvmRkVOlVpRUL El8x8Y1hfFsrgrxFIC7BkDm5i7f1vIBZzZOO9cc2Ryzxar+2jSiBmVfcSRxaNEpltBjEFTGy WS1t9/uADVutpicRnSc6qqYoFuOBMQFEYMRTXRaFFVYurEPtKl210uSFYg7TMZZm/WoQWmY/ tyckMQpa1z/Z+Yv3r7zw13IiinESnPhHl9svVW/so5IA2pEiG+Zi26AtACzARVodt/xory9U J4swZT2AAcmV8zlqcB1aL9RdIxFHt7cWNEmvXZhHoM66xOm8GO5cIZb7VlWfRk1b5tZJ2e5O hOM6Wu9AaO/2lPwMcebhKrrVqwXIVXIT4y5Bpg4kPIUCnSOSON31H43PhPBt4wcuEMtjbs+K f+mnTWEVh4n5VBc5GPuHY81iOZzrghnnD+7bc2rnnyPjOvFDFbIGOhtDbd7Rr1ghE9yiF6Oq Ig32grj40g3bdASlQGNr9ZIdAhSdSJT6FKfg5U/S9Nv6zFOQQkJI/TQ3akga8pimaFUnf3P5 XazRglTz1+XuJENAV7ihqxLAF83YatCkA==Ironport-hdrordr: A9a23:RfFN9q+3nNB3z/oIF1Fuk+F2db1zdoMgy1knxilNoENuHfBwxv rDoB1E73LJYW4qKQwdcdDpAtjkfZtFnaQFrrX5To3SIDUO31HYYr2KjLGSjwEIfheRygcz79 YYT0ETMqySMbE+t7eB3ODaKadg/DDkytHRuQ629R4EJmsKC52IrT0JcTpzencGHzWubqBJcK Z0k/A3wQZIDk5nCfhTaEN1PdTrlpnurtbLcBQGDxko5E2lljWz8oP3FBCew1M3Ty5P6a1Kyx mEryXJooGY992rwB7V0GHeq75MnsH699dFDMuQzuAINzTXjBqybogJYczAgNl1mpDs1L8Zqq iJn/4SBbU115oXRBDynfLZ4Xik7N/p0Q669bbXuwq6nSWzfkNENyMIv/MmTvKe0Tt7gDg06t M644rS3aAnfC/ojWDz4cPFWAptkVfxqX0+kfQLh3gaSocGbqRNxLZvt3+9Pa1wVR4S0rpXWN WGzfuskMp+YBefdTTUr2NvyNujUjA6GQqHWFELvoiQ3yJNlH50wkMEzIhH901wua4VWt1B/a DJI65onLZBQosfar98Hv4IRY+yBnbWSRzBPWqOKRDsFb0BOXjKt5nriY9Frt2CadgN1t8/iZ 7BWFRXuSo7fF/vE9SH2NlR/hXEUAyGLELQIwFllu9EU5HHNcjW2He4OSMTeuOb0oAiPvE=List-id: Xen developer discussion <xen-devel.lists.xenproject.org>Thread-index: AQHYKJ4ejcG5b0mVokmgaQG/G3Z+sqyhEcAAThread-topic: [PATCH 3/3] x86: correct fencing around CLFLUSH 
 On 23/02/2022 10:13, Jan Beulich wrote:
> As noted in the context of 3330013e6739 ("VT-d / x86: re-arrange cache
> syncing"): While cache_writeback() has the SFENCE on the correct side of
> CLFLUSHOPT, flush_area_local() doesn't. While I can't prove it due to
> lacking a copy of the old SDM version, I can only assume this placement
> was a result of what had been described there originally. In any event
> recent versions of the SDM hve been telling us otherwise.
>
> For AMD (and Hygon, albeit there it's benign, since all their CPUs are
> expected to support CLFLUSHOPT) the situation is more complex: MFENCE is
> needed ahead and/or after CLFLUSH when the CPU doesn't also support
> CLFLUSHOPT. (It's "and" in our case, as we cannot know what the caller's
> needs are.)
>
> Fixes: 623c720fc8da3 ("x86: use CLFLUSHOPT when available")
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -346,9 +346,14 @@ void __init early_cpu_init(void)
>              c->x86_model, c->x86_model, c->x86_mask, eax);
>  
>       if (c->cpuid_level >= 7)
> -             cpuid_count(7, 0, &eax, &ebx,
> +             cpuid_count(7, 0, &eax,
> +                            &c->x86_capability[FEATURESET_7b0],
>                              &c->x86_capability[FEATURESET_7c0], &edx);
>  
> +     if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ||
> +         cpu_has(c, X86_FEATURE_CLFLUSHOPT))
> +             setup_force_cpu_cap(X86_FEATURE_CLFLUSH_NO_MFENCE);
This is somewhat ugly, not only because it presumes that the early AMD
implementation peculiarities are common.
It also has a corner case that goes wrong when the BSP enumerates
CLFLUSHOPT but later CPUs don't.  In this case the workaround will be
disengaged even when it's not safe to.
Most importantly however, it makes the one current slow usecase (VT-d on
early Intel with only CLFLUSH) even slower.
I suggest inverting this workaround (and IMO, using the bug
infrastructure, because that's exactly what it is) and leaving a big
warning by the function saying "don't use on AMD before alternatives
have run" or something.  It's quite possibly a problem we'll never need
to solve in practice, although my plans for overhauling CPUID scanning
will probably fix it because we can move the first alternatives pass far
earlier as a consequence.
> +
>       eax = cpuid_eax(0x80000000);
>       if ((eax >> 16) == 0x8000 && eax >= 0x80000008) {
>               ebx = eax >= 0x8000001f ? cpuid_ebx(0x8000001f) : 0;
> --- a/xen/arch/x86/flushtlb.c
> +++ b/xen/arch/x86/flushtlb.c
> @@ -245,12 +245,15 @@ unsigned int flush_area_local(const void
>               c->x86_clflush_size && c->x86_cache_size && sz &&
>               ((sz >> 10) < c->x86_cache_size) )
>          {
> -            alternative("", "sfence", X86_FEATURE_CLFLUSHOPT);
> +            alternative("mfence", , X86_FEATURE_CLFLUSH_NO_MFENCE);
An an aside, the absence of "" is very weird parse, and only compiles
because this is a macro rather than a function.
I'd request that it stays, simply to make the code read more like regular C.
~Andrew
 |