[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 04/15] xen/arm: add Dom0 cache coloring support



Hi Jan,

On 01/02/2024 13:39, Jan Beulich wrote:
On 01.02.2024 14:35, Julien Grall wrote:
Hi Jan,

On 01/02/2024 13:30, Jan Beulich wrote:
On 29.01.2024 18:18, Carlo Nonato wrote:
Add a command line parameter to allow the user to set the coloring
configuration for Dom0.
A common configuration syntax for cache colors is introduced and
documented.
Take the opportunity to also add:
   - default configuration notion.
   - function to check well-formed configurations.

Direct mapping Dom0 isn't possible when coloring is enabled, so
CDF_directmap flag is removed when creating it.

What implications does this have?

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -963,6 +963,15 @@ Controls for the dom0 IOMMU setup.
Specify a list of IO ports to be excluded from dom0 access. +### dom0-llc-colors
+> `= List of [ <integer> | <integer>-<integer> ]`
+
+> Default: `All available LLC colors`
+
+Specify dom0 LLC color configuration. This options is available only when
+`CONFIG_LLC_COLORING` is enabled. If the parameter is not set, all available
+colors are used.

Even Arm already has a "dom0=" option. Is there a particular reason why
this doesn't become a new sub-option there?

As to meaning: With just a single <integer>, that's still a color value
then (and not a count of colors)? Wouldn't it make sense to have a
simpler variant available where you just say how many, and a suitable
set/range is then picked?

Finally a nit: "This option is ...".

@@ -2188,10 +2190,16 @@ void __init create_dom0(void)
               panic("SVE vector length error\n");
       }
- dom0 = domain_create(0, &dom0_cfg, CDF_privileged | CDF_directmap);
+    if ( !llc_coloring_enabled )
+        flags |= CDF_directmap;
+
+    dom0 = domain_create(0, &dom0_cfg, flags);
       if ( IS_ERR(dom0) )
           panic("Error creating domain 0 (rc = %ld)\n", PTR_ERR(dom0));
+ if ( llc_coloring_enabled && (rc = dom0_set_llc_colors(dom0)) )
+        panic("Error initializing LLC coloring for domain 0 (rc = %d)", rc);

As for the earlier patch, I find panic()ing here dubious. You can continue
quite fine, with a warning and perhaps again tainting the system.
There are arguments for both approach.

In which case - perhaps allow for both? With a Kconfig-established
default and a command line option to override?

Perhaps. But this is a separate discussion from this series. What Carlo has been doing match the surrounding code on Arm.


I agree that you can continue but
technically this is not the configuration you asked. Someone may not
notice the tainting until it is too late (read they have done
investigation).

Bear in mind that the user for cache coloring will be in very
specialized environment.

s/will/may/ I suppose. People may enable the option without being in
any specialized environment.

Sure. But again, why would you want to boot with a half broken configuration?

In a lot of cases, you are not making a favor to the admin to continue to boot. It is easy to say there is a warning in the logs, but this can often be overlooked and difficult to diagnostic afterwards. For instance, if you think about cache coloring the issue would be latency.

I don't think a lambda users will be able to easily figure out that their configuration was wrong.

Futhermore, when you operate at scale, I feel it is better to have an early boot crash rather than allowing the system to boot (parsing the logs is feasible but IMO risky as they are not stable).


So if you can't enable cache coloring in
production, then something really wrong has happened and continue to
boot is probably not right.

This matches the approach for Arm we have been using since the
beginning. And I will strongly argue to continue this way.

I'm okay with this, and here (for Arm-specific code) it may even be okay
to do so without further justification. But in the earlier patch where
common code is affected, I'll insist on at least justifying this behavior.

See above for a justification. If someone asks for cache coloring, then you most likely don't want to continue without cache coloring.

If you dislike the panic() in common code, then we can simply modify the function to return an error and move the panic() in the Arm code. This is not my preference, but I am under the impression that we both have very diverging view how to handle boot error and it will be hard to reconcile them (at least in this series, this can be done afterwards if somone fancy to write a series matching what you proposed above). So this would be the second best option for me.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.