Xen project Mailing List

[Xen-devel] Re: [RFC][PATCH] Per-cpu xentrace buffers

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>

Date: Wed, 20 Jan 2010 18:34:05 +0000

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 20 Jan 2010 10:34:34 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AcqZ+1DhHDAWcr0lSh6Xhb1Ui5ju8wAA9Qeh

Thread-topic: [RFC][PATCH] Per-cpu xentrace buffers

Final release is still a few weeks away. It should probably go in for rc2 then. -- Keir On 20/01/2010 18:06, "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote: > How long between rc2 and expected release (if no other candidates are > considered)? It's more of a debugging feature, so it's not going to > screw over any production systems if it's got some subtle bugs. (The > "tb_init_done" flag that turns it on or off is exactly the same.) I > could try to put it through its paces this week and early next week, and > if nothing turns up, it's probably fine to go in. > > It will definitely require a tools rebuild if anyone's using xentrace, > which people may not expect. :-) > > -George > > Keir Fraser wrote: >> Oh, I'm fine with it. I wasn't sure about putting it in for 4.0.0, but >> actually plenty is going in for rc2. What do you think? >> >> -- Keir >> >> On 20/01/2010 17:38, "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote: >> >> >>> Keir, would you mind commenting on this new design in the next few >>> days? If it looks like a good design, I'd like to do some more >>> testing and get this into our next XenServer release. >>> >>> -George >>> >>> On Thu, Jan 7, 2010 at 3:13 PM, George Dunlap <dunlapg@xxxxxxxxx> wrote: >>> >>>> In the current xentrace configuration, xentrace buffers are all >>>> allocated in a single contiguous chunk, and then divided among logical >>>> cpus, one buffer per cpu. The size of an allocatable chunk is fairly >>>> limited, in my experience about 128 pages (512KiB). As the number of >>>> logical cores increase, this means a much smaller maximum per-cpu >>>> trace buffer per cpu; on my dual-socket quad-core nehalem box with >>>> hyperthreading (16 logical cpus), that comes to 8 pages per logical >>>> cpu. >>>> >>>> The attached patch addresses this issue by allocating per-cpu buffers >>>> separately. This allows larger trace buffers; however, it requires an >>>> interface change to xentrace, which is why I'm making a Request For >>>> Comments. (I'm not expecting this patch to be included in the 4.0 >>>> release.) >>>> >>>> The old interface to get trace buffers was fairly simple: you ask for >>>> the info, and it gives you: >>>> * the mfn of the first page in the buffer allocation >>>> * the total size of the trace buffer >>>> >>>> The tools then mapped [mfn,mfn+size), calculated where the per-pcpu >>>> buffers were, and went on to consume records from them. >>>> >>>> -- Interface -- >>>> >>>> The proposed interface works as follows. >>>> >>>> * XEN_SYSCTL_TBUFOP_get_info still returns an mfn and a size (so no >>>> changes to the library). However, this new are is to a trace buffer >>>> info area (t_info), allocated once at boot time. The trace buffer >>>> info area contains mfns of the per-pcpu buffers. >>>> * The t_info struct contains an array of "offset pointers", one per >>>> pcpu. These are an offset into the t_info data area of an array of >>>> mfns for that pcpu. So logically, the layout looks like this: >>>> struct { >>>> int16_t tbuf_size; /* Number of pages per cpu */ >>>> int16_t offset[NR_CPUS]; /* Offset into the t_info area of the array */ >>>> uint32_t mfn[NR_CPUS][TBUF_SIZE]; >>>> }; >>>> >>>> So if NR_CPUS was 16, and TBUF_SIZE was 32, we'd have: >>>> struct { >>>> int16_t tbuf_size; /* Number of pages per cpu */ >>>> int16_t offset[16]; /* Offset into the t_info area of the array */ >>>> uint32_t p0_mfn_list[32]; >>>> uint32_t p1_mfn_list[32]; >>>> ... >>>> uint32_t p15_mfn_list[32]; >>>> }; >>>> * So the new way to map trace buffers is as follows: >>>> + Call TBUFOP_get_info to get the mfn and size of the t_info area, and map >>>> it. >>>> + Get the number of cpus >>>> + For each cpu: >>>> - Calculate the offset into the t_info area thus: unsigned long >>>> *mfn_list = ((unsigned long*)t_info)+(t_info->cpu_offset[cpu])) >>>> - Map t_info->tbuf_size mfns from mfn_list using xc_map_foreign_batch() >>>> >>>> In the current implementation, the t_info size is fixed at 2 pages, >>>> allowing about 2000 pages total to be mapped. For a 32-way system, >>>> this would allow up to 63 pages per cpu (256MiB). Bumping this up to >>>> 4 would allow even larger systems if required. >>>> >>>> The current implementation also allocates each trace buffer >>>> contiguously, since that's the easiest way to get contiguous virtual >>>> address space. But this interface allows Xen the flexibility, in the >>>> future, to allocate buffers in several chunks if necessary, without >>>> having to change the interface again. >>>> >>>> -- Implementation notes -- >>>> >>>> The t_info area is allocated once at boot. Trace buffers are >>>> allocated either at boot (if a parameter is passed) or when >>>> TBUFOP_set_size is called. Due to the complexity of tracking pages >>>> mapped by dom0, unmapping or resizing trace buffers is not supported. >>>> >>>> I introduced a new per-cpu spinlock guarding trace data and buffers. >>>> This allows per-cpu data to be safely accessed and modified without >>>> tracing with current tracing events. The per-cpu spinlock is grabbed >>>> whenever a trace event is generated; but in the (very very very) >>>> common case, the lock should be in the cache already. >>>> >>>> Feedback welcome. >>>> >>>> -George >>>> >>>> >> >> >> > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.