[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: RE: Re: RE: Re: Re: [Xen-devel] when timer go back in dom0 save and restore ormigrate, PV domain hung
 
 
Kevin,      Ok, I find we talk about different time_resume ;-) , time_resume I mentioned is in xen/arch/x86/time.c. but that of you mentioned lies in dom0's kernel. Ok, I also think we can modify something in dom0's kernel whcich can also resolve this problem.       Let me check, which is better.
  Thanks --James 
  >>> "Tian, Kevin"  08/11/27/ PM 13:37 >>>
No, time_resume is for sure invoked. You should look at 
machine_reboot.c which is the whole path for s/r and lm. 
  
"date" will change since by default wall clock in guest is 
synced to real. Maybe independent_wallclock is something you want to start with, 
which is not cared at s/r for now. 
  
Thanks, 
Kevin  
  
  F.Y.I
  >>> "Tian, Kevin" 
  08.11.27.  11:50 >>> 
  Sorry for a typo. I did mean domU instead of dom0. :-) 
  The point here is that time_resume will sync to new system time and wall clock 
  at restore, and thus pv guest should be able to continue... Xen system time is 
  not wallclock time which just counts up from power up. As Keir points 
  out, only its progress is used to drive internal jiffies. 
  --- Actually, 
  save/restore or migrate will not call time_resume, this function mybe only be 
  called in power saving.
  
  Then what do you mean for "system time stop" 
  here? TOD at user level, or within kernel you observe xen system time 
  never changing? 
  --- If you run command "date" in 
  user mode, you will find the date of output never change until a time interval 
  equal to the value of time delay. And also, you can run some applicatin 
  without many relation with time. such as vi,cd...etc, but if you run ping 
  x.x.x.x you will find only one line's respose and never go on. 
  
    Thanks  --James 
  
  
  
    
    Hi,     yes, there is a patch before to fix 
    problem wc_sec/wc_nsec in xc_domain_restore.c, but it still missed 
    something. If constucting dom0 or restoring of a PV dom. Guest os will 
    read the local wc_sec from xen as it base time.wc_sec is initialized with 
    CMOS data. There were some case which wc_sec will be changed. One is that go 
    back dom0's system-time will change dom0's time and wc_sec smaller which is 
    both Guest os and Xen. Actually, we can do a simple test, starting a pv 
    domain, then change dom0's time, and you will find the system time of guest 
    os stopped. That because you change wc_sec of both xen and guest os. 
         This patch only consider the case of save/restore. I 
    still not sure the policy of this case that is when dom0's system-time go 
    back. what VMs should do?  So, I have add this case to this 
    patch    By the way, Kevin, Guest OS will hang not dom0 ;-) and 
    also the time of hang just is equivlant to the time interval you go back in 
    dom0 or new machine you migrate.  Thanks   -- 
    James
  >>> Keir Fraser 08?11?26? 
    ?? 22:58 >>> So what happens if someone changes wallclock using 
    'date'? That's basically kind of what will appear to happen when s/r 
    occurs.
   -- Keir
  On 26/11/08 14:32, "Tian, Kevin" 
    <kevin.tian@xxxxxxxxx> wrote:
  
    hrtimer supports two timer bases: CLOCK_MONOTONIC and 
      CLOCK_REALTIME. wall_to_monotonic is only added in former case, and for 
      latter instead TOD is used directly per my reading. I did a quick search, 
      and it looks that futex and ntp are using CLOCK_REALTIME. Also there's one 
      vsyscall gate which can pass CLOCK_REALTIME from caller 
      too.
  Thanks, Kevin
  
         
         
        From: Keir Fraser  [mailto:keir.fraser@xxxxxxxxxxxxx] 
         Sent: Wednesday, November 26,  2008 10:26 
        PM To: Tian, Kevin; 'James Song'; 
         xen-devel@xxxxxxxxxxxxxxxxxxx Subject: Re: [Xen-devel] 
        when timer go  back in dom0 save and restore or migrate, PV domain 
        hung
    hrtimers add 
        wall_to_monotonic to xtime to get a  timesource that doesn't (or 
        shouldn't!) warp.
   -- Keir
  On  26/11/08 14:20, 
        "Tian, Kevin" <kevin.tian@xxxxxxxxx> 
         wrote:
    
        how about hrtimers? one mode is CLOCK_REALTIME, which uses 
           getnstimeofday as expiration. Once system time is changed either 
          in local or  new machine, that expiration can't be adjusted. but 
          i'm not sure whether it  still makes sense to try hrtimers in a 
          guest.
  Thanks Kevin
    
               
             
            From: Keir Fraser  [mailto:keir.fraser@xxxxxxxxxxxxx] 
              Sent: Wednesday, November 26,  2008 10:11 
            PM To:  Tian, Kevin; 'James Song'; 
              xen-devel@xxxxxxxxxxxxxxxxxxx Subject: Re: 
            [Xen-devel]  when timer go  back in dom0 save and restore 
            or migrate, PV domain  hung
    The  problem 
            hasn't been fully explained, but I can say  that PV guests 
             expect system time to jump across s/r and deal with that. For 
              example, Linux doesn't use Xen system time internally, 
            but uses its  progress  to periodically update jiffies, 
            which does not warp across  s/r.
  We have  had 
            problems corrupting wc_sec/wc_nsec in  xc_domain_restore.c, but 
            that was  fixed some time  ago.
   -- 
            Keir
  On 26/11/08 14:00, "Tian,  Kevin" 
             <kevin.tian@xxxxxxxxx> 
            wrote:
      
            This is not a s/r or lm specific 
              issue. For example, system  time  can be changed even 
              when pv guest is running. Your patch only  hacks restore 
               point once, and wc_sec can still be changed later  when 
              system time is  changed on-the-fly 
               again.
  IIRC, pv guest can catch up wall 
              clock change in timer  interrupt,  and time_resume will 
              sync internal processed system  time with new system 
               time after restored. But I'm not sure whether  it's 
              enough. Actually the more  interesting is the uptime 
               difference. For example, timer with expiration 
               calculated on  previous system time may wait nearly 
              infinite if uptime among  two  boxes vary a lot. But I 
              think such issue should have been considered   already, 
              e.g. some user tool assistance. I think Keir can comment 
               better  here.
  BTW, do you happen to know what 
              exactly dom0 hangs on? In  some  busy loop to catch up 
              time, or long delay to some critical  timer 
               expiration?
  Thanks, Kevin
      
                     
                 
                From: 
                  xen-devel-bounces@xxxxxxxxxxxxxxxxxxx  [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] 
                  On Behalf Of James  Song Sent: 
                Tuesday,  November 25,  2008 4:02 PM To: 
                   xen-devel@xxxxxxxxxxxxxxxxxxx Subject: 
                 [Xen-devel] when  timer go  back in dom0 save 
                and restore or  migrate, PV domain 
                 hung
    Hi,    I 
                  find PV domin hung, When we take those steps 
                             1, 
                 save PV  domain 
                            2, 
                  change system time of  PV domain back 
                            3, 
                 restore   a PV domain 
                          or 
                             1, 
                 migrate  a PV domain  from Machine A to Machine 
                  B          2, 
                 the system   time of Machine B is slower than 
                Machine  A.    the  problem is 
                 wc_sec will be  change when system-time chanaged in 
                dom0  or restore in a   slower-system-time 
                machine, but when restoring, xen  don't  restore the 
                wc_sec  of share_info from xenstore and use native 
                  one. So guest os will hang.   this patch 
                will work for  this 
                 issue.
   Thanks  -- Song 
                  Wei
  diff -r  a5ed0dbc829f 
                 tools/libxc/xc_domain_restore.c --- 
                   a/tools/libxc/xc_domain_restore.c 
                   Tue  Nov 18  14:34:14 2008 
                 +0800 +++  b/tools/libxc/xc_domain_restore.c 
                    Fri Nov 21   17:34:15 2008 
                +0800 @@ -328,6  +328,16 
                  @@        /* For 
                info   only 
                 */      nr_pfns = 0; + 
                      //jsong@xxxxxxxxxx, james 
                song +      memset(&domctl, 0, 
                  sizeof(domctl)); + 
                    domctl.domain =   dom; + 
                    domctl.cmd    = 
                   XEN_DOMCTL_restoredomain; + 
                   frc =   do_domctl(xc_handle, 
                 &domctl); +     if ( frc 
                 != 0 ) +      { + 
                              ERROR("Unable 
                  to set flag of  restore."); + 
                              goto 
                  out; + 
                     }        if 
                 (   read_exact(io_fd, &p2m_size, 
                sizeof(unsigned long)) 
                   )      { @@ 
                -1120,6 +1130,8 
                   @@        /* 
                restore  saved  vcpu_info and arch  specific info 
                  */      MEMCPY_FIELD(new_shared_info, 
                   old_shared_info, vcpu_info); + 
                      MEMCPY_FIELD(new_shared_info, 
                 old_shared_info,   wc_nsec); + 
                    MEMCPY_FIELD(new_shared_info, 
                   old_shared_info, 
                  wc_sec);       MEMCPY_FIELD(new_shared_info, 
                  old_shared_info, 
                   arch);        /* 
                clear  any  pending events and  the selector 
                */ diff -r  a5ed0dbc829f 
                 xen/arch/x86/time.c --- 
                  a/xen/arch/x86/time.c     Tue 
                Nov 18  14:34:14 2008 +0800 +++ 
                  b/xen/arch/x86/time.c     Fri 
                Nov 21 17:34:15 2008  +0800 @@   -689,7 +689,6 
                  @@       wmb();      (*version)++;  } -  void 
                   update_vcpu_system_time(struct vcpu 
                  *v)  {       struct 
                 cpu_time 
                       *t; @@ -703,7 
                 +702,6 
                  @@        if ( 
                  u->tsc_timestamp ==  t->local_tsc_stamp 
                  )           return; -       version_update_begin(&u->version);         u->tsc_timestamp 
                      = 
                t->local_tsc_stamp; @@   -713,14  +711,19 
                  @@         version_update_end(&u->version);  } -  void 
                   update_domain_wallclock_time(struct domain 
                   *d)  {       spin_lock(&wc_lock); + 
                     if(d->after_restore 
                 ) +      { + 
                          d->after_restore 
                 =  0; + 
                       goto   out; 
                 //jsong@xxxxxxxxxx + 
                     }       version_update_begin(&shared_info(d, 
                   wc_version));      shared_info(d, 
                  wc_sec)  =  wc_sec + 
                  d->time_offset_seconds;      shared_info(d, 
                   wc_nsec) = 
                  wc_nsec;       version_update_end(&shared_info(d, 
                   wc_version)); +out:       spin_unlock(&wc_lock);  }   @@ 
                  -751,7 +754,6 
                 @@      u64 
                  x;      u32 y, 
                 _wc_sec, 
                  _wc_nsec;      struct 
                domain 
                   *d; -      x = 
                (secs *  1000000000ULL)  + (u64)nsecs - 
                  system_time_base;      y 
                 =  do_div(x,  1000000000);   @@ 
                -1050,7 +1052,6   @@  struct tm 
                   wallclock_time(void)  {      uint64_t 
                   seconds; -      if 
                (  !wc_sec 
                   )          return 
                  (struct tm) { 0  };   diff -r 
                a5ed0dbc829f   xen/common/domctl.c --- 
                 a/xen/common/domctl.c      Tue 
                Nov 18 14:34:14 2008 +0800 +++ 
                   b/xen/common/domctl.c    Fri 
                Nov 21  17:34:15 2008  +0800 @@  -24,7 +24,6 
                @@  #include 
                  <asm/current.h>  #include 
                   <public/domctl.h>  #include 
                   <xsm/xsm.h> -  extern long 
                   arch_do_domctl(      struct 
                 xen_domctl  *op,  XEN_GUEST_HANDLE(xen_domctl_t) 
                 u_domctl);   @@  -315,6 +314,16 
                   @@          ret 
                 = 
                   0;      }       break; + 
                    case XEN_DOMCTL_restoredomain: + 
                    { + 
                         struct 
                domain   *d; + 
                        if ( (d  = 
                  rcu_lock_domain_by_id(op->domain)) == NULL 
                  ) + 
                              break; + 
                           + 
                         d->after_restore 
                =    1; + 
                          rcu_unlock_domain(d); + 
                          break; + 
                    }        case 
                   XEN_DOMCTL_createdomain:      { diff 
                  -r a5ed0dbc829f 
                 xen/include/public/domctl.h --- 
                   a/xen/include/public/domctl.h 
                   Tue Nov 18  14:34:14  2008 
                 +0800 +++ b/xen/include/public/domctl.h 
                     Fri Nov 21  17:34:15 2008 
                +0800 @@  -61,6 +61,7  @@  #define 
                 XEN_DOMCTL_destroydomain 
                       2  #define 
                   XEN_DOMCTL_pausedomain 
                          3  #define 
                  XEN_DOMCTL_unpausedomain 
                       4 +#define 
                  XEN_DOMCTL_restoredomain 
                        51  #define 
                  XEN_DOMCTL_resumedomain 
                        27    #define 
                   XEN_DOMCTL_getdomaininfo 
                      5 diff -r 
                  a5ed0dbc829f  xen/include/xen/sched.h --- 
                  a/xen/include/xen/sched.h 
                    Tue Nov 18 14:34:14 2008 
                  +0800 +++  b/xen/include/xen/sched.h 
                   Fri Nov 21  17:34:15   2008 
                +0800 @@ -231,6 +231,7 
                  @@       * cause a 
                  deadlock.  Acquirers don't spin waiting; they 
                   preempt.       */       spinlock_t 
                  hypercall_deadlock_mutex; +    int 
                 after_restore; 
                   //jsong@xxxxxxxxxx  };    struct 
                   domain_setup_info ---------------------------------------------------------------------------------------------  Thanks --Song 
                   wei
 
 
 
  
 
 
  
 
 
  
   
 |  
 _______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 
 
    
     |