[time-nuts] Cheap jitter measurements

Wed Apr 11 22:28:55 EDT 2018

It's worth noting that you can get rid of a /lot/ of the variance on a 
modern linux box:

1) Set the CPU to run at the same speed at all times (generally "max 
performance" but which way you do it doesn't really matter)
2) Set processor masks so that no processes other than your timing code 
runs on a core of your choice. On hyperthreaded processors, make sure 
nothing is scheduled on the other 'half' of that core.
3) Set your interrupts to not be scheduled onto that core
4) Make sure your timing code fits in the L1 cache
5) When possible, make sure you don't conditionally branch. The last 
means instead of doing something like this:

while true {
   if x < y {
     continue loop
   } else {
     write to hardware
   }
}

You do something more like:

while true {
compare x to y
   conditional mov 1 to hardware register on x gt y
   conditional mov 0 to hardware register on x lte y
}

(and if possible, write to memory-mapped hardware pages, rather than 
making calls into the kernel)

This guarantees that both a) latency writing to hardware is consistent 
every loop pass (though hardware-induced jitter isn't), and b) that 
there are no branch mispredicts because there are no conditional 
branches -- conditional move instructions take a constant time to 
execute (plus or minus memory access latency).

This basically removes the entire kernel from the picture, any other 
processes from the picture, and shared CPU resources from the picture, 
except for those times that you have no choice but to access the memory 
bus and such. Otherwise, your code will just sit there on its own core 
doing its own thing and nothing will interrupt it and most sources of 
unknown jitter are removed.

(It's not perfect, but it's probably the closest you'll get on a PC 
without specialized hardware. Though I _do_ wonder what could be done 
with something like the intel i210AT chips on like the apu2 hardware, 
which can do hardware PPS out and hardware event timestamping...)

-j

On 4/11/2018 4:01 PM, Hal Murray wrote:
> kb8tq at n1k.org said:
>> Except that’s not the way most timers run. The silicon needed to get a
>> programable  divider to work at 2.4 GHz is expensive. If you dig into the
>> hardware descriptions,  the clock tree feeds something much slower to the
>> “top end” of the typical timer in a CPU or MCU. The exception is the high
>> perf timers in some of the Intel chips.  There the issue is getting them to
>> relate to anything “outside” the chip.
> I think I got started in this area back in the early DEC Alpha days.  They
> had a register that counted raw clock cycles.  Simple.  I got stuck thinking
> that was the obvious/clean way to do things.
>
> Many thanks for giving me a poke to go learn more about this area.
>
> That was back before battery operation was as interesting as it is today.  I
> suspect power is more likely the critical factor.  Half the power goes into
> the low order bit, so counting by 4 every 4th cycle rather than 1 every cycle
> saves 3/4 of the power.
>
>
>> That may be what the kernel does, but it implements the result as a drop /
>> add to a counter.
> If the source of time is a register counting CPU clock ticks, and the CPU
> clock (2 or 3 GHz) is faster than the resolution of the clock (1 ns) it will
> be hard to see any drop/add.  However, if the time register is significantly
> slower, then the drop/add is easy to spot.  But all that is lost in the noise
> of cache misses and such.
>
> Here is a histogram from an Intel Atom running at 1.6 GHz.
>
> First pass, using rpcc.
>      cycles      Hits
>          24     86932
>          36    904825
>          48      8011
>          60       122
>          72         1
>         144        11
> ...
> So it looks like the cycle counter gets bumped by 12.  That's a strange
> number.  I suspect it's tangled up with changing the clock speed to save
> power.  There are conflicting interests in this area.  If you want to keep
> time, you need a register than ticks at a constant rate as you change speed.
> If you are doing performance analysis, you want a register than counts cycles
> at whatever speed the CPU is running.  Or maybe I'm confused.
>
> Second pass, using clock_gettime.
>        nSec      Hits
>         698         2
>         768         5
>         769         2
>         838         3
>         908         2
>         977         1
>         978         3
>        1047    237102
>        1048    383246
>        1117    204072
>        1118    172490
>        1187       275
>        1188       135
>        1257       263
>        1258        47
>        1326         7
>        1327       216
> ...
> The clock seems to be ticking in 70ns steps.  That doesn't match 12 clock
> cycles so I assume they are using something else.
>
> >From another system:
> Second pass, using clock_gettime.
>        nSec      Hits
>          19     45693
>          20    347538
>          21    591129
>          22     15284
>          23        63
>          24        34
>          25        32
> ...
> Note that this is 50 times faster than the previous example.
>
> I haven't figured out the kernel and library software for reading the clock.
> There is a special path for some functions like reading the clock that avoids
> the overhead of getting in/out of the kernel.  I assume there is some shared
> memory.
>    https://en.wikipedia.org/wiki/VDSO
>
> Again, thanks Bob.
>
> TICC arrived today.
>
>
>
>
> _______________________________________________
> time-nuts mailing list -- time-nuts at febo.com
> To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
> and follow the instructions there.