login | register
Sun 07 of Sep, 2008 [09:49 UTC]

voip-info.org

Discuss [4] History

Asterisk new jitterbuffer

Created by: stevekstevek,Last modification on Wed 19 of Mar, 2008 [19:03 UTC] by JustRumours

Asterisk 1.4 comes with a new adaptive general jitter buffer for SIP and IAX. More details here by Russel. The CLI command iax2 show netstats can help to diagnose jitterbuffer issues.

What is a Jitterbuffer

Basics

A jitterbuffer is used to compensate for jitter. Jitter describes the changes in relative lateness beween several network packets. Mild jitter (less than one frame's-worth) is usually not a problem. However, more excessive jitter can, without a jitterbuffer, result in gaps in audio. A jitterbuffer can also compensate for out-of-order delivery, which can happen when you have multiple network paths.

  • More indepth overview here

Asterisk Jitterbuffers

Asterisk currently has a jitterbuffer in chan_sip, chan_iax2, as well as in chan_zap. For Asterisk 1.2 there was no jitterbuffer in the RTP-based channels (i.e. chan_sip).

chan_local and Option 'j'

The Asterisk jitterbuffer for IAX and SIP in Asterisk 1.4 (and most probably 1.6 as well) only works for calls that bridge channels and does not apply to calls to applications like MeetMe() or Record() etc. However by using the Local channel construct together with the 'j' option there is a workaround available:

 [some-context]
 exten => 123,1,Dial(Local/124 <at> some-context/nj)
 exten => 124,1,MeetMe(some-room,dM)


This article is designed to help design a new jitterbuffer implementation for asterisk and iaxclient (libiax2). Contributions welcome
Article Initial Author: Steve Kann <SteveK@SteveK.com>
Also contributing: Stephen Davies <steve@daviesfam.org>

Issues with the old Asterisk 1.2 jitterbuffer implementation

Asterisk's old IAX2 jitterbuffer is used to correct timing of frames as they arrive. The timestamps of the incoming frames - set by the sender - are used to determine when the frames should be processed. The issues are as follows:
  • The dejittering is arguably in the wrong place. If the incoming frames will be forwarded over another VOIP link (whether IAX, SIP, MGCP etc) it may not be necessary to dejitter. Provided we maintain the timestamps, we can leave the dejittering to the final target system. Similarly, some applications may not require for the frames to be dejittered. Examples:
    • Bridged calls:
      • Audio frame dejitter can probably be handled by the remote agent, for any VoIP <-> VoIP bridge
    • Applications:
      • "Conferencing applications" (app_meetme, app_conference)
        • These applications essentially act as audio agents for incoming audio, so they require full, real-time dejittering on input sources.
      • "Recording" applications (app_voicemail, app_record, etc)
        • These applications still require some measure of jitterbuffering, if only to be able to do PLC and reordering. The timing of frames coming to these applications in not important, but one needs some jitterbuffering in order to reorder frames, and to do interpolation (PLC). So, a jitterbuffer without shrinking would be fine here. The needs of different applications might be communicated to the jitterbuffer as hints of some kind. (i.e JITTERBUFFER_MODE_REALTIME, JITTERBUFFER_MODE_RECORD).
  • Dejittering is "clocked" by the frames that arrive. This makes lost frame reconstruction impossible (or at least difficult) in the current design.
  • Lost frames can't be distinguished from the case where the sender stops sending.
  • Each frame is retimed once when it arrives. After its queued for future delivery, its not touched again. So when we resize the jitterbuffer, its hard to avoid audible gaps. Keeping the frames queued in a structure of our own would permit some audio-stretching tricks to cover up the jitterbuffer size change.
  • Trunking distorts frame timestamps by up to "trunkfreq" milliseconds (usually 20msec). Accurate timestamps are critical to the operation of the current jitterbuffer - so trunking and dejittering are currently incompatible.
    • What are the exact timing accuracy and resolution requirements for incoming frames? I'm (stevek) not entirely convinced that we can't easily deal with frames that have timestamps that are +- 20ms, as long as they are monotonically increasing, and over a long time they are correct.

Progress


Design goals for a new jitterbuffer implementation

  • Channel Independent (new)
    • Applicable to all network protocols, especially iax2 and SIP

  • Able to be used in libiax2/iaxclient as well as in asterisk proper
    • libiax2 presently uses a variant of chan_iax2's jitterbuffer implementation
    • The implementation of the jitterbuffer should be LGPL-compatible for this reason.

  • Features
    • Re-order misordered frames
    • Deal with remote clock skew (i.e. sender/receiver clocks operate at different speeds).
      • It might be cleaner to keep this outside of the jitterbuffer altogether. We can probably adjust the "rxcore" timestamp (effectively, our local idea of the remote time) as we receive packets, with a low-pass filter; this way we will gradually adjust our local reference time for received packets based on what we get from the remote side. The low-pass filter will make this adjustment independent of jitter.

    • Clean separation of the core dejitter function, and the "meta level" function of tuning the amount of jitter buffer needed based on seen jitter.
      • Jitterbuffer sizing algorithm?
        • The present asterisk/iax jitterbuffer uses a "loss control" sizing mechanism. In citeseer (below, see references), there are other algorithms that have been developed which may be more appropriate. In particular, there is a MOS-maximizing algorithm explored in http://tinyurl.com/6ylww which is interesting. They call this algorithm E-MOS. Basically, what it seems to end up doing in practice, compared to a loss control algorithm is to provide for lower loss in low jitter situations, and allow higher loss in higher jitter situations.


    • Properly handle multiple streams
      • Audio, Video, Control (DTML, Text, Hangup, etc).
      • It is certainly necessary for both audio and video frames to be jitterbuffered together, in order to preserve synchronization, but it is also important for DTMF and many control frames to also be buffered in synch; consider the case where you have a 1 second buffer, leave a voicemail, and then hangup: If the HANGUP frame isn't buffered, you will lost the last second of your message.

    • Automatic sizing

    • Shrink rate accelerated during silence: no noticable drops.
    • PLC (Packet Loss Concealment): Tell the channel or translator attached to the channel to interpolate a frame, when a frame is lost, or the jitterbuffer needs to grow.

    • DTX (Discontinuous transmission) deal with senders which don't necessarily send frames continuously. Perhaps we might want to specify that they must send a CNG frame to indicate the beginning of a silent period, otherwise, it is impossible to distinguish silence from a lost frame.

    • Performance statistics:
      • Report current jitterbuffer size, target jitterbuffer size, jitter measurements (last 5 min, last 10sec?), number of lost frames, number of frames interpolated due to jb growth, number of frames dropped due to jb shrinking, others?
        • It would be nice to return these statistics to the other side: Mark seems to think that sending this type of information in LAGRQ packets (for IAX2) would be good. For RTP channels, they could report this via RTCP (need to read specs to see what RTCP usually provides).

  • Tunables
    • History length: How much historical jitter to consider in deciding the proper length of the jitterbuffer




#define JB_OK           0
#define JB_EMPTY        1
#define JB_NOFRAME      2
#define JB_DROP         3

typedef struct jb_info {
        /* statistics */
        long frames_in;         /* number of frames input to the jitterbuffer.*/
        long frames_out;        /* number of frames output from the jitterbuffer.*/
        long frames_late;       /* number of frames which were too late, and dropped.*/
        long frames_lost;       /* number of missing frames.*/
        long frames_cur;        /* number of frames presently in jb, awaiting delivery.*/
        long jitter;            /* jitter measured within current history interval*/
        long length;            /* the present jitterbuffer delay*/
        long drift;             /* drift in ms between received frame clock, and local clock */
        long last_ts;           /* the last ts that was read from the jb */
} jb_info;

typedef struct jb_frame {
        void *data;             /* the frame data */
        long ts;        /* the relative delivery time expected */
        long ms;        /* the time covered by this frame, in sec/8000 */
        struct jb_frame *next, *prev;
} jb_frame;

typedef struct jitterbuf {
        jb_info info;

        /* history */
        long hist_long[JB_HISTORY_SECONDS];     /* history buckets */
        long hist_short[JB_HISTORY_MAXPERSEC];   /* short-term history */
        long hist_ts;                           /* effective start time of last bucket */
        int  hist_shortcur;                     /* current index into short-term history */

        jb_frame *frames;               /* queued frames */
        jb_frame *free;                 /* free frames (avoid malloc?) */
} jitterbuf;


/* new jitterbuf */
jitterbuf *             jb_new();

/* destroy jitterbuf */
void                    jb_destroy(jitterbuf *jb);

/* queue a frame data=frame data, timings (in ms): ms=length of frame (for voice), ts=ts, now=now*/
int                     jb_put(jitterbuf *jb, void *data, long ms, long ts, long now);

/* get a frame for time now.  return value is one of
 * JB_OK:  You've got frame!
 * JB_DROP: Here's an audio frame you should just drop.  Ask me again for this time..
 * JB_NOFRAME: There's no frame scheduled for this time.
 * JB_INTERP: Please interpolate an audio frame for this time (either we need to grow, or there was a lost frame
 * JB_EMPTY: The jb is empty.
 */
int                     jb_get(jitterbuf *jb, jb_frame *frame, long now);

/* when is the next frame due out (0=EMPTY)
 * This value may change as frames are added (esp non-audio frames)
 */
long                    jb_next(jitterbuf *jb);

/* get jitterbuf info: only "statistics" may be valid */
int                     jb_getinfo(jitterbuf *jb, jb_info *stats);

/* set jitterbuf info: only "settings" may be honored */
int                     jb_setinfo(jitterbuf *jb, jb_info *settings);




Discussion

  • Which translators support interpolation?
    • Speex, iLBC, and G729(??) support interpolation natively.
    • For LPCM, and perhaps uLaw/aLaw, we can probably write a simple interpolator, which (for example), plays the previous frame's audio, but reduces it's power by 10dB or so.
    • In experiments, I've also found that GSM is relatively happy in the case of a single interpolation if you just feed it the previous compressed frame again, when you want to interpolate; it's not ideal, but it is orders of magnitude better than playing silence.
    • We'll need to define an API for interpolation, and add that to the translators (it might be as simple as passing a 0 byte frame to the translator, but then how do you know how much you'd get back?).
    • in a more general design, we should also have an "interpolation" frame type, which you could pass to applications which presently handle translation internally, such as app_conference.

  • Interpolating a frame that is lost is different from interpolating a frame because of the jitterbuffer changing size
    • This makes a difference when using codecs that carry state from frame to frame. Where we have lost a frame we need to use the codec's PLC facility to interpolate it. Where we want to "invent" extra audio to cover up a jitterbuffer size change, we will need something else that just makes audio without changing the codec state.
      • We'll have to actually test this, and see how it works, with the codecs' native interpolators, and also with any interpolators we make for the simple codecs. I (stevek) suspect that using the codecs' interpolators in the case of jitter-buffer growth would be fine.

  • Where we have a chain of codec translations, can we choose where to do PLC?
    • For example, can we propagate a "missing frame" token through the translations, until we hit a codec that can do the interpolation? If we make the missing frame token an IAX frame then it could theoretically even pass over IAX links and get handled on another box.
      • We should probably try to do the PLC with the first translator, as it will best be able to interpolate.

  • When exactly will we be able to skip dejittering?
    • I think that we should only disable dejittering when a call is bridged to a channel which can accept jittered input (i.e. other IP channels). I think all applications will want some kind of dejittering, even if the exact parameters are different than those that need "real-time" dejittering.

  • How to handle scheduling?
    • The present IAX2 jitterbuffer operates when packets are received, scheduling them for delivery at some point in the future. This is somewhat limiting, because it makes it impossible for the jitterbuffer to alter the delivery time of not-yet-delivered packets.
    • An alternate mechanism would be for the jitterbuffer implementation to hold on to frames until they should be delivered. This mechanism would essentially run in parallel with asterisk's normal schedule mechanism; The jitterbuffer would schedule itself to run when it expects the next frame to be available for delivery, and then when it runs, it would (if it has a frame to deliver) schedule that to be delivered immediately. This would then cause apps waiting on the channel, etc, to wake up and get the next frame.

Reference material


Other Implementations



Comments

Comments Filter
222

333Jitterbuffer problems with DTMF

by lschweiss, Friday 06 of January, 2006 [15:26:02 UTC]
What is up with DTMF becoming unreliable on IAX2 when the jitterbuffer is turned on? Are there any solutions to make it work like it is supposed to when the jitterbuffer is on?

This problem seems to have been around for a long time, without much attention. Do the majority of Asterisk users not make IAX2 connections across the Internet or are they just not make IVR menus with these connections?
222

333Re: Re: jitter on trunking

by kleptog, Tuesday 05 of April, 2005 [14:09:01 UTC]
The main reason I can think of is handling dejittering of fax/data calls. Fax machines can handle data errors (frame loss) by organising retransmission. But they are not generally setup to handle continuously changing Round Trip Time. A variable jitterbuffer means a variable RTT.
222

333Re: jitter on trunking

by stevekstevek, Wednesday 27 of October, 2004 [21:41:04 UTC]
I'd imagine that the de-jittering would happen _after_ the frames were "de-trunked", and that in this case it should work fine.

I'm not sure why you'd want a fixed-size jitterbuffer; having a fixed size would only be a band-aid for a broken automatic sizing methodology.


222

333jitter on trunking

by , Tuesday 12 of October, 2004 [15:09:32 UTC]
Hi

Just wondering if this will work well for trunking of calls as well. I would like to suggest a fixed size jitter buffer option to add to your list of features, as this would be useful in terms of trunking as well.

Regards
Clive