Asterisk new jitterbuffer

Asterisk 1.4 has both the older adaptive jitterbuffer for IAX channels or those that use RTP, and the newer generic jitterbuffer only for channels that use RTP available for you to choose from. The adaptive jitterbuffer is in fact also a generic one. More details about the generic fixed-size buffer here by Russel.

The CLI command iax2 show netstats can help to diagnose adaptive jitterbuffer issues on IAX channels. The generic jitterbuffer writes its log to /tmp in case the logging options has been turned on.

What is a Jitterbuffer


A jitterbuffer is used to compensate for jitter. Jitter describes the changes in relative lateness beween several network packets. Mild jitter (less than one frame's-worth) is usually not a problem. However, more excessive jitter can, without a jitterbuffer, result in gaps in audio. A jitterbuffer can also compensate for out-of-order delivery, which can happen when you have multiple network paths.

  • More indepth overview here

Jitter and packet loss can be measured using RTCP. Measuring jitter is a bit difficult, and consequently the measurement of packet loss is a better benchmarking tool, since a jitter buffer essentially tries to fight the dropping of packets.

Asterisk 1.4 Jitterbuffers

Asterisk currently has a jitterbuffer in chan_sip, chan_iax2, as well as in chan_zap. For Asterisk 1.2 there was no jitterbuffer in the RTP-based channels (i.e. chan_sip).

chan_local and Option 'j' (in 1.6 with backport for 1.4)

The Asterisk jitterbuffer for IAX and SIP in Asterisk 1.4 (and most probably 1.6 as well) only works for calls that bridge channels and does not apply to calls to applications like MeetMe() or Record() etc. However by using the Local channel construct together with the 'j' option there is a workaround available:

exten => 123,1,Dial(Local/124 <at> some-context/nj)
exten => 124,1,MeetMe(some-room,dM)

This article was written to help design a new jitterbuffer implementation for Asterisk 1.2 and iaxclient (libiax2).
Article Initial Author: Steve Kann <>
Also contributing: Stephen Davies <>

Issues with the old Asterisk 1.0 jitterbuffer implementation

Asterisk's old IAX2 jitterbuffer is used to correct timing of frames as they arrive. The timestamps of the incoming frames - set by the sender - are used to determine when the frames should be processed. The issues are as follows:
  • The dejittering is arguably in the wrong place. If the incoming frames will be forwarded over another VOIP link (whether IAX, SIP, MGCP etc) it may not be necessary to dejitter. Provided we maintain the timestamps, we can leave the dejittering to the final target system. Similarly, some applications may not require for the frames to be dejittered. Examples:
    • Bridged calls:
      • Audio frame dejitter can probably be handled by the remote agent, for any VoIP <-> VoIP bridge
    • Applications:
      • "Conferencing applications" (app_meetme, app_conference)
        • These applications essentially act as audio agents for incoming audio, so they require full, real-time dejittering on input sources.
      • "Recording" applications (app_voicemail, app_record, etc)
        • These applications still require some measure of jitterbuffering, if only to be able to do PLC and reordering. The timing of frames coming to these applications in not important, but one needs some jitterbuffering in order to reorder frames, and to do interpolation (PLC). So, a jitterbuffer without shrinking would be fine here. The needs of different applications might be communicated to the jitterbuffer as hints of some kind. (i.e JITTERBUFFER_MODE_REALTIME, JITTERBUFFER_MODE_RECORD).
  • Dejittering is "clocked" by the frames that arrive. This makes lost frame reconstruction impossible (or at least difficult) in the current design.
  • Lost frames can't be distinguished from the case where the sender stops sending.
  • Each frame is retimed once when it arrives. After its queued for future delivery, its not touched again. So when we resize the jitterbuffer, its hard to avoid audible gaps. Keeping the frames queued in a structure of our own would permit some audio-stretching tricks to cover up the jitterbuffer size change.
  • Trunking distorts frame timestamps by up to "trunkfreq" milliseconds (usually 20msec). Accurate timestamps are critical to the operation of the current jitterbuffer - so trunking and dejittering are currently incompatible.
    • What are the exact timing accuracy and resolution requirements for incoming frames? I'm (stevek) not entirely convinced that we can't easily deal with frames that have timestamps that are +- 20ms, as long as they are monotonically increasing, and over a long time they are correct.


Design goals for a newadaptive jitterbuffer implementation in Asterisk 1.2

  • Channel Independent (new)
    • Applicable to all network protocols, especially iax2 and SIP

  • Able to be used in libiax2/iaxclient as well as in asterisk proper
    • libiax2 presently uses a variant of chan_iax2's jitterbuffer implementation
    • The implementation of the jitterbuffer should be LGPL-compatible for this reason.

  • Features
    • Re-order misordered frames
    • Deal with remote clock skew (i.e. sender/receiver clocks operate at different speeds).
      • It might be cleaner to keep this outside of the jitterbuffer altogether. We can probably adjust the "rxcore" timestamp (effectively, our local idea of the remote time) as we receive packets, with a low-pass filter; this way we will gradually adjust our local reference time for received packets based on what we get from the remote side. The low-pass filter will make this adjustment independent of jitter.

    • Clean separation of the core dejitter function, and the "meta level" function of tuning the amount of jitter buffer needed based on seen jitter.
      • Jitterbuffer sizing algorithm?
        • The present asterisk/iax jitterbuffer uses a "loss control" sizing mechanism. In citeseer (below, see references), there are other algorithms that have been developed which may be more appropriate. In particular, there is a MOS-maximizing algorithm explored in which is interesting. They call this algorithm E-MOS. Basically, what it seems to end up doing in practice, compared to a loss control algorithm is to provide for lower loss in low jitter situations, and allow higher loss in higher jitter situations.

    • Properly handle multiple streams
      • Audio, Video, Control (DTML, Text, Hangup, etc).
      • It is certainly necessary for both audio and video frames to be jitterbuffered together, in order to preserve synchronization, but it is also important for DTMF and many control frames to also be buffered in synch; consider the case where you have a 1 second buffer, leave a voicemail, and then hangup: If the HANGUP frame isn't buffered, you will lost the last second of your message.

    • Automatic sizing

    • Shrink rate accelerated during silence: no noticable drops.
    • PLC (Packet Loss Concealment): Tell the channel or translator attached to the channel to interpolate a frame, when a frame is lost, or the jitterbuffer needs to grow.

    • DTX (Discontinuous transmission) deal with senders which don't necessarily send frames continuously. Perhaps we might want to specify that they must send a CNG frame to indicate the beginning of a silent period, otherwise, it is impossible to distinguish silence from a lost frame.

    • Performance statistics:
      • Report current jitterbuffer size, target jitterbuffer size, jitter measurements (last 5 min, last 10sec?), number of lost frames, number of frames interpolated due to jb growth, number of frames dropped due to jb shrinking, others?
        • It would be nice to return these statistics to the other side: Mark seems to think that sending this type of information in LAGRQ packets (for IAX2) would be good. For RTP channels, they could report this via RTCP (need to read specs to see what RTCP usually provides).

  • Tunables
    • History length: How much historical jitter to consider in deciding the proper length of the jitterbuffer

  • API [header-in-progress]

#define JB_OK           0
#define JB_EMPTY        1
#define JB_NOFRAME      2
#define JB_DROP         3

typedef struct jb_info {
        /* statistics */
        long frames_in;         /* number of frames input to the jitterbuffer.*/
        long frames_out;        /* number of frames output from the jitterbuffer.*/
        long frames_late;       /* number of frames which were too late, and dropped.*/
        long frames_lost;       /* number of missing frames.*/
        long frames_cur;        /* number of frames presently in jb, awaiting delivery.*/
        long jitter;            /* jitter measured within current history interval*/
        long length;            /* the present jitterbuffer delay*/
        long drift;             /* drift in ms between received frame clock, and local clock */
        long last_ts;           /* the last ts that was read from the jb */
} jb_info;

typedef struct jb_frame {
        void *data;             /* the frame data */
        long ts;        /* the relative delivery time expected */
        long ms;        /* the time covered by this frame, in sec/8000 */
        struct jb_frame *next, *prev;
} jb_frame;

typedef struct jitterbuf {
        jb_info info;

        /* history */
        long hist_long[JB_HISTORY_SECONDS];     /* history buckets */
        long hist_short[JB_HISTORY_MAXPERSEC];   /* short-term history */
        long hist_ts;                           /* effective start time of last bucket */
        int  hist_shortcur;                     /* current index into short-term history */

        jb_frame *frames;               /* queued frames */
        jb_frame *free;                 /* free frames (avoid malloc?) */
} jitterbuf;

/* new jitterbuf */
jitterbuf *             jb_new();

/* destroy jitterbuf */
void                    jb_destroy(jitterbuf *jb);

/* queue a frame data=frame data, timings (in ms): ms=length of frame (for voice), ts=ts, now=now*/
int                     jb_put(jitterbuf *jb, void *data, long ms, long ts, long now);

/* get a frame for time now.  return value is one of
 * JB_OK:  You've got frame!
 * JB_DROP: Here's an audio frame you should just drop.  Ask me again for this time..
 * JB_NOFRAME: There's no frame scheduled for this time.
 * JB_INTERP: Please interpolate an audio frame for this time (either we need to grow, or there was a lost frame
 * JB_EMPTY: The jb is empty.
int                     jb_get(jitterbuf *jb, jb_frame *frame, long now);

/* when is the next frame due out (0=EMPTY)
 * This value may change as frames are added (esp non-audio frames)
long                    jb_next(jitterbuf *jb);

/* get jitterbuf info: only "statistics" may be valid */
int                     jb_getinfo(jitterbuf *jb, jb_info *stats);

/* set jitterbuf info: only "settings" may be honored */
int                     jb_setinfo(jitterbuf *jb, jb_info *settings);


  • Which translators support interpolation?
    • Speex, iLBC, and G729(??) support interpolation natively.
    • For LPCM, and perhaps uLaw/aLaw, we can probably write a simple interpolator, which (for example), plays the previous frame's audio, but reduces it's power by 10dB or so.
    • In experiments, I've also found that GSM is relatively happy in the case of a single interpolation if you just feed it the previous compressed frame again, when you want to interpolate; it's not ideal, but it is orders of magnitude better than playing silence.
    • We'll need to define an API for interpolation, and add that to the translators (it might be as simple as passing a 0 byte frame to the translator, but then how do you know how much you'd get back?).
    • in a more general design, we should also have an "interpolation" frame type, which you could pass to applications which presently handle translation internally, such as app_conference.

  • Interpolating a frame that is lost is different from interpolating a frame because of the jitterbuffer changing size
    • This makes a difference when using codecs that carry state from frame to frame. Where we have lost a frame we need to use the codec's PLC facility to interpolate it. Where we want to "invent" extra audio to cover up a jitterbuffer size change, we will need something else that just makes audio without changing the codec state.
      • We'll have to actually test this, and see how it works, with the codecs' native interpolators, and also with any interpolators we make for the simple codecs. I (stevek) suspect that using the codecs' interpolators in the case of jitter-buffer growth would be fine.

  • Where we have a chain of codec translations, can we choose where to do PLC?
    • For example, can we propagate a "missing frame" token through the translations, until we hit a codec that can do the interpolation? If we make the missing frame token an IAX frame then it could theoretically even pass over IAX links and get handled on another box.
      • We should probably try to do the PLC with the first translator, as it will best be able to interpolate.

  • When exactly will we be able to skip dejittering?
    • I think that we should only disable dejittering when a call is bridged to a channel which can accept jittered input (i.e. other IP channels). I think all applications will want some kind of dejittering, even if the exact parameters are different than those that need "real-time" dejittering.

  • How to handle scheduling?
    • The present IAX2 jitterbuffer operates when packets are received, scheduling them for delivery at some point in the future. This is somewhat limiting, because it makes it impossible for the jitterbuffer to alter the delivery time of not-yet-delivered packets.
    • An alternate mechanism would be for the jitterbuffer implementation to hold on to frames until they should be delivered. This mechanism would essentially run in parallel with asterisk's normal schedule mechanism; The jitterbuffer would schedule itself to run when it expects the next frame to be available for delivery, and then when it runs, it would (if it has a frame to deliver) schedule that to be delivered immediately. This would then cause apps waiting on the channel, etc, to wake up and get the next frame.

Reference material

Other Implementations

    • openh323 jitter buffer implementation [MPL; GPL-incompatible]
    • rat (Uses an old-BSD license with advertising clause)
      • file of interest mainly rat/playout_calc.c
      • Calculates "jitter" using jitter = (7/8)jitter + (1/8) new_jitter
Created by: stevekstevek, Last modification: Wed 15 of Jul, 2015 (11:47 UTC) by davemidd
Please update this page with new information, just login and click on the "Edit" or "Discussion" tab. Get a free login here: Register Thanks! - Find us on Google+