login | register
Fri 03 of Jul, 2009 [21:09 UTC]

voip-info.org

History

Asterisk High Availability Solutions

Created by: jht2,Last modification on Mon 09 of Mar, 2009 [11:40 UTC] by selintra
Ways to increase system availability and balancing:

  • DNS SRV on the CPE side but not all phones handle this.

  • SARK-HA from Aelintra Telecom offers High Availability Asterisk out-of-the box. Runs Aelintra's SARK UCS MVP Asterisk implementation on a pair of servers.... Real-time failover takes less than 20 seconds to complete. Setup requires only 4 additional data fields to filled out in the SARK globals panel. Illustrated set-up guide HERE.

  • Ranch Networks offers High Availability White_Paper_one_one_HA.pdf solution for Asterisk. This is Hardware based solution. (Just for two asterisks boxes).

  • Flip1405 Manages virtual IP between two Asterisk servers and queries UDP5060 for state changes
    • Downtime less than 30 seconds
    • Only 2 dependencies (nmap and arping)
    • Incredibly easy to setup

  • SERVERware. Fault tolerant and high availability solution with unlimited scalability. Commercial

  • Failover switches to automatically switch connections (T1, Ethernet, etc.) to a backup system.
    • CSS: You can make load-balancing with failover with multiple asterisk
    • Altéon : A better tool with permit to load-balance RTP but there is problem is you use qualify=yes and nated phones
    • Big-IP: You can make load-balancing with failover with multiple asterisk (coming soon the real SIP proxy functionalities)
    • Ask me if you have questions about layers 7 switchs


  • Vovida has a SIP load balancer. This allows several Asterisk servers to be setup and appear to be a single server to users. Other load balacing approaches involve the SER SIP proxy, UltraMonkey (see below) or simple DNS round-robin. And then there's also app_distributor as third party application or app_random.
    • there are a lot of bugs and the last version was released in 2002


  • Use the Linux-HA software to provide high-availability (HA) failover on programmed conditions - by default node hang or crash. Linux-HA also has many telephony-oriented HA APIs as defined by the Service Availability Forum (SAF). It also provides sub-second failover, and works well with shared disk or without. It is commonly used with the DRBD package to provide HA with no single point of failure, and no special hardware requirements.

  • Stratus, which as been making high-end continuous processing systems for 20 years, has just added an under $10,000 Linux based continuous processing solution: Stratus ftServer T Series Systems



  • QueueMetrics is able to monitor clustered call-centers with the load distribuited over a number of Asterisk servers as if they were one big single box.

  • OrderlyStats - Dedicated Real Time Call Centre Management and Statistics Package, can monitor single or clustered asterisk servers from a single page.

Asterisk High Availability HOWTO with Heartbeat and Redfone fonebridge

Overview
The following is a brief HOWTO for installing High-Availability Asterisk using Open Source tools combined with fail-over capable & intelligent hardware (the fonebridge).
The heartbeat utility is used in a 'Passive-Active' scenario but could easily be modified to do 'Active-Active'.

Background
Some of our more demanding customers in the Call Center and Banking Industry are loathe to accept an implementation with no mechanism for fail-over and high-availability so this is the hardware/software combination we are using to meet their demands.

Client Background
The following scenario was used for a medium sized call center operation with about 60 analog stations, and a single T1 PRI.

Hardware
  • 2 x 1U Supermicro Servers (P4, 512Mb, Dual Gig Eth, Dual SATA with RAID 0)
  • 1 x Redfone Quad T1 fonebridge to terminate PRI connectivity, power channel banks and provide fail-over capability between the two Supermicros.
  • 1 x T1 PRI
  • 3 x Adtran 750 FXS channel banks to drive analog phones
  • 2 x UPS/Surge Protectors

Software
  • Fedora Core 4
  • Asterisk, zaptel, libpri from CVS head
  • Linux HA software suite from Ultramonkey. They have RPMs for RHE3 that install fine on Fedora Core 4
  • Each server is a mirror image of the other in terms of Asterisk configs and software.

Software Install
After a standard install of FC4, Asterisk, zaptel, libpri we installed all of the packages from Ultramonkey pretty much following their guidelines: http://www.ultramonkey.org/3/installation-rh.el.3.html
You may have a few dependencies issues, mainly perl libs, but we were able to satisfy all of them by using Yum. If you are running Apt you should be able to accomplish the same thing.

Configuring Hearbeat
After installing heartbeat there are only three files that need to be modified for your environment. They are ha.cf, haresources and authkeys. They should all be placed in the /etc/ha.d/ directory. The files should be absolutely identical on all machines that are part of your Asterisk high-availability cluster. We only have two servers running but you could easily scale to more using the exact same configurations. These are our config files. All comment lines have been removed but as you can see they are short and simple.

ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 200ms
deadtime 2
warntime 1
initdead 120
udpport 694
bcast eth0
node asterisk1
node asterisk2

haresources
asterisk1 10.10.10.110 fonulator asterisk

authkeys
auth 1
1 sha1 SuPerS&cretP@$$werd

Operation
Each Asterisk server has a unique IP address which is part of the LAN segment. This could be a NATed network or Internet facing with public IP addresses. Heartbeat manages the monitoring of the hardware state of each machine over Ethernet or serial port or a combination of both (recommended) and assigns the Virtual IP to the Asterisk server which is currently in an active state. Example;

Asterisk1= 10.10.10.100
Asterisk2= 10.10.10.120
Virtual IP= 10.10.10.110 (see haresources)

With Heartbeat it is important that your node names are identical to the host names reflected in #uname -n. You also may need to manually add IP/hosts statements to your /etc/hosts file so each machine knows how to reach the other via IP.

Following the rules in haresources, Heartbeat will assign machine name asterisk1 as the primary server when both systems start up. It will then start the following scripts; fonulator (this is the little script that configures the fonebridge) and asterisk which starts the Asterisk server. These are both standard startup scripts placed in /etc/init.d/ .
If the Primary server suffers a hardware fault or simply stops responding to the heartbeats going between the two nodes asterisk2 will execute /etc/init.d/fonulator start to reconfigure the fonebridge on the fly and begin redirecting traffic to asterisk2 followed by /etc/init.d/asterisk start to start the Asterisk server.

Results
With heartbeat, IP takeover occurs in under a second. The fonulator utility re-configures the fonebridge in just about the same amount of time and then depending on your hardware platform and the complexity of apps running in Asterisk it can take between 5-15 seconds for Asterisk to start up on your secondary server, load all config files, clear alarms and be ready to process calls. Total fail-over time about 15-20 seconds.

Resources
Ultramonkey http://www.ultramonkey.org (High Avail software packages)
Linux HA http://www.linux-ha.org (The High Availability Linux Project)
Redfone http://www.red-fone.com (Maker of the Quad T1/E1 fonebridge)

Asterisk+Hearbeat+SIP+Multi-homed on Debian/Ubuntu


Overview
Use standard Ubuntu/Debian packages to create an Active/Passive high-availability solution for asterisk 1.4 using hearbeat 1.0 (and FreePBX) and using SIP (not redphone/PRI/analog/etc). Note: Use Debian server, do not use Ubuntu server until RAID-1 issues are solved (perhaps Ubuntu Intrepid?).

Background
Many ISP's are now providing "Dynamic T1" instead of (or in addition to) standard T1-PRI service. This "Dynamic T1" just means that they are providing highly prioritized VOIP/SIP between your customer site and them across a T1 (or other highspeed connection). So, it is now more and more possible to get cheaper service using VOIP only without T1-PRI and get very similar call quality. This solution deals with Debian/Ubuntu, but also the special issues that are raised with heartbeat when connecting to the upstream provider via SIP. Many clients want failover support to "seal the deal".

Issues
Heartbeat "takes over" an IP address by adding an "alias" to an interface IN ADDITION to an IP that must always be there so that heartbeat can communicate. For a PBX type install that is not behind a NAT, with no upstream SIP proxy (OpenSer), an alias will be added to BOTH the WAN interface and the LAN interface. Asterisk will need to bind to both the LAN and WAN to operate. Unless you do some routing/proxy magic outlined in this solution, you will run into trouble because asterisk will put the wrong SRC/VIA address in IP/SIP packets. This will cause problems upstream, because your ISP/SIP provider may authenticate based on IP and you will be appearing to send packets from the wrong IP. This will cause problems in the LAN for similar reasons.

Software Install
apt-get install asterisk
apt-get-install heartbeat

Heartbeat Config Generally
See the configuration info in the "Redfone" HOWTO above this one generally. I'm using the 10.10.10.0 addresses from above and 77.77.77.0 as a WAN address in my examples. I'm assuming that the shared LAN address is 10.10.10.110 and the shared WAN address is 77.77.77.110. Asterisk1 server's "other" WAN IP is 77.77.77.100. For sake of example: Asterisk2 machine has 77.77.77.120.

haresources
asterisk1 10.10.10.110 77.77.77.110 fixrouting asterisk

Routing fixes
For each interface to which Asterisk binds it gets the IP address by doing a routing lookup. If you look at 'ip route show' and the look after the word 'src' you will see which IP will be used for that interface (also look at 'ip route get'). It will put this IP into VIA headers and send all IP/UDP/SIP packets from this IP. When this server is primary we need to fix the routing so that all packets on LAN look like they are coming from the 'shared' IP of the two servers for the LAN... AND.. (for multi-homed) we need to fix the routing for the WAN interface also.


The 'fixrouting' script detailed below needs to be /etc/init.d/fixrouting

#! /bin/sh -e
set -e

case "$1" in
   start)
ip route change 10.10.10.0/24 src 10.10.10.110 dev eth0
       ip route change 77.77.77.0/24 src 77.77.77.110 dev eth1
   ;;
 stop)
       ip route change 10.10.10.0/24 src 10.10.10.100 dev eth0
       ip route change 77.77.77.0/24 src 77.77.77.100 dev eth1
   ;;
 force-reload|restart)
   $0 stop
   $0 start
   ;;
 *)
   echo "Usage: /etc/init.d/fixrouting {start|stop|restart|force-reload}"
   exit 1
   ;;
esac

exit 0


Results
When a failover happens that makes this server primary the "shared" IPs will be taken over and then the routing fix will make sure that all packets look like they are coming from that IP in asterisk. When this server fails or becomes secondary IPs will be released and the routing fix will set things back to the Passive state so that the Active machine might still be able to communicate with it (and avoid IP conflicts).



Ultra Monkey

The current solution I have uses UltraMonkey ( http://www.ultramonkey.org ) for load-balancing and failover and it works like a champ. There are obviously a lot of details there, and I'd be happy to detail them if people are interested. There is also a site that has two clusters with uniform reachability for all phones and PRIs. None of this requires a lot of dialplan tuning on a day-to-day basis.

See also



Asterisk

Comments

Comments Filter
222

333Re: Asterisk wih Ultramonkey load balancing; bug in real server health check

by nhadie, Saturday 23 of August, 2008 [20:10:31 UTC]
Hi Madhuri,

I'm very much interested on how to make ultramonkey work with asterisk. Currently i have ultramonkey setup to load balance my web traffic (port 80 and 443) and it's working fine.
I tried adding sip service, my ldirectord.cf looks like this:

virtual=12.13.14.155:5060
       real=12.13.14.130:5060 gate
       real=12.13.14.131:5060 gate
       service=sip
       scheduler=rr
       persistent=600
       protocol=udp
       checktype=connect

but does not seem to work when i try configure my phone to register on 12.13.14.155
any help would be really appreciated.

222

333Asking for info

by wturra, Thursday 08 of May, 2008 [17:21:45 UTC]
He, I wold like to get info about load balancing of asterisk with UltraM. Could you please send me some info about experience on it...? Regards.
222

333Asterisk wih Ultramonkey load balancing; bug in real server health check

by madhuri, Tuesday 06 of May, 2008 [06:39:00 UTC]
I found a bug when trying to get Asterisk working with Ultramonkey. After fixing this bug I have Asterisk working with Ultramonkey doing load balancing and heartbeat without any problem.

The bug:

Asterisk real server health check does not work reliabily from Ultramonkey. ldirectord from ultramonkey sends SIP OPTIONS request for real server health ckeck. Many a times Asterisk sends "200 OK" response for this request on a wrong port. So, the real server is deactivated.

Here are the details:

- Ultramonkey could set up to use SIP OPTIONS request for Asterisk real server health check. When you do that the script /etc/ha.d/resource.d/ldirectord uses the same call-id for all the OPTIONS requests it sends.

- In Asterisk, in chan_sip.c, when it receives a new SIP request it tries to see if there is an existing dialog setup for this request. If it doesn't find the exising dialog it will setup the new dialog. Since call-id, to, from and Cseq are same for every request sent from ldirectord it sometimes picks up the wrong earlier dialog and sends the response to this request on the wrong port.

- ldirectord never receives response in the above case and marks the real server down.

Solution:

Modify ldirectord to generate new call-id for each request. Here is the modified code for ldirectord. After this change there is no problem in real server health check.

Here is a quick modififications to ldirectord, check_sip subroutine. You can use any method to generate different call-id. I have used the following method.

               my $range = 100000000000;
               my $callid = int(rand($range));
               my $request =
               "OPTIONS sip:" . $$v{login} . " SIP/2.0\r\n" .
               "Via: SIP/2.0/UDP $sip_s_addr_str:$sip_s_port;" . "rport;" .
                       "branch=z9hG4bKhjhs8ass877\r\n" .
               "Max-Forwards: 70\r\n" .
               "To: <sip:" . $$v{login} . ">\r\n" .
               "From: <sip:" . $$v{login} . ">;tag=1928301774\r\n" .
               "Call-ID: $callid\r\n" .
               "CSeq: 63104 OPTIONS\r\n" .
               "Contact: <sip:" . $$v{login} . ">\r\n" .
               "Accept: application/sdp\r\n" .
               "Content-Length: 0\r\n\r\n";


If anybody wants full details of how to get Asterisk working with ultramonkey load balancing and heartbeat let me know.

222

333Asterisk wih Ultramonkey load balancing; bug in real server health check

by madhuri, Tuesday 06 of May, 2008 [06:37:24 UTC]
I found a bug when trying to get Asterisk working with Ultramonkey. After fixing this bug I have Asterisk working with Ultramonkey doing load balancing and heartbeat without any problem.

The bug:

Asterisk real server health check does not work reliabily from Ultramonkey. ldirectord from ultramonkey sends SIP OPTIONS request for real server health ckeck. Many a times Asterisk sends "200 OK" response for this request on a wrong port. So, the real server is deactivated.

Here are the details:

- Ultramonkey could set up to use SIP OPTIONS request for Asterisk real server health check. When you do that the script /etc/ha.d/resource.d/ldirectord uses the same call-id for all the OPTIONS requests it sends.

- In Asterisk, in chan_sip.c, when it receives a new SIP request it tries to see if there is an existing dialog setup for this request. If it doesn't find the exising dialog it will setup the new dialog. Since call-id, to, from and Cseq are same for every request sent from ldirectord it sometimes picks up the wrong earlier dialog and sends the response to this request on the wrong port.

- ldirectord never receives response in the above case and marks the real server down.

Solution:

Modify ldirectord to generate new call-id for each request. Here is the modified code for ldirectord. After this change there is no problem in real server health check.

Here is a quick modififications to ldirectord, check_sip subroutine. You can use any method to generate different call-id. I have used the following method.

               my $range = 100000000000;
               my $callid = int(rand($range));
               my $request =
               "OPTIONS sip:" . $$v{login} . " SIP/2.0\r\n" .
               "Via: SIP/2.0/UDP $sip_s_addr_str:$sip_s_port;" . "rport;" .
                       "branch=z9hG4bKhjhs8ass877\r\n" .
               "Max-Forwards: 70\r\n" .
               "To: <sip:" . $$v{login} . ">\r\n" .
               "From: <sip:" . $$v{login} . ">;tag=1928301774\r\n" .
               "Call-ID: $callid\r\n" .
               "CSeq: 63104 OPTIONS\r\n" .
               "Contact: <sip:" . $$v{login} . ">\r\n" .
               "Accept: application/sdp\r\n" .
               "Content-Length: 0\r\n\r\n";


If anybody wants full details of how to get Asterisk working with ultramonkey load balancing and heartbeat let me know.

222

333

by schapman, Friday 29 of February, 2008 [14:26:57 UTC]
why couldnt you just add an extension from a backup server to the main server.. That re-registers ever n seconds or something... if it ever unregisters then you know that asterisk on that machine is down. Then send a sort of tattle telling signal to a main SER server to say that I am the one who should get the calls... As well you could write a manager connection to each other on each server and whatever server is still registered and had the least events in the past n minutes gets the call. ... I guess with SER much of that is possible .. maybe the question is.. has that already been done....
222

333CSS load balancing

by dgman, Wednesday 17 of October, 2007 [15:56:53 UTC]
Hi,

It is written : "CSS: You can make load-balancing with failover with multiple asterisk"
But how to do it with registration on the css ip address?
Thank you for your help!
222

333foneBRIDGE2 setup

by bisente, Sunday 26 of August, 2007 [10:54:14 UTC]
Hi

I've published my Asterisk/foneBRIDGE2/heartbeat setup: config files, scripts... along with a brief description of the architecture and working of the cluster. It's available here:

Asterisk clusters with a foneBRIDGE2

Hope somebody finds it useful. :)

Regards
222

333

by maolivar, Monday 23 of July, 2007 [12:26:15 UTC]


My question is

who can i synchronize the contents of asterisk1"master" in real time in order to guarantee a certified copy to asterisk2 "slave"?.
and configurate the access physique to ISDN.

Thanks


222

333Asterisk High Availability Solutions

by maolivar, Monday 23 of July, 2007 [12:16:21 UTC]

Morning

I tri to implemate a High Availability Solutions using UltraMonkey

Please can you describe all details...

Thanks

222

333

by vitaly_il, Thursday 28 of June, 2007 [08:56:40 UTC]
Doesn linux HA know to test application by sending SIP options and ananalyzing its answers?