Hi everyone,
I have recently run into an issue at one of my company's larger customers - before I explain it, let me give you a summary of the system in question:
Hardware:
HP Proliant DL380 G4:
Intel Xeon 3.2GHz (2x)
2GB PC2-3200 DDR2
3x 73GB SAS HDD (RAID 5)
Software:
PIAF-Green 2.0.6.4
FreePBX Version: 2.11.0.11
Asterisk Version: 11.5.1
OS: CentOS 6.4 Final
Kernel Version: 2.6.32-358.23.2.el6.i686 (32 bit)
System:
Number of active extensions: 70
Avg. concurrent calls: 30-50
Using SIP
Call recordings are enabled (wav49 format, stored on the same drive as the system)
Audio codec: g729
The issue I'm having is that at some point during the day, something, somewhere in the system snaps, and Asterisk stops closing the open channels when calls are completed, causing these channels to stack up until they reach ~1000, at which point the CPU usage on the system is at 100%, the system falls over and is unable to open any more channels until I do a forced amportal restart, or reboot the server (if the amportal doesn't work). When under load (before falling over) with 30-50 concurrent calls, CPU usage sits at ~50%.
The customer mentioned that their previous VoIP provider (also using Asterisk) had encountered a similar issue, and after performing diagnostics, determined that the storage of the call recordings was being bottlenecked by the hard drives (too many IOPS), and due to the recordings not being able to finish, Asterisk was not closing the channels created for the recordings. They solved this issue by setting up a separate server to handle the call recordings (linking the two using an IAX trunk, I'm not too certain of the details).
However, I would prefer to attempt to resolve this issue by making changes on the current system before resorting to the method above, and so far have come up with the following changes which I'm going to implement and test tomorrow:
- Disable ringback (generated by our provider, and is causing the recordings to start before the actual call starts (as they are set to record on answer, but the system sees the generated ringback as an answer))
- Enable SIP reinvites (to remove some of the load from Asterisk although I'm not certain how much of a difference this will make)
- Try recording in standard uncompressed 16-bit WAV and not wav49 (wav49 compression could be generating more overhead as well, although this will make storage space an issue)
Do you think that the above changes will lessen the load on the system enough to solve the issue, and do you have any other recommendations as to changes I could make?
One other issue that caught my attention, which I've seen at one of our other customers, is a warning generated by Asterisk, which appears at a rate of several a second when logged into the Asterisk CLI in verbose mode:
WARNING[xxxx]: translate.c:xxx framein: no samples for g729tolin
After researching the issue, I discovered that it was as a result of the dialler that the customer was using, trying to send audio using lin, but Asterisk refusing it (as our systems only use g729) and forcing it to retransmit - what I'd like to know is whether this process could be generating a significant amount of overhead or not (as it doesn't seem to be causing any errors).
Thanks in advance! I've given all the information that I think is relevant to the issue, but if I've missed something, please let me know and I'll be happy to provide it.
I have recently run into an issue at one of my company's larger customers - before I explain it, let me give you a summary of the system in question:
Hardware:
HP Proliant DL380 G4:
Intel Xeon 3.2GHz (2x)
2GB PC2-3200 DDR2
3x 73GB SAS HDD (RAID 5)
Software:
PIAF-Green 2.0.6.4
FreePBX Version: 2.11.0.11
Asterisk Version: 11.5.1
OS: CentOS 6.4 Final
Kernel Version: 2.6.32-358.23.2.el6.i686 (32 bit)
System:
Number of active extensions: 70
Avg. concurrent calls: 30-50
Using SIP
Call recordings are enabled (wav49 format, stored on the same drive as the system)
Audio codec: g729
The issue I'm having is that at some point during the day, something, somewhere in the system snaps, and Asterisk stops closing the open channels when calls are completed, causing these channels to stack up until they reach ~1000, at which point the CPU usage on the system is at 100%, the system falls over and is unable to open any more channels until I do a forced amportal restart, or reboot the server (if the amportal doesn't work). When under load (before falling over) with 30-50 concurrent calls, CPU usage sits at ~50%.
The customer mentioned that their previous VoIP provider (also using Asterisk) had encountered a similar issue, and after performing diagnostics, determined that the storage of the call recordings was being bottlenecked by the hard drives (too many IOPS), and due to the recordings not being able to finish, Asterisk was not closing the channels created for the recordings. They solved this issue by setting up a separate server to handle the call recordings (linking the two using an IAX trunk, I'm not too certain of the details).
However, I would prefer to attempt to resolve this issue by making changes on the current system before resorting to the method above, and so far have come up with the following changes which I'm going to implement and test tomorrow:
- Disable ringback (generated by our provider, and is causing the recordings to start before the actual call starts (as they are set to record on answer, but the system sees the generated ringback as an answer))
- Enable SIP reinvites (to remove some of the load from Asterisk although I'm not certain how much of a difference this will make)
- Try recording in standard uncompressed 16-bit WAV and not wav49 (wav49 compression could be generating more overhead as well, although this will make storage space an issue)
Do you think that the above changes will lessen the load on the system enough to solve the issue, and do you have any other recommendations as to changes I could make?
One other issue that caught my attention, which I've seen at one of our other customers, is a warning generated by Asterisk, which appears at a rate of several a second when logged into the Asterisk CLI in verbose mode:
WARNING[xxxx]: translate.c:xxx framein: no samples for g729tolin
After researching the issue, I discovered that it was as a result of the dialler that the customer was using, trying to send audio using lin, but Asterisk refusing it (as our systems only use g729) and forcing it to retransmit - what I'd like to know is whether this process could be generating a significant amount of overhead or not (as it doesn't seem to be causing any errors).
Thanks in advance! I've given all the information that I think is relevant to the issue, but if I've missed something, please let me know and I'll be happy to provide it.