- Different Levels for Different Purposes
- Telephony Hardware
- Core Software Debugging
- HowTo Debug a DeadLock in Asterisk
Different Levels for Different Purposes
There are (at least) three different levels or types of debugging:
1) Telephony hardware configuration and fault finding in the physical equipment, connections and cabling
2) Configuration based on the configuration files and call handling
3) Core software, looking at the C code and modules
Asterisk does not require any telephony hardware to operate VOIP, but it is flexible enough to connect to and from standard telephone lines and handsets, either instead of or in addition to VOIP terminals. Standard telephone lines (FXO, FXS, T1, E1 etc) are all handled via the Zaptel drivers.
To confirm what devices are available and configured for use in the system, type ZAP SHOW CHANNELS at the console prompt. For more specific information about any individual channel, type ZAP SHOW CHANNEL X (where X is the number). Remember that channels will not work if they are not physically connected to a real, working telephone line - the Dial command will return all channels busy/congested. If channels are not shown, then time to revisit both zapata.conf and zaptel.conf ensure that modprobe(s) have been done and ztcfg has been run.
SIP Channels are only shown if registered
SIP SHOW INUSE will list all SIP extensions defined in SIP.CONF
SIP SHOW CHANNELS will list all SIP extensions registered at that time
Asterisk will normally grab the sound card and use it as the console voice i/o. This means that running a SIP Softphone on the same server is at best difficult. Asterisk is designed to operate as a standalone without am XWindows GUI for best performance - expect poor audio quality if you try this.
Where inspection of the configuration files fails, the simplest thing is to turn debugging/trace output on. This is done by editting the logger.conf file (in /etc/asterisk). To add this output to the console screen, simply add ",debug" at the end of the console configuration line - the sample files have an alternative line with this text already in place. Then type "LOGGER ROTATE" at the console screen, and all signfiicant telephony events will now pour out. These can also be written to a log file for more detailed analysis.
Asterisk by default writes to the system log a complete call record, for example:
> cd /var/log/asterisk/cdr-csv
> tail -f Master.csv
"","5032273698","9714042975","sip","""DORGAN M"" <5032273698>","SIP/220.127.116.11-08100358","","AGI","
,"2004-12-09 13:14:44","2004-12-09 13:14:53",12,9,"ANSWERED","DOCUMENTATION"
Or, if you are having problems catching intermittent problems on your system, consider adding more information to the Asterisk message log. If you look in logger.conf you will see something like:
messages => notice,warning,error
consider changing this to:
messages => notice,warning,error,debug,verbose
(for short periods of time anyway — it can really eat disk space). Note you will need to restart Asterisk or type LOGGER ROTATE at the CLI to get this change to take effect (reload doesn't do it)
Core Software Debugging
Where the behaviour of the application is not as expected, more serious program debugging may be required.
"strace" might give you some helpful info. See man strace. It will record system calls and signels.
"viewcore" is a useful tool for analysing coredumps: http://adimitrov.net/main/viewcore/
I'd run Asterisk in a debugger. I like "DDD". Compile everything with the -g gcc option first so the debugger can display source code lines. When the program stops the curser will be left on the last line executed and you can cause it to "run backwards" and see how you got there.
An Important Note on Submitting Backtraces
When getting a backtrace to have developers help you debug a problem, it is VERY important that you build asterisk with "make clean; make valgrind; make install". This will turn off optimizations so that we can get a readable backtrace.
Tips for Asterisk 1.4Two things you need to make sure you enable when you compile asterisk:
These can be toggled under the "Compiler Flags" options when you do a "make menuselect". Make sure that you have them turned on (I.E. an Asterisk next to them).
Then, follow the guidelines for obtaining a backtrace with GDB. These can be found in the file "asterisk/doc/backtrace.txt".
I would suggest that you issue the following, to make your life easy;
script backtrace.txt <- This will start a typescript session, which captures the Input / Output from the session into a file called "backtrace.txt"
Once you get into gdb, then issue: "set pagination off" to turn of paging.
Then, get your backtrace, exit out of GDB and find a related issue on the bugtracker (or create a new issue) to post your backtrace to. Do NOT post your backtrace to #asterisk-dev or #asterisk-bugs on Freenode, as this will only serve to annoy the developers who are typically busy helping others.
Backtracing a core dump file in /tmp
- start Asterisk with safe_asterisk
- enter "gdb asterisk core.xxxx"
- enter "bt" while in gdb (or do a "bt full")
- enter "thread apply all bt"
100% CPUAre you sure the Asterisk process died and is not just hung either in an infinite loop or waiting for something that will never happen? What does "ps" say?
CONSOLE=noAre you running safe_asterisk? If so try to modify safe_asterisk ... CONSOLE=yes to CONSOLE=no.
Debugging a running asteriskList all the asterisk threads with ps axum -C asterisk to find the thread that takes the most CPU. Now connect with gdb:
gdb /usr/sbin/asterisk pid
and do "bt" and post the last few lines to the mailing list ...
ulimitIf asterisk is crashing as in exiting, issue the command
ulimit -c unlimited
and this should allow asterisk to drop a core file if it can.
HowTo Debug a DeadLock in Asterisk
1) In the asterisk makefile you need to uncomment
- Optional debugging parameters
- NB* the DO_ CRASH arg will force a core dump on certain conditions that indicate a possible deadlock
for testing this is a usefull option because the core will show off what was happending between the threads.
grep the sources for the CRASH macro do see under what conditons this might occur
2) Turn on Verbose logging
Current cvs as of Feb 1 2004 allows verbose msg to be logged see logger.conf add VERBOSE to the messages file.
This will allow you to log ast_verbose msg's to the logs so we can
- see what the bt threads are doing in time sequence order
- re-create the situation that lead to core or deadlock
3) When you deadlock don't restart the box or restart asterisk
Instead take the 5 mins while everyone is freaking out to attach gdb to the running asterisk process and do
gdb /usr/sbin/asterisk <pid of main * process>
...you can get the asterisk "Process Identification Number (PID) by asterisk -r ("-> currently running on blah (pid =9075)"). Note: If the box is truly hosed & blocked on all I/O this will fail also, you must use ps ax) or look for lowest pid after doing ps ax -C asterisk.
4) after gdb loads do
thread apply all bt
At the very least you are now going to save that bt output to a file and post that to bugs.digium.com
5) Identify dead locked threads by this pattern
Note the "_pthread_wait_for_restart_signal". That means we are in wait loop wanting the mutex lock
Thread 23 (Thread 3576854 (LWP 2910)):
- 0 0x400c787e in sigsuspend () from /lib/libc.so.6
- 1 0x40022879 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
- 2 0x40024a36 in __pthread_alt_lock () from /lib/libpthread.so.0
- 3 0x40020fd2 in pthread_mutex_lock () from /lib/libpthread.so.0
Note apparently not all systems implement the "pthread_wait_for_restart_signal", so I guess you might just want to scan for at least "pthread_mutex_lock". You will usually find more than one of these patterns because once a thread is dead locked on a mutex lock, other threads that want the same lock will pile up quickly.
6) Try to identify the first thread, that is dead locked.
The sequence number of bt threads is not relevent, because threads are re-used.
Look in your log files at the time stamps and try to corrolate the THREAD number (e.g. "Thread 23 (Thread 3576854 (LWP 2910))") to the earliest entry in the log file with that same THREAD number (e.g. "VERBOSE"). Note the FRAME number just before doing "pthread_mutex_lock()", (that is the #0, #1, #2, number right after bt THREAD number).
Log files are usually in /var/log/asterisk/. Check the files messages and debug in this directory.
7) Find the position
Now that we have our potential guilty party as the first in line for the lock do
thread <sequence number> for the THREAD of interest
frame <fame number> for the frame # before the pthread_mutex_lock ()
This now should be in our asterisk sources right where we call ast_mutex_lock(). Record the name of the lock it was trying to get eg ast_mutex_lock(&agentlock).
8) Check who has the lock
Now if we have properly turned on thread debugging we are going to be able to see into include/asterisk/lock.f ast_mutex_t struct which looks like this
so now that we have our lock we can see who has it & what we are waiting on, .. do the following bt cmds
This is the guility code that is holding the lock that we want to look at.
9) Now comes the hard part ...:)
Why is this Code in that thread, file, function, lineno not releasing our lock? We have to now scour the code checking all places where that lock is set & released looking for
- places where there is a lock hierarachy and locks are set and released in different order,
- places where a lock is held too long in for and while loops, before calling longer running functions etc
- Do not mutex lock at critical section where we might receive operating system signals
Don't Forget Asterisk's Logging Features (see above)
If you really want to see what's going on under the hood you can start Asterisk using:
/usr/sbin/asterisk -vvvvvv -g -dddddd -c
Finally issue "iax2 debug" or "sip debug" at the CLI prompt. You can easily expect 5 to 10mb an hour of output.
Please update this page with new information, just login and click on the "Edit" or "Discussion" tab. Get a free login here: Register Thanks! - Find us on Google+