AQuA - Audio Quality Analyzer
AQuA – Audio Quality Analyzer, a product of Sevana Oy, Finland is a simple but powerful tool to provide perceptual voice quality testing and audio quality monitoring. This is the easiest way to compare two audio files and test voice quality between original and degraded files. Technology implemented in AQuA is based on the research of World recognized scientists and allows receiving metrics of perceptual audio quality measurement for voice, HD Voice and wideband audio signals.
Goals for automated audio quality testing
Sound signal quality estimation acquires the increasing value with the distribution of new generations of mobile communications, systems of a synthetic telephony, VoIP and various portable sound recording and sound reproducing devices. The goal of an automated audio quality testing and monitoring system naturally arises to work out a way, which would provide objective estimation (i.e. independently from estimation of particular subject) on audio signal quality and the opportunity to automate such estimation. It is of a high importance as for comparison of competitive commercial products as well as for parameters’ optimization for proprietary products. One of the main parameters in systems for compression, transfer and reproduction of the sound information is the quality of the restored, received or reproduced sound. Quantitative measurement of sound quality has specific features due to the fact that the final receiver of a sound signal is always a human, and a human is also a source of the majority of sound signals. According to the well-known fact, sound signals quality is determined not only by the technical characteristics of a sound processing and transfer systems, but also by the properties of individual peculiarities of speech perception and production, which vary in time and from individual to individual
Review of ITU P.862
ITU P.862 standard also known as PESQ is an objective measurement method that predicts the results of subjective listening tests on telephony systems. PESQ uses a sensory model to compare the original, unprocessed signal with the degraded signal from the network or network element. The resulting quality score is similar to the subjective "Mean Opinion Score" (MOS) measured using panel tests according to ITU-T P.800. The PESQ scores are calibrated using a large database of subjective tests. The method takes into account coding distortions, errors, packet loss, delay and variable delay, and filtering in analogue network components. Being one of the most popular tools PESQ has a number of disadvantages such as demanding test signals to be speech-like because many systems are optimized for speech and respond in an unrepresentative way to non-speech signals (e.g. tones, noise, ITU-T P.50). PESQ test signal is to be set by tester and thus vendor estimations may vary from end customer estimations. The approach performs signal level equalization what theoretically is not that good because when speaking, different sound volumes may have different spectrums. PESQ cannot catch significant quality loss, which occurs when the voice is equalized such that there is far less low frequency and high frequency energy when compared to the original voice file. The need to develop new methods and to improve existing ones is essential to bring together objective and subjective estimation of quality and to explicitly use in such systems our knowledge about hearing and speech production. To use arbitrary or particularized signal as a source signal depends on the estimation purpose (speech intelligibility evaluation, sound reproduction quality, quality estimation of speech transmitted through intercommunication channels, etc.) and allows increasing estimation objectivity.
General scheme of the technology
The scheme above represents general concept of the quality estimation system for sound signals. A generator of test signals allows sound signal forming according to one of the sound flow models. It can be either a particularized set of sound signals or a signal, received in output of statistical speech model that AQuA can generate on demand. Generator’s signal can either be saved for follow-up usage or be exposed to processing and estimation. Bank of signals stores sound data, received as a result of signals’ generator work or from external sources.
Input of the estimation block is a signal of generator directly or one from the bank of signals. Test signal is the input of the synchronizer or of the device under test, which can be for example, a recorded audio from an echo server or a communication channel. The output signal of the device under test is input into synchronizer as well.
The synchronizer matches in time initial signal and the processed one. The synchronized signals in chunks are put into analytical module, which determines the degree of similarity for signals and issues the quality estimation as the measure of similarity between the initial (original) and the processed (degraded) signals.
Perceptual audio quality analysis
Human ear is a non-linear system, which results in effect called masking. Masking occurs on hearing a message against a background of noise or masking sounds. As result of the research of the harmonic signal masking by narrow-band noise Zwiker has determined that the entire spectrum of audible frequencies could be divided into frequency group or bands, recognizable by human ear. Before Zwiker, Fletcher, who had named the selected frequency groups as critical bands of hearing, had drawn a similar conclusion. Critical bands determined by Fletcher and Zwiker differ since the former has defined bands by means of masking with noise and the latter – from the relations of perceived loudness. Sapozhkov has determined a critical band as “a band of frequency speech range, perceptible as a single whole”. In his earlier researches he even suggested that sound signal in the band could be substituted for equivalent tone signal, but experiments did not confirm that assumption. Critical bands determined by Sapozhkov differ from the ones determined by Fletcher and Zwiker since Sapozhkov proceeded from the properties of speech signal. Pokrovskij has also determined critical bands on the basis of speech signal properties. According to his definition the bands provide equal probability of finding formants in them. The value of spectrum energy in bands can be used for different purposes one of which is the sound signal quality estimation. However, using only one author’s critical bands does not allow getting enough objective estimation, since they show only one of the aspects of perception or speech production. In AQuA spectrum energy can be determined in various critical bands as well as in logarithmic and resonator bands, what allows taking into consideration more properties of hearing and speech processing. Taking into account that the bands determined by Pokrovskij and Sapozhkov are better for speech signal and not for sound signal in general allows increasing the accuracy of estimation depending on its purpose. Acoustics of a speech path is non-stationary and non-linear. Taking into account resonator bands increases accuracy in determining sound (particularly speech) signals quality.
AQuA Command Line for IP PBX
AQuA Command Line is a command line implementation of AQuA technology that allows simple implementation of voice and audio quality monitoring to any IP PBX. Typically quality of VoIP terminations is done in the following way:
- Originating a call on the monitoring server using IP PBX interface to a certain VoIP termination server which is running an echo application.
- Monitor both inbound and outbound legs of the call and save them as wav files.
- Use AQuA (Linux or Windows version) to compare the wav files.
This approach is quite effective as one does not need to purchase additional hardware thus utilizing existing infrastructure and can easily map percentage, MOS and PESQ values produced by AQuA to various call terminations.
AQuA Command Line parameters can tune the software in order to work properly in different environments, inside the software there are 53 different sounds typical for human speech, AQuA can create test signals to more precisely perform voice quality testing. One can invoke command line application from any IP PBX allowing very simple integration. Windows systems can utilize AQuA software also as a DLL library.
The most common deployment of AQuA to establish voice quality monitoring is to utilize PHP scipting and the so-called cron jobs. AQuA customers find this approach the most suitable for fastest AQuA deployment to Asterisk PBXs.
Considering the same example of monitoring voice quality at various call terminations and the concept of having Asterisk + cron jobs + PHP scriptiong, one would have the following setup:
1. cron job invokes a PHP script according to predefined time table
2. PHP script establishes connection to Asterisk server and originates a call to for instance an echo server (for example using this Asterisk API http://code.google.com/p/asterisk-php-api/)
3. Audio stored locally is played to the echo server and then recorded back from the echo server
4. Recorded audio is compared to the original audio available locally using AQuA Command Line application (invoked from the same PHP script f.e.)
5. After %, MOS, PESQ values and reasons for voice qualtiy loss (in case of -fau option) are received they can be stored to a local database with correspondent call/ time stamp.
Within a couple of minutes you will enable your Asterisk server with voice quality monitoring feature and ability to log reasons for voice quality losses besides percentage, MOS and PESQ values generated by AQuA. AQuA manual contains all required information to choose the right parameters for voice quality testing and is available at the following link: http://www.sevana.fi/AQuA%20-%20Audio%20Quality%20Analyzer%202.1%20Manual.pdf as a PDF document or at this link http://www.sevana.fi/voice_quality_testing_online_manual_1.php as online software manual.
Among AQuA benefits one will definitely appreciate that:
- AQuA is available as a server solution without any “per channel” limitations
- AQuA license does not involve any annual royalty fee
- AQuA is available for all Linux systems and servers (32 bit and 64 bit)
- AQuA is easy to deploy and use for software products development
- AQuA provides perceptual estimation of audio quality and can be utilized in VoIP, PSTN, ISDN, GSM, CDMA networks and combinations of those
- AQuA is also available as a service