ASCIIMath creating images

Monday, March 20, 2017

Publication update

This blog has been basically been inactive since last October since being a PostDoc means that there are a whole bunch other things that I am busy with.  And of course, it was winter - that means statistically, there is always someone in the family who is sick (kids bring home every germ that is going around...).

View of Kiel. Source: Johannes Barre 2006, on Wikipedia
But it is spring now!  And I have just returned from DAGA 2017 in Kiel (where we found a very nice Thai restaurant) so time to update some of my work!

First off, my colleagues in Hannover published "Customized high performance low power processor for binaural speaker localization" at ICECS 2016 in Monte Carlo, Monaco (paper on IEEE Xplore), there was the Winter plenary of Hearing4all, and at DAGA 2017, I presented "Pitch features for low-complexity online speaker tracking", and Sarina (Ph.D. student I'm co-supervising) presented "A distance measure to combine monaural and binaural auditory cues for sound source segregation", both of which can be found on my homepage.  In the pipeline is now "Real-time Implementation of a GMM-based Binaural Localization Algorithm on a VLIW-SIMD Processor" by Christopher, which has been accepted and will be presented at ICME 2017 in Hong Kong in July, and I submitted a paper ("Segregation and Linking of Speech Glimpses in Multisource Scenarios on a Hearing Aid") to EUSIPCO 2017; that one is still in review.

I was also teaching a class in the past semester ("5.04.4223 Introduction into Music Information Retrieval") which, because it's a brand new class took a crazy amount of work to prepare for - but I think the students really enjoyed it, and I saw some very good code being written for the final project.

Now back to real work (writing more papers, that is)!  (Well, there's one or two topics I'll put on the blog in the next little while, too.  Later.)

Friday, March 17, 2017

What happened to my homepage? Or: why you should read emails from your hosting provider carefully...

Having had about a million things on my plate recently (WHEN was the last time I blogged anything?), I tend to prioritize emails - and I did kinda ignore an email sent out by Atlassian in January.  It meant to inform me that the personal homepages on BitBucket were moving from to (and switching to https in the process); a redirect from Jan 23rd on, then the old link would be dead after March 1st.

I noticed this of course only when I suddenly couldn't access my homepage. D'Oh!

I have now gone though most of the blog postings here to fix all the links.  If you find any that I missed (or other dead links) please let me know in the comments! Thanks!

Sunday, October 23, 2016

A (very) short trip to Korea

Panorama view from my room at the Nest Hotel in Incheon.  The Incheon Airport is visible on the right.
Earlier this month, I was in Incheon, South Korea, to present a talk at the Symposium for "Statistical physics, machine learning, and its application to speech and pattern recognition", organized by Prof. Kang-Hun Ahn as part of the Korea Institute for Advanced Study (KIAS).  I was specifically invited to give a talk there (along with Jörg Lücke and Steven van de Par). One does not refuse such an invitation, esp. as a post-doc trying to make an academic career happen.

As a conference, the event was very good, in both of scientific content and forging connections that hopefully can  continue in the future.  Many interesting discussions happened outside the sessions, too.
The location of the Symposium banquet. It was excellent.  Don't ask me for the name though; but I can give the coordinates: 37°25'53.1"N 126°25'27.0"E

The trip was bizarre for me for one reason, though.  It's the first time I traveled that far just to present at a conference.  Unfortunately, this symposium was scheduled not long after I had returned from Italy (where I was at MLSP 2016, coupled with a 1 week vacation), and the week before classes start here at the University of Oldenburg; so I had no time to do any sightseeing in Korea.  I literally arrived the day before the first day of the symposium, and left the morning after the last day.  Total time in Korea: about 66 hours. Total time flying there and back: 30 hours. We (myself, Jörg, and Steven) never went further than about 5 km from the Airport.

I certainly hope to go to Korea again, but then stay a little longer! There is so much to see, and I have friends in Japan I'd like to visit, too.  (I've been to Jeju before, so I know Korea can be very beautiful. Next time I'd like to bring the wife and kids along!)

Wednesday, September 14, 2016

MLSP2016 paper: Speaker Tracking for Hearing Aids

MLSP 2016 poster, the print version
can be found here.
Yesterday, I presented my poster at the 2016 IEEE International Workshop on Machine Learning for Signal processing. I think it was received pretty well, there were several people that talked to me, and we had very good discussions. The biggest problem (and typical for all poster sessions) was that there were other good posters being presented, and I couldn't really spend time talking to the other authors at that session. However, over the next few days I'll have a chance to chat with them, so it's all good.
The beach of the conference venue,
with view towards Salerno.
My own paper I'm presenting is entitled "Speaker Tracking for Hearing Aids", and it basically a method to link speech utterances spoken at different times by the same speaker, a classic problem also found in speaker diarization (but I don't need to do segmentation).  My method however is optimized for low computational complexity (for hearing aids), yet reaches comparable performance to typical far more complex methods.  You can find the abstract, paper, and poster (seen in the pic) on my homepage.
Overall, I like these small, highly focused conferences - and being in a nice sunny environment is not to be sneezed at either.

Wednesday, August 31, 2016

A library of Gammatone Filterbanks in Python

I have already programmed gammatone filterbanks several times in MATLAB, but in for some recent work I needed a specific one (the One-Zero Gammatone filterbank) [Katsiamis, 2006] in Python.
Gammatone impulse responses, in time domain

In addition, I wanted my "old" FIR GTFB (from my Ph.D. Thesis) to be in there as well as the GTFB of a former colleague here in Oldenburg [Chen, 2015].  So I packed them all up in a library - I also wanted to learn how to make a PyPI compatible Python library.

This is a work in progress. The gammatone filterbanks mentioned are there, but momentarily I don't have time to polish the code; furthermore, I'd like to eventually add the Slaney GTFB and the Hohmann GTFB - not to mention the structure needs more cleanup.  Consider this v0.01, for educational purposes. You can find it on github.

Thursday, August 11, 2016

Audio Resampling in Python

 (Note: this post was written as a Jupyter Notebook which can be found with the Python code at
One of the most basic tasks in audio processing anyone would need to do is resampling audio files; seems like the data you want to process is never sampled in the rate you want. 44.1k? 16k? 8k? Those are the common ones; there are some really odd ones out there.
Resampling is actually quite hard to get right. You need to properly choose your antialias filters, and write a interpolation/decimation procedure that won't introduce too much noise. There have been books written on this topic. For the most part, there is no single univeral way to to this right, since context matters.
For a more detailed discussion on sample rate conversion see the aptly named "Digital Audio Resampling Homepage" []. Then look at [] for a super informative comparison of a ton of resampling implementations. No seriously, go there. (It inspired me to compare the methods below using a sine sweep.)
MATLAB has the built-in resample function. Everyone uses it. It works, but also sucks in many aspects - a much better function is in the DSP toolbox, but as near as I can tell, noone uses it (even when copious other parts of the DSP toolbox are being used!)
But I'm not talking about MATLAB, I'm talking about Python here, and it doesn't have a default resampling method - except if you're doing any kind of numerical computing, you'll be using numpy and probably scipy --- and the latter has a built-in resample function.
To get ahead of myself a bit, scipy.signal.resample sucks for audio resampling. That becomes apparent quite quickly - it works in frequency domain, by basically truncation or zero-padding the signal in the frequency domain. This is quite ugly in time domain (especially since it assumes the signal to be circular).
There are two alternatives I want to point at that are "turn-key", that is, you just have to specify your resampling ratio, and the library does the rest. I know of a few others, but at the time of writing this they didn't seem as easy to use (eg. just doing the interpolation/decimation, but not calculating a filter.) If there are other notable libraries I should know of, please mention it in the comments of the blog version of this notebook [].
The first alternative is scikit.resample. Using it has two problems: it relies on an external (C-language) library called "Secret Rabbit Code" (SRC as in "sample rate conversion", geddit?) aka libsamplerate []. Not a problem in itself, but you need to install it and this step is not automated in pip and friends (as far as I know at time of writing). (Problem for me was that I needed to run it on a platform where I couldn't do this easily.) Problem 2: scikit.resample 0.3.3 is Python 2 only, and the Python 3 version is unofficial (0.4.0-dev, I used []).
The other one is resampy, which can be found in PyPI or []. It is easy to install and works with Python 3 (librosa now uses it as preferred resampler over libsamplerate)
TL;DR: if you can install external libs, use scikit.resample (0.4.0-dev). resampy is faster, but not as good, but entirely reasonable. (If MATLAB resample is good enough for you, resampy will serve you just fine and is easier to install)
Now for a comparison with pretty pictures.

Downsampling a sweep

A common task is to convert between the CD sampling rate of 44.1 kHz, and the multiples of 8kHz that originated from the telekon industry. In particular, 48kHz. (Supposedly the rate of 44.1k was chosen in part because is was difficult to convert to 48 kHz; but it is also connected to NTSC timing). So let's have a sweep sampled at 48 kHz. (See the Jupyter Notebook for the Python code)
Sweep 100-23900 Hz, sampled at 48 kHz.
Now convert it to 44.1 kHz using the different libraries, and plot the spectrograms.
Same sweep downsampled to 44.1 kHz using scikit.resample.
scikit.samplerate (or libsamplerate AKA "Secret Rabbit Code") does it well. Almost no aliasing, yet close to Nyquist, and noise levels in line with the input. Note that once the sweep is above Nyquist, the output is basically nil.
The same sweep but now downsampled using scipy.signal.resample.
And this is scipy.signal.resample. Throughout the entire signal there is a high-frequency tone, and there is considerable energy even if the sweep is above Nyquist. This is a result of the frequency-domain processing operating on the entire signal at one. This is NOT suited to audio signals.
The same sweep downsampled using resampy.
Finally, resampy. There are some funny nonlinear effects, but at very low level - for audio applications generally nothing to worry about. Even the aliasing is below -60 dB. So, while not as good as scikit.resample is is useable and certainly better than scipy.signal.resample!

Upsampling an impulse

A way to examine the antialias filter is to upsample an impulse. Here, note a frustrating aspect of all these differing libraries: the interface is completely different beween all of them; and to boot, resampy and scikit.resample give different length outputs for the same upsampling ratio. This means you can't just switch between them using a from foo import resample statement, you'll need a wrapper function if you want to switch freely amongst them (see librosa as an example).
>>> us1 = sk_samplerate.resample(impulse, P/Q, 'sinc_best')
>>> us2 = scipy_signal.resample(impulse, int(len(impulse)*P/Q))
>>> us3 = resampy.resample(impulse, Q, P)
>>> print(us1.shape, us2.shape, us3.shape)
(1087,) (1088,) (1088,)
Frequency response of an impulse upsampled from 44.1 kHz to 48 kHz
'scikit.resample' certainly uses the best antialias filter, with a steep slope, allowing very little energy over Nyquist. 'resampy' is a more gentle filter (lower order, perhaps?) but decent. 'scipy.signal.resample'... well, anyways.
What do the impulse responses look like in time domain?
Time-domain impulse upsampled from 44.1 kHz to 48 kHz
The same section of the impulse, but with samples converted into dB (10*log10 magnitude squared)
Unsurprisingly, 'resampy' is the most compact, which explains the gentler cut-off. 'scikit.resample' is wider, but acceptable. 'scipy.signal.resample' sticks out like a sore thumb.

Speed comparison

>>> %timeit sk_samplerate.resample(sig, Q/P, 'sinc_best')
10 loops, best of 3: 98.1 ms per loop
>>> %timeit scipy_signal.resample(np.float64(sig), int(len(sig)*Q/P))
100 loops, best of 3: 4.52 ms per loop
>>> %timeit resampy.resample(np.float64(sig), P, Q) 
10 loops, best of 3: 40.7 ms per loop

And here is a notable merit of 'scipy.signal.resample'. It is faster by an order of magnitude compared to the other methods. I suspect that if you make sure your signals are of length 2^N, you'll get even faster results, since it'll switch to a FFT instead of a DFT. The other two are probably losing some speed in the passing of data from Python to C - but fundamentally, frequency domain techniques do tend to be fast. So keep that in mind you you need speed.


I'll just repeat the above TL;DR: use scikit.resample (0.4.0-dev) if you can install libsamplerate in a sane place. Else, use resampy, which is faster, but not as good --- but entirely reasonable. Use 'scipy.signal.resample' only if you really need the speed, and be aware of its shortcomings (note the circular assumption!).

Monday, June 6, 2016

FFT-based Overlap-Add FIR filtering in Python

Here is a small Python function I've written (github), that might be useful if you're doing signal processing in Python.  The function implements the classic FFT-based Overlap-Add filtering, potentially saving a heck of a lot of processing time (assuming your filter is of sufficiently high order: usually about 128 taps).

For a thorough explanation of the algorithm, see the Wikipedia article.

I wrote this code as part of a lager ongoing project (gammatone filtering) that I will release eventually.  What I still have to add to this code is the ability to set and save state information (it's simply the part of the response cut off at the end of the function) and the ability to filter complex signals with complex filters (replacing rfft with fft).  Changes made in the latest version.