Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks


Share/Save/Bookmark

Eyben, Florian and Petridis, Stavros and Schuller, Björn and Tzimiropoulos, Georgios and Zafeiriou, Stefanos and Pantic, Maja (2011) Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2011, 22-27 May 2011, Prague, Czech Republic.

[img]PDF
Restricted to UT campus only
: Request a copy
225Kb
Abstract:We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features.
Item Type:Conference or Workshop Item
Copyright:© 2011 IEEE
Faculty:
Electrical Engineering, Mathematics and Computer Science (EEMCS)
Research Group:
Link to this item:http://purl.utwente.nl/publications/79507
Official URL:http://dx.doi.org/10.1109/ICASSP.2011.5947690
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page