Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
Eyben, Florian and Petridis, Stavros and Schuller, Björn and Tzimiropoulos, Georgios and Zafeiriou, Stefanos and Pantic, Maja (2011) Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2011, 22-27 May 2011, Prague, Czech Republic.
| PDF Restricted to UT campus only: Request a copy 225Kb |
| Abstract: | We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features. |
| Item Type: | Conference or Workshop Item |
| Copyright: | © 2011 IEEE |
| Faculty: | Electrical Engineering, Mathematics and Computer Science (EEMCS) |
| Research Group: | |
| Link to this item: | http://purl.utwente.nl/publications/79507 |
| Official URL: | http://dx.doi.org/10.1109/ICASSP.2011.5947690 |
| Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page

Show download statistics for this publication
Show download statistics for this publication