Better Spanish Emotion Recognition In-the-Wild: Bringing Attention to Deep Spectrum Voice Analysis
- Ortega-Beltrán, Elena
- Cabacas-Maso, Josep
-
Benito-Altamirano, Ismael
- Ventura, Carles
ISSN: 0302-9743, 1611-3349
ISBN: 9783031915802, 9783031915819
Year of publication: 2025
Pages: 335-348
Type: Book chapter
Bibliographic References
- Abdollahi, H., Mahoor, M.H., Zandie, R., Siewierski, J., Qualls, S.H.: Artificial emotional intelligence in socially assistive robots for older adults: A pilot study. IEEE Trans. Affect. Comput. 14(3), 2020–2032 (2022). https://doi.org/10.1109/TAFFC.2022.3143803
- Amiriparian, S., et al.: Snore sound classification using image-based deep spectrum features. In: Interspeech 2017, pp. 3512–3516. ISCA (August 2017). https://doi.org/10.21437/Interspeech.2017-434
- Amiriparian, S., Gerczuk, M., Ottl, S., Schuller, B.: Deepspectrum (2020). https://github.com/DeepSpectrum/DeepSpectrum. Accessed 30 July 2024
- Barra Chicote, R., et al.: Spanish expressive voices: corpus for emotion research in Spanish, pp. 60–70 (May 2008). http://www.lrec-conf.org/proceedings/lrec2008/
- Bestelmeyer, P.E.G., Kotz, S.A., Belin, P.: Effects of emotional valence and arousal on the voice perception network. Soc. Cogn. Affect. Neurosci. 12, 1351–1358 (2017). https://doi.org/10.1093/scan/nsx059
- Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B., et al.: A database of German emotional speech. In: Interspeech, vol. 5, pp. 1517–1520 (2005)
- Busso, C., et al.: Iemocap: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)
- Ekman, P.: Emotional and conversational nonverbal signals. In: Language, knowledge, and representation: Proceedings of the Sixth International Colloquium on Cognitive Science (ICCS-99), pp. 39–50. Springer (2004)
- Garcia-Cuesta, E., Salvador, A.B., Pãez, D.G.: EmoMatchSpanishDB: study of speech emotion recognition machine learning models in a new spanish elicited database 83(5), 13093–1311. https://doi.org/10.1007/s11042-023-15959-w, https://link.springer.com/10.1007/s11042-023-15959-w
- Górriz, M., Antony, J., McGuinness, K., Giró-i Nieto, X., O’Connor, N.E.: Assessing knee oa severity with cnn attention-based end-to-end architectures. In: International Conference on Medical Imaging with Deep Learning, pp. 197–214 (2019)
- Hortal, E., Brechard Alarcia, R.: Gantron: emotional speech synthesis with generative adversarial networks (October 2021)
- Jaafar, N., Lachiri, Z.: Stress recognition from speech by combining image-based deep spectrum and text-based features. In: 2022 IEEE Information Technologies & Smart Industrial Systems (ITSIS), pp. 1–6. IEEE (2022)
- Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Mahjoub, M.A.: Speech emotion recognition: methods and cases study:. In: Proceedings of the 10th International Conference on Agents and Artificial Intelligence, pp. 175–182. SCITEPRESS - Science and Technology Publications. https://doi.org/10.5220/0006611601750182, http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0006611601750182
- Lin, C.H., Liao, W.K., Hsieh, W.C., Liao, W.J., Wang, J.C.: Emotion identification using extremely low frequency components of speech feature contours. Sci. World J. 2014(1), 757121 (2014)
- Martin, O., Kotsia, I., Macq, B., Pitas, I.: The enterfaceamp;146;05 audio-visual emotion database, p. 8. IEEE (2006). https://doi.org/10.1109/ICDEW.2006.145
- Preston, S.H., Vierboom, Y.C.: The changing age distribution of the united states. Popul. Dev. Rev. 47(2), 527–539 (2021)
- Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the recola multimodal corpus of remote collaborative and affective interactions. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–8. IEEE (2013)
- Schuller, B.W., et al.: The acm multimedia 2023 computational paralinguistics challenge: emotion share & requests. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 9635–9639 (2023)
- Šídlo, L., Šprocha, B., Durček, P.: A retrospective and prospective view of current and future population ageing in the european union 28 countries. Morav. Geogr. Rep. 28(3), 187–207 (2020)
- Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
- Vary, P., Martin, R.: Digital Speech Transmission and Enhancement. IEEE Press, Wiley, Hoboken (2023). https://books.google.es/books?id=3rDiEAAAQBAJ
- Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features 53(5), 768–785. https://doi.org/10.1016/j.specom.2010.08.013, https://linkinghub.elsevier.com/retrieve/pii/S0167639310001470
- Yu, C., Sommerlad, A., Sakure, L., Livingston, G.: Socially assistive robots for people with dementia: systematic review and meta-analysis of feasibility, acceptability and the effect on cognition, neuropsychiatric symptoms and quality of life. Ageing Res. Rev. 78, 101633 (2022)
- Zadeh, A.B., Cao, Y., Hessner, S., Liang, P.P., Poria, S., Morency, L.P.: Cmu-moseas: a multimodal language dataset for Spanish, Portuguese, German and French, pp. 1801–1812. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.emnlp-main.141