“Sizing up” emotional reponses

Here is a copy of a recent theoretical paper I wrote on the importance of “sizing up” emotional responses (musical and otherwise).  A formatted version can be obtained from the 2016 ICMPC Conference website.

Listener-normalized musical affect: conceptualizing “felt” musical emotions

Joseph PlazakSchool of Music, Illinois Wesleyan University, U.S.A


One commonly addressed distinction within the literature on musical affect is the difference between perceived and felt emotional experiences. This paper presents a conceptual framework for understanding how felt musical emotions differ from perceived musical emotions. Theories of perceived musical emotion often assume that musical sounds are normalized relative to their sound source. In this way, perceived emotions are often “externally” normalized relative to the capacities of a given instrument or ensemble. Felt musical emotions are, by definition, “internal” to the listener, and therefore, our understanding of these emotions might benefit from a process of “listener-normalization.” By employing “psychomechanical” attributes of sound, such as perceived sound source size and/or perceived sound source energy, it is possible to normalize sounds relative to the receiver/listener. When listeners normalize sounds relative to themselves, they may be able to determine their ability to “cope” with the size and energy of the sound source, and thereafter, activate proper “felt” emotional responses. This paper addresses current issues relating to our understanding of induced musical emotion, and in particular, details how the process of listener-normalization might explain some variability within experimental research on “felt” musical emotions.


There are a number of complexities concerning the ability of music to communicate emotional information, but one commonly made distinction pertains to the difference between “perceived” and “felt” affective communication. Since Gabrielsson (2002) reinvigorated the question of “internal locus of emotion” vs. “external locus of emotion” within musical communication, many studies have investigated this distinction (Schubert, 2013). The purpose of the present paper is to discuss how the process of listener-normalization might explain some variability within experimental research on “felt” musical emotion, and to encourage further research via this particular conceptualization.


Perceived/expressed musical emotions are those that can successfully communicate emotional information to a receiver/listener, without necessarily inducing these emotions. Normalization plays an important role in the successful communication of such perceived emotions. For example, when perceiving a sad musical passage performed on a given instrument, such as the violin, one might claim that the music sounds sad because it is low in pitch, utilizes a dark timbre, and/or is generally quiet throughout (see Juslin & Timmers, 2010 for a summary of communicative codes in music). It is important to note that such claims typically assume the normalization and relativity of musical cues. In the case of the sad violin example, the characteristics of sad music are normalized relative to the capacities of the violin, and thus we might more accurately claim that a given passage is relatively low in pitch “for a violin,” relatively dark in timbre “for a violin,” and relatively quiet “for a violin.” By definition, perceived emotions (with the possible exception of self-perception) are heard relative to an “external” sound source, and therefore, sound-source normalization is useful for perceiving variations of expressive intent.

Within the literature on affective speech, it is common to explicitly normalize cues relative to the sound source. For example, in a study that analyzed the acoustic features of confidence in speech, Jiang & Pell (2015) normalized the f0 contours of each stimulus relative to the mean of each speaker’s minimum frequency that was spoken within a neutral statement. In studies on pitch perception in music, sound-source normalization is sometimes ignored, thus leading to puzzling associations across the literature. For example, Juslin & Timmer’s (2010) review of associations between affect and musical structures revealed that “High Pitch Level” has been previously associated with a diverse range of perceived emotional information, including: happiness, pleading, anger, and fear. With such a wide range of pitch associations, a standardized source-normalization procedure, much like those found in the affective speech literature, might be beneficial for understanding how the abovementioned associations differ. By normalizing musical pitch relative to a sound source, we might more accurately capture interactions between pitch and timbre (i.e. register); an interaction known to result in different perceived emotional information (Huron, et al., 2006).


When discussing felt/induced musical emotion, the role of normalization might be considered more abstract. Because felt emotional responses are “internal” by definition, it might be useful to consider normalizing felt emotional responses relative to the receiver (i.e. listener) rather than the sound source.

Normalizing a set of musical cues against a listener might initially seem problematic. While we can think of an instrument as being relatively low in pitch or dark in timbre, we typically do not describe ourselves in this same way. Most musical descriptors are not particularly apt for also describing listeners. Therefore, in order to normalize musical sounds relative to the listener, we will need to examine a unique set of sound descriptors.

Beyond acoustic and musical descriptors of sound, psychophysical and psychomechanical descriptors of sound (McAdams et. al, 1994; Stoelinga, 2009) may prove to be useful for normalizing sound stimuli relative to a listener. Psychomechanical attributes of sound include characteristics that can be inferred about a source from sound cues alone, such as perceived sound source size, perceived sound source energy, sound source material, sound source proximity, etc. One unique feature of these cues is that they are common to both the sound source and the listener; both the source and the receiver will have a physical size, amount of energy, physical location, etc. For example, via psychomechanical sound descriptors, it is possible to assess if a given sound source is much smaller or much larger than a particular listener, as might be the case when comparing performances of ukulele music with pipe organ music. The former sound source is likely to be relatively smaller than the listener, whereas the latter sound source is likely to be much bigger. A similar comparison could be made between an energetic sound source and a lethargic sound receiver.

The conceptual framework brought forward here is that induced musical emotions stem from cues that are normalized relative to a given musical receiver (i.e. listener), and perceived musical emotions stem from cues that are normalized to a given musical sender (i.e. instrument or ensemble). In many ways, this idea represents what is already inherent within the connotative meaning of the words “perceived” and “felt.” Perceived emotion, by definition, implies a sense of outwardness, especially with regards to perceiving emotion within music, whereas felt emotion, on the other hand, implies a sense of inwardness. Therefore, the process of normalization may play an important role towards understanding the difference between these two emotional loci.


In order to illustrate the utility of “double-normalizing” affective musical communication, that is, normalizing cues relative to both the sound source (perceived emotion) and the sound receiver (felt emotion), consider the various ways in which one might respond to the sound of a human vocalizing an angry growl.

Being able to perceive that a sound source is expressing anger involves multiple cues, such as sharp amplitude envelopes (Juslin, 1997), lower f0 (Morton, 1977), large f0 range (Williams & Stevens, 1972), etc. Recall that these cues are not absolute, but instead perceived relative to a given sound source. Without the ability to normalize the pitch and intensity ranges of the sound source, it might not be possible to understand how we perceive expressive intent. Normalization facilitates the ability to hear sounds as relatively low and/or relatively sharp, and thereafter “perceive” that a given sound source may be angry.

Beyond sound source-normalization, listener-normalization may provide insight into plausible induced emotional responses to this angry vocalization. For simplicity, a singular psychomechanical property will be used in this example: sound source size. Assume that the listener not only perceives that the sound source is angry (via the cues mentioned above), but also simultaneously perceives the physical size of the sound source (see Patterson, et al. 2008 for an explanation of auditory size perception). Here, we will examine three source size exemplars: same source size (an angry adult), much smaller source size (an angry dwarf), and much larger source size (an angry 10-foot tall giant). In all three cases, the “perceived” emotional message is the same (i.e. anger), yet through the process of listener-normalization, the “felt” emotional response to the angry vocalization could be manifested in many different ways.

Hypothetically, if an angry vocalization was produced by a sound source of approximately the same size as the listener, the felt affective responses might be expected to vary widely, ranging from indifference to fear. In the second example, in which the angry vocalization was produced from a much smaller dwarf-like sound source, different induced responses might be expected, such as laughter, ridicule, or compassion. In the final example, in which the angry vocalization was produced by a much larger giant-like sound source, one might expect induced responses such as terror or panic. These hypothetical responses are closely related to the sound receiver’s ability to cope with the angry sound source.

All of the potential induced responses in the example above are theoretically derived from normalizing a single psychomechanical source cue relative to the listener. However, many concurrent psychomechanical cues should be expected to play a role in induced emotional responses (e.g. source energy, source proximity, etc.). Sound source size is therefore but one of many descriptors that may prove to be useful in understanding felt emotional responses.


The utility of listener-normalized musical affect is largely unknown. In order to develop a complete listener-normalized music perception framework, more research is needed on the perception of psychomechanical aspects of music. By better understanding the perception of sound source attributes within music, more meaningful conclusions may be drawn. In this brief paper, the role of listener-normalized affect has only been discussed in relation to isolated sounds (some of which were musical). Actual music making typically involves constant changes of psychomechanical attributes (energy, proximity, etc.), and therefore, the combination of these changing attributes results in a very complex sound source; perhaps even the most complex sound source regularly encountered in everyday life.

Despite limited knowledge on this topic, we can still speculate about the implications of listener-normalized musical affect, both within and beyond music perception. From a development perspective, listener-normalized emotional responses might be useful for understanding how music perception changes from infancy thru adulthood. To an infant, we should expect that the world “sounds much bigger” than it does to an adult listener. A recent study by Masapollo, Polka, & Menard (2015) found that infants preferred listening to synthesized speech with infant vocal properties relative to adult vocal properties. In this case, listener-normalization (i.e. normalizing sounds to an infant listener) provides an interesting conceptual lens for examining these experimental results. From a theoretical perspective, listener-normalized music affect might be useful for developing a “scaled” theory of emotional responses, such that normalizing psychomechanical musical features to a specific listener might provide insight into how and why that listener prefers certain types of music for certain purposes.

According to Schubert (2013), those most interested in comparing perceived and felt emotional communication tend to be engaged in music perception research. However, the implications for understanding induced emotional responses extend far beyond music into realms such as Human-Computer-Interaction (HCI), comparative psychology, cross-species communication, and potentially even psychological disorders that involve abnormal self-perceptions. It is my hope that researchers investigating emotional responses to music will consider the possibilities afforded by listener-normalization. This concept may be of use not only for future research on induced emotional responses, but also as a meta-analysis tool for investigating variability in previously published work on musical emotions.


The author wishes to thank Illinois Wesleyan University for supporting a junior faculty research leave proposal that facilitated this research. The author also wishes to thank Stephen McAdams, Marc Pell, David Huron, and Zachary Silver for their support, feedback, and suggestions during this leave.


Gabrielsson, A. (2002). Emotion perceived and emotion felt: Same or different? Musicae Scientiae, 5(1 suppl), 123-147.

Huron, D., Kinney, D., & Precoda, K. (2006). Influence of pitch height on the perception of submissiveness and threat in musical passages. Empirical Musicology Review, 1(3), 170-177.

Jiang, X., & Pell, M. D. (2015). On how the brain decodes vocal cues about speaker confidence. Cortex66, 9-34.

Juslin, P. N. (1997). Perceived emotional expression in synthesized performances of a short melody: Capturing the listener’s judgment policy.Musicae Scientiae1(2), 225-256.

Juslin, P. N., & Timmers, R. (2010). Expression and communication of emotion in music performance. Handbook of music and emotion: Theory, research, applications, 453-489.

McAdams, S., Chaigne, A., & Roussarie, V. (2004). The psychomechanics of simulated sound sources: Material properties of impacted bars. The Journal of the Acoustical Society of America115(3), 1306-1320.

Patterson, R. D., Smith, D. R., van Dinther, R., & Walters, T. C. (2008). Size information in the production and perception of communication sounds. In Auditory perception of sound sources (pp. 43-75). Springer US.

Masapollo, M., Polka, L., & Ménard, L. (2015). When infants talk, infants listen: pre‐babbling infants prefer listening to speech with infant vocal properties. Developmental Science, 19(2), 318-328.

Morton, E. S. (1977). On the occurrence and significance of motivation-structural rules in some bird and mammal sounds. American Naturalist, 111(981), 855-869.

Schubert, E. (2013). Emotion felt by the listener and expressed by the music: literature review and theoretical perspectives. Frontiers in Psychology, 4:837.

Stoelinga, C. (2009). A psychomechanical study of rolling sounds. VDM Publishing.

Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates. The Journal of the Acoustical Society of America,52(4B), 1238-1250.

Please cite this paper as:  Plazak, J. (2016). Listener-normalized musical affect: conceptualizing “felt” musical emotions. In V. Johnson, D. Zhao, H. Conrad (editors), “Proceedings of the 14th International Conference for Music Perception and Cognition.” pp. 304-306.